Google Spanner
Spanner is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution.
This notebook goes over how to use Spanner
for Vector Search with
SpannerVectorStore
class.
Learn more about the package on GitHub.
Open In Colab
Before You Beginβ
To run this notebook, you will need to do the following:
- Create a Google Cloud Project
- Enable the Cloud Spanner API
- Create a Spanner instance
- Create a Spanner database
π¦π Library Installationβ
The integration lives in its own langchain-google-spanner
package, so
we need to install it.
%pip install --upgrade --quiet langchain-google-spanner
Note: you may need to restart the kernel to use updated packages.
Colab only: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top.
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython
# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)
π Authenticationβ
Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.
- If you are using Colab to run this notebook, use the cell below and continue.
- If you are using Vertex AI Workbench, check out the setup instructions here.
from google.colab import auth
auth.authenticate_user()
β Set Your Google Cloud Projectβ
Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.
If you donβt know your project ID, try the following:
- Run
gcloud config list
. - Run
gcloud projects list
. - See the support page: Locate the project ID.
# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.
PROJECT_ID = "my-project-id" # @param {type:"string"}
# Set the project id
!gcloud config set project {PROJECT_ID}
π‘ API Enablementβ
The langchain-google-spanner
package requires that you enable the
Spanner
API
in your Google Cloud Project.
# enable Spanner API
!gcloud services enable spanner.googleapis.com
Basic Usageβ
Set Spanner database valuesβ
Find your database values, in the Spanner Instances page.
# @title Set Your Values Here { display-mode: "form" }
INSTANCE = "my-instance" # @param {type: "string"}
DATABASE = "my-database" # @param {type: "string"}
TABLE_NAME = "vectors_search_data" # @param {type: "string"}
Initialize a tableβ
The SpannerVectorStore
class instance requires a database table with
id, content and embeddings columns.
The helper method init_vector_store_table()
that can be used to create
a table with the proper schema for you.
from langchain_google_spanner import SecondaryIndex, SpannerVectorStore, TableColumn
SpannerVectorStore.init_vector_store_table(
instance_id=INSTANCE,
database_id=DATABASE,
table_name=TABLE_NAME,
id_column="row_id",
metadata_columns=[
TableColumn(name="metadata", type="JSON", is_null=True),
TableColumn(name="title", type="STRING(MAX)", is_null=False),
],
secondary_indexes=[
SecondaryIndex(index_name="row_id_and_title", columns=["row_id", "title"])
],
)
Create an embedding class instanceβ
You can use any LangChain embeddings
model. You may need to enable
Vertex AI API to use VertexAIEmbeddings
. We recommend setting the
embedding modelβs version for production, learn more about the Text
embeddings
models.
# enable Vertex AI API
!gcloud services enable aiplatform.googleapis.com
from langchain_google_vertexai import VertexAIEmbeddings
embeddings = VertexAIEmbeddings(
model_name="textembedding-gecko@latest", project=PROJECT_ID
)
SpannerVectorStoreβ
To initialize the SpannerVectorStore
class you need to provide 4
required arguments and other arguments are optional and only need to
pass if itβs different from default ones
instance_id
- The name of the Spanner instancedatabase_id
- The name of the Spanner databasetable_name
- The name of the table within the database to store the documents & their embeddings.embedding_service
- The Embeddings implementation which is used to generate the embeddings.
db = SpannerVectorStore(
instance_id=INSTANCE,
database_id=DATABASE,
table_name=TABLE_NAME,
ignore_metadata_columns=[],
embedding_service=embeddings,
metadata_json_column="metadata",
)
π Add Documentsβ
To add documents in the vector store.
import uuid
from langchain_community.document_loaders import HNLoader
loader = HNLoader("https://news.ycombinator.com/item?id=34817881")
documents = loader.load()
ids = [str(uuid.uuid4()) for _ in range(len(documents))]
API Reference:
π Search Documentsβ
To search documents in the vector store with similarity search.
db.similarity_search(query="Explain me vector store?", k=3)
π Search Documentsβ
To search documents in the vector store with max marginal relevance search.
db.max_marginal_relevance_search("Testing the langchain integration with spanner", k=3)
π Delete Documentsβ
To remove documents from the vector store, use the IDs that correspond to the values in the `row_id`` column when initializing the VectorStore.
db.delete(ids=["id1", "id2"])
π Delete Documentsβ
To remove documents from the vector store, you can utilize the documents themselves. The content column and metadata columns provided during VectorStore initialization will be used to find out the rows corresponding to the documents. Any matching rows will then be deleted.
db.delete(documents=[documents[0], documents[1]])