MongoDBAtlasVectorSearch(
self,
collection: Collection[Dict[str, Any]],
embedding:| Name | Type | Description |
|---|---|---|
collection* | Collection[Dict[str, Any]] | MongoDB collection to add the texts to |
embedding* | Embeddings | str | Text embedding model to use. If a string is passed, it will be used to create an AutoEmbeddings class with the given model name. |
text_key | str | List[str] | Default: 'text' |
index_name | str | Default: 'vector_index' |
embedding_key | str | None | Default: 'embedding' |
relevance_score_fn | str | None | Default: 'cosine' |
auto_create_index | bool | None | Default: None |
dimensions | int | Default: -1 |
auto_index_timeout | int | Default: 15 |
MongoDB Atlas vector store integration.
MongoDBAtlasVectorSearch performs data operations on text, embeddings and arbitrary data. In addition to CRUD operations, the VectorStore provides Vector Search based on similarity of embedding vectors following the Hierarchical Navigable Small Worlds (HNSW) algorithm.
This supports a number of models to ascertain scores, "similarity" (default), "MMR", and "similarity_score_threshold". These are described in the search_type argument to as_retriever, which provides the Runnable.invoke(query) API, allowing MongoDBAtlasVectorSearch to be used within a chain.
Setup:
Set up a MongoDB Atlas cluster. The free tier M0 will allow you to start. Search Indexes are only available on Atlas, the fully managed cloud service, not the self-managed MongoDB. Follow this guide
Create a Collection and a Vector Search Index. The procedure is described
here.
You can optionally supply a dimensions argument to programmatically create a Vector
Search Index.
Install langchain-mongodb
.. code-block:: bash
pip install -qU langchain-mongodb pymongo
.. code-block:: python
import getpass
MONGODB_ATLAS_CONNECTION_STRING = getpass.getpass("MongoDB Atlas Connection String:")
Key init args — indexing params: embedding: Embeddings Embedding function to use.
Key init args — client params: collection: Collection MongoDB collection to use. index_name: str Name of the Atlas Search index.
Instantiate:
.. code-block:: python
from pymongo import MongoClient from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch from pymongo import MongoClient from langchain_openai import OpenAIEmbeddings
vector_store = MongoDBAtlasVectorSearch.from_connection_string( connection_string=os=MONGODB_ATLAS_CONNECTION_STRING, namespace="db_name.collection_name", embedding=OpenAIEmbeddings(), index_name="vector_index", text_key="text_field" )
Add Documents:
.. code-block:: python
from langchain_core.documents import Document
document_1 = Document(page_content="foo", metadata={"baz": "bar"}) document_2 = Document(page_content="thud", metadata={"bar": "baz"}) document_3 = Document(page_content="i will be deleted :(")
documents = [document_1, document_2, document_3] ids = ["1", "2", "3"] vector_store.add_documents(documents=documents, ids=ids)
Delete Documents:
.. code-block:: python
vector_store.delete(ids=["3"])
Search:
.. code-block:: python
results = vector_store.similarity_search(query="thud",k=1)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
.. code-block:: python
* thud [{'_id': '2', 'baz': 'baz'}]
Search with filter:
.. code-block:: python
results = vector_store.similarity_search(query="thud",k=1,post_filter=[{"bar": "baz"]})
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
.. code-block:: python
* thud [{'_id': '2', 'baz': 'baz'}]
Search with score:
.. code-block:: python
results = vector_store.similarity_search_with_score(query="qux",k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
.. code-block:: python
* [SIM=0.916096] foo [{'_id': '1', 'baz': 'bar'}]
Async:
.. code-block:: python
# add documents
# await vector_store.aadd_documents(documents=documents, ids=ids)
# delete documents
# await vector_store.adelete(ids=["3"])
# search
# results = vector_store.asimilarity_search(query="thud",k=1)
# search with score
results = await vector_store.asimilarity_search_with_score(query="qux",k=1)
for doc,score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
.. code-block:: python
* [SIM=0.916096] foo [{'_id': '1', 'baz': 'bar'}]
Use as Retriever:
.. code-block:: python
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
)
retriever.invoke("thud")
.. code-block:: python
[Document(metadata={'_id': '2', 'embedding': [-0.01850726455450058, -0.0014740974875167012, -0.009762819856405258, ...], 'baz': 'baz'}, page_content='thud')]
MongoDB field that will contain the text for each document. It is possible to parse a list of fields. The first one will be used as text key. Default: 'text'
Existing Atlas Vector Search Index
Field that will contain the embedding for each document, should be None if embedding is an instance of AutoEmbeddings.
The similarity score used for the index
Currently supported: 'euclidean', 'cosine', and 'dotProduct'
Should be None if embedding is an AutoEmbedding.
Whether to automatically create an index if it does not exist. By default, if no search index of index_name
exists, one will be created.
Number of dimensions in embedding. If the value is not provided, and auto_create_index
is true, the value will be inferred. Should be -1 if embedding is an instance of AutoEmbeddings.
Timeout in seconds to wait for an auto-created index to be ready.