MongoDB Atlas vector store integration.
MongoDBAtlasVectorSearch performs data operations on text, embeddings and arbitrary data. In addition to CRUD operations, the VectorStore provides Vector Search based on similarity of embedding vectors following the Hierarchical Navigable Small Worlds (HNSW) algorithm.
This supports a number of models to ascertain scores, "similarity" (default), "MMR", and "similarity_score_threshold". These are described in the search_type argument to as_retriever, which provides the Runnable.invoke(query) API, allowing MongoDBAtlasVectorSearch to be used within a chain.
Translator between MongoDB Query API and LangChain's StructuredQuery.
With Vector Search Indexes, one can index boolean, date, number, objectId, string, and UUID fields to pre-filter your data. Filtering your data is useful to narrow the scope of your semantic search and ensure that not all vectors are considered for comparison. It reduces the number of documents against which to run similarity comparisons, which can decrease query latency and increase the accuracy of search results.
Retriever that uses an LLM to deduce filters for Vector Search algorithm.
This can greatly increase power of vector search on collections with structured metadata.
Before calling the search algorithm of the vector store, this retriever first prompts an LLM to find logical statements (e.g. and, in) in a semantic query. From the response, it forms a structured query, which it passes to a VectorStore as filters,
The fields to look for conditions are specified by metadata_field_info
a simple list of attribute information for each fieldname, type, description.
See How to do "self-querying" retrieval <https://python.langchain.com/docs/how_to/self_query/>_
for more information.
One must index the fields that you want to filter your data by as the filter type in a vectorSearch type index definition.
Example usage:
.. code-block:: python
from langchain_mongodb.retrievers import MongoDBAtlasSelfQueryRetriever
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_ollama.embeddings import OllamaEmbeddings
# Start with the standard MongoDB Atlas vector store
vectorstore = MongoDBAtlasVectorSearch.from_connection_string(
connection_string="mongodb://127.0.0.1:40947/?directConnection=true",
namespace=f"{DB_NAME}.{COLLECTION_NAME}",
embedding=OllamaEmbeddings(model="all-minilm:l6-v2")
)
# Define metadata describing the data
metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'animated']",
type="string",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
# Create search index with filters
vectorstore.create_vector_search_index(
dimensions=dimensions,
filters=[f.name for f in metadata_field_info],
wait_until_complete=TIMEOUT
)
# Add documents, including embeddings
vectorstore.add_documents(fictitious_movies)
# Create the retriever from the VectorStore, an LLM and info about the documents
retriever = MongoDBAtlasSelfQueryRetriever.from_llm(
llm=llm,
vectorstore=vectorstore,
metadata_field_info=metadata_field_info,
document_contents="Descriptions of movies",
enable_limit=True
)
# This example results in the following composite filter sent to $vectorSearch:
# {'filter': {'$and': [{'year': {'$lt': 1960}}, {'rating': {'$gt': 8}}]}}
print(retriever.invoke("Movies made before 1960 that are rated higher than 8"))