Retriever that uses an LLM to deduce filters for Vector Search algorithm.
This can greatly increase power of vector search on collections with structured metadata.
Before calling the search algorithm of the vector store, this retriever first prompts an LLM to find logical statements (e.g. and, in) in a semantic query. From the response, it forms a structured query, which it passes to a VectorStore as filters,
The fields to look for conditions are specified by metadata_field_info
a simple list of attribute information for each fieldname, type, description.
See How to do "self-querying" retrieval <https://python.langchain.com/docs/how_to/self_query/>_
for more information.
One must index the fields that you want to filter your data by as the filter type in a vectorSearch type index definition.
Example usage:
.. code-block:: python
from langchain_mongodb.retrievers import MongoDBAtlasSelfQueryRetriever
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_ollama.embeddings import OllamaEmbeddings
# Start with the standard MongoDB Atlas vector store
vectorstore = MongoDBAtlasVectorSearch.from_connection_string(
connection_string="mongodb://127.0.0.1:40947/?directConnection=true",
namespace=f"{DB_NAME}.{COLLECTION_NAME}",
embedding=OllamaEmbeddings(model="all-minilm:l6-v2")
)
# Define metadata describing the data
metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'animated']",
type="string",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
# Create search index with filters
vectorstore.create_vector_search_index(
dimensions=dimensions,
filters=[f.name for f in metadata_field_info],
wait_until_complete=TIMEOUT
)
# Add documents, including embeddings
vectorstore.add_documents(fictitious_movies)
# Create the retriever from the VectorStore, an LLM and info about the documents
retriever = MongoDBAtlasSelfQueryRetriever.from_llm(
llm=llm,
vectorstore=vectorstore,
metadata_field_info=metadata_field_info,
document_contents="Descriptions of movies",
enable_limit=True
)
# This example results in the following composite filter sent to $vectorSearch:
# {'filter': {'$and': [{'year': {'$lt': 1960}}, {'rating': {'$gt': 8}}]}}
print(retriever.invoke("Movies made before 1960 that are rated higher than 8"))
MongoDBAtlasSelfQueryRetriever()See Also:
Run Vector Search Queries <https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#run-vector-search-queries>_How to Index Fields for Vector Search <https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type>_~langchain_mongodb.vectorstores.MongoDBAtlasVectorSearchThe underlying vector store from which documents will be retrieved.
Translator for turning LangChain internal query language into Atlas search params.
The query constructor chain for generating the vector store queries.
The search type to perform on the vector store.
Keyword arguments to pass to MongoDBAtlasVectorSearch (e.g. {'k':10}).
logs the structured query generated by the LLM
Use original query instead of the LLM's revised query that removes statements with filters.