Class●Since v0.3

MongoDBAtlasSelfQueryRetriever

Retriever that uses an LLM to deduce filters for Vector Search algorithm.

This can greatly increase power of vector search on collections with structured metadata.

Before calling the search algorithm of the vector store, this retriever first prompts an LLM to find logical statements (e.g. and, in) in a semantic query. From the response, it forms a structured query, which it passes to a VectorStore as filters,

The fields to look for conditions are specified by metadata_field_info a simple list of attribute information for each fieldname, type, description. See How to do "self-querying" retrieval <https://python.langchain.com/docs/how_to/self_query/>_ for more information.

One must index the fields that you want to filter your data by as the filter type in a vectorSearch type index definition.

Example usage:

.. code-block:: python

from langchain_mongodb.retrievers import MongoDBAtlasSelfQueryRetriever
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_ollama.embeddings import OllamaEmbeddings

# Start with the standard MongoDB Atlas vector store
vectorstore = MongoDBAtlasVectorSearch.from_connection_string(
    connection_string="mongodb://127.0.0.1:40947/?directConnection=true",
    namespace=f"{DB_NAME}.{COLLECTION_NAME}",
    embedding=OllamaEmbeddings(model="all-minilm:l6-v2")
)
# Define metadata describing the data
metadata_field_info = [
    AttributeInfo(
        name="genre",
        description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'animated']",
        type="string",
    ),
    AttributeInfo(
        name="year",
        description="The year the movie was released",
        type="integer",
    ),
    AttributeInfo(
        name="rating", description="A 1-10 rating for the movie", type="float"
    ),
]

# Create  search index with filters
vectorstore.create_vector_search_index(
    dimensions=dimensions,
    filters=[f.name for f in metadata_field_info],
    wait_until_complete=TIMEOUT
)

# Add documents, including embeddings
vectorstore.add_documents(fictitious_movies)

# Create the retriever from the VectorStore, an LLM and info about the documents
retriever = MongoDBAtlasSelfQueryRetriever.from_llm(
    llm=llm,
    vectorstore=vectorstore,
    metadata_field_info=metadata_field_info,
    document_contents="Descriptions of movies",
    enable_limit=True
)

# This example results in the following composite filter sent to $vectorSearch:
# {'filter': {'$and': [{'year': {'$lt': 1960}}, {'rating': {'$gt': 8}}]}}
print(retriever.invoke("Movies made before 1960 that are rated higher than 8"))

MongoDBAtlasSelfQueryRetriever()

Bases

SelfQueryRetriever

See Also:

Run Vector Search Queries <https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#run-vector-search-queries>_
How to Index Fields for Vector Search <https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type>_
:class:~langchain_mongodb.vectorstores.MongoDBAtlasVectorSearch