Search Retrievers of various types.
Use MongoDBAtlasVectorSearch.as_retriever(**)
to create MongoDB's core Vector Search Retriever.
Retriever performs full-text searches using Lucene's standard (BM25) analyzer.
RunnableSerializable API of MongoDB GraphRAG.
Hybrid Search Retriever combines vector and full-text searches weighting them the via Reciprocal Rank Fusion (RRF) algorithm.
Increasing the vector_penalty will reduce the importance on the vector search. Increasing the fulltext_penalty will correspondingly reduce the fulltext score. For more on the algorithm,see https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking
MongoDB Atlas's ParentDocumentRetriever
“Parent Document Retrieval” is a common approach to enhance the performance of retrieval methods in RAG by providing the LLM with a broader context to consider. In essence, we divide the original documents into relatively small chunks, embed each one, and store them in a vector database. Using such small chunks (a sentence or a couple of sentences) helps the embedding models to better reflect their meaning. If two high scoring chunks are contained in the same document, the query response will include the parent document just once. One can control the number of chunks found in the vector_search_stage by setting search_kwargs == {'top_k': n}. The number of query responses will be <= top_k.
In this implementation, we can store both parent and child documents in a single collection while only having to compute and index embedding vectors for the chunks!
This is achieved by backing both the
vectorstore, :class:~langchain_mongodb.vectorstores.MongoDBAtlasVectorSearch
and the docstore :class:~langchain_mongodb.docstores.MongoDBDocStore
by the same MongoDB Collection.
For more details, see superclasses
:class:~langchain.retrievers.parent_document_retriever.ParentDocumentRetriever
and :class:~langchain.retrievers.MultiVectorRetriever.
Retriever that uses an LLM to deduce filters for Vector Search algorithm.
This can greatly increase power of vector search on collections with structured metadata.
Before calling the search algorithm of the vector store, this retriever first prompts an LLM to find logical statements (e.g. and, in) in a semantic query. From the response, it forms a structured query, which it passes to a VectorStore as filters,
The fields to look for conditions are specified by metadata_field_info
a simple list of attribute information for each fieldname, type, description.
See How to do "self-querying" retrieval <https://python.langchain.com/docs/how_to/self_query/>_
for more information.
One must index the fields that you want to filter your data by as the filter type in a vectorSearch type index definition.
Example usage:
.. code-block:: python
from langchain_mongodb.retrievers import MongoDBAtlasSelfQueryRetriever
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_ollama.embeddings import OllamaEmbeddings
# Start with the standard MongoDB Atlas vector store
vectorstore = MongoDBAtlasVectorSearch.from_connection_string(
connection_string="mongodb://127.0.0.1:40947/?directConnection=true",
namespace=f"{DB_NAME}.{COLLECTION_NAME}",
embedding=OllamaEmbeddings(model="all-minilm:l6-v2")
)
# Define metadata describing the data
metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'animated']",
type="string",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
# Create search index with filters
vectorstore.create_vector_search_index(
dimensions=dimensions,
filters=[f.name for f in metadata_field_info],
wait_until_complete=TIMEOUT
)
# Add documents, including embeddings
vectorstore.add_documents(fictitious_movies)
# Create the retriever from the VectorStore, an LLM and info about the documents
retriever = MongoDBAtlasSelfQueryRetriever.from_llm(
llm=llm,
vectorstore=vectorstore,
metadata_field_info=metadata_field_info,
document_contents="Descriptions of movies",
enable_limit=True
)
# This example results in the following composite filter sent to $vectorSearch:
# {'filter': {'$and': [{'year': {'$lt': 1960}}, {'rating': {'$gt': 8}}]}}
print(retriever.invoke("Movies made before 1960 that are rated higher than 8"))