MongoDB Atlas's ParentDocumentRetriever
“Parent Document Retrieval” is a common approach to enhance the performance of retrieval methods in RAG by providing the LLM with a broader context to consider. In essence, we divide the original documents into relatively small chunks, embed each one, and store them in a vector database. Using such small chunks (a sentence or a couple of sentences) helps the embedding models to better reflect their meaning. If two high scoring chunks are contained in the same document, the query response will include the parent document just once. One can control the number of chunks found in the vector_search_stage by setting search_kwargs == {'top_k': n}. The number of query responses will be <= top_k.
In this implementation, we can store both parent and child documents in a single collection while only having to compute and index embedding vectors for the chunks!
This is achieved by backing both the
vectorstore, :class:~langchain_mongodb.vectorstores.MongoDBAtlasVectorSearch
and the docstore :class:~langchain_mongodb.docstores.MongoDBDocStore
by the same MongoDB Collection.
For more details, see superclasses
:class:~langchain.retrievers.parent_document_retriever.ParentDocumentRetriever
and :class:~langchain.retrievers.MultiVectorRetriever.
Vectorstore API to add, embed, and search through child documents
Provides an API around the Collection to add the parent documents
Key stored in metadata pointing to parent document
Kwargs to be passed to vector_search_stage. e.g. {'top_k': 5}.
Construct Retriever using one Collection for VectorStore and one for DocStore
See parent classes
:class:~langchain.retrievers.parent_document_retriever.ParentDocumentRetriever
and :class:~langchain.retrievers.MultiVectorRetriever for further details.
Close the resources used by the MongoDBAtlasParentDocumentRetriever.