LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
  • MCP Adapters
    • Overview
    • Agents
    • Callbacks
    • Chains
    • Chat models
    • Embeddings
    • Evaluation
    • Globals
    • Hub
    • Memory
    • Output parsers
    • Retrievers
    • Runnables
    • LangSmith
    • Storage
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    MCP Adapters
    OverviewAgentsCallbacksChainsChat modelsEmbeddingsEvaluationGlobalsHubMemoryOutput parsersRetrieversRunnablesLangSmithStorage
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-classicretrieversparent_document_retriever
    Module●Since v1.0

    parent_document_retriever

    Classes

    class
    MultiVectorRetriever

    Retriever that supports multiple embeddings per parent document.

    This retriever is designed for scenarios where documents are split into smaller chunks for embedding and vector search, but retrieval returns the original parent documents rather than individual chunks.

    It works by:

    • Performing similarity (or MMR) search over embedded child chunks
    • Collecting unique parent document IDs from chunk metadata
    • Fetching and returning the corresponding parent documents from the docstore

    This pattern is commonly used in RAG pipelines to improve answer grounding while preserving full document context.

    class
    ParentDocumentRetriever

    Retrieve small chunks then retrieve their parent documents.

    When splitting documents for retrieval, there are often conflicting desires:

    1. You may want to have small documents, so that their embeddings can most accurately reflect their meaning. If too long, then the embeddings can lose meaning.
    2. You want to have long enough documents that the context of each chunk is retained.

    The ParentDocumentRetriever strikes that balance by splitting and storing small chunks of data. During retrieval, it first fetches the small chunks but then looks up the parent IDs for those chunks and returns those larger documents.

    Note that "parent document" refers to the document that a small chunk originated from. This can either be the whole raw document OR a larger chunk.

    View source on GitHub