LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
    • Overview
    • Caches
    • Callbacks
    • Documents
    • Document loaders
    • Embeddings
    • Exceptions
    • Language models
    • Serialization
    • Output parsers
    • Prompts
    • Rate limiters
    • Retrievers
    • Runnables
    • Utilities
    • Vector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    OverviewCachesCallbacksDocumentsDocument loadersEmbeddingsExceptionsLanguage modelsSerializationOutput parsersPromptsRate limitersRetrieversRunnablesUtilitiesVector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-coretoolsretriever
    Module●Since v0.2

    retriever

    Retriever tool.

    Functions

    function
    aformat_document

    Async format a document into a string based on a prompt template.

    First, this pulls information from the document from two sources:

    1. page_content: This takes the information from the document.page_content and assigns it to a variable named page_content.
    2. metadata: This takes information from document.metadata and assigns it to variables of the same name.

    Those variables are then passed into the prompt to produce a formatted string.

    function
    format_document

    Format a document into a string based on a prompt template.

    First, this pulls information from the document from two sources:

    1. page_content: This takes the information from the document.page_content and assigns it to a variable named page_content.
    2. metadata: This takes information from document.metadata and assigns it to variables of the same name.

    Those variables are then passed into the prompt to produce a formatted string.

    function
    create_retriever_tool

    Create a tool to do retrieval of documents.

    Classes

    class
    Document

    Class for storing a piece of text and associated metadata.

    Note

    Document is for retrieval workflows, not chat I/O. For sending text to an LLM in a conversation, use message types from langchain.messages.

    class
    BasePromptTemplate

    Base class for all prompt templates, returning a prompt.

    class
    PromptTemplate

    Prompt template for a language model.

    A prompt template consists of a string template. It accepts a set of parameters from the user that can be used to generate a prompt for a language model.

    The template can be formatted using either f-strings (default), jinja2, or mustache syntax.

    Security

    Prefer using template_format='f-string' instead of template_format='jinja2', or make sure to NEVER accept jinja2 templates from untrusted sources as they may lead to arbitrary Python code execution.

    As of LangChain 0.0.329, Jinja2 templates will be rendered using Jinja2's SandboxedEnvironment by default. This sand-boxing should be treated as a best-effort approach rather than a guarantee of security, as it is an opt-out rather than opt-in approach.

    Despite the sandboxing, we recommend to never use jinja2 templates from untrusted sources.

    class
    StructuredTool

    Tool that can operate on any number of inputs.

    class
    BaseRetriever

    Abstract base class for a document retrieval system.

    A retrieval system is defined as something that can take string queries and return the most 'relevant' documents from some source.

    Usage:

    A retriever follows the standard Runnable interface, and should be used via the standard Runnable methods of invoke, ainvoke, batch, abatch.

    Implementation:

    When implementing a custom retriever, the class should implement the _get_relevant_documents method to define the logic for retrieving documents.

    Optionally, an async native implementations can be provided by overriding the _aget_relevant_documents method.

    Retriever that returns the first 5 documents from a list of documents
    from langchain_core.documents import Document
    from langchain_core.retrievers import BaseRetriever
    
    class SimpleRetriever(BaseRetriever):
        docs: list[Document]
        k: int = 5
    
        def _get_relevant_documents(self, query: str) -> list[Document]:
            """Return the first k documents from the list of documents"""
            return self.docs[:self.k]
    
        async def _aget_relevant_documents(self, query: str) -> list[Document]:
            """(Optional) async native implementation."""
            return self.docs[:self.k]
    Simple retriever based on a scikit-learn vectorizer
    from sklearn.metrics.pairwise import cosine_similarity
    
    class TFIDFRetriever(BaseRetriever, BaseModel):
        vectorizer: Any
        docs: list[Document]
        tfidf_array: Any
        k: int = 4
    
        class Config:
            arbitrary_types_allowed = True
    
        def _get_relevant_documents(self, query: str) -> list[Document]:
            # Ip -- (n_docs,x), Op -- (n_docs,n_Feats)
            query_vec = self.vectorizer.transform([query])
            # Op -- (n_docs,1) -- Cosine Sim with each doc
            results = cosine_similarity(self.tfidf_array, query_vec).reshape((-1,))
            return [self.docs[i] for i in results.argsort()[-self.k :][::-1]]
    class
    RetrieverInput

    Input to the retriever.

    Type Aliases

    typeAlias
    Callbacks: list[BaseCallbackHandler] | BaseCallbackManager | None
    View source on GitHub