LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
    • Overview
    • Caches
    • Callbacks
    • Documents
    • Document loaders
    • Embeddings
    • Exceptions
    • Language models
    • Serialization
    • Output parsers
    • Prompts
    • Rate limiters
    • Retrievers
    • Runnables
    • Utilities
    • Vector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    OverviewCachesCallbacksDocumentsDocument loadersEmbeddingsExceptionsLanguage modelsSerializationOutput parsersPromptsRate limitersRetrieversRunnablesUtilitiesVector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-coredocumentsbase
    Module●Since v0.1

    base

    Base classes for media and documents.

    This module contains core abstractions for data retrieval and processing workflows:

    • BaseMedia: Base class providing id and metadata fields
    • Blob: Raw data loading (files, binary data) - used by document loaders
    • Document: Text content for retrieval (RAG, vector stores, semantic search)
    Not for LLM chat messages

    These classes are for data processing pipelines, not LLM I/O. For multimodal content in chat messages (images, audio in conversations), see langchain.messages content blocks instead.

    Classes

    class
    Serializable

    Serializable base class.

    This class is used to serialize objects to JSON.

    It relies on the following methods and properties:

    • is_lc_serializable: Is this class serializable?

      By design, even if a class inherits from Serializable, it is not serializable by default. This is to prevent accidental serialization of objects that should not be serialized.

    • get_lc_namespace: Get the namespace of the LangChain object.

      During deserialization, this namespace is used to identify the correct class to instantiate.

      Please see the Reviver class in langchain_core.load.load for more details.

      During deserialization an additional mapping is handle classes that have moved or been renamed across package versions.

    • lc_secrets: A map of constructor argument names to secret ids.

    • lc_attributes: List of additional attribute names that should be included as part of the serialized representation.

    class
    BaseMedia

    Base class for content used in retrieval and data processing workflows.

    Provides common fields for content that needs to be stored, indexed, or searched.

    Note

    For multimodal content in chat messages (images, audio sent to/from LLMs), use langchain.messages content blocks instead.

    class
    Blob

    Raw data abstraction for document loading and file processing.

    Represents raw bytes or text, either in-memory or by file reference. Used primarily by document loaders to decouple data loading from parsing.

    Inspired by Mozilla's Blob

    Initialize a blob from in-memory data
    from langchain_core.documents import Blob
    
    blob = Blob.from_data("Hello, world!")
    
    # Read the blob as a string
    print(blob.as_string())
    
    # Read the blob as bytes
    print(blob.as_bytes())
    
    # Read the blob as a byte stream
    with blob.as_bytes_io() as f:
        print(f.read())
    Load from memory and specify MIME type and metadata
    from langchain_core.documents import Blob
    
    blob = Blob.from_data(
        data="Hello, world!",
        mime_type="text/plain",
        metadata={"source": "https://example.com"},
    )
    Load the blob from a file
    from langchain_core.documents import Blob
    
    blob = Blob.from_path("path/to/file.txt")
    
    # Read the blob as a string
    print(blob.as_string())
    
    # Read the blob as bytes
    print(blob.as_bytes())
    
    # Read the blob as a byte stream
    with blob.as_bytes_io() as f:
        print(f.read())
    class
    Document

    Class for storing a piece of text and associated metadata.

    Note

    Document is for retrieval workflows, not chat I/O. For sending text to an LLM in a conversation, use message types from langchain.messages.

    Type Aliases

    typeAlias
    PathLike
    View source on GitHub