LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
    • Overview
    • Caches
    • Callbacks
    • Documents
    • Document loaders
    • Embeddings
    • Exceptions
    • Language models
    • Serialization
    • Output parsers
    • Prompts
    • Rate limiters
    • Retrievers
    • Runnables
    • Utilities
    • Vector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    OverviewCachesCallbacksDocumentsDocument loadersEmbeddingsExceptionsLanguage modelsSerializationOutput parsersPromptsRate limitersRetrieversRunnablesUtilitiesVector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-coredocument_loadersbase
    Module●Since v0.1

    base

    Abstract interface for document loader implementations.

    Functions

    Classes

    View source on GitHub
    function
    run_in_executor
    class
    Document
    class
    Blob
    class
    BaseLoader
    class
    BaseBlobParser

    Run a function in an executor.

    Class for storing a piece of text and associated metadata.

    Note

    Document is for retrieval workflows, not chat I/O. For sending text to an LLM in a conversation, use message types from langchain.messages.

    Interface for document loader.

    Implementations should implement the lazy-loading method using generators to avoid loading all documents into memory at once.

    load is provided just for user convenience and should not be overridden.

    Abstract interface for blob parsers.

    A blob parser provides a way to parse raw data stored in a blob into one or more Document objects.

    The parser can be composed with blob loaders, making it easy to reuse a parser independent of how the blob was originally loaded.

    Raw data abstraction for document loading and file processing.

    Represents raw bytes or text, either in-memory or by file reference. Used primarily by document loaders to decouple data loading from parsing.

    Inspired by Mozilla's Blob

    Initialize a blob from in-memory data
    from langchain_core.documents import Blob
    
    blob = Blob.from_data("Hello, world!")
    
    # Read the blob as a string
    print(blob.as_string())
    
    # Read the blob as bytes
    print(blob.as_bytes())
    
    # Read the blob as a byte stream
    with blob.as_bytes_io() as f:
        print(f.read())
    Load from memory and specify MIME type and metadata
    from langchain_core.documents import Blob
    
    blob = Blob.from_data(
        data="Hello, world!",
        mime_type="text/plain",
        metadata={"source": "https://example.com"},
    )
    Load the blob from a file
    from langchain_core.documents import Blob
    
    blob = Blob.from_path("path/to/file.txt")
    
    # Read the blob as a string
    print(blob.as_string())
    
    # Read the blob as bytes
    print(blob.as_bytes())
    
    # Read the blob as a byte stream
    with blob.as_bytes_io() as f:
        print(f.read())