LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
    • Overview
    • Caches
    • Callbacks
    • Documents
    • Document loaders
    • Embeddings
    • Exceptions
    • Language models
    • Serialization
    • Output parsers
    • Prompts
    • Rate limiters
    • Retrievers
    • Runnables
    • Utilities
    • Vector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    OverviewCachesCallbacksDocumentsDocument loadersEmbeddingsExceptionsLanguage modelsSerializationOutput parsersPromptsRate limitersRetrieversRunnablesUtilitiesVector stores
    MCP Adapters
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-coredocument_loaders
    Module●Since v0.1

    document_loaders

    Document loaders.

    Functions

    function
    import_attr

    Import an attribute from a module located in a package.

    This utility function is used in custom __getattr__ methods within __init__.py files to dynamically import attributes.

    Classes

    class
    BaseBlobParser

    Abstract interface for blob parsers.

    A blob parser provides a way to parse raw data stored in a blob into one or more Document objects.

    The parser can be composed with blob loaders, making it easy to reuse a parser independent of how the blob was originally loaded.

    class
    BaseLoader

    Interface for document loader.

    Implementations should implement the lazy-loading method using generators to avoid loading all documents into memory at once.

    load is provided just for user convenience and should not be overridden.

    class
    Blob

    Raw data abstraction for document loading and file processing.

    Represents raw bytes or text, either in-memory or by file reference. Used primarily by document loaders to decouple data loading from parsing.

    Inspired by Mozilla's Blob

    Initialize a blob from in-memory data
    from langchain_core.documents import Blob
    
    blob = Blob.from_data("Hello, world!")
    
    # Read the blob as a string
    print(blob.as_string())
    
    # Read the blob as bytes
    print(blob.as_bytes())
    
    # Read the blob as a byte stream
    with blob.as_bytes_io() as f:
        print(f.read())
    Load from memory and specify MIME type and metadata
    from langchain_core.documents import Blob
    
    blob = Blob.from_data(
        data="Hello, world!",
        mime_type="text/plain",
        metadata={"source": "https://example.com"},
    )
    Load the blob from a file
    from langchain_core.documents import Blob
    
    blob = Blob.from_path("path/to/file.txt")
    
    # Read the blob as a string
    print(blob.as_string())
    
    # Read the blob as bytes
    print(blob.as_bytes())
    
    # Read the blob as a byte stream
    with blob.as_bytes_io() as f:
        print(f.read())
    class
    BlobLoader

    Abstract interface for blob loaders implementation.

    Implementer should be able to load raw content from a storage system according to some criteria and return the raw content lazily as a stream of blobs.

    class
    LangSmithLoader

    Load LangSmith Dataset examples as Document objects.

    Loads the example inputs as the Document page content and places the entire example into the Document metadata. This allows you to easily create few-shot example retrievers from the loaded documents.

    Lazy loading
    from langchain_core.document_loaders import LangSmithLoader
    
    loader = LangSmithLoader(dataset_id="...", limit=100)
    docs = []
    for doc in loader.lazy_load():
        docs.append(doc)
    # -> [Document("...", metadata={"inputs": {...}, "outputs": {...}, ...}), ...]

    Type Aliases

    typeAlias
    PathLike

    Modules

    module
    langsmith

    LangSmith document loader.

    module
    blob_loaders

    Schema for Blobs and Blob Loaders.

    The goal is to facilitate decoupling of content loading from content parsing code. In addition, content loading code should provide a lazy loading interface by default.

    module
    base

    Abstract interface for document loader implementations.

    View source on GitHub