Module●Since v0.1

base

Base classes for media and documents.

This module contains core abstractions for data retrieval and processing workflows:

BaseMedia: Base class providing id and metadata fields
Blob: Raw data loading (files, binary data) - used by document loaders
Document: Text content for retrieval (RAG, vector stores, semantic search)

Not for LLM chat messages

These classes are for data processing pipelines, not LLM I/O. For multimodal content in chat messages (images, audio in conversations), see langchain.messages content blocks instead.

Classes

class

Serializable

Serializable base class.

This class is used to serialize objects to JSON.

It relies on the following methods and properties:

is_lc_serializable: Is this class serializable?

By design, even if a class inherits from Serializable, it is not serializable by default. This is to prevent accidental serialization of objects that should not be serialized.
get_lc_namespace: Get the namespace of the LangChain object.

During deserialization, this namespace is used to identify the correct class to instantiate.

Please see the Reviver class in langchain_core.load.load for more details.

During deserialization an additional mapping is handle classes that have moved or been renamed across package versions.
lc_secrets: A map of constructor argument names to secret ids.
lc_attributes: List of additional attribute names that should be included as part of the serialized representation.

class

BaseMedia

Base class for content used in retrieval and data processing workflows.

Provides common fields for content that needs to be stored, indexed, or searched.

Note

For multimodal content in chat messages (images, audio sent to/from LLMs), use langchain.messages content blocks instead.

class

Blob

Raw data abstraction for document loading and file processing.

Represents raw bytes or text, either in-memory or by file reference. Used primarily by document loaders to decouple data loading from parsing.

Inspired by Mozilla's Blob

Initialize a blob from in-memory data

from langchain_core.documents import Blob

blob = Blob.from_data("Hello, world!")

# Read the blob as a string
print(blob.as_string())

# Read the blob as bytes
print(blob.as_bytes())

# Read the blob as a byte stream
with blob.as_bytes_io() as f:
    print(f.read())

Load from memory and specify MIME type and metadata

from langchain_core.documents import Blob

blob = Blob.from_data(
    data="Hello, world!",
    mime_type="text/plain",
    metadata={"source": "https://example.com"},
)

Load the blob from a file

from langchain_core.documents import Blob

blob = Blob.from_path("path/to/file.txt")

# Read the blob as a string
print(blob.as_string())

# Read the blob as bytes
print(blob.as_bytes())

# Read the blob as a byte stream
with blob.as_bytes_io() as f:
    print(f.read())

class

Document

Class for storing a piece of text and associated metadata.

Note

Document is for retrieval workflows, not chat I/O. For sending text to an LLM in a conversation, use message types from langchain.messages.

Type Aliases

typeAlias

PathLike

View source on GitHub

base

Classes

Type Aliases

LangChain Assistant

Menu

base

Classes

Type Aliases