Base classes for media and documents.
This module contains core abstractions for data retrieval and processing workflows:
BaseMedia: Base class providing id and metadata fieldsBlob: Raw data loading (files, binary data) - used by document loadersDocument: Text content for retrieval (RAG, vector stores, semantic search)These classes are for data processing pipelines, not LLM I/O. For multimodal
content in chat messages (images, audio in conversations), see
langchain.messages content blocks instead.
Serializable base class.
This class is used to serialize objects to JSON.
It relies on the following methods and properties:
is_lc_serializable: Is this class serializable?
By design, even if a class inherits from Serializable, it is not serializable
by default. This is to prevent accidental serialization of objects that should
not be serialized.
get_lc_namespace: Get the namespace of the LangChain object.
During deserialization, this namespace is used to identify the correct class to instantiate.
Please see the Reviver class in langchain_core.load.load for more details.
During deserialization an additional mapping is handle classes that have moved or been renamed across package versions.
lc_secrets: A map of constructor argument names to secret ids.
lc_attributes: List of additional attribute names that should be included
as part of the serialized representation.
Base class for content used in retrieval and data processing workflows.
Provides common fields for content that needs to be stored, indexed, or searched.
For multimodal content in chat messages (images, audio sent to/from LLMs),
use langchain.messages content blocks instead.
Raw data abstraction for document loading and file processing.
Represents raw bytes or text, either in-memory or by file reference. Used primarily by document loaders to decouple data loading from parsing.
Inspired by Mozilla's Blob
from langchain_core.documents import Blob
blob = Blob.from_data("Hello, world!")
# Read the blob as a string
print(blob.as_string())
# Read the blob as bytes
print(blob.as_bytes())
# Read the blob as a byte stream
with blob.as_bytes_io() as f:
print(f.read())from langchain_core.documents import Blob
blob = Blob.from_data(
data="Hello, world!",
mime_type="text/plain",
metadata={"source": "https://example.com"},
)from langchain_core.documents import Blob
blob = Blob.from_path("path/to/file.txt")
# Read the blob as a string
print(blob.as_string())
# Read the blob as bytes
print(blob.as_bytes())
# Read the blob as a byte stream
with blob.as_bytes_io() as f:
print(f.read())Class for storing a piece of text and associated metadata.
Document is for retrieval workflows, not chat I/O. For sending text
to an LLM in a conversation, use message types from langchain.messages.