Abstract interface for document loader implementations.
Run a function in an executor.
Class for storing a piece of text and associated metadata.
Document is for retrieval workflows, not chat I/O. For sending text
to an LLM in a conversation, use message types from langchain.messages.
Interface for document loader.
Implementations should implement the lazy-loading method using generators to avoid loading all documents into memory at once.
load is provided just for user convenience and should not be overridden.
Abstract interface for blob parsers.
A blob parser provides a way to parse raw data stored in a blob into one or more
Document objects.
The parser can be composed with blob loaders, making it easy to reuse a parser independent of how the blob was originally loaded.
Raw data abstraction for document loading and file processing.
Represents raw bytes or text, either in-memory or by file reference. Used primarily by document loaders to decouple data loading from parsing.
Inspired by Mozilla's Blob
from langchain_core.documents import Blob
blob = Blob.from_data("Hello, world!")
# Read the blob as a string
print(blob.as_string())
# Read the blob as bytes
print(blob.as_bytes())
# Read the blob as a byte stream
with blob.as_bytes_io() as f:
print(f.read())from langchain_core.documents import Blob
blob = Blob.from_data(
data="Hello, world!",
mime_type="text/plain",
metadata={"source": "https://example.com"},
)from langchain_core.documents import Blob
blob = Blob.from_path("path/to/file.txt")
# Read the blob as a string
print(blob.as_string())
# Read the blob as bytes
print(blob.as_bytes())
# Read the blob as a byte stream
with blob.as_bytes_io() as f:
print(f.read())