Document loaders.
Abstract interface for blob parsers.
A blob parser provides a way to parse raw data stored in a blob into one or more
Document objects.
The parser can be composed with blob loaders, making it easy to reuse a parser independent of how the blob was originally loaded.
Interface for document loader.
Implementations should implement the lazy-loading method using generators to avoid loading all documents into memory at once.
load is provided just for user convenience and should not be overridden.
Raw data abstraction for document loading and file processing.
Represents raw bytes or text, either in-memory or by file reference. Used primarily by document loaders to decouple data loading from parsing.
Inspired by Mozilla's Blob
from langchain_core.documents import Blob
blob = Blob.from_data("Hello, world!")
# Read the blob as a string
print(blob.as_string())
# Read the blob as bytes
print(blob.as_bytes())
# Read the blob as a byte stream
with blob.as_bytes_io() as f:
print(f.read())from langchain_core.documents import Blob
blob = Blob.from_data(
data="Hello, world!",
mime_type="text/plain",
metadata={"source": "https://example.com"},
)from langchain_core.documents import Blob
blob = Blob.from_path("path/to/file.txt")
# Read the blob as a string
print(blob.as_string())
# Read the blob as bytes
print(blob.as_bytes())
# Read the blob as a byte stream
with blob.as_bytes_io() as f:
print(f.read())Abstract interface for blob loaders implementation.
Implementer should be able to load raw content from a storage system according to some criteria and return the raw content lazily as a stream of blobs.
Load LangSmith Dataset examples as Document objects.
Loads the example inputs as the Document page content and places the entire
example into the Document metadata. This allows you to easily create few-shot
example retrievers from the loaded documents.
from langchain_core.document_loaders import LangSmithLoader
loader = LangSmithLoader(dataset_id="...", limit=100)
docs = []
for doc in loader.lazy_load():
docs.append(doc)
# -> [Document("...", metadata={"inputs": {...}, "outputs": {...}, ...}), ...]LangSmith document loader.
Schema for Blobs and Blob Loaders.
The goal is to facilitate decoupling of content loading from content parsing code. In addition, content loading code should provide a lazy loading interface by default.
Abstract interface for document loader implementations.