Implementer should be able to load raw content from a storage system according to some criteria and return the raw content lazily as a stream of blobs.

Raw data abstraction for document loading and file processing.

Represents raw bytes or text, either in-memory or by file reference. Used primarily by document loaders to decouple data loading from parsing.

Inspired by Mozilla's Blob

Initialize a blob from in-memory data

from langchain_core.documents import Blob

blob = Blob.from_data("Hello, world!")

# Read the blob as a string
print(blob.as_string())

# Read the blob as bytes
print(blob.as_bytes())

# Read the blob as a byte stream
with blob.as_bytes_io() as f:
    print(f.read())

Load from memory and specify MIME type and metadata

from langchain_core.documents import Blob

blob = Blob.from_data(
    data="Hello, world!",
    mime_type="text/plain",
    metadata={"source": "https://example.com"},
)

Load the blob from a file

from langchain_core.documents import Blob

blob = Blob.from_path("path/to/file.txt")

# Read the blob as a string
print(blob.as_string())

# Read the blob as bytes
print(blob.as_bytes())

# Read the blob as a byte stream
with blob.as_bytes_io() as f:
    print(f.read())

Load LangSmith Dataset examples as Document objects.

Loads the example inputs as the Document page content and places the entire example into the Document metadata. This allows you to easily create few-shot example retrievers from the loaded documents.

Lazy loading

from langchain_core.document_loaders import LangSmithLoader

loader = LangSmithLoader(dataset_id="...", limit=100)
docs = []
for doc in loader.lazy_load():
    docs.append(doc)

# -> [Document("...", metadata={"inputs": {...}, "outputs": {...}, ...}), ...]

Schema for Blobs and Blob Loaders.

The goal is to facilitate decoupling of content loading from content parsing code. In addition, content loading code should provide a lazy loading interface by default.

LangSmith document loader.

Abstract interface for document loader implementations.

LangChain Assistant

Menu

document_loaders

Functions

Classes

Type Aliases

Modules