Raw data abstraction for document loading and file processing.
Represents raw bytes or text, either in-memory or by file reference. Used primarily by document loaders to decouple data loading from parsing.
Inspired by Mozilla's Blob
from langchain_core.documents import Blob
blob = Blob.from_data("Hello, world!")
# Read the blob as a string
print(blob.as_string())
# Read the blob as bytes
print(blob.as_bytes())
# Read the blob as a byte stream
with blob.as_bytes_io() as f:
print(f.read())from langchain_core.documents import Blob
blob = Blob.from_data(
data="Hello, world!",
mime_type="text/plain",
metadata={"source": "https://example.com"},
)from langchain_core.documents import Blob
blob = Blob.from_path("path/to/file.txt")
# Read the blob as a string
print(blob.as_string())
# Read the blob as bytes
print(blob.as_bytes())
# Read the blob as a byte stream
with blob.as_bytes_io() as f:
print(f.read())Raw data associated with the Blob.
MIME type, not to be confused with a file extension.
Encoding to use if decoding the bytes into a string.
Uses utf-8 as default encoding if decoding to string.
Location where the original content was found.
The source location of the blob as string if known otherwise none.
If a path is associated with the Blob, it will default to the path location.
Unless explicitly set via a metadata field called 'source', in which
case that value will be used instead.
Return True as this class is serializable.
Get the namespace of the LangChain object.
Return a unique identifier for this class for serialization purposes.
Convert the graph to a JSON-serializable format.
Serialize a "not implemented" object.