Documents¶
langchain_core.documents.base.BaseMedia
¶
Bases: Serializable
Use to represent media content.
Media objects can be used to represent raw data, such as text or binary data.
LangChain Media objects allow associating metadata and an optional identifier with the content.
The presence of an ID and metadata make it easier to store, index, and search over the content in a structured way.
METHOD | DESCRIPTION |
---|---|
__init__ |
|
is_lc_serializable |
Is this class serializable? |
get_lc_namespace |
Get the namespace of the LangChain object. |
lc_id |
Return a unique identifier for this class for serialization purposes. |
to_json |
Serialize the object to JSON. |
to_json_not_implemented |
Serialize a "not implemented" object. |
id
class-attribute
instance-attribute
¶
An optional identifier for the document.
Ideally this should be unique across the document collection and formatted as a UUID, but this will not be enforced.
Added in version 0.2.11
metadata
class-attribute
instance-attribute
¶
Arbitrary metadata associated with the content.
lc_secrets
property
¶
A map of constructor argument names to secret ids.
For example, {"openai_api_key": "OPENAI_API_KEY"}
lc_attributes
property
¶
lc_attributes: dict
List of attribute names that should be included in the serialized kwargs.
These attributes must be accepted by the constructor.
Default is an empty dictionary.
is_lc_serializable
classmethod
¶
is_lc_serializable() -> bool
Is this class serializable?
By design, even if a class inherits from Serializable
, it is not serializable
by default. This is to prevent accidental serialization of objects that should
not be serialized.
RETURNS | DESCRIPTION |
---|---|
bool
|
Whether the class is serializable. Default is |
get_lc_namespace
classmethod
¶
lc_id
classmethod
¶
Return a unique identifier for this class for serialization purposes.
The unique identifier is a list of strings that describes the path to the object.
For example, for the class langchain.llms.openai.OpenAI
, the id is
["langchain", "llms", "openai", "OpenAI"]
.
to_json
¶
Serialize the object to JSON.
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the class has deprecated attributes. |
RETURNS | DESCRIPTION |
---|---|
SerializedConstructor | SerializedNotImplemented
|
A json serializable object or a |
to_json_not_implemented
¶
Serialize a "not implemented" object.
RETURNS | DESCRIPTION |
---|---|
SerializedNotImplemented
|
|
langchain_core.documents.base.Blob
¶
Bases: BaseMedia
Blob represents raw data by either reference or value.
Provides an interface to materialize the blob in different representations, and help to decouple the development of data loaders from the downstream parsing of the raw data.
Inspired by: https://developer.mozilla.org/en-US/docs/Web/API/Blob
Example: Initialize a blob from in-memory data
from langchain_core.documents import Blob
blob = Blob.from_data("Hello, world!")
# Read the blob as a string
print(blob.as_string())
# Read the blob as bytes
print(blob.as_bytes())
# Read the blob as a byte stream
with blob.as_bytes_io() as f:
print(f.read())
Example: Load from memory and specify mime-type and metadata
from langchain_core.documents import Blob
blob = Blob.from_data(
data="Hello, world!",
mime_type="text/plain",
metadata={"source": "https://example.com"},
)
Example: Load the blob from a file
from langchain_core.documents import Blob
blob = Blob.from_path("path/to/file.txt")
# Read the blob as a string
print(blob.as_string())
# Read the blob as bytes
print(blob.as_bytes())
# Read the blob as a byte stream
with blob.as_bytes_io() as f:
print(f.read())
METHOD | DESCRIPTION |
---|---|
check_blob_is_valid |
Verify that either data or path is provided. |
as_string |
Read data as a string. |
as_bytes |
Read data as bytes. |
as_bytes_io |
Read data as a byte stream. |
from_path |
Load the blob from a path like object. |
from_data |
Initialize the blob from in-memory data. |
__repr__ |
Return the blob representation. |
__init__ |
|
is_lc_serializable |
Is this class serializable? |
get_lc_namespace |
Get the namespace of the LangChain object. |
lc_id |
Return a unique identifier for this class for serialization purposes. |
to_json |
Serialize the object to JSON. |
to_json_not_implemented |
Serialize a "not implemented" object. |
data
class-attribute
instance-attribute
¶
Raw data associated with the blob.
mimetype
class-attribute
instance-attribute
¶
mimetype: str | None = None
MimeType not to be confused with a file extension.
encoding
class-attribute
instance-attribute
¶
encoding: str = 'utf-8'
Encoding to use if decoding the bytes into a string.
Use utf-8
as default encoding, if decoding to string.
path
class-attribute
instance-attribute
¶
Location where the original content was found.
source
property
¶
source: str | None
The source location of the blob as string if known otherwise none.
If a path is associated with the blob, it will default to the path location.
Unless explicitly set via a metadata field called "source"
, in which
case that value will be used instead.
lc_secrets
property
¶
A map of constructor argument names to secret ids.
For example, {"openai_api_key": "OPENAI_API_KEY"}
lc_attributes
property
¶
lc_attributes: dict
List of attribute names that should be included in the serialized kwargs.
These attributes must be accepted by the constructor.
Default is an empty dictionary.
id
class-attribute
instance-attribute
¶
An optional identifier for the document.
Ideally this should be unique across the document collection and formatted as a UUID, but this will not be enforced.
Added in version 0.2.11
metadata
class-attribute
instance-attribute
¶
Arbitrary metadata associated with the content.
check_blob_is_valid
classmethod
¶
Verify that either data or path is provided.
as_string
¶
as_string() -> str
Read data as a string.
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the blob cannot be represented as a string. |
RETURNS | DESCRIPTION |
---|---|
str
|
The data as a string. |
as_bytes
¶
as_bytes() -> bytes
Read data as bytes.
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the blob cannot be represented as bytes. |
RETURNS | DESCRIPTION |
---|---|
bytes
|
The data as bytes. |
as_bytes_io
¶
as_bytes_io() -> Generator[BytesIO | BufferedReader, None, None]
Read data as a byte stream.
RAISES | DESCRIPTION |
---|---|
NotImplementedError
|
If the blob cannot be represented as a byte stream. |
YIELDS | DESCRIPTION |
---|---|
BytesIO | BufferedReader
|
The data as a byte stream. |
from_path
classmethod
¶
from_path(
path: PathLike,
*,
encoding: str = "utf-8",
mime_type: str | None = None,
guess_type: bool = True,
metadata: dict | None = None
) -> Blob
Load the blob from a path like object.
PARAMETER | DESCRIPTION |
---|---|
path
|
Path-like object to file to be read
TYPE:
|
encoding
|
Encoding to use if decoding the bytes into a string
TYPE:
|
mime_type
|
If provided, will be set as the mime-type of the data
TYPE:
|
guess_type
|
If
TYPE:
|
metadata
|
Metadata to associate with the blob
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Blob
|
Blob instance |
from_data
classmethod
¶
from_data(
data: str | bytes,
*,
encoding: str = "utf-8",
mime_type: str | None = None,
path: str | None = None,
metadata: dict | None = None
) -> Blob
Initialize the blob from in-memory data.
PARAMETER | DESCRIPTION |
---|---|
data
|
The in-memory data associated with the blob |
encoding
|
Encoding to use if decoding the bytes into a string
TYPE:
|
mime_type
|
If provided, will be set as the mime-type of the data
TYPE:
|
path
|
If provided, will be set as the source from which the data came
TYPE:
|
metadata
|
Metadata to associate with the blob
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Blob
|
Blob instance |
is_lc_serializable
classmethod
¶
is_lc_serializable() -> bool
Is this class serializable?
By design, even if a class inherits from Serializable
, it is not serializable
by default. This is to prevent accidental serialization of objects that should
not be serialized.
RETURNS | DESCRIPTION |
---|---|
bool
|
Whether the class is serializable. Default is |
get_lc_namespace
classmethod
¶
lc_id
classmethod
¶
Return a unique identifier for this class for serialization purposes.
The unique identifier is a list of strings that describes the path to the object.
For example, for the class langchain.llms.openai.OpenAI
, the id is
["langchain", "llms", "openai", "OpenAI"]
.
to_json
¶
Serialize the object to JSON.
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the class has deprecated attributes. |
RETURNS | DESCRIPTION |
---|---|
SerializedConstructor | SerializedNotImplemented
|
A json serializable object or a |
to_json_not_implemented
¶
Serialize a "not implemented" object.
RETURNS | DESCRIPTION |
---|---|
SerializedNotImplemented
|
|
langchain_core.documents.base.Document
¶
Bases: BaseMedia
Class for storing a piece of text and associated metadata.
Example
METHOD | DESCRIPTION |
---|---|
__init__ |
Pass page_content in as positional or named arg. |
is_lc_serializable |
Return True as this class is serializable. |
get_lc_namespace |
Get the namespace of the LangChain object. |
__str__ |
Override str to restrict it to page_content and metadata. |
lc_id |
Return a unique identifier for this class for serialization purposes. |
to_json |
Serialize the object to JSON. |
to_json_not_implemented |
Serialize a "not implemented" object. |
lc_secrets
property
¶
A map of constructor argument names to secret ids.
For example, {"openai_api_key": "OPENAI_API_KEY"}
lc_attributes
property
¶
lc_attributes: dict
List of attribute names that should be included in the serialized kwargs.
These attributes must be accepted by the constructor.
Default is an empty dictionary.
id
class-attribute
instance-attribute
¶
An optional identifier for the document.
Ideally this should be unique across the document collection and formatted as a UUID, but this will not be enforced.
Added in version 0.2.11
metadata
class-attribute
instance-attribute
¶
Arbitrary metadata associated with the content.
__init__
¶
Pass page_content in as positional or named arg.
is_lc_serializable
classmethod
¶
is_lc_serializable() -> bool
Return True as this class is serializable.
get_lc_namespace
classmethod
¶
__str__
¶
__str__() -> str
Override str to restrict it to page_content and metadata.
RETURNS | DESCRIPTION |
---|---|
str
|
A string representation of the Document. |
lc_id
classmethod
¶
Return a unique identifier for this class for serialization purposes.
The unique identifier is a list of strings that describes the path to the object.
For example, for the class langchain.llms.openai.OpenAI
, the id is
["langchain", "llms", "openai", "OpenAI"]
.
to_json
¶
Serialize the object to JSON.
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the class has deprecated attributes. |
RETURNS | DESCRIPTION |
---|---|
SerializedConstructor | SerializedNotImplemented
|
A json serializable object or a |
to_json_not_implemented
¶
Serialize a "not implemented" object.
RETURNS | DESCRIPTION |
---|---|
SerializedNotImplemented
|
|