Skip to content

Documents

langchain_core.documents.base.BaseMedia

Bases: Serializable

Use to represent media content.

Media objects can be used to represent raw data, such as text or binary data.

LangChain Media objects allow associating metadata and an optional identifier with the content.

The presence of an ID and metadata make it easier to store, index, and search over the content in a structured way.

METHOD DESCRIPTION
__init__
is_lc_serializable

Is this class serializable?

get_lc_namespace

Get the namespace of the LangChain object.

lc_id

Return a unique identifier for this class for serialization purposes.

to_json

Serialize the object to JSON.

to_json_not_implemented

Serialize a "not implemented" object.

id class-attribute instance-attribute

id: str | None = Field(default=None, coerce_numbers_to_str=True)

An optional identifier for the document.

Ideally this should be unique across the document collection and formatted as a UUID, but this will not be enforced.

Added in version 0.2.11

metadata class-attribute instance-attribute

metadata: dict = Field(default_factory=dict)

Arbitrary metadata associated with the content.

lc_secrets property

lc_secrets: dict[str, str]

A map of constructor argument names to secret ids.

For example, {"openai_api_key": "OPENAI_API_KEY"}

lc_attributes property

lc_attributes: dict

List of attribute names that should be included in the serialized kwargs.

These attributes must be accepted by the constructor.

Default is an empty dictionary.

__init__

__init__(*args: Any, **kwargs: Any) -> None

is_lc_serializable classmethod

is_lc_serializable() -> bool

Is this class serializable?

By design, even if a class inherits from Serializable, it is not serializable by default. This is to prevent accidental serialization of objects that should not be serialized.

RETURNS DESCRIPTION
bool

Whether the class is serializable. Default is False.

get_lc_namespace classmethod

get_lc_namespace() -> list[str]

Get the namespace of the LangChain object.

For example, if the class is langchain.llms.openai.OpenAI, then the namespace is ["langchain", "llms", "openai"]

RETURNS DESCRIPTION
list[str]

The namespace.

lc_id classmethod

lc_id() -> list[str]

Return a unique identifier for this class for serialization purposes.

The unique identifier is a list of strings that describes the path to the object.

For example, for the class langchain.llms.openai.OpenAI, the id is ["langchain", "llms", "openai", "OpenAI"].

to_json

to_json() -> SerializedConstructor | SerializedNotImplemented

Serialize the object to JSON.

RAISES DESCRIPTION
ValueError

If the class has deprecated attributes.

RETURNS DESCRIPTION
SerializedConstructor | SerializedNotImplemented

A json serializable object or a SerializedNotImplemented object.

to_json_not_implemented

to_json_not_implemented() -> SerializedNotImplemented

Serialize a "not implemented" object.

RETURNS DESCRIPTION
SerializedNotImplemented

SerializedNotImplemented.

langchain_core.documents.base.Blob

Bases: BaseMedia

Blob represents raw data by either reference or value.

Provides an interface to materialize the blob in different representations, and help to decouple the development of data loaders from the downstream parsing of the raw data.

Inspired by: https://developer.mozilla.org/en-US/docs/Web/API/Blob

Example: Initialize a blob from in-memory data

from langchain_core.documents import Blob

blob = Blob.from_data("Hello, world!")

# Read the blob as a string
print(blob.as_string())

# Read the blob as bytes
print(blob.as_bytes())

# Read the blob as a byte stream
with blob.as_bytes_io() as f:
    print(f.read())

Example: Load from memory and specify mime-type and metadata

from langchain_core.documents import Blob

blob = Blob.from_data(
    data="Hello, world!",
    mime_type="text/plain",
    metadata={"source": "https://example.com"},
)

Example: Load the blob from a file

from langchain_core.documents import Blob

blob = Blob.from_path("path/to/file.txt")

# Read the blob as a string
print(blob.as_string())

# Read the blob as bytes
print(blob.as_bytes())

# Read the blob as a byte stream
with blob.as_bytes_io() as f:
    print(f.read())
METHOD DESCRIPTION
check_blob_is_valid

Verify that either data or path is provided.

as_string

Read data as a string.

as_bytes

Read data as bytes.

as_bytes_io

Read data as a byte stream.

from_path

Load the blob from a path like object.

from_data

Initialize the blob from in-memory data.

__repr__

Return the blob representation.

__init__
is_lc_serializable

Is this class serializable?

get_lc_namespace

Get the namespace of the LangChain object.

lc_id

Return a unique identifier for this class for serialization purposes.

to_json

Serialize the object to JSON.

to_json_not_implemented

Serialize a "not implemented" object.

data class-attribute instance-attribute

data: bytes | str | None = None

Raw data associated with the blob.

mimetype class-attribute instance-attribute

mimetype: str | None = None

MimeType not to be confused with a file extension.

encoding class-attribute instance-attribute

encoding: str = 'utf-8'

Encoding to use if decoding the bytes into a string.

Use utf-8 as default encoding, if decoding to string.

path class-attribute instance-attribute

path: PathLike | None = None

Location where the original content was found.

source property

source: str | None

The source location of the blob as string if known otherwise none.

If a path is associated with the blob, it will default to the path location.

Unless explicitly set via a metadata field called "source", in which case that value will be used instead.

lc_secrets property

lc_secrets: dict[str, str]

A map of constructor argument names to secret ids.

For example, {"openai_api_key": "OPENAI_API_KEY"}

lc_attributes property

lc_attributes: dict

List of attribute names that should be included in the serialized kwargs.

These attributes must be accepted by the constructor.

Default is an empty dictionary.

id class-attribute instance-attribute

id: str | None = Field(default=None, coerce_numbers_to_str=True)

An optional identifier for the document.

Ideally this should be unique across the document collection and formatted as a UUID, but this will not be enforced.

Added in version 0.2.11

metadata class-attribute instance-attribute

metadata: dict = Field(default_factory=dict)

Arbitrary metadata associated with the content.

check_blob_is_valid classmethod

check_blob_is_valid(values: dict[str, Any]) -> Any

Verify that either data or path is provided.

as_string

as_string() -> str

Read data as a string.

RAISES DESCRIPTION
ValueError

If the blob cannot be represented as a string.

RETURNS DESCRIPTION
str

The data as a string.

as_bytes

as_bytes() -> bytes

Read data as bytes.

RAISES DESCRIPTION
ValueError

If the blob cannot be represented as bytes.

RETURNS DESCRIPTION
bytes

The data as bytes.

as_bytes_io

as_bytes_io() -> Generator[BytesIO | BufferedReader, None, None]

Read data as a byte stream.

RAISES DESCRIPTION
NotImplementedError

If the blob cannot be represented as a byte stream.

YIELDS DESCRIPTION
BytesIO | BufferedReader

The data as a byte stream.

from_path classmethod

from_path(
    path: PathLike,
    *,
    encoding: str = "utf-8",
    mime_type: str | None = None,
    guess_type: bool = True,
    metadata: dict | None = None
) -> Blob

Load the blob from a path like object.

PARAMETER DESCRIPTION
path

Path-like object to file to be read

TYPE: PathLike

encoding

Encoding to use if decoding the bytes into a string

TYPE: str DEFAULT: 'utf-8'

mime_type

If provided, will be set as the mime-type of the data

TYPE: str | None DEFAULT: None

guess_type

If True, the mimetype will be guessed from the file extension, if a mime-type was not provided

TYPE: bool DEFAULT: True

metadata

Metadata to associate with the blob

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Blob

Blob instance

from_data classmethod

from_data(
    data: str | bytes,
    *,
    encoding: str = "utf-8",
    mime_type: str | None = None,
    path: str | None = None,
    metadata: dict | None = None
) -> Blob

Initialize the blob from in-memory data.

PARAMETER DESCRIPTION
data

The in-memory data associated with the blob

TYPE: str | bytes

encoding

Encoding to use if decoding the bytes into a string

TYPE: str DEFAULT: 'utf-8'

mime_type

If provided, will be set as the mime-type of the data

TYPE: str | None DEFAULT: None

path

If provided, will be set as the source from which the data came

TYPE: str | None DEFAULT: None

metadata

Metadata to associate with the blob

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Blob

Blob instance

__repr__

__repr__() -> str

Return the blob representation.

__init__

__init__(*args: Any, **kwargs: Any) -> None

is_lc_serializable classmethod

is_lc_serializable() -> bool

Is this class serializable?

By design, even if a class inherits from Serializable, it is not serializable by default. This is to prevent accidental serialization of objects that should not be serialized.

RETURNS DESCRIPTION
bool

Whether the class is serializable. Default is False.

get_lc_namespace classmethod

get_lc_namespace() -> list[str]

Get the namespace of the LangChain object.

For example, if the class is langchain.llms.openai.OpenAI, then the namespace is ["langchain", "llms", "openai"]

RETURNS DESCRIPTION
list[str]

The namespace.

lc_id classmethod

lc_id() -> list[str]

Return a unique identifier for this class for serialization purposes.

The unique identifier is a list of strings that describes the path to the object.

For example, for the class langchain.llms.openai.OpenAI, the id is ["langchain", "llms", "openai", "OpenAI"].

to_json

to_json() -> SerializedConstructor | SerializedNotImplemented

Serialize the object to JSON.

RAISES DESCRIPTION
ValueError

If the class has deprecated attributes.

RETURNS DESCRIPTION
SerializedConstructor | SerializedNotImplemented

A json serializable object or a SerializedNotImplemented object.

to_json_not_implemented

to_json_not_implemented() -> SerializedNotImplemented

Serialize a "not implemented" object.

RETURNS DESCRIPTION
SerializedNotImplemented

SerializedNotImplemented.

langchain_core.documents.base.Document

Bases: BaseMedia

Class for storing a piece of text and associated metadata.

Example
from langchain_core.documents import Document

document = Document(
    page_content="Hello, world!", metadata={"source": "https://example.com"}
)
METHOD DESCRIPTION
__init__

Pass page_content in as positional or named arg.

is_lc_serializable

Return True as this class is serializable.

get_lc_namespace

Get the namespace of the LangChain object.

__str__

Override str to restrict it to page_content and metadata.

lc_id

Return a unique identifier for this class for serialization purposes.

to_json

Serialize the object to JSON.

to_json_not_implemented

Serialize a "not implemented" object.

page_content instance-attribute

page_content: str

String text.

lc_secrets property

lc_secrets: dict[str, str]

A map of constructor argument names to secret ids.

For example, {"openai_api_key": "OPENAI_API_KEY"}

lc_attributes property

lc_attributes: dict

List of attribute names that should be included in the serialized kwargs.

These attributes must be accepted by the constructor.

Default is an empty dictionary.

id class-attribute instance-attribute

id: str | None = Field(default=None, coerce_numbers_to_str=True)

An optional identifier for the document.

Ideally this should be unique across the document collection and formatted as a UUID, but this will not be enforced.

Added in version 0.2.11

metadata class-attribute instance-attribute

metadata: dict = Field(default_factory=dict)

Arbitrary metadata associated with the content.

__init__

__init__(page_content: str, **kwargs: Any) -> None

Pass page_content in as positional or named arg.

is_lc_serializable classmethod

is_lc_serializable() -> bool

Return True as this class is serializable.

get_lc_namespace classmethod

get_lc_namespace() -> list[str]

Get the namespace of the LangChain object.

RETURNS DESCRIPTION
list[str]

["langchain", "schema", "document"]

__str__

__str__() -> str

Override str to restrict it to page_content and metadata.

RETURNS DESCRIPTION
str

A string representation of the Document.

lc_id classmethod

lc_id() -> list[str]

Return a unique identifier for this class for serialization purposes.

The unique identifier is a list of strings that describes the path to the object.

For example, for the class langchain.llms.openai.OpenAI, the id is ["langchain", "llms", "openai", "OpenAI"].

to_json

to_json() -> SerializedConstructor | SerializedNotImplemented

Serialize the object to JSON.

RAISES DESCRIPTION
ValueError

If the class has deprecated attributes.

RETURNS DESCRIPTION
SerializedConstructor | SerializedNotImplemented

A json serializable object or a SerializedNotImplemented object.

to_json_not_implemented

to_json_not_implemented() -> SerializedNotImplemented

Serialize a "not implemented" object.

RETURNS DESCRIPTION
SerializedNotImplemented

SerializedNotImplemented.