Skip to content

Embeddings

Embeddings.

Modules:

Name Description
base

Factory functions for embeddings.

cache

Module contains code for a cache backed embedder.

Classes:

Name Description
Embeddings

Interface for embedding models.

CacheBackedEmbeddings

Interface for caching results from embedding models.

Functions:

Name Description
init_embeddings

Initialize an embeddings model from a model name and optional provider.

Embeddings

Bases: ABC

Interface for embedding models.

This is an interface meant for implementing text embedding models.

Text embedding models are used to map text to a vector (a point in n-dimensional space).

Texts that are similar will usually be mapped to points that are close to each other in this space. The exact details of what's considered "similar" and how "distance" is measured in this space are dependent on the specific embedding model.

This abstraction contains a method for embedding a list of documents and a method for embedding a query text. The embedding of a query text is expected to be a single vector, while the embedding of a list of documents is expected to be a list of vectors.

Usually the query embedding is identical to the document embedding, but the abstraction allows treating them independently.

In addition to the synchronous methods, this interface also provides asynchronous versions of the methods.

By default, the asynchronous methods are implemented using the synchronous methods; however, implementations may choose to override the asynchronous methods with an async native implementation for performance reasons.

Methods:

Name Description
embed_documents

Embed search docs.

embed_query

Embed query text.

aembed_documents

Asynchronous Embed search docs.

aembed_query

Asynchronous Embed query text.

embed_documents abstractmethod

embed_documents(texts: list[str]) -> list[list[float]]

Embed search docs.

Parameters:

Name Type Description Default
texts list[str]

List of text to embed.

required

Returns:

Type Description
list[list[float]]

List of embeddings.

embed_query abstractmethod

embed_query(text: str) -> list[float]

Embed query text.

Parameters:

Name Type Description Default
text str

Text to embed.

required

Returns:

Type Description
list[float]

Embedding.

aembed_documents async

aembed_documents(texts: list[str]) -> list[list[float]]

Asynchronous Embed search docs.

Parameters:

Name Type Description Default
texts list[str]

List of text to embed.

required

Returns:

Type Description
list[list[float]]

List of embeddings.

aembed_query async

aembed_query(text: str) -> list[float]

Asynchronous Embed query text.

Parameters:

Name Type Description Default
text str

Text to embed.

required

Returns:

Type Description
list[float]

Embedding.

CacheBackedEmbeddings

Bases: Embeddings

Interface for caching results from embedding models.

The interface allows works with any store that implements the abstract store interface accepting keys of type str and values of list of floats.

If need be, the interface can be extended to accept other implementations of the value serializer and deserializer, as well as the key encoder.

Note that by default only document embeddings are cached. To cache query embeddings too, pass in a query_embedding_store to constructor.

Examples:

.. code-block: python

from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import LocalFileStore
from langchain_community.embeddings import OpenAIEmbeddings

store = LocalFileStore('./my_cache')

underlying_embedder = OpenAIEmbeddings()
embedder = CacheBackedEmbeddings.from_bytes_store(
    underlying_embedder, store, namespace=underlying_embedder.model
)

# Embedding is computed and cached
embeddings = embedder.embed_documents(["hello", "goodbye"])

# Embeddings are retrieved from the cache, no computation is done
embeddings = embedder.embed_documents(["hello", "goodbye"])

Methods:

Name Description
__init__

Initialize the embedder.

embed_documents

Embed a list of texts.

aembed_documents

Embed a list of texts.

embed_query

Embed query text.

aembed_query

Embed query text.

from_bytes_store

On-ramp that adds the necessary serialization and encoding to the store.

__init__

__init__(
    underlying_embeddings: Embeddings,
    document_embedding_store: BaseStore[str, list[float]],
    *,
    batch_size: int | None = None,
    query_embedding_store: BaseStore[str, list[float]]
    | None = None,
) -> None

Initialize the embedder.

Parameters:

Name Type Description Default
underlying_embeddings Embeddings

the embedder to use for computing embeddings.

required
document_embedding_store BaseStore[str, list[float]]

The store to use for caching document embeddings.

required
batch_size int | None

The number of documents to embed between store updates.

None
query_embedding_store BaseStore[str, list[float]] | None

The store to use for caching query embeddings. If None, query embeddings are not cached.

None

embed_documents

embed_documents(texts: list[str]) -> list[list[float]]

Embed a list of texts.

The method first checks the cache for the embeddings. If the embeddings are not found, the method uses the underlying embedder to embed the documents and stores the results in the cache.

Parameters:

Name Type Description Default
texts list[str]

A list of texts to embed.

required

Returns:

Type Description
list[list[float]]

A list of embeddings for the given texts.

aembed_documents async

aembed_documents(texts: list[str]) -> list[list[float]]

Embed a list of texts.

The method first checks the cache for the embeddings. If the embeddings are not found, the method uses the underlying embedder to embed the documents and stores the results in the cache.

Parameters:

Name Type Description Default
texts list[str]

A list of texts to embed.

required

Returns:

Type Description
list[list[float]]

A list of embeddings for the given texts.

embed_query

embed_query(text: str) -> list[float]

Embed query text.

By default, this method does not cache queries. To enable caching, set the cache_query parameter to True when initializing the embedder.

Parameters:

Name Type Description Default
text str

The text to embed.

required

Returns:

Type Description
list[float]

The embedding for the given text.

aembed_query async

aembed_query(text: str) -> list[float]

Embed query text.

By default, this method does not cache queries. To enable caching, set the cache_query parameter to True when initializing the embedder.

Parameters:

Name Type Description Default
text str

The text to embed.

required

Returns:

Type Description
list[float]

The embedding for the given text.

from_bytes_store classmethod

from_bytes_store(
    underlying_embeddings: Embeddings,
    document_embedding_cache: ByteStore,
    *,
    namespace: str = "",
    batch_size: int | None = None,
    query_embedding_cache: bool | ByteStore = False,
    key_encoder: Callable[[str], str]
    | Literal[
        "sha1", "blake2b", "sha256", "sha512"
    ] = "sha1",
) -> CacheBackedEmbeddings

On-ramp that adds the necessary serialization and encoding to the store.

Parameters:

Name Type Description Default
underlying_embeddings Embeddings

The embedder to use for embedding.

required
document_embedding_cache ByteStore

The cache to use for storing document embeddings.

required
namespace str

The namespace to use for document cache. This namespace is used to avoid collisions with other caches. For example, set it to the name of the embedding model used.

''
batch_size int | None

The number of documents to embed between store updates.

None
query_embedding_cache bool | ByteStore

The cache to use for storing query embeddings. True to use the same cache as document embeddings. False to not cache query embeddings.

False
key_encoder Callable[[str], str] | Literal['sha1', 'blake2b', 'sha256', 'sha512']

Optional callable to encode keys. If not provided, a default encoder using SHA-1 will be used. SHA-1 is not collision-resistant, and a motivated attacker could craft two different texts that hash to the same cache key.

New applications should use one of the alternative encoders or provide a custom and strong key encoder function to avoid this risk.

If you change a key encoder in an existing cache, consider just creating a new cache, to avoid (the potential for) collisions with existing keys or having duplicate keys for the same text in the cache.

'sha1'

Returns:

Type Description
CacheBackedEmbeddings

An instance of CacheBackedEmbeddings that uses the provided cache.

init_embeddings

init_embeddings(
    model: str,
    *,
    provider: str | None = None,
    **kwargs: Any,
) -> Embeddings | Runnable[Any, list[float]]

Initialize an embeddings model from a model name and optional provider.

Note: Must have the integration package corresponding to the model provider installed.

Parameters:

Name Type Description Default
model str

Name of the model to use. Can be either: - A model string like "openai:text-embedding-3-small" - Just the model name if provider is specified

required
provider str | None

Optional explicit provider name. If not specified, will attempt to parse from the model string. Supported providers and their required packages:

{_get_provider_list()}

None
**kwargs Any

Additional model-specific parameters passed to the embedding model. These vary by provider, see the provider-specific documentation for details.

{}

Returns:

Type Description
Embeddings | Runnable[Any, list[float]]

An Embeddings instance that can generate embeddings for text.

Raises:

Type Description
ValueError

If the model provider is not supported or cannot be determined

ImportError

If the required provider package is not installed

.. dropdown:: Example Usage :open:

.. code-block:: python

    # Using a model string
    model = init_embeddings("openai:text-embedding-3-small")
    model.embed_query("Hello, world!")

    # Using explicit provider
    model = init_embeddings(model="text-embedding-3-small", provider="openai")
    model.embed_documents(["Hello, world!", "Goodbye, world!"])

    # With additional parameters
    model = init_embeddings("openai:text-embedding-3-small", api_key="sk-...")

.. versionadded:: 0.3.9