Class●Since v1.0

ElasticsearchEmbeddingsCache

Elasticsearch embeddings cache.

Caches embeddings in Elasticsearch to avoid repeated embedding computations.

ElasticsearchEmbeddingsCache(
  self,
  index_name: str,
  *,
  store_input: bool = True,
  metadata: Optional[Dict[str, Any]] = None,
  namespace: Optional[str] = None,
  maximum_duplicates_allowed: int = 1,
  client: Optional[Elasticsearch] = None,
  es_url: Optional[str] = None,
  es_cloud_id: Optional[str] = None,
  es_user: Optional[str] = None,
  es_api_key: Optional[str] = None,
  es_password: Optional[str] = None
)

Bases

ByteStore

Setup:

Install langchain_elasticsearch and start Elasticsearch locally using the start-local script.

pip install -qU langchain_elasticsearch
curl -fsSL https://elastic.co/start-local | sh

This will create an elastic-start-local folder. To start Elasticsearch and Kibana:

cd elastic-start-local
./start.sh

Elasticsearch will be available at http://localhost:9200. The password for the elastic user and API key are stored in the .env file in the elastic-start-local folder.

Initialize the Elasticsearch embeddings cache store.

Instantiate:

from langchain_elasticsearch import ElasticsearchEmbeddingsCache

cache = ElasticsearchEmbeddingsCache(
    index_name="embeddings-cache",
    es_url="http://localhost:9200"
)

Instantiate with API key:

from langchain_elasticsearch import ElasticsearchEmbeddingsCache

cache = ElasticsearchEmbeddingsCache(
    index_name="embeddings-cache",
    es_url="http://localhost:9200",
    es_api_key="your-api-key"
)

Instantiate from cloud:

from langchain_elasticsearch import ElasticsearchEmbeddingsCache

cache = ElasticsearchEmbeddingsCache(
    index_name="embeddings-cache",
    es_cloud_id="<cloud_id>",
    es_api_key="your-api-key"
)

Instantiate from existing connection:

from langchain_elasticsearch import ElasticsearchEmbeddingsCache
from elasticsearch import Elasticsearch

client = Elasticsearch("http://localhost:9200")
cache = ElasticsearchEmbeddingsCache(
    index_name="embeddings-cache",
    client=client
)

Use with CacheBackedEmbeddings:

from langchain.embeddings import CacheBackedEmbeddings
from langchain_openai import OpenAIEmbeddings
from langchain_elasticsearch import ElasticsearchEmbeddingsCache

underlying_embeddings = OpenAIEmbeddings()
cache = ElasticsearchEmbeddingsCache(
    index_name="embeddings-cache",
    es_url="http://localhost:9200"
)
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings,
    cache,
    namespace=underlying_embeddings.model
)

For synchronous applications, use the ElasticsearchEmbeddingsCache class. For asynchronous applications, use the AsyncElasticsearchEmbeddingsCache class.

Parameters

Name	Type	Description
`index_name`*	`str`	The name of the index or alias to use for the cache. If it doesn't exist, an index is created according to the default mapping.
`store_input`	`bool`	Default:`True` Whether to store the input text in the cache. Default is True.
`metadata`	`dict`	Default:`None` Additional metadata to store in the cache for filtering. Must be JSON serializable.
`namespace`	`str`	Default:`None` A namespace to organize the cache.
`maximum_duplicates_allowed`	`int`	Default:`1` Maximum duplicate keys permitted when using aliases across multiple indices. Default is 1.
`client`	`AsyncElasticsearch`	Default:`None` Pre-existing Elasticsearch connection. Either provide this OR credentials.
`es_url`	`str`	Default:`None` URL of the Elasticsearch instance.
`es_cloud_id`	`str`	Default:`None` Cloud ID of the Elasticsearch instance.
`es_user`	`str`	Default:`None` Username for Elasticsearch.
`es_api_key`	`str`	Default:`None` API key for Elasticsearch.
`es_password`	`str`	Default:`None` Password for Elasticsearch.

Constructors

constructor

__init__

Name	Type
index_name	str
store_input	bool
metadata	Optional[Dict[str, Any]]
namespace	Optional[str]
maximum_duplicates_allowed	int
client	Optional[Elasticsearch]
es_url	Optional[str]
es_cloud_id	Optional[str]
es_user	Optional[str]
es_api_key	Optional[str]
es_password	Optional[str]

Attributes

attribute

mapping: Dict[str, Any]

Get the default mapping for the index.

Methods

Encode the vector data as bytes to as a base64 string.

method

decode_vector

Decode the base64 string to vector data as bytes.

method

mget

Get the values associated with the given keys.

method

build_document

Build the Elasticsearch document for storing a single embedding

method

mset

Set the values for the given keys.

method

mdelete

Delete the given keys and their associated values.

method

yield_keys

Get an iterator over keys that match the given prefix.

View source on GitHub

ElasticsearchEmbeddingsCache( self, index_name: str, *, store_input: bool = True, metadata: Optional[Dict[str, Any]] = None, namespace: Optional[str] = None, maximum_duplicates_allowed: int = 1, client: Optional[Elasticsearch] = None, es_url: Optional[str] = None, es_cloud_id: Optional[str] = None, es_user: Optional[str] = None, es_api_key: Optional[str] = None, es_password: Optional[str] = None )

from langchain_elasticsearch import ElasticsearchEmbeddingsCache cache = ElasticsearchEmbeddingsCache( index_name="embeddings-cache", es_url="http://localhost:9200", es_api_key="your-api-key" )

from langchain_elasticsearch import ElasticsearchEmbeddingsCache cache = ElasticsearchEmbeddingsCache( index_name="embeddings-cache", es_cloud_id="<cloud_id>", es_api_key="your-api-key" )

from langchain_elasticsearch import ElasticsearchEmbeddingsCache from elasticsearch import Elasticsearch client = Elasticsearch("http://localhost:9200") cache = ElasticsearchEmbeddingsCache( index_name="embeddings-cache", client=client )

from langchain.embeddings import CacheBackedEmbeddings from langchain_openai import OpenAIEmbeddings from langchain_elasticsearch import ElasticsearchEmbeddingsCache underlying_embeddings = OpenAIEmbeddings() cache = ElasticsearchEmbeddingsCache( index_name="embeddings-cache", es_url="http://localhost:9200" ) cached_embeddings = CacheBackedEmbeddings.from_bytes_store( underlying_embeddings, cache, namespace=underlying_embeddings.model )

Parameters

Name	Type	Description
`index_name`*	`str`	The name of the index or alias to use for the cache. If it doesn't exist, an index is created according to the default mapping.
`store_input`	`bool`	Default:`True` Whether to store the input text in the cache. Default is True.
`metadata`	`dict`	Default:`None` Additional metadata to store in the cache for filtering. Must be JSON serializable.
`namespace`	`str`	Default:`None` A namespace to organize the cache.
`maximum_duplicates_allowed`	`int`	Default:`1` Maximum duplicate keys permitted when using aliases across multiple indices. Default is 1.
`client`	`AsyncElasticsearch`	Default:`None` Pre-existing Elasticsearch connection. Either provide this OR credentials.
`es_url`	`str`	Default:`None` URL of the Elasticsearch instance.
`es_cloud_id`	`str`	Default:`None` Cloud ID of the Elasticsearch instance.
`es_user`	`str`	Default:`None` Username for Elasticsearch.
`es_api_key`	`str`	Default:`None` API key for Elasticsearch.
`es_password`	`str`	Default:`None` Password for Elasticsearch.