Elasticsearch embeddings cache.
Caches embeddings in Elasticsearch to avoid repeated embedding computations.
ElasticsearchEmbeddingsCache(
self,
index_name: str,
*,
store_input: bool = True,
metadata: Optional[Dict[str, Any]] = None,
namespace: Optional[str] = None,
maximum_duplicates_allowed: int = 1,
client: Optional[Elasticsearch] = None,
es_url: Optional[str] = None,
es_cloud_id: Optional[str] = None,
es_user: Optional[str] = None,
es_api_key: Optional[str] = None,
es_password: Optional[str] = None
)Setup:
Install langchain_elasticsearch and start Elasticsearch locally using
the start-local script.
pip install -qU langchain_elasticsearch
curl -fsSL https://elastic.co/start-local | sh
This will create an elastic-start-local folder. To start Elasticsearch
and Kibana:
cd elastic-start-local
./start.sh
Elasticsearch will be available at http://localhost:9200. The password
for the elastic user and API key are stored in the .env file in the
elastic-start-local folder.
Initialize the Elasticsearch embeddings cache store.
Instantiate:
from langchain_elasticsearch import ElasticsearchEmbeddingsCache
cache = ElasticsearchEmbeddingsCache(
index_name="embeddings-cache",
es_url="http://localhost:9200"
)
Instantiate with API key:
from langchain_elasticsearch import ElasticsearchEmbeddingsCache
cache = ElasticsearchEmbeddingsCache(
index_name="embeddings-cache",
es_url="http://localhost:9200",
es_api_key="your-api-key"
)
Instantiate from cloud:
from langchain_elasticsearch import ElasticsearchEmbeddingsCache
cache = ElasticsearchEmbeddingsCache(
index_name="embeddings-cache",
es_cloud_id="<cloud_id>",
es_api_key="your-api-key"
)
Instantiate from existing connection:
from langchain_elasticsearch import ElasticsearchEmbeddingsCache
from elasticsearch import Elasticsearch
client = Elasticsearch("http://localhost:9200")
cache = ElasticsearchEmbeddingsCache(
index_name="embeddings-cache",
client=client
)
Use with CacheBackedEmbeddings:
from langchain.embeddings import CacheBackedEmbeddings
from langchain_openai import OpenAIEmbeddings
from langchain_elasticsearch import ElasticsearchEmbeddingsCache
underlying_embeddings = OpenAIEmbeddings()
cache = ElasticsearchEmbeddingsCache(
index_name="embeddings-cache",
es_url="http://localhost:9200"
)
cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings,
cache,
namespace=underlying_embeddings.model
)
For synchronous applications, use the ElasticsearchEmbeddingsCache class.
For asynchronous applications, use the AsyncElasticsearchEmbeddingsCache
class.
| Name | Type | Description |
|---|---|---|
index_name* | str | The name of the index or alias to use for the cache. If it doesn't exist, an index is created according to the default mapping. |
store_input | bool | Default: TrueWhether to store the input text in the cache. Default is True. |
metadata | dict | Default: NoneAdditional metadata to store in the cache for filtering. Must be JSON serializable. |
namespace | str | Default: NoneA namespace to organize the cache. |
maximum_duplicates_allowed | int | Default: 1Maximum duplicate keys permitted when using aliases across multiple indices. Default is 1. |
client | AsyncElasticsearch | Default: NonePre-existing Elasticsearch connection. Either provide this OR credentials. |
es_url | str | Default: NoneURL of the Elasticsearch instance. |
es_cloud_id | str | Default: NoneCloud ID of the Elasticsearch instance. |
es_user | str | Default: NoneUsername for Elasticsearch. |
es_api_key | str | Default: NoneAPI key for Elasticsearch. |
es_password | str | Default: NonePassword for Elasticsearch. |
Encode the vector data as bytes to as a base64 string.
Decode the base64 string to vector data as bytes.
Get the values associated with the given keys.
Build the Elasticsearch document for storing a single embedding
Set the values for the given keys.
Delete the given keys and their associated values.
Get an iterator over keys that match the given prefix.