| Name | Type | Description |
|---|---|---|
index_name* | str | Name of the index to create. |
embedding | Embeddings | Default: NoneEmbedding function to use. |
custom_index_settings | Optional[Dict[str, Any]] | Default: None |
client | Optional[Elasticsearch or AsyncElasticsearch] | Default: None |
es_url | Optional[str] | Default: None |
es_cloud_id | Optional[str] | Default: None |
es_user | Optional[str] | Default: None |
es_password | Optional[str] | Default: None |
es_api_key | Optional[str] | Default: None |
es_params | Optional[Dict[str, Any]] | Default: None |
num_dimensions | Optional[int] | Default: None |
metadata_mappings | Optional[Dict[str, Any]] | Default: None |
vector_query_field | str | Default: 'vector' |
query_field | str | Default: 'text' |
]])* | unknown | |
strategy | Union[BaseRetrievalStrategy, AsyncRetrievalStrategy] | Default: ApproxRetrievalStrategy() |
| Name | Type |
|---|---|
| index_name | str |
| embedding | Optional[Embeddings] |
| client | Optional[AsyncElasticsearch] |
| es_url | Optional[str] |
| es_cloud_id | Optional[str] |
| es_user | Optional[str] |
| es_api_key | Optional[str] |
| es_password | Optional[str] |
| vector_query_field | str |
| query_field | str |
| distance_strategy | Optional[Literal[DistanceStrategy.COSINE, DistanceStrategy.DOT_PRODUCT, DistanceStrategy.EUCLIDEAN_DISTANCE, DistanceStrategy.MAX_INNER_PRODUCT |
| strategy | Union[BaseRetrievalStrategy, AsyncRetrievalStrategy] |
| es_params | Optional[Dict[str, Any]] |
| custom_index_settings | Optional[Dict[str, Any]] |
| num_dimensions | Optional[int] |
| metadata_mappings | Optional[Dict[str, Any]] |
Elasticsearch vector store.
Setup:
Install langchain_elasticsearch and start Elasticsearch locally using
the start-local script.
pip install -qU langchain_elasticsearch
curl -fsSL https://elastic.co/start-local | sh
This will create an elastic-start-local folder. To start Elasticsearch
and Kibana:
cd elastic-start-local
./start.sh
Elasticsearch will be available at http://localhost:9200. The password
for the elastic user and API key are stored in the .env file in the
elastic-start-local folder.
Initialize the AsyncElasticsearchStore instance.
Instantiate:
from langchain_elasticsearch import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings
vector_store = ElasticsearchStore(
index_name="langchain-demo",
embedding=OpenAIEmbeddings(),
es_url="http://localhost:9200",
)
Instantiate with API key (URL):
from langchain_elasticsearch import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings
store = ElasticsearchStore(
index_name="langchain-demo",
embedding=OpenAIEmbeddings(),
es_url="http://localhost:9200",
es_api_key="your-api-key"
)
Instantiate with username/password (URL):
from langchain_elasticsearch import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings
store = ElasticsearchStore(
index_name="langchain-demo",
embedding=OpenAIEmbeddings(),
es_url="http://localhost:9200",
es_user="elastic",
es_password="password"
)
If you want to use a cloud hosted Elasticsearch instance, you can pass in the cloud_id argument instead of the es_url argument.
Instantiate from cloud (with username/password):
from langchain_elasticsearch.vectorstores import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings
store = ElasticsearchStore(
embedding=OpenAIEmbeddings(),
index_name="langchain-demo",
es_cloud_id="<cloud_id>",
es_user="elastic",
es_password="<password>"
)
Instantiate from cloud (with API key):
from langchain_elasticsearch.vectorstores import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings
store = ElasticsearchStore(
embedding=OpenAIEmbeddings(),
index_name="langchain-demo",
es_cloud_id="<cloud_id>",
es_api_key="your-api-key"
)
You can also connect to an existing Elasticsearch instance by passing in a pre-existing Elasticsearch connection via the client argument.
Instantiate from existing connection:
from langchain_elasticsearch.vectorstores import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings
from elasticsearch import Elasticsearch
client = Elasticsearch("http://localhost:9200")
store = ElasticsearchStore(
embedding=OpenAIEmbeddings(),
index_name="langchain-demo",
client=client
)
Class methods (afrom_texts, afrom_documents) accept the same connection options:
Instantiate from texts with credentials:
from langchain_elasticsearch import ElasticsearchStore
store = await ElasticsearchStore.afrom_texts(
texts=["text1", "text2"],
index_name="langchain-demo",
es_url="http://localhost:9200"
)
Instantiate from texts with client:
from langchain_elasticsearch import ElasticsearchStore
from elasticsearch import Elasticsearch
client = Elasticsearch("http://localhost:9200")
store = await ElasticsearchStore.afrom_texts(
texts=["text1", "text2"],
index_name="langchain-demo",
client=client
)
Add Documents:
from langchain_core.documents import Document
document_1 = Document(page_content="foo", metadata={"baz": "bar"})
document_2 = Document(page_content="thud", metadata={"bar": "baz"})
document_3 = Document(page_content="i will be deleted :(")
documents = [document_1, document_2, document_3]
ids = ["1", "2", "3"]
vector_store.add_documents(documents=documents, ids=ids)
Delete Documents:
vector_store.delete(ids=["3"])
Search:
results = vector_store.similarity_search(query="thud",k=1)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* thud [{'bar': 'baz'}]
Search with filter:
results = vector_store.similarity_search(query="thud",k=1,filter=[{"term": {"metadata.bar.keyword": "baz"}}])
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* thud [{'bar': 'baz'}]
Search with score:
results = vector_store.similarity_search_with_score(query="qux",k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.916092] foo [{'baz': 'bar'}]
Async:
from langchain_elasticsearch import AsyncElasticsearchStore
vector_store = AsyncElasticsearchStore(...)
# add documents
await vector_store.aadd_documents(documents=documents, ids=ids)
# delete documents
await vector_store.adelete(ids=["3"])
# search
results = vector_store.asimilarity_search(query="thud",k=1)
# search with score
results = await vector_store.asimilarity_search_with_score(query="qux",k=1)
for doc,score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.916092] foo [{'baz': 'bar'}]
Use as Retriever:
pip install "elasticsearch[vectorstore_mmr]"
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
)
retriever.invoke("thud")
[Document(metadata={'bar': 'baz'}, page_content='thud')]
Advanced Uses:
ElasticsearchStore by default uses the ApproxRetrievalStrategy, which uses the HNSW algorithm to perform approximate nearest neighbor search. This is the fastest and most memory efficient algorithm.
If you want to use the Brute force / Exact strategy for searching vectors, you can pass in the ExactRetrievalStrategy to the ElasticsearchStore constructor.
Use ExactRetrievalStrategy:
from langchain_elasticsearch.vectorstores import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings
store = ElasticsearchStore(
embedding=OpenAIEmbeddings(),
index_name="langchain-demo",
es_url="http://localhost:9200",
strategy=ElasticsearchStore.ExactRetrievalStrategy()
)
Both strategies require that you know the similarity metric you want to use when creating the index. The default is cosine similarity, but you can also use dot product or euclidean distance.
Use dot product similarity:
from langchain_elasticsearch.vectorstores import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings
from langchain_elasticsearch import DistanceStrategy
store = ElasticsearchStore(
"langchain-demo",
embedding=OpenAIEmbeddings(),
es_url="http://localhost:9200",
distance_strategy="DOT_PRODUCT"
)A dictionary of custom settings for the index. This can include configurations like the number of shards, number of replicas,analysis settings, and other index-specific settings. If not provided, default settings will be used. Note that if the same setting is provided by both the user and the strategy, will raise an error.
Pre-existing Elasticsearch connection. Either provide this OR credentials.
URL of the Elasticsearch instance to connect to.
Cloud ID of the Elasticsearch instance to connect to.
Username to use when connecting to Elasticsearch.
Password to use when connecting to Elasticsearch.
API key to use when connecting to Elasticsearch.
Additional parameters for the Elasticsearch client.
Number of dimensions of the embeddings.
Metadata mappings for the index.
Name of the field containing the vector query. Default is vector.
Name of the field containing the text query. Default is text.
Distance strategy to use.
Retrieval strategy to use. Default is ApproxRetrievalStrategy().