Chroma vector store integration.
Chroma(
self,
collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
embedding_function: Embeddings | None = None,
persist_directory: str| Name | Type | Description |
|---|---|---|
collection_name | str | Default: _LANGCHAIN_DEFAULT_COLLECTION_NAMEName of the collection to create. |
embedding_function | Embeddings | None | Default: NoneEmbedding class object. Used to embed texts. |
persist_directory | str | None | Default: None |
host | str | None | Default: None |
port | int | None | Default: None |
ssl | bool | Default: False |
headers | dict[str, str] | None | Default: None |
chroma_cloud_api_key | str | None | Default: None |
tenant | str | None | Default: None |
database | str | None | Default: None |
client_settings | chromadb.config.Settings | None | Default: None |
collection_metadata | dict | None | Default: None |
collection_configuration | CreateCollectionConfiguration | None | Default: None |
client | chromadb.ClientAPI | None | Default: None |
relevance_score_fn | Callable[[float], float] | None | Default: None |
create_collection_if_not_exists | bool | None | Default: True |
| Name | Type |
|---|---|
| collection_name | str |
| embedding_function | Embeddings | None |
| persist_directory | str | None |
| host | str | None |
| port | int | None |
| headers | dict[str, str] | None |
| chroma_cloud_api_key | str | None |
| tenant | str | None |
| database | str | None |
| client_settings | chromadb.config.Settings | None |
| collection_metadata | dict | None |
| collection_configuration | CreateCollectionConfiguration | None |
| client | chromadb.ClientAPI | None |
| relevance_score_fn | Callable[[float], float] | None |
| create_collection_if_not_exists | bool | None |
| ssl | bool |
Setup:
Install chromadb, langchain-chroma packages:
pip install -qU chromadb langchain-chroma
Key init args — indexing params: collection_name: Name of the collection. embedding_function: Embedding function to use.
Key init args — client params: client: Chroma client to use. client_settings: Chroma client settings. persist_directory: Directory to persist the collection. host: Hostname of a deployed Chroma server. port: Connection port for a deployed Chroma server. Default is 8000. ssl: Whether to establish an SSL connection with a deployed Chroma server. Default is False. headers: HTTP headers to send to a deployed Chroma server. chroma_cloud_api_key: Chroma Cloud API key. tenant: Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. database: Database name. Required for Chroma Cloud connections. Default is 'default_database'.
Instantiate:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
vector_store = Chroma(
collection_name="foo",
embedding_function=OpenAIEmbeddings(),
# other params...
)
Add Documents:
from langchain_core.documents import Document
document_1 = Document(page_content="foo", metadata={"baz": "bar"})
document_2 = Document(page_content="thud", metadata={"bar": "baz"})
document_3 = Document(page_content="i will be deleted :(")
documents = [document_1, document_2, document_3]
ids = ["1", "2", "3"]
vector_store.add_documents(documents=documents, ids=ids)
Update Documents:
updated_document = Document(
page_content="qux",
metadata={"bar": "baz"},
)
vector_store.update_documents(ids=["1"], documents=[updated_document])
Delete Documents:
vector_store.delete(ids=["3"])
Search:
results = vector_store.similarity_search(query="thud", k=1)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
*thud[{"baz": "bar"}]
Search with filter:
results = vector_store.similarity_search(
query="thud", k=1, filter={"baz": "bar"}
)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
*foo[{"baz": "bar"}]
Search with score:
results = vector_store.similarity_search_with_score(query="qux", k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.000000] qux [{'bar': 'baz', 'baz': 'bar'}]
Async:
# add documents
# await vector_store.aadd_documents(documents=documents, ids=ids)
# delete documents
# await vector_store.adelete(ids=["3"])
# search
# results = vector_store.asimilarity_search(query="thud",k=1)
# search with score
results = await vector_store.asimilarity_search_with_score(query="qux", k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.335463] foo [{'baz': 'bar'}]
Use as Retriever:
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
)
retriever.invoke("thud")
[Document(metadata={"baz": "bar"}, page_content="thud")]Directory to persist the collection.
Hostname of a deployed Chroma server.
Connection port for a deployed Chroma server. Default is 8000.
Whether to establish an SSL connection with a deployed Chroma server. Default is False.
HTTP headers to send to a deployed Chroma server.
Chroma Cloud API key.
Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers.
Database name. Required for Chroma Cloud connections. Default is 'default_database'.
Chroma client settings
Collection configurations.
Index configuration for the collection.
Chroma client. Documentation: https://docs.trychroma.com/reference/python/client
Function to calculate relevance score from distance.
Used only in similarity_search_with_relevance_scores
Whether to create collection
if it doesn't exist. Defaults to True.