`langchain-chroma`¶

Reference docs

This page contains reference documentation for Chroma. See the docs for conceptual guides, tutorials, and examples on using Chroma modules.

langchain_chroma ¶

LangChain integration for Chroma vector database.

Chroma ¶

Bases: VectorStore

Chroma vector store integration.

Setup

Install chromadb, langchain-chroma packages:

pip install -qU chromadb langchain-chroma

Key init args — indexing params: collection_name: Name of the collection. embedding_function: Embedding function to use.

Key init args — client params: client: Chroma client to use. client_settings: Chroma client settings. persist_directory: Directory to persist the collection. host: Hostname of a deployed Chroma server. port: Connection port for a deployed Chroma server. Default is 8000. ssl: Whether to establish an SSL connection with a deployed Chroma server. Default is False. headers: HTTP headers to send to a deployed Chroma server. chroma_cloud_api_key: Chroma Cloud API key. tenant: Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. database: Database name. Required for Chroma Cloud connections. Default is 'default_database'.

Instantiate

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vector_store = Chroma(
    collection_name="foo",
    embedding_function=OpenAIEmbeddings(),
    # other params...
)

Add Documents

from langchain_core.documents import Document

document_1 = Document(page_content="foo", metadata={"baz": "bar"})
document_2 = Document(page_content="thud", metadata={"bar": "baz"})
document_3 = Document(page_content="i will be deleted :(")

documents = [document_1, document_2, document_3]
ids = ["1", "2", "3"]
vector_store.add_documents(documents=documents, ids=ids)

Update Documents

updated_document = Document(
    page_content="qux",
    metadata={"bar": "baz"},
)

vector_store.update_documents(ids=["1"], documents=[updated_document])

Delete Documents

vector_store.delete(ids=["3"])

Search

results = vector_store.similarity_search(query="thud", k=1)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

*thud[{"baz": "bar"}]

Search with filter

results = vector_store.similarity_search(
    query="thud", k=1, filter={"baz": "bar"}
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

*foo[{"baz": "bar"}]

Search with score

results = vector_store.similarity_search_with_score(query="qux", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.000000] qux [{'bar': 'baz', 'baz': 'bar'}]

Async

# add documents
# await vector_store.aadd_documents(documents=documents, ids=ids)

# delete documents
# await vector_store.adelete(ids=["3"])

# search
# results = vector_store.asimilarity_search(query="thud",k=1)

# search with score
results = await vector_store.asimilarity_search_with_score(query="qux", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.335463] foo [{'baz': 'bar'}]

Use as Retriever

retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
)
retriever.invoke("thud")

[Document(metadata={"baz": "bar"}, page_content="thud")]

METHOD	DESCRIPTION
`aget_by_ids`	Async get documents by their IDs.
`adelete`	Async delete by vector ID or other criteria.
`aadd_texts`	Async run more texts through the embeddings and add to the `VectorStore`.
`add_documents`	Add or update documents in the `VectorStore`.
`aadd_documents`	Async run more documents through the embeddings and add to the `VectorStore`.
`search`	Return docs most similar to query using a specified search type.
`asearch`	Async return docs most similar to query using a specified search type.
`asimilarity_search_with_score`	Async run similarity search with distance.
`similarity_search_with_relevance_scores`	Return docs and relevance scores in the range `[0, 1]`.
`asimilarity_search_with_relevance_scores`	Async return docs and relevance scores in the range `[0, 1]`.
`asimilarity_search`	Async return docs most similar to query.
`asimilarity_search_by_vector`	Async return docs most similar to embedding vector.
`amax_marginal_relevance_search`	Async return docs selected using the maximal marginal relevance.
`amax_marginal_relevance_search_by_vector`	Async return docs selected using the maximal marginal relevance.
`afrom_documents`	Async return `VectorStore` initialized from documents and embeddings.
`afrom_texts`	Async return `VectorStore` initialized from texts and embeddings.
`as_retriever`	Return `VectorStoreRetriever` initialized from this `VectorStore`.
`__init__`	Initialize with a Chroma client.
`encode_image`	Get base64 string from image URI.
`fork`	Fork this vector store.
`add_images`	Run more images through the embeddings and add to the `VectorStore`.
`add_texts`	Run more texts through the embeddings and add to the `VectorStore`.
`hybrid_search`	Run hybrid search with Chroma.
`similarity_search`	Run similarity search with Chroma.
`similarity_search_by_vector`	Return docs most similar to embedding vector.
`similarity_search_by_vector_with_relevance_scores`	Return docs most similar to embedding vector and similarity score.
`similarity_search_with_score`	Run similarity search with Chroma with distance.
`similarity_search_with_vectors`	Run similarity search with Chroma with vectors.
`similarity_search_by_image`	Search for similar images based on the given image URI.
`similarity_search_by_image_with_relevance_score`	Search for similar images based on the given image URI.
`max_marginal_relevance_search_by_vector`	Return docs selected using the maximal marginal relevance.
`max_marginal_relevance_search`	Return docs selected using the maximal marginal relevance.
`delete_collection`	Delete the collection.
`reset_collection`	Resets the collection.
`get`	Gets the collection.
`get_by_ids`	Get documents by their IDs.
`update_document`	Update a document in the collection.
`update_documents`	Update a document in the collection.
`from_texts`	Create a Chroma vectorstore from a raw documents.
`from_documents`	Create a Chroma vectorstore from a list of documents.
`delete`	Delete by vector IDs.

embeddings `property` ¶

embeddings: Embeddings | None

Access the query embedding object.

aget_by_ids `async` ¶

aget_by_ids(ids: Sequence[str]) -> list[Document]

Async get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

PARAMETER	DESCRIPTION
`ids`	List of IDs to retrieve. TYPE: `Sequence[str]`

RETURNS	DESCRIPTION
`list[Document]`	List of `Document` objects.

adelete `async` ¶

adelete(ids: list[str] | None = None, **kwargs: Any) -> bool | None

Async delete by vector ID or other criteria.

PARAMETER	DESCRIPTION
`ids`	List of IDs to delete. If `None`, delete all. TYPE: `list[str] \| None` DEFAULT: `None`
`**kwargs`	Other keyword arguments that subclasses might use. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`bool \| None`	`True` if deletion is successful, `False` otherwise, `None` if not implemented.

aadd_texts `async` ¶

aadd_texts(
    texts: Iterable[str],
    metadatas: list[dict] | None = None,
    *,
    ids: list[str] | None = None,
    **kwargs: Any,
) -> list[str]

Async run more texts through the embeddings and add to the VectorStore.

PARAMETER	DESCRIPTION
`texts`	Iterable of strings to add to the `VectorStore`. TYPE: `Iterable[str]`
`metadatas`	Optional list of metadatas associated with the texts. TYPE: `list[dict] \| None` DEFAULT: `None`
`ids`	Optional list TYPE: `list[str] \| None` DEFAULT: `None`
`**kwargs`	`VectorStore` specific parameters. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[str]`	List of IDs from adding the texts into the `VectorStore`.

RAISES	DESCRIPTION
`ValueError`	If the number of metadatas does not match the number of texts.
`ValueError`	If the number of IDs does not match the number of texts.

add_documents ¶

add_documents(documents: list[Document], **kwargs: Any) -> list[str]

Add or update documents in the VectorStore.

PARAMETER	DESCRIPTION
`documents`	Documents to add to the `VectorStore`. TYPE: `list[Document]`
`**kwargs`	Additional keyword arguments. If kwargs contains IDs and documents contain ids, the IDs in the kwargs will receive precedence. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[str]`	List of IDs of the added texts.

aadd_documents `async` ¶

aadd_documents(documents: list[Document], **kwargs: Any) -> list[str]

Async run more documents through the embeddings and add to the VectorStore.

PARAMETER	DESCRIPTION
`documents`	Documents to add to the `VectorStore`. TYPE: `list[Document]`
`**kwargs`	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[str]`	List of IDs of the added texts.

search ¶

search(query: str, search_type: str, **kwargs: Any) -> list[Document]

Return docs most similar to query using a specified search type.

PARAMETER	DESCRIPTION
`query`	Input text. TYPE: `str`
`search_type`	Type of search to perform. Can be `'similarity'`, `'mmr'`, or `'similarity_score_threshold'`. TYPE: `str`
`**kwargs`	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of `Document` objects most similar to the query.

RAISES	DESCRIPTION
`ValueError`	If `search_type` is not one of `'similarity'`, `'mmr'`, or `'similarity_score_threshold'`.

asearch `async` ¶

asearch(query: str, search_type: str, **kwargs: Any) -> list[Document]

Async return docs most similar to query using a specified search type.

PARAMETER	DESCRIPTION
`query`	Input text. TYPE: `str`
`search_type`	Type of search to perform. Can be `'similarity'`, `'mmr'`, or `'similarity_score_threshold'`. TYPE: `str`
`**kwargs`	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of `Document` objects most similar to the query.

RAISES	DESCRIPTION
`ValueError`	If `search_type` is not one of `'similarity'`, `'mmr'`, or `'similarity_score_threshold'`.

asimilarity_search_with_score `async` ¶

asimilarity_search_with_score(
    *args: Any, **kwargs: Any
) -> list[tuple[Document, float]]

Async run similarity search with distance.

PARAMETER	DESCRIPTION
`*args`	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `()`
`**kwargs`	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of tuples of `(doc, similarity_score)`.

similarity_search_with_relevance_scores ¶

similarity_search_with_relevance_scores(
    query: str, k: int = 4, **kwargs: Any
) -> list[tuple[Document, float]]

Return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

PARAMETER	DESCRIPTION
`query`	Input text. TYPE: `str`
`k`	Number of `Document` objects to return. TYPE: `int` DEFAULT: `4`
`**kwargs`	Kwargs to be passed to similarity search. Should include `score_threshold`, an optional floating point value between `0` to `1` to filter the resulting set of retrieved docs. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of tuples of `(doc, similarity_score)`.

asimilarity_search_with_relevance_scores `async` ¶

asimilarity_search_with_relevance_scores(
    query: str, k: int = 4, **kwargs: Any
) -> list[tuple[Document, float]]

Async return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

PARAMETER	DESCRIPTION
`query`	Input text. TYPE: `str`
`k`	Number of `Document` objects to return. TYPE: `int` DEFAULT: `4`
`**kwargs`	Kwargs to be passed to similarity search. Should include `score_threshold`, an optional floating point value between `0` to `1` to filter the resulting set of retrieved docs. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of tuples of `(doc, similarity_score)`

asimilarity_search `async` ¶

asimilarity_search(query: str, k: int = 4, **kwargs: Any) -> list[Document]

Async return docs most similar to query.

PARAMETER	DESCRIPTION
`query`	Input text. TYPE: `str`
`k`	Number of `Document` objects to return. TYPE: `int` DEFAULT: `4`
`**kwargs`	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of `Document` objects most similar to the query.

asimilarity_search_by_vector `async` ¶

asimilarity_search_by_vector(
    embedding: list[float], k: int = 4, **kwargs: Any
) -> list[Document]

Async return docs most similar to embedding vector.

PARAMETER	DESCRIPTION
`embedding`	Embedding to look up documents similar to. TYPE: `list[float]`
`k`	Number of `Document` objects to return. TYPE: `int` DEFAULT: `4`
`**kwargs`	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of `Document` objects most similar to the query vector.

amax_marginal_relevance_search `async` ¶

amax_marginal_relevance_search(
    query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any
) -> list[Document]

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER	DESCRIPTION
`query`	Text to look up documents similar to. TYPE: `str`
`k`	Number of `Document` objects to return. TYPE: `int` DEFAULT: `4`
`fetch_k`	Number of `Document` objects to fetch to pass to MMR algorithm. TYPE: `int` DEFAULT: `20`
`lambda_mult`	Number between `0` and `1` that determines the degree of diversity among the results with `0` corresponding to maximum diversity and `1` to minimum diversity. TYPE: `float` DEFAULT: `0.5`
`**kwargs`	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of `Document` objects selected by maximal marginal relevance.

amax_marginal_relevance_search_by_vector `async` ¶

amax_marginal_relevance_search_by_vector(
    embedding: list[float],
    k: int = 4,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    **kwargs: Any,
) -> list[Document]

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER	DESCRIPTION
`embedding`	Embedding to look up documents similar to. TYPE: `list[float]`
`k`	Number of `Document` objects to return. TYPE: `int` DEFAULT: `4`
`fetch_k`	Number of `Document` objects to fetch to pass to MMR algorithm. TYPE: `int` DEFAULT: `20`
`lambda_mult`	Number between `0` and `1` that determines the degree of diversity among the results with `0` corresponding to maximum diversity and `1` to minimum diversity. TYPE: `float` DEFAULT: `0.5`
`**kwargs`	Arguments to pass to the search method. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of `Document` objects selected by maximal marginal relevance.

afrom_documents `async` `classmethod` ¶

afrom_documents(
    documents: list[Document], embedding: Embeddings, **kwargs: Any
) -> Self

Async return VectorStore initialized from documents and embeddings.

PARAMETER	DESCRIPTION
`documents`	List of `Document` objects to add to the `VectorStore`. TYPE: `list[Document]`
`embedding`	Embedding function to use. TYPE: `Embeddings`
`**kwargs`	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`Self`	`VectorStore` initialized from documents and embeddings.

afrom_texts `async` `classmethod` ¶

afrom_texts(
    texts: list[str],
    embedding: Embeddings,
    metadatas: list[dict] | None = None,
    *,
    ids: list[str] | None = None,
    **kwargs: Any,
) -> Self

Async return VectorStore initialized from texts and embeddings.

PARAMETER	DESCRIPTION
`texts`	Texts to add to the `VectorStore`. TYPE: `list[str]`
`embedding`	Embedding function to use. TYPE: `Embeddings`
`metadatas`	Optional list of metadatas associated with the texts. TYPE: `list[dict] \| None` DEFAULT: `None`
`ids`	Optional list of IDs associated with the texts. TYPE: `list[str] \| None` DEFAULT: `None`
`**kwargs`	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`Self`	`VectorStore` initialized from texts and embeddings.

as_retriever ¶

as_retriever(**kwargs: Any) -> VectorStoreRetriever

Return VectorStoreRetriever initialized from this VectorStore.

PARAMETER DESCRIPTION

**kwargs

Keyword arguments to pass to the search function.

Can include:

search_type: Defines the type of search that the Retriever should perform. Can be 'similarity' (default), 'mmr', or 'similarity_score_threshold'.
search_kwargs: Keyword arguments to pass to the search function.

Can include things like:
- k: Amount of documents to return (Default: 4)
- score_threshold: Minimum relevance threshold for similarity_score_threshold
- fetch_k: Amount of documents to pass to MMR algorithm (Default: 20)
- lambda_mult: Diversity of results returned by MMR; 1 for minimum diversity and 0 for maximum. (Default: 0.5)
- filter: Filter by document metadata

TYPE: Any DEFAULT: {}

RETURNS	DESCRIPTION
`VectorStoreRetriever`	Retriever class for `VectorStore`.

Examples:

# Retrieve more documents with higher diversity
# Useful if your dataset has many similar documents
docsearch.as_retriever(
    search_type="mmr", search_kwargs={"k": 6, "lambda_mult": 0.25}
)

# Fetch more documents for the MMR algorithm to consider
# But only return the top 5
docsearch.as_retriever(search_type="mmr", search_kwargs={"k": 5, "fetch_k": 50})

# Only retrieve documents that have a relevance score
# Above a certain threshold
docsearch.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.8},
)

# Only get the single most similar document from the dataset
docsearch.as_retriever(search_kwargs={"k": 1})

# Use a filter to only retrieve documents from a specific paper
docsearch.as_retriever(
    search_kwargs={"filter": {"paper_title": "GPT-4 Technical Report"}}
)

init ¶

__init__(
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    embedding_function: Embeddings | None = None,
    persist_directory: str | None = None,
    host: str | None = None,
    port: int | None = None,
    headers: dict[str, str] | None = None,
    chroma_cloud_api_key: str | None = None,
    tenant: str | None = None,
    database: str | None = None,
    client_settings: Settings | None = None,
    collection_metadata: dict | None = None,
    collection_configuration: CreateCollectionConfiguration | None = None,
    client: ClientAPI | None = None,
    relevance_score_fn: Callable[[float], float] | None = None,
    create_collection_if_not_exists: bool | None = True,
    *,
    ssl: bool = False,
) -> None

Initialize with a Chroma client.

PARAMETER	DESCRIPTION
`collection_name`	Name of the collection to create. TYPE: `str` DEFAULT: `_LANGCHAIN_DEFAULT_COLLECTION_NAME`
`embedding_function`	Embedding class object. Used to embed texts. TYPE: `Embeddings \| None` DEFAULT: `None`
`persist_directory`	Directory to persist the collection. TYPE: `str \| None` DEFAULT: `None`
`host`	Hostname of a deployed Chroma server. TYPE: `str \| None` DEFAULT: `None`
`port`	Connection port for a deployed Chroma server. Default is 8000. TYPE: `int \| None` DEFAULT: `None`
`ssl`	Whether to establish an SSL connection with a deployed Chroma server. Default is False. TYPE: `bool` DEFAULT: `False`
`headers`	HTTP headers to send to a deployed Chroma server. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`chroma_cloud_api_key`	Chroma Cloud API key. TYPE: `str \| None` DEFAULT: `None`
`tenant`	Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. TYPE: `str \| None` DEFAULT: `None`
`database`	Database name. Required for Chroma Cloud connections. Default is 'default_database'. TYPE: `str \| None` DEFAULT: `None`
`client_settings`	Chroma client settings TYPE: `Settings \| None` DEFAULT: `None`
`collection_metadata`	Collection configurations. TYPE: `dict \| None` DEFAULT: `None`
`collection_configuration`	Index configuration for the collection. TYPE: `CreateCollectionConfiguration \| None` DEFAULT: `None`
`client`	Chroma client. Documentation: https://docs.trychroma.com/reference/python/client TYPE: `ClientAPI \| None` DEFAULT: `None`
`relevance_score_fn`	Function to calculate relevance score from distance. Used only in `similarity_search_with_relevance_scores` TYPE: `Callable[[float], float] \| None` DEFAULT: `None`
`create_collection_if_not_exists`	Whether to create collection if it doesn't exist. Defaults to `True`. TYPE: `bool \| None` DEFAULT: `True`

__ensure_collection ¶

__ensure_collection() -> None

Ensure that the collection exists or create it.

__query_collection ¶

__query_collection(
    query_texts: list[str] | None = None,
    query_embeddings: list[list[float]] | None = None,
    n_results: int = 4,
    where: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document] | QueryResult

Query the chroma collection.

PARAMETER	DESCRIPTION
`query_texts`	List of query texts. TYPE: `list[str] \| None` DEFAULT: `None`
`query_embeddings`	List of query embeddings. TYPE: `list[list[float]] \| None` DEFAULT: `None`
`n_results`	Number of results to return. TYPE: `int` DEFAULT: `4`
`where`	dict used to filter results by metadata. E.g. {"color" : "red"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document`	dict used to filter by the document contents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document] \| QueryResult`	List of `n_results` nearest neighbor embeddings for provided
`list[Document] \| QueryResult`	query_embeddings or query_texts.

See more: https://docs.trychroma.com/reference/py-collection#query

encode_image `staticmethod` ¶

encode_image(uri: str) -> str

Get base64 string from image URI.

fork ¶

fork(new_name: str) -> Chroma

Fork this vector store.

PARAMETER	DESCRIPTION
`new_name`	New name for the forked store. TYPE: `str`

RETURNS	DESCRIPTION
`Chroma`	A new Chroma store forked from this vector store.

add_images ¶

add_images(
    uris: list[str], metadatas: list[dict] | None = None, ids: list[str] | None = None
) -> list[str]

Run more images through the embeddings and add to the VectorStore.

PARAMETER	DESCRIPTION
`uris`	File path to the image. TYPE: `list[str]`
`metadatas`	Optional list of metadatas. When querying, you can filter on this metadata. TYPE: `list[dict] \| None` DEFAULT: `None`
`ids`	Optional list of IDs. (Items without IDs will be assigned UUIDs) TYPE: `list[str] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[str]`	List of IDs of the added images.

RAISES	DESCRIPTION
`ValueError`	When metadata is incorrect.

add_texts ¶

add_texts(
    texts: Iterable[str],
    metadatas: list[dict] | None = None,
    ids: list[str] | None = None,
    **kwargs: Any,
) -> list[str]

Run more texts through the embeddings and add to the VectorStore.

PARAMETER	DESCRIPTION
`texts`	Texts to add to the `VectorStore`. TYPE: `Iterable[str]`
`metadatas`	Optional list of metadatas. When querying, you can filter on this metadata. TYPE: `list[dict] \| None` DEFAULT: `None`
`ids`	Optional list of IDs. (Items without IDs will be assigned UUIDs) TYPE: `list[str] \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[str]`	List of IDs of the added texts.

RAISES	DESCRIPTION
`ValueError`	When metadata is incorrect.

hybrid_search ¶

hybrid_search(search: Search) -> list[Document]

Run hybrid search with Chroma.

PARAMETER	DESCRIPTION
`search`	The Search configuration for hybrid search. TYPE: `Search`

RETURNS	DESCRIPTION
`list[Document]`	A list of documents resulting from the search operation.

Example

from chromadb import Search, K, Knn, Rrf

Create RRF ranking with text query¶

hybrid_rank = Rrf( ranks=[ Knn(query="query", return_rank=True, limit=300), Knn(query="query learning applications", key="sparse_embedding") ], weights=[2.0, 1.0], # Dense 2x more important k=60 )

Build complete the search strategy¶

search = (Search() .where( (K("language") == "en") & (K("year") >= 2020) ) .rank(hybrid_rank) .limit(10) .select(K.DOCUMENT, K.SCORE, "title", "year") )

results = vector_store.hybrid_search(search)

similarity_search ¶

similarity_search(
    query: str, k: int = DEFAULT_K, filter: dict[str, str] | None = None, **kwargs: Any
) -> list[Document]

Run similarity search with Chroma.

PARAMETER	DESCRIPTION
`query`	Query text to search for. TYPE: `str`
`k`	Number of results to return. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter`	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of documents most similar to the query text.

similarity_search_by_vector ¶

similarity_search_by_vector(
    embedding: list[float],
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document]

Return docs most similar to embedding vector.

PARAMETER	DESCRIPTION
`embedding`	Embedding to look up documents similar to. TYPE: `list[float]`
`k`	Number of Documents to return. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter`	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document`	dict used to filter by the document contents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of `Document` objects most similar to the query vector.

similarity_search_by_vector_with_relevance_scores ¶

similarity_search_by_vector_with_relevance_scores(
    embedding: list[float],
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[tuple[Document, float]]

Return docs most similar to embedding vector and similarity score.

PARAMETER	DESCRIPTION
`embedding`	Embedding to look up documents similar to. TYPE: `List[float]`
`k`	Number of Documents to return. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter`	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document`	dict used to filter by the documents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of documents most similar to the query text and relevance score
`list[tuple[Document, float]]`	in float for each. Lower score represents more similarity.

similarity_search_with_score ¶

similarity_search_with_score(
    query: str,
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[tuple[Document, float]]

Run similarity search with Chroma with distance.

PARAMETER	DESCRIPTION
`query`	Query text to search for. TYPE: `str`
`k`	Number of results to return. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter`	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document`	dict used to filter by document contents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of documents most similar to the query text and
`list[tuple[Document, float]]`	distance in float for each. Lower score represents more similarity.

similarity_search_with_vectors ¶

similarity_search_with_vectors(
    query: str,
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[tuple[Document, ndarray]]

Run similarity search with Chroma with vectors.

PARAMETER	DESCRIPTION
`query`	Query text to search for. TYPE: `str`
`k`	Number of results to return. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter`	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document`	dict used to filter by the document contents. E.g. {"$contains": "hello"}. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, ndarray]]`	List of documents most similar to the query text and
`list[tuple[Document, ndarray]]`	embedding vectors for each.

similarity_search_by_image ¶

similarity_search_by_image(
    uri: str, k: int = DEFAULT_K, filter: dict[str, str] | None = None, **kwargs: Any
) -> list[Document]

Search for similar images based on the given image URI.

PARAMETER	DESCRIPTION
`uri`	URI of the image to search for. TYPE: `str`
`k`	Number of results to return. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter`	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`**kwargs`	Additional arguments to pass to function. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of Images most similar to the provided image. Each element in list is a
`list[Document]`	LangChain Document Object. The page content is b64 encoded image, metadata
`list[Document]`	is default or as defined by user.

RAISES	DESCRIPTION
`ValueError`	If the embedding function does not support image embeddings.

similarity_search_by_image_with_relevance_score ¶

similarity_search_by_image_with_relevance_score(
    uri: str, k: int = DEFAULT_K, filter: dict[str, str] | None = None, **kwargs: Any
) -> list[tuple[Document, float]]

Search for similar images based on the given image URI.

PARAMETER	DESCRIPTION
`uri`	URI of the image to search for. TYPE: `str`
`k`	Number of results to return. TYPE: `int` DEFAULT: `DEFAULT_K`
`filter`	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`**kwargs`	Additional arguments to pass to function. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[tuple[Document, float]]`	List of tuples containing documents similar to the query image and their
`list[tuple[Document, float]]`	similarity scores. 0^th element in each tuple is a LangChain Document Object.
`list[tuple[Document, float]]`	The page content is b64 encoded img, metadata is default or defined by user.

RAISES	DESCRIPTION
`ValueError`	If the embedding function does not support image embeddings.

max_marginal_relevance_search_by_vector ¶

max_marginal_relevance_search_by_vector(
    embedding: list[float],
    k: int = DEFAULT_K,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document]

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER	DESCRIPTION
`embedding`	Embedding to look up documents similar to. TYPE: `list[float]`
`k`	Number of `Document` objects to return. TYPE: `int` DEFAULT: `DEFAULT_K`
`fetch_k`	Number of `Document` objects to fetch to pass to MMR algorithm. TYPE: `int` DEFAULT: `20`
`lambda_mult`	Number between 0 and 1 that determines the degree of diversity among the results with `0` corresponding to maximum diversity and `1` to minimum diversity. TYPE: `float` DEFAULT: `0.5`
`filter`	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document`	dict used to filter by the document contents. e.g. `{"$contains": "hello"}`. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of `Document` objects selected by maximal marginal relevance.

max_marginal_relevance_search ¶

max_marginal_relevance_search(
    query: str,
    k: int = DEFAULT_K,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document]

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER	DESCRIPTION
`query`	Text to look up documents similar to. TYPE: `str`
`k`	Number of Documents to return. TYPE: `int` DEFAULT: `DEFAULT_K`
`fetch_k`	Number of Documents to fetch to pass to MMR algorithm. TYPE: `int` DEFAULT: `20`
`lambda_mult`	Number between `0` and `1` that determines the degree of diversity among the results with `0` corresponding to maximum diversity and `1` to minimum diversity. TYPE: `float` DEFAULT: `0.5`
`filter`	Filter by metadata. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`where_document`	dict used to filter by the document contents. e.g. `{"$contains": "hello"}`. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments to pass to Chroma collection query. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`list[Document]`	List of `Document` objects selected by maximal marginal relevance.

RAISES	DESCRIPTION
`ValueError`	If the embedding function is not provided.

delete_collection ¶

delete_collection() -> None

Delete the collection.

reset_collection ¶

reset_collection() -> None

Resets the collection.

Resets the collection by deleting the collection and recreating an empty one.

get ¶

get(
    ids: str | list[str] | None = None,
    where: Where | None = None,
    limit: int | None = None,
    offset: int | None = None,
    where_document: WhereDocument | None = None,
    include: list[str] | None = None,
) -> dict[str, Any]

Gets the collection.

PARAMETER	DESCRIPTION
`ids`	The ids of the embeddings to get. Optional. TYPE: `str \| list[str] \| None` DEFAULT: `None`
`where`	A Where type dict used to filter results by. E.g. `{"$and": [{"color": "red"}, {"price": 4.20}]}` Optional. TYPE: `Where \| None` DEFAULT: `None`
`limit`	The number of documents to return. Optional. TYPE: `int \| None` DEFAULT: `None`
`offset`	The offset to start returning results from. Useful for paging results with limit. Optional. TYPE: `int \| None` DEFAULT: `None`
`where_document`	A WhereDocument type dict used to filter by the documents. E.g. `{"$contains": "hello"}`. Optional. TYPE: `WhereDocument \| None` DEFAULT: `None`
`include`	A list of what to include in the results. Can contain `"embeddings"`, `"metadatas"`, `"documents"`. Ids are always included. Defaults to `["metadatas", "documents"]`. Optional. TYPE: `list[str] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict[str, Any]`	A dict with the keys `"ids"`, `"embeddings"`, `"metadatas"`, `"documents"`.

get_by_ids ¶

get_by_ids(ids: Sequence[str]) -> list[Document]

Get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

PARAMETER	DESCRIPTION
`ids`	List of ids to retrieve. TYPE: `Sequence[str]`

RETURNS	DESCRIPTION
`list[Document]`	List of `Document` objects.

Added in 0.2.1

update_document ¶

update_document(document_id: str, document: Document) -> None

Update a document in the collection.

PARAMETER	DESCRIPTION
`document_id`	ID of the document to update. TYPE: `str`
`document`	Document to update. TYPE: `Document`

update_documents ¶

update_documents(ids: list[str], documents: list[Document]) -> None

Update a document in the collection.

PARAMETER	DESCRIPTION
`ids`	List of ids of the document to update. TYPE: `list[str]`
`documents`	List of documents to update. TYPE: `list[Document]`

RAISES	DESCRIPTION
`ValueError`	If the embedding function is not provided.

from_texts `classmethod` ¶

from_texts(
    texts: list[str],
    embedding: Embeddings | None = None,
    metadatas: list[dict] | None = None,
    ids: list[str] | None = None,
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    persist_directory: str | None = None,
    host: str | None = None,
    port: int | None = None,
    headers: dict[str, str] | None = None,
    chroma_cloud_api_key: str | None = None,
    tenant: str | None = None,
    database: str | None = None,
    client_settings: Settings | None = None,
    client: ClientAPI | None = None,
    collection_metadata: dict | None = None,
    collection_configuration: CreateCollectionConfiguration | None = None,
    *,
    ssl: bool = False,
    **kwargs: Any,
) -> Chroma

Create a Chroma vectorstore from a raw documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

PARAMETER	DESCRIPTION
`texts`	List of texts to add to the collection. TYPE: `list[str]`
`collection_name`	Name of the collection to create. TYPE: `str` DEFAULT: `_LANGCHAIN_DEFAULT_COLLECTION_NAME`
`persist_directory`	Directory to persist the collection. TYPE: `str \| None` DEFAULT: `None`
`host`	Hostname of a deployed Chroma server. TYPE: `str \| None` DEFAULT: `None`
`port`	Connection port for a deployed Chroma server. Default is 8000. TYPE: `int \| None` DEFAULT: `None`
`ssl`	Whether to establish an SSL connection with a deployed Chroma server. Default is False. TYPE: `bool` DEFAULT: `False`
`headers`	HTTP headers to send to a deployed Chroma server. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`chroma_cloud_api_key`	Chroma Cloud API key. TYPE: `str \| None` DEFAULT: `None`
`tenant`	Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. TYPE: `str \| None` DEFAULT: `None`
`database`	Database name. Required for Chroma Cloud connections. Default is 'default_database'. TYPE: `str \| None` DEFAULT: `None`
`embedding`	Embedding function. TYPE: `Embeddings \| None` DEFAULT: `None`
`metadatas`	List of metadatas. TYPE: `list[dict] \| None` DEFAULT: `None`
`ids`	List of document IDs. TYPE: `list[str] \| None` DEFAULT: `None`
`client_settings`	Chroma client settings. TYPE: `Settings \| None` DEFAULT: `None`
`client`	Chroma client. Documentation: https://docs.trychroma.com/reference/python/client TYPE: `ClientAPI \| None` DEFAULT: `None`
`collection_metadata`	Collection configurations. TYPE: `dict \| None` DEFAULT: `None`
`collection_configuration`	Index configuration for the collection. TYPE: `CreateCollectionConfiguration \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments to initialize a Chroma client. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`Chroma`	Chroma vectorstore. TYPE: `Chroma`

from_documents `classmethod` ¶

from_documents(
    documents: list[Document],
    embedding: Embeddings | None = None,
    ids: list[str] | None = None,
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    persist_directory: str | None = None,
    host: str | None = None,
    port: int | None = None,
    headers: dict[str, str] | None = None,
    chroma_cloud_api_key: str | None = None,
    tenant: str | None = None,
    database: str | None = None,
    client_settings: Settings | None = None,
    client: ClientAPI | None = None,
    collection_metadata: dict | None = None,
    collection_configuration: CreateCollectionConfiguration | None = None,
    *,
    ssl: bool = False,
    **kwargs: Any,
) -> Chroma

Create a Chroma vectorstore from a list of documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

PARAMETER	DESCRIPTION
`collection_name`	Name of the collection to create. TYPE: `str` DEFAULT: `_LANGCHAIN_DEFAULT_COLLECTION_NAME`
`persist_directory`	Directory to persist the collection. TYPE: `str \| None` DEFAULT: `None`
`host`	Hostname of a deployed Chroma server. TYPE: `str \| None` DEFAULT: `None`
`port`	Connection port for a deployed Chroma server. Default is 8000. TYPE: `int \| None` DEFAULT: `None`
`ssl`	Whether to establish an SSL connection with a deployed Chroma server. TYPE: `bool` DEFAULT: `False`
`headers`	HTTP headers to send to a deployed Chroma server. TYPE: `dict[str, str] \| None` DEFAULT: `None`
`chroma_cloud_api_key`	Chroma Cloud API key. TYPE: `str \| None` DEFAULT: `None`
`tenant`	Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. TYPE: `str \| None` DEFAULT: `None`
`database`	Database name. Required for Chroma Cloud connections. Default is 'default_database'. TYPE: `str \| None` DEFAULT: `None`
`ids`	List of document IDs. TYPE: `list[str] \| None` DEFAULT: `None`
`documents`	List of documents to add to the `VectorStore`. TYPE: `list[Document]`
`embedding`	Embedding function. TYPE: `Embeddings \| None` DEFAULT: `None`
`client_settings`	Chroma client settings. TYPE: `Settings \| None` DEFAULT: `None`
`client`	Chroma client. Documentation: https://docs.trychroma.com/reference/python/client TYPE: `ClientAPI \| None` DEFAULT: `None`
`collection_metadata`	Collection configurations. TYPE: `dict \| None` DEFAULT: `None`
`collection_configuration`	Index configuration for the collection. TYPE: `CreateCollectionConfiguration \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments to initialize a Chroma client. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`Chroma`	Chroma vectorstore. TYPE: `Chroma`

delete ¶

delete(ids: list[str] | None = None, **kwargs: Any) -> None

Delete by vector IDs.

PARAMETER	DESCRIPTION
`ids`	List of ids to delete. TYPE: `list[str] \| None` DEFAULT: `None`
`kwargs`	Additional keyword arguments. TYPE: `Any` DEFAULT: `{}`

langchain-chroma¶

langchain_chroma ¶

Chroma ¶

embeddings property ¶

aget_by_ids async ¶

adelete async ¶

aadd_texts async ¶

add_documents ¶

aadd_documents async ¶

search ¶

asearch async ¶

asimilarity_search_with_score async ¶

similarity_search_with_relevance_scores ¶

asimilarity_search_with_relevance_scores async ¶

asimilarity_search async ¶

asimilarity_search_by_vector async ¶

amax_marginal_relevance_search async ¶

amax_marginal_relevance_search_by_vector async ¶

afrom_documents async classmethod ¶

afrom_texts async classmethod ¶

as_retriever ¶

__init__ ¶

__ensure_collection ¶

__query_collection ¶

encode_image staticmethod ¶

fork ¶

add_images ¶

add_texts ¶

hybrid_search ¶

Create RRF ranking with text query¶

Build complete the search strategy¶

similarity_search ¶

similarity_search_by_vector ¶

similarity_search_by_vector_with_relevance_scores ¶

similarity_search_with_score ¶

similarity_search_with_vectors ¶

similarity_search_by_image ¶

similarity_search_by_image_with_relevance_score ¶

max_marginal_relevance_search_by_vector ¶

max_marginal_relevance_search ¶

delete_collection ¶

reset_collection ¶

get ¶

get_by_ids ¶

update_document ¶

update_documents ¶

from_texts classmethod ¶

from_documents classmethod ¶

delete ¶

`langchain-chroma`¶

embeddings `property` ¶

aget_by_ids `async` ¶

adelete `async` ¶

aadd_texts `async` ¶

aadd_documents `async` ¶

asearch `async` ¶

asimilarity_search_with_score `async` ¶

asimilarity_search_with_relevance_scores `async` ¶

asimilarity_search `async` ¶

asimilarity_search_by_vector `async` ¶

amax_marginal_relevance_search `async` ¶

amax_marginal_relevance_search_by_vector `async` ¶

afrom_documents `async` `classmethod` ¶

afrom_texts `async` `classmethod` ¶

init ¶

encode_image `staticmethod` ¶

from_texts `classmethod` ¶

from_documents `classmethod` ¶