Skip to content

langchain-chroma

Modules:

Name Description
vectorstores

This is the langchain_chroma.vectorstores module.

Classes:

Name Description
Chroma

Chroma vector store integration.

Chroma

Bases: VectorStore

Chroma vector store integration.

Setup

Install chromadb, langchain-chroma packages:

.. code-block:: bash

pip install -qU chromadb langchain-chroma

Key init args — indexing params: collection_name: str Name of the collection. embedding_function: Embeddings Embedding function to use.

Key init args — client params: client: Optional[Client] Chroma client to use. client_settings: Optional[chromadb.config.Settings] Chroma client settings. persist_directory: Optional[str] Directory to persist the collection. host: Optional[str] Hostname of a deployed Chroma server. port: Optional[int] Connection port for a deployed Chroma server. Default is 8000. ssl: Optional[bool] Whether to establish an SSL connection with a deployed Chroma server. Default is False. headers: Optional[dict[str, str]] HTTP headers to send to a deployed Chroma server. chroma_cloud_api_key: Optional[str] Chroma Cloud API key. tenant: Optional[str] Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. database: Optional[str] Database name. Required for Chroma Cloud connections. Default is 'default_database'.

Instantiate

.. code-block:: python

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vector_store = Chroma(
    collection_name="foo",
    embedding_function=OpenAIEmbeddings(),
    # other params...
)
Add Documents

.. code-block:: python

from langchain_core.documents import Document

document_1 = Document(page_content="foo", metadata={"baz": "bar"})
document_2 = Document(page_content="thud", metadata={"bar": "baz"})
document_3 = Document(page_content="i will be deleted :(")

documents = [document_1, document_2, document_3]
ids = ["1", "2", "3"]
vector_store.add_documents(documents=documents, ids=ids)
Update Documents

.. code-block:: python

updated_document = Document(
    page_content="qux",
    metadata={"bar": "baz"},
)

vector_store.update_documents(ids=["1"], documents=[updated_document])
Delete Documents

.. code-block:: python

vector_store.delete(ids=["3"])
Search with filter

.. code-block:: python

results = vector_store.similarity_search(
    query="thud", k=1, filter={"baz": "bar"}
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

.. code-block:: python

*foo[{"baz": "bar"}]
Search with score

.. code-block:: python

results = vector_store.similarity_search_with_score(query="qux", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

.. code-block:: python

* [SIM=0.000000] qux [{'bar': 'baz', 'baz': 'bar'}]
Async

.. code-block:: python

# add documents
# await vector_store.aadd_documents(documents=documents, ids=ids)

# delete documents
# await vector_store.adelete(ids=["3"])

# search
# results = vector_store.asimilarity_search(query="thud",k=1)

# search with score
results = await vector_store.asimilarity_search_with_score(query="qux", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

.. code-block:: python

* [SIM=0.335463] foo [{'baz': 'bar'}]
Use as Retriever

.. code-block:: python

retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
)
retriever.invoke("thud")

.. code-block:: python

[Document(metadata={"baz": "bar"}, page_content="thud")]

Methods:

Name Description
aget_by_ids

Async get documents by their IDs.

adelete

Async delete by vector ID or other criteria.

aadd_texts

Async run more texts through the embeddings and add to the vectorstore.

add_documents

Add or update documents in the vectorstore.

aadd_documents

Async run more documents through the embeddings and add to the vectorstore.

search

Return docs most similar to query using a specified search type.

asearch

Async return docs most similar to query using a specified search type.

asimilarity_search_with_score

Async run similarity search with distance.

similarity_search_with_relevance_scores

Return docs and relevance scores in the range [0, 1].

asimilarity_search_with_relevance_scores

Async return docs and relevance scores in the range [0, 1].

asimilarity_search

Async return docs most similar to query.

asimilarity_search_by_vector

Async return docs most similar to embedding vector.

amax_marginal_relevance_search

Async return docs selected using the maximal marginal relevance.

amax_marginal_relevance_search_by_vector

Async return docs selected using the maximal marginal relevance.

afrom_documents

Async return VectorStore initialized from documents and embeddings.

afrom_texts

Async return VectorStore initialized from texts and embeddings.

as_retriever

Return VectorStoreRetriever initialized from this VectorStore.

__init__

Initialize with a Chroma client.

encode_image

Get base64 string from image URI.

fork

Fork this vector store.

add_images

Run more images through the embeddings and add to the vectorstore.

add_texts

Run more texts through the embeddings and add to the vectorstore.

similarity_search

Run similarity search with Chroma.

similarity_search_by_vector

Return docs most similar to embedding vector.

similarity_search_by_vector_with_relevance_scores

Return docs most similar to embedding vector and similarity score.

similarity_search_with_score

Run similarity search with Chroma with distance.

similarity_search_with_vectors

Run similarity search with Chroma with vectors.

similarity_search_by_image

Search for similar images based on the given image URI.

similarity_search_by_image_with_relevance_score

Search for similar images based on the given image URI.

max_marginal_relevance_search_by_vector

Return docs selected using the maximal marginal relevance.

max_marginal_relevance_search

Return docs selected using the maximal marginal relevance.

delete_collection

Delete the collection.

reset_collection

Resets the collection.

get

Gets the collection.

get_by_ids

Get documents by their IDs.

update_document

Update a document in the collection.

update_documents

Update a document in the collection.

from_texts

Create a Chroma vectorstore from a raw documents.

from_documents

Create a Chroma vectorstore from a list of documents.

delete

Delete by vector IDs.

Attributes:

Name Type Description
embeddings Optional[Embeddings]

Access the query embedding object.

embeddings property

embeddings: Optional[Embeddings]

Access the query embedding object.

aget_by_ids async

aget_by_ids(ids: Sequence[str]) -> list[Document]

Async get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

Parameters:

Name Type Description Default
ids Sequence[str]

List of ids to retrieve.

required

Returns:

Type Description
list[Document]

List of Documents.

Added in version 0.2.11

adelete async

adelete(
    ids: list[str] | None = None, **kwargs: Any
) -> bool | None

Async delete by vector ID or other criteria.

Parameters:

Name Type Description Default
ids list[str] | None

List of ids to delete. If None, delete all. Default is None.

None
**kwargs Any

Other keyword arguments that subclasses might use.

{}

Returns:

Type Description
bool | None

Optional[bool]: True if deletion is successful,

bool | None

False otherwise, None if not implemented.

aadd_texts async

aadd_texts(
    texts: Iterable[str],
    metadatas: list[dict] | None = None,
    *,
    ids: list[str] | None = None,
    **kwargs: Any
) -> list[str]

Async run more texts through the embeddings and add to the vectorstore.

Parameters:

Name Type Description Default
texts Iterable[str]

Iterable of strings to add to the vectorstore.

required
metadatas list[dict] | None

Optional list of metadatas associated with the texts. Default is None.

None
ids list[str] | None

Optional list

None
**kwargs Any

vectorstore specific parameters.

{}

Returns:

Type Description
list[str]

List of ids from adding the texts into the vectorstore.

Raises:

Type Description
ValueError

If the number of metadatas does not match the number of texts.

ValueError

If the number of ids does not match the number of texts.

add_documents

add_documents(
    documents: list[Document], **kwargs: Any
) -> list[str]

Add or update documents in the vectorstore.

Parameters:

Name Type Description Default
documents list[Document]

Documents to add to the vectorstore.

required
kwargs Any

Additional keyword arguments. if kwargs contains ids and documents contain ids, the ids in the kwargs will receive precedence.

{}

Returns:

Type Description
list[str]

List of IDs of the added texts.

aadd_documents async

aadd_documents(
    documents: list[Document], **kwargs: Any
) -> list[str]

Async run more documents through the embeddings and add to the vectorstore.

Parameters:

Name Type Description Default
documents list[Document]

Documents to add to the vectorstore.

required
kwargs Any

Additional keyword arguments.

{}

Returns:

Type Description
list[str]

List of IDs of the added texts.

search

search(
    query: str, search_type: str, **kwargs: Any
) -> list[Document]

Return docs most similar to query using a specified search type.

Parameters:

Name Type Description Default
query str

Input text

required
search_type str

Type of search to perform. Can be "similarity", "mmr", or "similarity_score_threshold".

required
**kwargs Any

Arguments to pass to the search method.

{}

Returns:

Type Description
list[Document]

List of Documents most similar to the query.

Raises:

Type Description
ValueError

If search_type is not one of "similarity", "mmr", or "similarity_score_threshold".

asearch async

asearch(
    query: str, search_type: str, **kwargs: Any
) -> list[Document]

Async return docs most similar to query using a specified search type.

Parameters:

Name Type Description Default
query str

Input text.

required
search_type str

Type of search to perform. Can be "similarity", "mmr", or "similarity_score_threshold".

required
**kwargs Any

Arguments to pass to the search method.

{}

Returns:

Type Description
list[Document]

List of Documents most similar to the query.

Raises:

Type Description
ValueError

If search_type is not one of "similarity", "mmr", or "similarity_score_threshold".

asimilarity_search_with_score async

asimilarity_search_with_score(
    *args: Any, **kwargs: Any
) -> list[tuple[Document, float]]

Async run similarity search with distance.

Parameters:

Name Type Description Default
*args Any

Arguments to pass to the search method.

()
**kwargs Any

Arguments to pass to the search method.

{}

Returns:

Type Description
list[tuple[Document, float]]

List of Tuples of (doc, similarity_score).

similarity_search_with_relevance_scores

similarity_search_with_relevance_scores(
    query: str, k: int = 4, **kwargs: Any
) -> list[tuple[Document, float]]

Return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

Parameters:

Name Type Description Default
query str

Input text.

required
k int

Number of Documents to return. Defaults to 4.

4
**kwargs Any

kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to filter the resulting set of retrieved docs.

{}

Returns:

Type Description
list[tuple[Document, float]]

List of Tuples of (doc, similarity_score).

asimilarity_search_with_relevance_scores async

asimilarity_search_with_relevance_scores(
    query: str, k: int = 4, **kwargs: Any
) -> list[tuple[Document, float]]

Async return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

Parameters:

Name Type Description Default
query str

Input text.

required
k int

Number of Documents to return. Defaults to 4.

4
**kwargs Any

kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to filter the resulting set of retrieved docs

{}

Returns:

Type Description
list[tuple[Document, float]]

List of Tuples of (doc, similarity_score)

asimilarity_search(
    query: str, k: int = 4, **kwargs: Any
) -> list[Document]

Async return docs most similar to query.

Parameters:

Name Type Description Default
query str

Input text.

required
k int

Number of Documents to return. Defaults to 4.

4
**kwargs Any

Arguments to pass to the search method.

{}

Returns:

Type Description
list[Document]

List of Documents most similar to the query.

asimilarity_search_by_vector async

asimilarity_search_by_vector(
    embedding: list[float], k: int = 4, **kwargs: Any
) -> list[Document]

Async return docs most similar to embedding vector.

Parameters:

Name Type Description Default
embedding list[float]

Embedding to look up documents similar to.

required
k int

Number of Documents to return. Defaults to 4.

4
**kwargs Any

Arguments to pass to the search method.

{}

Returns:

Type Description
list[Document]

List of Documents most similar to the query vector.

amax_marginal_relevance_search(
    query: str,
    k: int = 4,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    **kwargs: Any
) -> list[Document]

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters:

Name Type Description Default
query str

Text to look up documents similar to.

required
k int

Number of Documents to return. Defaults to 4.

4
fetch_k int

Number of Documents to fetch to pass to MMR algorithm. Default is 20.

20
lambda_mult float

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

0.5
**kwargs Any

Arguments to pass to the search method.

{}

Returns:

Type Description
list[Document]

List of Documents selected by maximal marginal relevance.

amax_marginal_relevance_search_by_vector async

amax_marginal_relevance_search_by_vector(
    embedding: list[float],
    k: int = 4,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    **kwargs: Any
) -> list[Document]

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters:

Name Type Description Default
embedding list[float]

Embedding to look up documents similar to.

required
k int

Number of Documents to return. Defaults to 4.

4
fetch_k int

Number of Documents to fetch to pass to MMR algorithm. Default is 20.

20
lambda_mult float

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

0.5
**kwargs Any

Arguments to pass to the search method.

{}

Returns:

Type Description
list[Document]

List of Documents selected by maximal marginal relevance.

afrom_documents async classmethod

afrom_documents(
    documents: list[Document],
    embedding: Embeddings,
    **kwargs: Any
) -> Self

Async return VectorStore initialized from documents and embeddings.

Parameters:

Name Type Description Default
documents list[Document]

List of Documents to add to the vectorstore.

required
embedding Embeddings

Embedding function to use.

required
kwargs Any

Additional keyword arguments.

{}

Returns:

Name Type Description
VectorStore Self

VectorStore initialized from documents and embeddings.

afrom_texts async classmethod

afrom_texts(
    texts: list[str],
    embedding: Embeddings,
    metadatas: list[dict] | None = None,
    *,
    ids: list[str] | None = None,
    **kwargs: Any
) -> Self

Async return VectorStore initialized from texts and embeddings.

Parameters:

Name Type Description Default
texts list[str]

Texts to add to the vectorstore.

required
embedding Embeddings

Embedding function to use.

required
metadatas list[dict] | None

Optional list of metadatas associated with the texts. Default is None.

None
ids list[str] | None

Optional list of IDs associated with the texts.

None
kwargs Any

Additional keyword arguments.

{}

Returns:

Name Type Description
VectorStore Self

VectorStore initialized from texts and embeddings.

as_retriever

as_retriever(**kwargs: Any) -> VectorStoreRetriever

Return VectorStoreRetriever initialized from this VectorStore.

Parameters:

Name Type Description Default
**kwargs Any

Keyword arguments to pass to the search function. Can include: search_type (Optional[str]): Defines the type of search that the Retriever should perform. Can be "similarity" (default), "mmr", or "similarity_score_threshold". search_kwargs (Optional[Dict]): Keyword arguments to pass to the search function. Can include things like: k: Amount of documents to return (Default: 4) score_threshold: Minimum relevance threshold for similarity_score_threshold fetch_k: Amount of documents to pass to MMR algorithm (Default: 20) lambda_mult: Diversity of results returned by MMR; 1 for minimum diversity and 0 for maximum. (Default: 0.5) filter: Filter by document metadata

{}

Returns:

Name Type Description
VectorStoreRetriever VectorStoreRetriever

Retriever class for VectorStore.

Examples:

.. code-block:: python

# Retrieve more documents with higher diversity
# Useful if your dataset has many similar documents
docsearch.as_retriever(
    search_type="mmr", search_kwargs={"k": 6, "lambda_mult": 0.25}
)

# Fetch more documents for the MMR algorithm to consider
# But only return the top 5
docsearch.as_retriever(
    search_type="mmr", search_kwargs={"k": 5, "fetch_k": 50}
)

# Only retrieve documents that have a relevance score
# Above a certain threshold
docsearch.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.8},
)

# Only get the single most similar document from the dataset
docsearch.as_retriever(search_kwargs={"k": 1})

# Use a filter to only retrieve documents from a specific paper
docsearch.as_retriever(
    search_kwargs={"filter": {"paper_title": "GPT-4 Technical Report"}}
)

__init__

__init__(
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    embedding_function: Optional[Embeddings] = None,
    persist_directory: Optional[str] = None,
    host: Optional[str] = None,
    port: Optional[int] = None,
    headers: Optional[dict[str, str]] = None,
    chroma_cloud_api_key: Optional[str] = None,
    tenant: Optional[str] = None,
    database: Optional[str] = None,
    client_settings: Optional[Settings] = None,
    collection_metadata: Optional[dict] = None,
    collection_configuration: Optional[
        CreateCollectionConfiguration
    ] = None,
    client: Optional[ClientAPI] = None,
    relevance_score_fn: Optional[
        Callable[[float], float]
    ] = None,
    create_collection_if_not_exists: Optional[bool] = True,
    *,
    ssl: bool = False
) -> None

Initialize with a Chroma client.

Parameters:

Name Type Description Default
collection_name str

Name of the collection to create.

_LANGCHAIN_DEFAULT_COLLECTION_NAME
embedding_function Optional[Embeddings]

Embedding class object. Used to embed texts.

None
persist_directory Optional[str]

Directory to persist the collection.

None
host Optional[str]

Hostname of a deployed Chroma server.

None
port Optional[int]

Connection port for a deployed Chroma server. Default is 8000.

None
ssl bool

Whether to establish an SSL connection with a deployed Chroma server. Default is False.

False
headers Optional[dict[str, str]]

HTTP headers to send to a deployed Chroma server.

None
chroma_cloud_api_key Optional[str]

Chroma Cloud API key.

None
tenant Optional[str]

Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers.

None
database Optional[str]

Database name. Required for Chroma Cloud connections. Default is 'default_database'.

None
client_settings Optional[Settings]

Chroma client settings

None
collection_metadata Optional[dict]

Collection configurations.

None
collection_configuration Optional[CreateCollectionConfiguration]

Index configuration for the collection. Defaults to None.

None
client Optional[ClientAPI] None
relevance_score_fn Optional[Callable[[float], float]]

Function to calculate relevance score from distance. Used only in similarity_search_with_relevance_scores

None
create_collection_if_not_exists Optional[bool]

Whether to create collection if it doesn't exist. Defaults to True.

True

__ensure_collection

__ensure_collection() -> None

Ensure that the collection exists or create it.

__query_collection

__query_collection(
    query_texts: Optional[list[str]] = None,
    query_embeddings: Optional[list[list[float]]] = None,
    n_results: int = 4,
    where: Optional[dict[str, str]] = None,
    where_document: Optional[dict[str, str]] = None,
    **kwargs: Any
) -> Union[list[Document], QueryResult]

Query the chroma collection.

Parameters:

Name Type Description Default
query_texts Optional[list[str]]

List of query texts.

None
query_embeddings Optional[list[list[float]]]

List of query embeddings.

None
n_results int

Number of results to return. Defaults to 4.

4
where Optional[dict[str, str]]

dict used to filter results by metadata. E.g. {"color" : "red"}.

None
where_document Optional[dict[str, str]]

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

None
kwargs Any

Additional keyword arguments to pass to Chroma collection query.

{}

Returns:

Type Description
Union[list[Document], QueryResult]

List of n_results nearest neighbor embeddings for provided

Union[list[Document], QueryResult]

query_embeddings or query_texts.

See more: https://docs.trychroma.com/reference/py-collection#query

encode_image staticmethod

encode_image(uri: str) -> str

Get base64 string from image URI.

fork

fork(new_name: str) -> Chroma

Fork this vector store.

Parameters:

Name Type Description Default
new_name str

New name for the forked store.

required

Returns:

Type Description
Chroma

A new Chroma store forked from this vector store.

add_images

add_images(
    uris: list[str],
    metadatas: Optional[list[dict]] = None,
    ids: Optional[list[str]] = None,
) -> list[str]

Run more images through the embeddings and add to the vectorstore.

Parameters:

Name Type Description Default
uris list[str]

File path to the image.

required
metadatas Optional[list[dict]]

Optional list of metadatas. When querying, you can filter on this metadata.

None
ids Optional[list[str]]

Optional list of IDs. (Items without IDs will be assigned UUIDs)

None

Returns:

Type Description
list[str]

List of IDs of the added images.

Raises:

Type Description
ValueError

When metadata is incorrect.

add_texts

add_texts(
    texts: Iterable[str],
    metadatas: Optional[list[dict]] = None,
    ids: Optional[list[str]] = None,
    **kwargs: Any
) -> list[str]

Run more texts through the embeddings and add to the vectorstore.

Parameters:

Name Type Description Default
texts Iterable[str]

Texts to add to the vectorstore.

required
metadatas Optional[list[dict]]

Optional list of metadatas. When querying, you can filter on this metadata.

None
ids Optional[list[str]]

Optional list of IDs. (Items without IDs will be assigned UUIDs)

None
kwargs Any

Additional keyword arguments.

{}

Returns:

Type Description
list[str]

List of IDs of the added texts.

Raises:

Type Description
ValueError

When metadata is incorrect.

similarity_search(
    query: str,
    k: int = DEFAULT_K,
    filter: Optional[dict[str, str]] = None,
    **kwargs: Any
) -> list[Document]

Run similarity search with Chroma.

Parameters:

Name Type Description Default
query str

Query text to search for.

required
k int

Number of results to return. Defaults to 4.

DEFAULT_K
filter Optional[dict[str, str]]

Filter by metadata. Defaults to None.

None
kwargs Any

Additional keyword arguments to pass to Chroma collection query.

{}

Returns:

Type Description
list[Document]

List of documents most similar to the query text.

similarity_search_by_vector

similarity_search_by_vector(
    embedding: list[float],
    k: int = DEFAULT_K,
    filter: Optional[dict[str, str]] = None,
    where_document: Optional[dict[str, str]] = None,
    **kwargs: Any
) -> list[Document]

Return docs most similar to embedding vector.

Parameters:

Name Type Description Default
embedding list[float]

Embedding to look up documents similar to.

required
k int

Number of Documents to return. Defaults to 4.

DEFAULT_K
filter Optional[dict[str, str]]

Filter by metadata. Defaults to None.

None
where_document Optional[dict[str, str]]

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

None
kwargs Any

Additional keyword arguments to pass to Chroma collection query.

{}

Returns:

Type Description
list[Document]

List of Documents most similar to the query vector.

similarity_search_by_vector_with_relevance_scores

similarity_search_by_vector_with_relevance_scores(
    embedding: list[float],
    k: int = DEFAULT_K,
    filter: Optional[dict[str, str]] = None,
    where_document: Optional[dict[str, str]] = None,
    **kwargs: Any
) -> list[tuple[Document, float]]

Return docs most similar to embedding vector and similarity score.

Parameters:

Name Type Description Default
embedding List[float]

Embedding to look up documents similar to.

required
k int

Number of Documents to return. Defaults to 4.

DEFAULT_K
filter Optional[dict[str, str]]

Filter by metadata. Defaults to None.

None
where_document Optional[dict[str, str]]

dict used to filter by the documents. E.g. {"$contains": "hello"}.

None
kwargs Any

Additional keyword arguments to pass to Chroma collection query.

{}

Returns:

Type Description
list[tuple[Document, float]]

List of documents most similar to the query text and relevance score

list[tuple[Document, float]]

in float for each. Lower score represents more similarity.

similarity_search_with_score

similarity_search_with_score(
    query: str,
    k: int = DEFAULT_K,
    filter: Optional[dict[str, str]] = None,
    where_document: Optional[dict[str, str]] = None,
    **kwargs: Any
) -> list[tuple[Document, float]]

Run similarity search with Chroma with distance.

Parameters:

Name Type Description Default
query str

Query text to search for.

required
k int

Number of results to return. Defaults to 4.

DEFAULT_K
filter Optional[dict[str, str]]

Filter by metadata. Defaults to None.

None
where_document Optional[dict[str, str]]

dict used to filter by document contents. E.g. {"$contains": "hello"}.

None
kwargs Any

Additional keyword arguments to pass to Chroma collection query.

{}

Returns:

Type Description
list[tuple[Document, float]]

List of documents most similar to the query text and

list[tuple[Document, float]]

distance in float for each. Lower score represents more similarity.

similarity_search_with_vectors

similarity_search_with_vectors(
    query: str,
    k: int = DEFAULT_K,
    filter: Optional[dict[str, str]] = None,
    where_document: Optional[dict[str, str]] = None,
    **kwargs: Any
) -> list[tuple[Document, ndarray]]

Run similarity search with Chroma with vectors.

Parameters:

Name Type Description Default
query str

Query text to search for.

required
k int

Number of results to return. Defaults to 4.

DEFAULT_K
filter Optional[dict[str, str]]

Filter by metadata. Defaults to None.

None
where_document Optional[dict[str, str]]

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

None
kwargs Any

Additional keyword arguments to pass to Chroma collection query.

{}

Returns:

Type Description
list[tuple[Document, ndarray]]

List of documents most similar to the query text and

list[tuple[Document, ndarray]]

embedding vectors for each.

similarity_search_by_image

similarity_search_by_image(
    uri: str,
    k: int = DEFAULT_K,
    filter: Optional[dict[str, str]] = None,
    **kwargs: Any
) -> list[Document]

Search for similar images based on the given image URI.

Parameters:

Name Type Description Default
uri str

URI of the image to search for.

required
k int

Number of results to return. Defaults to DEFAULT_K.

DEFAULT_K
filter Optional[Dict[str, str]]

Filter by metadata.

None
**kwargs Any

Additional arguments to pass to function.

{}

Returns:

Type Description
list[Document]

List of Images most similar to the provided image.

list[Document]

Each element in list is a LangChain Document Object.

list[Document]

The page content is b64 encoded image, metadata is default or

list[Document]

as defined by user.

Raises:

Type Description
ValueError

If the embedding function does not support image embeddings.

similarity_search_by_image_with_relevance_score

similarity_search_by_image_with_relevance_score(
    uri: str,
    k: int = DEFAULT_K,
    filter: Optional[dict[str, str]] = None,
    **kwargs: Any
) -> list[tuple[Document, float]]

Search for similar images based on the given image URI.

Parameters:

Name Type Description Default
uri str

URI of the image to search for.

required
k int

Number of results to return.

DEFAULT_K
filter Optional[Dict[str, str]]

Filter by metadata.

None
**kwargs Any

Additional arguments to pass to function.

{}

Returns:

Type Description
list[tuple[Document, float]]

List[Tuple[Document, float]]: List of tuples containing documents similar

list[tuple[Document, float]]

to the query image and their similarity scores.

list[tuple[Document, float]]

0th element in each tuple is a LangChain Document Object.

list[tuple[Document, float]]

The page content is b64 encoded img, metadata is default or defined by user.

Raises:

Type Description
ValueError

If the embedding function does not support image embeddings.

max_marginal_relevance_search_by_vector

max_marginal_relevance_search_by_vector(
    embedding: list[float],
    k: int = DEFAULT_K,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    filter: Optional[dict[str, str]] = None,
    where_document: Optional[dict[str, str]] = None,
    **kwargs: Any
) -> list[Document]

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters:

Name Type Description Default
embedding list[float]

Embedding to look up documents similar to.

required
k int

Number of Documents to return. Defaults to 4.

DEFAULT_K
fetch_k int

Number of Documents to fetch to pass to MMR algorithm. Defaults to 20.

20
lambda_mult float

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

0.5
filter Optional[dict[str, str]]

Filter by metadata. Defaults to None.

None
where_document Optional[dict[str, str]]

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

None
kwargs Any

Additional keyword arguments to pass to Chroma collection query.

{}

Returns:

Type Description
list[Document]

List of Documents selected by maximal marginal relevance.

max_marginal_relevance_search(
    query: str,
    k: int = DEFAULT_K,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    filter: Optional[dict[str, str]] = None,
    where_document: Optional[dict[str, str]] = None,
    **kwargs: Any
) -> list[Document]

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters:

Name Type Description Default
query str

Text to look up documents similar to.

required
k int

Number of Documents to return. Defaults to 4.

DEFAULT_K
fetch_k int

Number of Documents to fetch to pass to MMR algorithm.

20
lambda_mult float

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

0.5
filter Optional[dict[str, str]]

Filter by metadata. Defaults to None.

None
where_document Optional[dict[str, str]]

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

None
kwargs Any

Additional keyword arguments to pass to Chroma collection query.

{}

Returns:

Type Description
list[Document]

List of Documents selected by maximal marginal relevance.

Raises:

Type Description
ValueError

If the embedding function is not provided.

delete_collection

delete_collection() -> None

Delete the collection.

reset_collection

reset_collection() -> None

Resets the collection.

Resets the collection by deleting the collection and recreating an empty one.

get

get(
    ids: Optional[Union[str, list[str]]] = None,
    where: Optional[Where] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
    where_document: Optional[WhereDocument] = None,
    include: Optional[list[str]] = None,
) -> dict[str, Any]

Gets the collection.

Parameters:

Name Type Description Default
ids Optional[Union[str, list[str]]]

The ids of the embeddings to get. Optional.

None
where Optional[Where]

A Where type dict used to filter results by. E.g. {"$and": [{"color": "red"}, {"price": 4.20}]} Optional.

None
limit Optional[int]

The number of documents to return. Optional.

None
offset Optional[int]

The offset to start returning results from. Useful for paging results with limit. Optional.

None
where_document Optional[WhereDocument]

A WhereDocument type dict used to filter by the documents. E.g. {"$contains": "hello"}. Optional.

None
include Optional[list[str]]

A list of what to include in the results. Can contain "embeddings", "metadatas", "documents". Ids are always included. Defaults to ["metadatas", "documents"]. Optional.

None

Returns:

Type Description
dict[str, Any]

A dict with the keys "ids", "embeddings", "metadatas", "documents".

get_by_ids

get_by_ids(ids: Sequence[str]) -> list[Document]

Get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

Parameters:

Name Type Description Default
ids Sequence[str]

List of ids to retrieve.

required

Returns:

Type Description
list[Document]

List of Documents.

Added in 0.2.1

update_document

update_document(
    document_id: str, document: Document
) -> None

Update a document in the collection.

Parameters:

Name Type Description Default
document_id str

ID of the document to update.

required
document Document

Document to update.

required

update_documents

update_documents(
    ids: list[str], documents: list[Document]
) -> None

Update a document in the collection.

Parameters:

Name Type Description Default
ids list[str]

List of ids of the document to update.

required
documents list[Document]

List of documents to update.

required

Raises:

Type Description
ValueError

If the embedding function is not provided.

from_texts classmethod

from_texts(
    texts: list[str],
    embedding: Optional[Embeddings] = None,
    metadatas: Optional[list[dict]] = None,
    ids: Optional[list[str]] = None,
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    persist_directory: Optional[str] = None,
    host: Optional[str] = None,
    port: Optional[int] = None,
    headers: Optional[dict[str, str]] = None,
    chroma_cloud_api_key: Optional[str] = None,
    tenant: Optional[str] = None,
    database: Optional[str] = None,
    client_settings: Optional[Settings] = None,
    client: Optional[ClientAPI] = None,
    collection_metadata: Optional[dict] = None,
    collection_configuration: Optional[
        CreateCollectionConfiguration
    ] = None,
    *,
    ssl: bool = False,
    **kwargs: Any
) -> Chroma

Create a Chroma vectorstore from a raw documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

Parameters:

Name Type Description Default
texts list[str]

List of texts to add to the collection.

required
collection_name str

Name of the collection to create.

_LANGCHAIN_DEFAULT_COLLECTION_NAME
persist_directory Optional[str]

Directory to persist the collection.

None
host Optional[str]

Hostname of a deployed Chroma server.

None
port Optional[int]

Connection port for a deployed Chroma server. Default is 8000.

None
ssl bool

Whether to establish an SSL connection with a deployed Chroma server. Default is False.

False
headers Optional[dict[str, str]]

HTTP headers to send to a deployed Chroma server.

None
chroma_cloud_api_key Optional[str]

Chroma Cloud API key.

None
tenant Optional[str]

Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers.

None
database Optional[str]

Database name. Required for Chroma Cloud connections. Default is 'default_database'.

None
embedding Optional[Embeddings]

Embedding function. Defaults to None.

None
metadatas Optional[list[dict]]

List of metadatas. Defaults to None.

None
ids Optional[list[str]]

List of document IDs. Defaults to None.

None
client_settings Optional[Settings]

Chroma client settings.

None
client Optional[ClientAPI] None
collection_metadata Optional[dict]

Collection configurations. Defaults to None.

None
collection_configuration Optional[CreateCollectionConfiguration]

Index configuration for the collection. Defaults to None.

None
kwargs Any

Additional keyword arguments to initialize a Chroma client.

{}

Returns:

Name Type Description
Chroma Chroma

Chroma vectorstore.

from_documents classmethod

from_documents(
    documents: list[Document],
    embedding: Optional[Embeddings] = None,
    ids: Optional[list[str]] = None,
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    persist_directory: Optional[str] = None,
    host: Optional[str] = None,
    port: Optional[int] = None,
    headers: Optional[dict[str, str]] = None,
    chroma_cloud_api_key: Optional[str] = None,
    tenant: Optional[str] = None,
    database: Optional[str] = None,
    client_settings: Optional[Settings] = None,
    client: Optional[ClientAPI] = None,
    collection_metadata: Optional[dict] = None,
    collection_configuration: Optional[
        CreateCollectionConfiguration
    ] = None,
    *,
    ssl: bool = False,
    **kwargs: Any
) -> Chroma

Create a Chroma vectorstore from a list of documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

Parameters:

Name Type Description Default
collection_name str

Name of the collection to create.

_LANGCHAIN_DEFAULT_COLLECTION_NAME
persist_directory Optional[str]

Directory to persist the collection.

None
host Optional[str]

Hostname of a deployed Chroma server.

None
port Optional[int]

Connection port for a deployed Chroma server. Default is 8000.

None
ssl bool

Whether to establish an SSL connection with a deployed Chroma server. Default is False.

False
headers Optional[dict[str, str]]

HTTP headers to send to a deployed Chroma server.

None
chroma_cloud_api_key Optional[str]

Chroma Cloud API key.

None
tenant Optional[str]

Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers.

None
database Optional[str]

Database name. Required for Chroma Cloud connections. Default is 'default_database'.

None
ids

List of document IDs. Defaults to None.

required
documents list[Document]

List of documents to add to the vectorstore.

required
embedding Optional[Embeddings]

Embedding function. Defaults to None.

None
client_settings Optional[Settings]

Chroma client settings.

None
client Optional[ClientAPI] None
collection_metadata Optional[dict]

Collection configurations. Defaults to None.

None
collection_configuration Optional[CreateCollectionConfiguration]

Index configuration for the collection. Defaults to None.

None
kwargs Any

Additional keyword arguments to initialize a Chroma client.

{}

Returns:

Name Type Description
Chroma Chroma

Chroma vectorstore.

delete

delete(
    ids: Optional[list[str]] = None, **kwargs: Any
) -> None

Delete by vector IDs.

Parameters:

Name Type Description Default
ids Optional[list[str]]

List of ids to delete.

None
kwargs Any

Additional keyword arguments.

{}