Skip to content

langchain-chroma

PyPI - Version PyPI - License PyPI - Downloads

Reference docs

This page contains reference documentation for Chroma. See the docs for conceptual guides, tutorials, and examples on using Chroma modules.

langchain_chroma

LangChain integration for Chroma vector database.

Chroma

Bases: VectorStore

Chroma vector store integration.

Setup

Install chromadb, langchain-chroma packages:

pip install -qU chromadb langchain-chroma

Key init args — indexing params: collection_name: Name of the collection. embedding_function: Embedding function to use.

Key init args — client params: client: Chroma client to use. client_settings: Chroma client settings. persist_directory: Directory to persist the collection. host: Hostname of a deployed Chroma server. port: Connection port for a deployed Chroma server. Default is 8000. ssl: Whether to establish an SSL connection with a deployed Chroma server. Default is False. headers: HTTP headers to send to a deployed Chroma server. chroma_cloud_api_key: Chroma Cloud API key. tenant: Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers. database: Database name. Required for Chroma Cloud connections. Default is 'default_database'.

Instantiate
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vector_store = Chroma(
    collection_name="foo",
    embedding_function=OpenAIEmbeddings(),
    # other params...
)
Add Documents
from langchain_core.documents import Document

document_1 = Document(page_content="foo", metadata={"baz": "bar"})
document_2 = Document(page_content="thud", metadata={"bar": "baz"})
document_3 = Document(page_content="i will be deleted :(")

documents = [document_1, document_2, document_3]
ids = ["1", "2", "3"]
vector_store.add_documents(documents=documents, ids=ids)
Update Documents
updated_document = Document(
    page_content="qux",
    metadata={"bar": "baz"},
)

vector_store.update_documents(ids=["1"], documents=[updated_document])
Delete Documents
vector_store.delete(ids=["3"])
Search with filter

results = vector_store.similarity_search(
    query="thud", k=1, filter={"baz": "bar"}
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")
*foo[{"baz": "bar"}]

Search with score

results = vector_store.similarity_search_with_score(query="qux", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.000000] qux [{'bar': 'baz', 'baz': 'bar'}]

Async
# add documents
# await vector_store.aadd_documents(documents=documents, ids=ids)

# delete documents
# await vector_store.adelete(ids=["3"])

# search
# results = vector_store.asimilarity_search(query="thud",k=1)

# search with score
results = await vector_store.asimilarity_search_with_score(query="qux", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.335463] foo [{'baz': 'bar'}]
Use as Retriever
retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
)
retriever.invoke("thud")
[Document(metadata={"baz": "bar"}, page_content="thud")]
METHOD DESCRIPTION
aget_by_ids

Async get documents by their IDs.

adelete

Async delete by vector ID or other criteria.

aadd_texts

Async run more texts through the embeddings and add to the VectorStore.

add_documents

Add or update documents in the VectorStore.

aadd_documents

Async run more documents through the embeddings and add to the VectorStore.

search

Return docs most similar to query using a specified search type.

asearch

Async return docs most similar to query using a specified search type.

asimilarity_search_with_score

Async run similarity search with distance.

similarity_search_with_relevance_scores

Return docs and relevance scores in the range [0, 1].

asimilarity_search_with_relevance_scores

Async return docs and relevance scores in the range [0, 1].

asimilarity_search

Async return docs most similar to query.

asimilarity_search_by_vector

Async return docs most similar to embedding vector.

amax_marginal_relevance_search

Async return docs selected using the maximal marginal relevance.

amax_marginal_relevance_search_by_vector

Async return docs selected using the maximal marginal relevance.

afrom_documents

Async return VectorStore initialized from documents and embeddings.

afrom_texts

Async return VectorStore initialized from texts and embeddings.

as_retriever

Return VectorStoreRetriever initialized from this VectorStore.

__init__

Initialize with a Chroma client.

encode_image

Get base64 string from image URI.

fork

Fork this vector store.

add_images

Run more images through the embeddings and add to the VectorStore.

add_texts

Run more texts through the embeddings and add to the VectorStore.

similarity_search

Run similarity search with Chroma.

similarity_search_by_vector

Return docs most similar to embedding vector.

similarity_search_by_vector_with_relevance_scores

Return docs most similar to embedding vector and similarity score.

similarity_search_with_score

Run similarity search with Chroma with distance.

similarity_search_with_vectors

Run similarity search with Chroma with vectors.

similarity_search_by_image

Search for similar images based on the given image URI.

similarity_search_by_image_with_relevance_score

Search for similar images based on the given image URI.

max_marginal_relevance_search_by_vector

Return docs selected using the maximal marginal relevance.

max_marginal_relevance_search

Return docs selected using the maximal marginal relevance.

delete_collection

Delete the collection.

reset_collection

Resets the collection.

get

Gets the collection.

get_by_ids

Get documents by their IDs.

update_document

Update a document in the collection.

update_documents

Update a document in the collection.

from_texts

Create a Chroma vectorstore from a raw documents.

from_documents

Create a Chroma vectorstore from a list of documents.

delete

Delete by vector IDs.

embeddings property

embeddings: Embeddings | None

Access the query embedding object.

aget_by_ids async

aget_by_ids(ids: Sequence[str]) -> list[Document]

Async get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

PARAMETER DESCRIPTION
ids

List of IDs to retrieve.

TYPE: Sequence[str]

RETURNS DESCRIPTION
list[Document]

List of Document objects.

adelete async

adelete(ids: list[str] | None = None, **kwargs: Any) -> bool | None

Async delete by vector ID or other criteria.

PARAMETER DESCRIPTION
ids

List of IDs to delete. If None, delete all.

TYPE: list[str] | None DEFAULT: None

**kwargs

Other keyword arguments that subclasses might use.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
bool | None

True if deletion is successful, False otherwise, None if not implemented.

aadd_texts async

aadd_texts(
    texts: Iterable[str],
    metadatas: list[dict] | None = None,
    *,
    ids: list[str] | None = None,
    **kwargs: Any,
) -> list[str]

Async run more texts through the embeddings and add to the VectorStore.

PARAMETER DESCRIPTION
texts

Iterable of strings to add to the VectorStore.

TYPE: Iterable[str]

metadatas

Optional list of metadatas associated with the texts.

TYPE: list[dict] | None DEFAULT: None

ids

Optional list

TYPE: list[str] | None DEFAULT: None

**kwargs

VectorStore specific parameters.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[str]

List of IDs from adding the texts into the VectorStore.

RAISES DESCRIPTION
ValueError

If the number of metadatas does not match the number of texts.

ValueError

If the number of IDs does not match the number of texts.

add_documents

add_documents(documents: list[Document], **kwargs: Any) -> list[str]

Add or update documents in the VectorStore.

PARAMETER DESCRIPTION
documents

Documents to add to the VectorStore.

TYPE: list[Document]

**kwargs

Additional keyword arguments.

If kwargs contains IDs and documents contain ids, the IDs in the kwargs will receive precedence.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[str]

List of IDs of the added texts.

aadd_documents async

aadd_documents(documents: list[Document], **kwargs: Any) -> list[str]

Async run more documents through the embeddings and add to the VectorStore.

PARAMETER DESCRIPTION
documents

Documents to add to the VectorStore.

TYPE: list[Document]

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[str]

List of IDs of the added texts.

search

search(query: str, search_type: str, **kwargs: Any) -> list[Document]

Return docs most similar to query using a specified search type.

PARAMETER DESCRIPTION
query

Input text.

TYPE: str

search_type

Type of search to perform. Can be 'similarity', 'mmr', or 'similarity_score_threshold'.

TYPE: str

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Document objects most similar to the query.

RAISES DESCRIPTION
ValueError

If search_type is not one of 'similarity', 'mmr', or 'similarity_score_threshold'.

asearch async

asearch(query: str, search_type: str, **kwargs: Any) -> list[Document]

Async return docs most similar to query using a specified search type.

PARAMETER DESCRIPTION
query

Input text.

TYPE: str

search_type

Type of search to perform. Can be 'similarity', 'mmr', or 'similarity_score_threshold'.

TYPE: str

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Document objects most similar to the query.

RAISES DESCRIPTION
ValueError

If search_type is not one of 'similarity', 'mmr', or 'similarity_score_threshold'.

asimilarity_search_with_score async

asimilarity_search_with_score(
    *args: Any, **kwargs: Any
) -> list[tuple[Document, float]]

Async run similarity search with distance.

PARAMETER DESCRIPTION
*args

Arguments to pass to the search method.

TYPE: Any DEFAULT: ()

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of tuples of (doc, similarity_score).

similarity_search_with_relevance_scores

similarity_search_with_relevance_scores(
    query: str, k: int = 4, **kwargs: Any
) -> list[tuple[Document, float]]

Return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

PARAMETER DESCRIPTION
query

Input text.

TYPE: str

k

Number of Document objects to return.

TYPE: int DEFAULT: 4

**kwargs

kwargs to be passed to similarity search. Should include score_threshold, An optional floating point value between 0 to 1 to filter the resulting set of retrieved docs

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of tuples of (doc, similarity_score).

asimilarity_search_with_relevance_scores async

asimilarity_search_with_relevance_scores(
    query: str, k: int = 4, **kwargs: Any
) -> list[tuple[Document, float]]

Async return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

PARAMETER DESCRIPTION
query

Input text.

TYPE: str

k

Number of Document objects to return.

TYPE: int DEFAULT: 4

**kwargs

kwargs to be passed to similarity search. Should include score_threshold, An optional floating point value between 0 to 1 to filter the resulting set of retrieved docs

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of tuples of (doc, similarity_score)

asimilarity_search(query: str, k: int = 4, **kwargs: Any) -> list[Document]

Async return docs most similar to query.

PARAMETER DESCRIPTION
query

Input text.

TYPE: str

k

Number of Document objects to return.

TYPE: int DEFAULT: 4

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Document objects most similar to the query.

asimilarity_search_by_vector async

asimilarity_search_by_vector(
    embedding: list[float], k: int = 4, **kwargs: Any
) -> list[Document]

Async return docs most similar to embedding vector.

PARAMETER DESCRIPTION
embedding

Embedding to look up documents similar to.

TYPE: list[float]

k

Number of Document objects to return.

TYPE: int DEFAULT: 4

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Document objects most similar to the query vector.

amax_marginal_relevance_search(
    query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any
) -> list[Document]

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER DESCRIPTION
query

Text to look up documents similar to.

TYPE: str

k

Number of Document objects to return.

TYPE: int DEFAULT: 4

fetch_k

Number of Document objects to fetch to pass to MMR algorithm.

TYPE: int DEFAULT: 20

lambda_mult

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity.

TYPE: float DEFAULT: 0.5

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Document objects selected by maximal marginal relevance.

amax_marginal_relevance_search_by_vector async

amax_marginal_relevance_search_by_vector(
    embedding: list[float],
    k: int = 4,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    **kwargs: Any,
) -> list[Document]

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER DESCRIPTION
embedding

Embedding to look up documents similar to.

TYPE: list[float]

k

Number of Document objects to return.

TYPE: int DEFAULT: 4

fetch_k

Number of Document objects to fetch to pass to MMR algorithm.

TYPE: int DEFAULT: 20

lambda_mult

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity.

TYPE: float DEFAULT: 0.5

**kwargs

Arguments to pass to the search method.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Document objects selected by maximal marginal relevance.

afrom_documents async classmethod

afrom_documents(
    documents: list[Document], embedding: Embeddings, **kwargs: Any
) -> Self

Async return VectorStore initialized from documents and embeddings.

PARAMETER DESCRIPTION
documents

List of Document objects to add to the VectorStore.

TYPE: list[Document]

embedding

Embedding function to use.

TYPE: Embeddings

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Self

VectorStore initialized from documents and embeddings.

afrom_texts async classmethod

afrom_texts(
    texts: list[str],
    embedding: Embeddings,
    metadatas: list[dict] | None = None,
    *,
    ids: list[str] | None = None,
    **kwargs: Any,
) -> Self

Async return VectorStore initialized from texts and embeddings.

PARAMETER DESCRIPTION
texts

Texts to add to the VectorStore.

TYPE: list[str]

embedding

Embedding function to use.

TYPE: Embeddings

metadatas

Optional list of metadatas associated with the texts.

TYPE: list[dict] | None DEFAULT: None

ids

Optional list of IDs associated with the texts.

TYPE: list[str] | None DEFAULT: None

**kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Self

VectorStore initialized from texts and embeddings.

as_retriever

as_retriever(**kwargs: Any) -> VectorStoreRetriever

Return VectorStoreRetriever initialized from this VectorStore.

PARAMETER DESCRIPTION
**kwargs

Keyword arguments to pass to the search function. Can include:

  • search_type: Defines the type of search that the Retriever should perform. Can be 'similarity' (default), 'mmr', or 'similarity_score_threshold'.
  • search_kwargs: Keyword arguments to pass to the search function. Can include things like:

    • k: Amount of documents to return (Default: 4)
    • score_threshold: Minimum relevance threshold for similarity_score_threshold
    • fetch_k: Amount of documents to pass to MMR algorithm (Default: 20)
    • lambda_mult: Diversity of results returned by MMR; 1 for minimum diversity and 0 for maximum. (Default: 0.5)
    • filter: Filter by document metadata

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
VectorStoreRetriever

Retriever class for VectorStore.

Examples:

# Retrieve more documents with higher diversity
# Useful if your dataset has many similar documents
docsearch.as_retriever(
    search_type="mmr", search_kwargs={"k": 6, "lambda_mult": 0.25}
)

# Fetch more documents for the MMR algorithm to consider
# But only return the top 5
docsearch.as_retriever(search_type="mmr", search_kwargs={"k": 5, "fetch_k": 50})

# Only retrieve documents that have a relevance score
# Above a certain threshold
docsearch.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.8},
)

# Only get the single most similar document from the dataset
docsearch.as_retriever(search_kwargs={"k": 1})

# Use a filter to only retrieve documents from a specific paper
docsearch.as_retriever(
    search_kwargs={"filter": {"paper_title": "GPT-4 Technical Report"}}
)

__init__

__init__(
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    embedding_function: Embeddings | None = None,
    persist_directory: str | None = None,
    host: str | None = None,
    port: int | None = None,
    headers: dict[str, str] | None = None,
    chroma_cloud_api_key: str | None = None,
    tenant: str | None = None,
    database: str | None = None,
    client_settings: Settings | None = None,
    collection_metadata: dict | None = None,
    collection_configuration: CreateCollectionConfiguration | None = None,
    client: ClientAPI | None = None,
    relevance_score_fn: Callable[[float], float] | None = None,
    create_collection_if_not_exists: bool | None = True,
    *,
    ssl: bool = False,
) -> None

Initialize with a Chroma client.

PARAMETER DESCRIPTION
collection_name

Name of the collection to create.

TYPE: str DEFAULT: _LANGCHAIN_DEFAULT_COLLECTION_NAME

embedding_function

Embedding class object. Used to embed texts.

TYPE: Embeddings | None DEFAULT: None

persist_directory

Directory to persist the collection.

TYPE: str | None DEFAULT: None

host

Hostname of a deployed Chroma server.

TYPE: str | None DEFAULT: None

port

Connection port for a deployed Chroma server. Default is 8000.

TYPE: int | None DEFAULT: None

ssl

Whether to establish an SSL connection with a deployed Chroma server. Default is False.

TYPE: bool DEFAULT: False

headers

HTTP headers to send to a deployed Chroma server.

TYPE: dict[str, str] | None DEFAULT: None

chroma_cloud_api_key

Chroma Cloud API key.

TYPE: str | None DEFAULT: None

tenant

Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers.

TYPE: str | None DEFAULT: None

database

Database name. Required for Chroma Cloud connections. Default is 'default_database'.

TYPE: str | None DEFAULT: None

client_settings

Chroma client settings

TYPE: Settings | None DEFAULT: None

collection_metadata

Collection configurations.

TYPE: dict | None DEFAULT: None

collection_configuration

Index configuration for the collection.

TYPE: CreateCollectionConfiguration | None DEFAULT: None

client

TYPE: ClientAPI | None DEFAULT: None

relevance_score_fn

Function to calculate relevance score from distance. Used only in similarity_search_with_relevance_scores

TYPE: Callable[[float], float] | None DEFAULT: None

create_collection_if_not_exists

Whether to create collection if it doesn't exist. Defaults to True.

TYPE: bool | None DEFAULT: True

__ensure_collection

__ensure_collection() -> None

Ensure that the collection exists or create it.

__query_collection

__query_collection(
    query_texts: list[str] | None = None,
    query_embeddings: list[list[float]] | None = None,
    n_results: int = 4,
    where: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document] | QueryResult

Query the chroma collection.

PARAMETER DESCRIPTION
query_texts

List of query texts.

TYPE: list[str] | None DEFAULT: None

query_embeddings

List of query embeddings.

TYPE: list[list[float]] | None DEFAULT: None

n_results

Number of results to return.

TYPE: int DEFAULT: 4

where

dict used to filter results by metadata. E.g. {"color" : "red"}.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document] | QueryResult

List of n_results nearest neighbor embeddings for provided

list[Document] | QueryResult

query_embeddings or query_texts.

See more: https://docs.trychroma.com/reference/py-collection#query

encode_image staticmethod

encode_image(uri: str) -> str

Get base64 string from image URI.

fork

fork(new_name: str) -> Chroma

Fork this vector store.

PARAMETER DESCRIPTION
new_name

New name for the forked store.

TYPE: str

RETURNS DESCRIPTION
Chroma

A new Chroma store forked from this vector store.

add_images

add_images(
    uris: list[str], metadatas: list[dict] | None = None, ids: list[str] | None = None
) -> list[str]

Run more images through the embeddings and add to the VectorStore.

PARAMETER DESCRIPTION
uris

File path to the image.

TYPE: list[str]

metadatas

Optional list of metadatas. When querying, you can filter on this metadata.

TYPE: list[dict] | None DEFAULT: None

ids

Optional list of IDs. (Items without IDs will be assigned UUIDs)

TYPE: list[str] | None DEFAULT: None

RETURNS DESCRIPTION
list[str]

List of IDs of the added images.

RAISES DESCRIPTION
ValueError

When metadata is incorrect.

add_texts

add_texts(
    texts: Iterable[str],
    metadatas: list[dict] | None = None,
    ids: list[str] | None = None,
    **kwargs: Any,
) -> list[str]

Run more texts through the embeddings and add to the VectorStore.

PARAMETER DESCRIPTION
texts

Texts to add to the VectorStore.

TYPE: Iterable[str]

metadatas

Optional list of metadatas. When querying, you can filter on this metadata.

TYPE: list[dict] | None DEFAULT: None

ids

Optional list of IDs. (Items without IDs will be assigned UUIDs)

TYPE: list[str] | None DEFAULT: None

kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[str]

List of IDs of the added texts.

RAISES DESCRIPTION
ValueError

When metadata is incorrect.

similarity_search(
    query: str, k: int = DEFAULT_K, filter: dict[str, str] | None = None, **kwargs: Any
) -> list[Document]

Run similarity search with Chroma.

PARAMETER DESCRIPTION
query

Query text to search for.

TYPE: str

k

Number of results to return.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of documents most similar to the query text.

similarity_search_by_vector

similarity_search_by_vector(
    embedding: list[float],
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document]

Return docs most similar to embedding vector.

PARAMETER DESCRIPTION
embedding

Embedding to look up documents similar to.

TYPE: list[float]

k

Number of Documents to return.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Document objects most similar to the query vector.

similarity_search_by_vector_with_relevance_scores

similarity_search_by_vector_with_relevance_scores(
    embedding: list[float],
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[tuple[Document, float]]

Return docs most similar to embedding vector and similarity score.

PARAMETER DESCRIPTION
embedding

Embedding to look up documents similar to.

TYPE: List[float]

k

Number of Documents to return.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the documents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of documents most similar to the query text and relevance score

list[tuple[Document, float]]

in float for each. Lower score represents more similarity.

similarity_search_with_score

similarity_search_with_score(
    query: str,
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[tuple[Document, float]]

Run similarity search with Chroma with distance.

PARAMETER DESCRIPTION
query

Query text to search for.

TYPE: str

k

Number of results to return.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by document contents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of documents most similar to the query text and

list[tuple[Document, float]]

distance in float for each. Lower score represents more similarity.

similarity_search_with_vectors

similarity_search_with_vectors(
    query: str,
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[tuple[Document, ndarray]]

Run similarity search with Chroma with vectors.

PARAMETER DESCRIPTION
query

Query text to search for.

TYPE: str

k

Number of results to return.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the document contents. E.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, ndarray]]

List of documents most similar to the query text and

list[tuple[Document, ndarray]]

embedding vectors for each.

similarity_search_by_image

similarity_search_by_image(
    uri: str, k: int = DEFAULT_K, filter: dict[str, str] | None = None, **kwargs: Any
) -> list[Document]

Search for similar images based on the given image URI.

PARAMETER DESCRIPTION
uri

URI of the image to search for.

TYPE: str

k

Number of results to return.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

**kwargs

Additional arguments to pass to function.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Images most similar to the provided image. Each element in list is a

list[Document]

LangChain Document Object. The page content is b64 encoded image, metadata

list[Document]

is default or as defined by user.

RAISES DESCRIPTION
ValueError

If the embedding function does not support image embeddings.

similarity_search_by_image_with_relevance_score

similarity_search_by_image_with_relevance_score(
    uri: str, k: int = DEFAULT_K, filter: dict[str, str] | None = None, **kwargs: Any
) -> list[tuple[Document, float]]

Search for similar images based on the given image URI.

PARAMETER DESCRIPTION
uri

URI of the image to search for.

TYPE: str

k

Number of results to return.

TYPE: int DEFAULT: DEFAULT_K

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

**kwargs

Additional arguments to pass to function.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[tuple[Document, float]]

List of tuples containing documents similar to the query image and their

list[tuple[Document, float]]

similarity scores. 0th element in each tuple is a LangChain Document Object.

list[tuple[Document, float]]

The page content is b64 encoded img, metadata is default or defined by user.

RAISES DESCRIPTION
ValueError

If the embedding function does not support image embeddings.

max_marginal_relevance_search_by_vector

max_marginal_relevance_search_by_vector(
    embedding: list[float],
    k: int = DEFAULT_K,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document]

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER DESCRIPTION
embedding

Embedding to look up documents similar to.

TYPE: list[float]

k

Number of Document objects to return.

TYPE: int DEFAULT: DEFAULT_K

fetch_k

Number of Document objects to fetch to pass to MMR algorithm.

TYPE: int DEFAULT: 20

lambda_mult

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity.

TYPE: float DEFAULT: 0.5

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the document contents. e.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Document objects selected by maximal marginal relevance.

max_marginal_relevance_search(
    query: str,
    k: int = DEFAULT_K,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document]

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

PARAMETER DESCRIPTION
query

Text to look up documents similar to.

TYPE: str

k

Number of Documents to return.

TYPE: int DEFAULT: DEFAULT_K

fetch_k

Number of Documents to fetch to pass to MMR algorithm.

TYPE: int DEFAULT: 20

lambda_mult

Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity.

TYPE: float DEFAULT: 0.5

filter

Filter by metadata.

TYPE: dict[str, str] | None DEFAULT: None

where_document

dict used to filter by the document contents. e.g. {"$contains": "hello"}.

TYPE: dict[str, str] | None DEFAULT: None

kwargs

Additional keyword arguments to pass to Chroma collection query.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[Document]

List of Document objects selected by maximal marginal relevance.

RAISES DESCRIPTION
ValueError

If the embedding function is not provided.

delete_collection

delete_collection() -> None

Delete the collection.

reset_collection

reset_collection() -> None

Resets the collection.

Resets the collection by deleting the collection and recreating an empty one.

get

get(
    ids: str | list[str] | None = None,
    where: Where | None = None,
    limit: int | None = None,
    offset: int | None = None,
    where_document: WhereDocument | None = None,
    include: list[str] | None = None,
) -> dict[str, Any]

Gets the collection.

PARAMETER DESCRIPTION
ids

The ids of the embeddings to get. Optional.

TYPE: str | list[str] | None DEFAULT: None

where

A Where type dict used to filter results by. E.g. {"$and": [{"color": "red"}, {"price": 4.20}]} Optional.

TYPE: Where | None DEFAULT: None

limit

The number of documents to return. Optional.

TYPE: int | None DEFAULT: None

offset

The offset to start returning results from. Useful for paging results with limit. Optional.

TYPE: int | None DEFAULT: None

where_document

A WhereDocument type dict used to filter by the documents. E.g. {"$contains": "hello"}. Optional.

TYPE: WhereDocument | None DEFAULT: None

include

A list of what to include in the results. Can contain "embeddings", "metadatas", "documents". Ids are always included. Defaults to ["metadatas", "documents"]. Optional.

TYPE: list[str] | None DEFAULT: None

RETURNS DESCRIPTION
dict[str, Any]

A dict with the keys "ids", "embeddings", "metadatas", "documents".

get_by_ids

get_by_ids(ids: Sequence[str]) -> list[Document]

Get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

PARAMETER DESCRIPTION
ids

List of ids to retrieve.

TYPE: Sequence[str]

RETURNS DESCRIPTION
list[Document]

List of Document objects.

Added in 0.2.1

update_document

update_document(document_id: str, document: Document) -> None

Update a document in the collection.

PARAMETER DESCRIPTION
document_id

ID of the document to update.

TYPE: str

document

Document to update.

TYPE: Document

update_documents

update_documents(ids: list[str], documents: list[Document]) -> None

Update a document in the collection.

PARAMETER DESCRIPTION
ids

List of ids of the document to update.

TYPE: list[str]

documents

List of documents to update.

TYPE: list[Document]

RAISES DESCRIPTION
ValueError

If the embedding function is not provided.

from_texts classmethod

from_texts(
    texts: list[str],
    embedding: Embeddings | None = None,
    metadatas: list[dict] | None = None,
    ids: list[str] | None = None,
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    persist_directory: str | None = None,
    host: str | None = None,
    port: int | None = None,
    headers: dict[str, str] | None = None,
    chroma_cloud_api_key: str | None = None,
    tenant: str | None = None,
    database: str | None = None,
    client_settings: Settings | None = None,
    client: ClientAPI | None = None,
    collection_metadata: dict | None = None,
    collection_configuration: CreateCollectionConfiguration | None = None,
    *,
    ssl: bool = False,
    **kwargs: Any,
) -> Chroma

Create a Chroma vectorstore from a raw documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

PARAMETER DESCRIPTION
texts

List of texts to add to the collection.

TYPE: list[str]

collection_name

Name of the collection to create.

TYPE: str DEFAULT: _LANGCHAIN_DEFAULT_COLLECTION_NAME

persist_directory

Directory to persist the collection.

TYPE: str | None DEFAULT: None

host

Hostname of a deployed Chroma server.

TYPE: str | None DEFAULT: None

port

Connection port for a deployed Chroma server. Default is 8000.

TYPE: int | None DEFAULT: None

ssl

Whether to establish an SSL connection with a deployed Chroma server. Default is False.

TYPE: bool DEFAULT: False

headers

HTTP headers to send to a deployed Chroma server.

TYPE: dict[str, str] | None DEFAULT: None

chroma_cloud_api_key

Chroma Cloud API key.

TYPE: str | None DEFAULT: None

tenant

Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers.

TYPE: str | None DEFAULT: None

database

Database name. Required for Chroma Cloud connections. Default is 'default_database'.

TYPE: str | None DEFAULT: None

embedding

Embedding function.

TYPE: Embeddings | None DEFAULT: None

metadatas

List of metadatas.

TYPE: list[dict] | None DEFAULT: None

ids

List of document IDs.

TYPE: list[str] | None DEFAULT: None

client_settings

Chroma client settings.

TYPE: Settings | None DEFAULT: None

client

TYPE: ClientAPI | None DEFAULT: None

collection_metadata

Collection configurations.

TYPE: dict | None DEFAULT: None

collection_configuration

Index configuration for the collection.

TYPE: CreateCollectionConfiguration | None DEFAULT: None

kwargs

Additional keyword arguments to initialize a Chroma client.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Chroma

Chroma vectorstore.

TYPE: Chroma

from_documents classmethod

from_documents(
    documents: list[Document],
    embedding: Embeddings | None = None,
    ids: list[str] | None = None,
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    persist_directory: str | None = None,
    host: str | None = None,
    port: int | None = None,
    headers: dict[str, str] | None = None,
    chroma_cloud_api_key: str | None = None,
    tenant: str | None = None,
    database: str | None = None,
    client_settings: Settings | None = None,
    client: ClientAPI | None = None,
    collection_metadata: dict | None = None,
    collection_configuration: CreateCollectionConfiguration | None = None,
    *,
    ssl: bool = False,
    **kwargs: Any,
) -> Chroma

Create a Chroma vectorstore from a list of documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

PARAMETER DESCRIPTION
collection_name

Name of the collection to create.

TYPE: str DEFAULT: _LANGCHAIN_DEFAULT_COLLECTION_NAME

persist_directory

Directory to persist the collection.

TYPE: str | None DEFAULT: None

host

Hostname of a deployed Chroma server.

TYPE: str | None DEFAULT: None

port

Connection port for a deployed Chroma server. Default is 8000.

TYPE: int | None DEFAULT: None

ssl

Whether to establish an SSL connection with a deployed Chroma server.

TYPE: bool DEFAULT: False

headers

HTTP headers to send to a deployed Chroma server.

TYPE: dict[str, str] | None DEFAULT: None

chroma_cloud_api_key

Chroma Cloud API key.

TYPE: str | None DEFAULT: None

tenant

Tenant ID. Required for Chroma Cloud connections. Default is 'default_tenant' for local Chroma servers.

TYPE: str | None DEFAULT: None

database

Database name. Required for Chroma Cloud connections. Default is 'default_database'.

TYPE: str | None DEFAULT: None

ids

List of document IDs.

TYPE: list[str] | None DEFAULT: None

documents

List of documents to add to the VectorStore.

TYPE: list[Document]

embedding

Embedding function.

TYPE: Embeddings | None DEFAULT: None

client_settings

Chroma client settings.

TYPE: Settings | None DEFAULT: None

client

TYPE: ClientAPI | None DEFAULT: None

collection_metadata

Collection configurations.

TYPE: dict | None DEFAULT: None

collection_configuration

Index configuration for the collection.

TYPE: CreateCollectionConfiguration | None DEFAULT: None

kwargs

Additional keyword arguments to initialize a Chroma client.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Chroma

Chroma vectorstore.

TYPE: Chroma

delete

delete(ids: list[str] | None = None, **kwargs: Any) -> None

Delete by vector IDs.

PARAMETER DESCRIPTION
ids

List of ids to delete.

TYPE: list[str] | None DEFAULT: None

kwargs

Additional keyword arguments.

TYPE: Any DEFAULT: {}