Skip to content

OpenAIEmbeddings

Reference docs

This page contains reference documentation for OpenAIEmbeddings. See the docs for conceptual guides, tutorials, and examples on using OpenAIEmbeddings.

langchain_openai.embeddings.OpenAIEmbeddings

Bases: BaseModel, Embeddings

OpenAI embedding model integration.

Setup

Install langchain_openai and set environment variable OPENAI_API_KEY.

pip install -U langchain_openai
export OPENAI_API_KEY="your-api-key"

Key init args — embedding params: model: Name of OpenAI model to use. dimensions: The number of dimensions the resulting output embeddings should have. Only supported in 'text-embedding-3' and later models.

Key init args — client params: api_key: OpenAI API key. organization: OpenAI organization ID. If not passed in will be read from env var OPENAI_ORG_ID. max_retries: Maximum number of retries to make when generating. request_timeout: Timeout for requests to OpenAI completion API

See full list of supported init args and their descriptions in the params section.

Instantiate
from langchain_openai import OpenAIEmbeddings

embed = OpenAIEmbeddings(
    model="text-embedding-3-large"
    # With the `text-embedding-3` class
    # of models, you can specify the size
    # of the embeddings you want returned.
    # dimensions=1024
)
Embed single text

input_text = "The meaning of life is 42"
vector = embeddings.embed_query("hello")
print(vector[:3])
[-0.024603435769677162, -0.007543657906353474, 0.0039630369283258915]

Embed multiple texts

vectors = embeddings.embed_documents(["hello", "goodbye"])
# Showing only the first 3 coordinates
print(len(vectors))
print(vectors[0][:3])
2
[-0.024603435769677162, -0.007543657906353474, 0.0039630369283258915]

Async

await embed.aembed_query(input_text)
print(vector[:3])

# multiple:
# await embed.aembed_documents(input_texts)
[-0.009100092574954033, 0.005071679595857859, -0.0029193938244134188]

METHOD DESCRIPTION
build_extra

Build extra kwargs from additional params that were passed in.

validate_environment

Validate that api key and python package exists in environment.

embed_documents

Call out to OpenAI's embedding endpoint for embedding search docs.

aembed_documents

Call out to OpenAI's embedding endpoint async for embedding search docs.

embed_query

Call out to OpenAI's embedding endpoint for embedding query text.

aembed_query

Call out to OpenAI's embedding endpoint async for embedding query text.

dimensions class-attribute instance-attribute

dimensions: int | None = None

The number of dimensions the resulting output embeddings should have.

Only supported in text-embedding-3 and later models.

openai_api_version class-attribute instance-attribute

openai_api_version: str | None = Field(
    default_factory=from_env("OPENAI_API_VERSION", default=None), alias="api_version"
)

Automatically inferred from env var OPENAI_API_VERSION if not provided.

openai_api_base class-attribute instance-attribute

openai_api_base: str | None = Field(
    alias="base_url", default_factory=from_env("OPENAI_API_BASE", default=None)
)

Base URL path for API requests, leave blank if not using a proxy or service emulator.

embedding_ctx_length class-attribute instance-attribute

embedding_ctx_length: int = 8191

The maximum number of tokens to embed at once.

openai_api_key class-attribute instance-attribute

openai_api_key: SecretStr | None = Field(
    alias="api_key", default_factory=secret_from_env("OPENAI_API_KEY", default=None)
)

Automatically inferred from env var OPENAI_API_KEY if not provided.

openai_organization class-attribute instance-attribute

openai_organization: str | None = Field(
    alias="organization",
    default_factory=from_env(["OPENAI_ORG_ID", "OPENAI_ORGANIZATION"], default=None),
)

Automatically inferred from env var OPENAI_ORG_ID if not provided.

chunk_size class-attribute instance-attribute

chunk_size: int = 1000

Maximum number of texts to embed in each batch

max_retries class-attribute instance-attribute

max_retries: int = 2

Maximum number of retries to make when generating.

request_timeout class-attribute instance-attribute

request_timeout: float | tuple[float, float] | Any | None = Field(
    default=None, alias="timeout"
)

Timeout for requests to OpenAI completion API. Can be float, httpx.Timeout or None.

tiktoken_enabled class-attribute instance-attribute

tiktoken_enabled: bool = True

Set this to False for non-OpenAI implementations of the embeddings API, e.g. the --extensions openai extension for text-generation-webui

tiktoken_model_name class-attribute instance-attribute

tiktoken_model_name: str | None = None

The model name to pass to tiktoken when using this class. Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit. By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.

show_progress_bar class-attribute instance-attribute

show_progress_bar: bool = False

Whether to show a progress bar when embedding.

model_kwargs class-attribute instance-attribute

model_kwargs: dict[str, Any] = Field(default_factory=dict)

Holds any model parameters valid for create call not explicitly specified.

skip_empty class-attribute instance-attribute

skip_empty: bool = False

Whether to skip empty strings when embedding or raise an error.

retry_min_seconds class-attribute instance-attribute

retry_min_seconds: int = 4

Min number of seconds to wait between retries

retry_max_seconds class-attribute instance-attribute

retry_max_seconds: int = 20

Max number of seconds to wait between retries

http_client class-attribute instance-attribute

http_client: Any | None = None

Optional httpx.Client. Only used for sync invocations. Must specify http_async_client as well if you'd like a custom client for async invocations.

http_async_client class-attribute instance-attribute

http_async_client: Any | None = None

Optional httpx.AsyncClient. Only used for async invocations. Must specify http_client as well if you'd like a custom client for sync invocations.

check_embedding_ctx_length class-attribute instance-attribute

check_embedding_ctx_length: bool = True

Whether to check the token length of inputs and automatically split inputs longer than embedding_ctx_length.

build_extra classmethod

build_extra(values: dict[str, Any]) -> Any

Build extra kwargs from additional params that were passed in.

validate_environment

validate_environment() -> Self

Validate that api key and python package exists in environment.

embed_documents

embed_documents(
    texts: list[str], chunk_size: int | None = None, **kwargs: Any
) -> list[list[float]]

Call out to OpenAI's embedding endpoint for embedding search docs.

PARAMETER DESCRIPTION
texts

The list of texts to embed.

TYPE: list[str]

chunk_size

The chunk size of embeddings. If None, will use the chunk size specified by the class.

TYPE: int | None DEFAULT: None

kwargs

Additional keyword arguments to pass to the embedding API.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[list[float]]

List of embeddings, one for each text.

aembed_documents async

aembed_documents(
    texts: list[str], chunk_size: int | None = None, **kwargs: Any
) -> list[list[float]]

Call out to OpenAI's embedding endpoint async for embedding search docs.

PARAMETER DESCRIPTION
texts

The list of texts to embed.

TYPE: list[str]

chunk_size

The chunk size of embeddings. If None, will use the chunk size specified by the class.

TYPE: int | None DEFAULT: None

kwargs

Additional keyword arguments to pass to the embedding API.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[list[float]]

List of embeddings, one for each text.

embed_query

embed_query(text: str, **kwargs: Any) -> list[float]

Call out to OpenAI's embedding endpoint for embedding query text.

PARAMETER DESCRIPTION
text

The text to embed.

TYPE: str

kwargs

Additional keyword arguments to pass to the embedding API.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[float]

Embedding for the text.

aembed_query async

aembed_query(text: str, **kwargs: Any) -> list[float]

Call out to OpenAI's embedding endpoint async for embedding query text.

PARAMETER DESCRIPTION
text

The text to embed.

TYPE: str

kwargs

Additional keyword arguments to pass to the embedding API.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
list[float]

Embedding for the text.