OpenAIEmbeddings
¶
Reference docs
This page contains reference documentation for OpenAIEmbeddings
. See
the docs
for conceptual guides, tutorials, and examples on using OpenAIEmbeddings
.
langchain_openai.embeddings.OpenAIEmbeddings
¶
Bases: BaseModel
, Embeddings
OpenAI embedding model integration.
Setup
Install langchain_openai
and set environment variable OPENAI_API_KEY
.
Key init args — embedding params:
model:
Name of OpenAI model to use.
dimensions:
The number of dimensions the resulting output embeddings should have.
Only supported in 'text-embedding-3'
and later models.
Key init args — client params:
api_key:
OpenAI API key.
organization:
OpenAI organization ID. If not passed in will be read
from env var OPENAI_ORG_ID
.
max_retries:
Maximum number of retries to make when generating.
request_timeout:
Timeout for requests to OpenAI completion API
See full list of supported init args and their descriptions in the params section.
Instantiate
Async
METHOD | DESCRIPTION |
---|---|
build_extra |
Build extra kwargs from additional params that were passed in. |
validate_environment |
Validate that api key and python package exists in environment. |
embed_documents |
Call out to OpenAI's embedding endpoint for embedding search docs. |
aembed_documents |
Call out to OpenAI's embedding endpoint async for embedding search docs. |
embed_query |
Call out to OpenAI's embedding endpoint for embedding query text. |
aembed_query |
Call out to OpenAI's embedding endpoint async for embedding query text. |
dimensions
class-attribute
instance-attribute
¶
dimensions: int | None = None
The number of dimensions the resulting output embeddings should have.
Only supported in text-embedding-3
and later models.
openai_api_version
class-attribute
instance-attribute
¶
openai_api_version: str | None = Field(
default_factory=from_env("OPENAI_API_VERSION", default=None), alias="api_version"
)
Automatically inferred from env var OPENAI_API_VERSION
if not provided.
openai_api_base
class-attribute
instance-attribute
¶
openai_api_base: str | None = Field(
alias="base_url", default_factory=from_env("OPENAI_API_BASE", default=None)
)
Base URL path for API requests, leave blank if not using a proxy or service emulator.
embedding_ctx_length
class-attribute
instance-attribute
¶
embedding_ctx_length: int = 8191
The maximum number of tokens to embed at once.
openai_api_key
class-attribute
instance-attribute
¶
openai_api_key: SecretStr | None = Field(
alias="api_key", default_factory=secret_from_env("OPENAI_API_KEY", default=None)
)
Automatically inferred from env var OPENAI_API_KEY
if not provided.
openai_organization
class-attribute
instance-attribute
¶
openai_organization: str | None = Field(
alias="organization",
default_factory=from_env(["OPENAI_ORG_ID", "OPENAI_ORGANIZATION"], default=None),
)
Automatically inferred from env var OPENAI_ORG_ID
if not provided.
chunk_size
class-attribute
instance-attribute
¶
chunk_size: int = 1000
Maximum number of texts to embed in each batch
max_retries
class-attribute
instance-attribute
¶
max_retries: int = 2
Maximum number of retries to make when generating.
request_timeout
class-attribute
instance-attribute
¶
Timeout for requests to OpenAI completion API. Can be float, httpx.Timeout
or
None.
tiktoken_enabled
class-attribute
instance-attribute
¶
tiktoken_enabled: bool = True
Set this to False for non-OpenAI implementations of the embeddings API, e.g.
the --extensions openai
extension for text-generation-webui
tiktoken_model_name
class-attribute
instance-attribute
¶
tiktoken_model_name: str | None = None
The model name to pass to tiktoken when using this class. Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit. By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.
show_progress_bar
class-attribute
instance-attribute
¶
show_progress_bar: bool = False
Whether to show a progress bar when embedding.
model_kwargs
class-attribute
instance-attribute
¶
Holds any model parameters valid for create
call not explicitly specified.
skip_empty
class-attribute
instance-attribute
¶
skip_empty: bool = False
Whether to skip empty strings when embedding or raise an error.
retry_min_seconds
class-attribute
instance-attribute
¶
retry_min_seconds: int = 4
Min number of seconds to wait between retries
retry_max_seconds
class-attribute
instance-attribute
¶
retry_max_seconds: int = 20
Max number of seconds to wait between retries
http_client
class-attribute
instance-attribute
¶
http_client: Any | None = None
Optional httpx.Client
. Only used for sync invocations. Must specify
http_async_client
as well if you'd like a custom client for async
invocations.
http_async_client
class-attribute
instance-attribute
¶
http_async_client: Any | None = None
Optional httpx.AsyncClient
. Only used for async invocations. Must specify
http_client
as well if you'd like a custom client for sync invocations.
check_embedding_ctx_length
class-attribute
instance-attribute
¶
check_embedding_ctx_length: bool = True
Whether to check the token length of inputs and automatically split inputs longer than embedding_ctx_length.
build_extra
classmethod
¶
Build extra kwargs from additional params that were passed in.
validate_environment
¶
validate_environment() -> Self
Validate that api key and python package exists in environment.
embed_documents
¶
embed_documents(
texts: list[str], chunk_size: int | None = None, **kwargs: Any
) -> list[list[float]]
Call out to OpenAI's embedding endpoint for embedding search docs.
PARAMETER | DESCRIPTION |
---|---|
texts
|
The list of texts to embed. |
chunk_size
|
The chunk size of embeddings. If
TYPE:
|
kwargs
|
Additional keyword arguments to pass to the embedding API.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list[list[float]]
|
List of embeddings, one for each text. |
aembed_documents
async
¶
aembed_documents(
texts: list[str], chunk_size: int | None = None, **kwargs: Any
) -> list[list[float]]
Call out to OpenAI's embedding endpoint async for embedding search docs.
PARAMETER | DESCRIPTION |
---|---|
texts
|
The list of texts to embed. |
chunk_size
|
The chunk size of embeddings. If
TYPE:
|
kwargs
|
Additional keyword arguments to pass to the embedding API.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list[list[float]]
|
List of embeddings, one for each text. |