OpenAIEmbeddings()OpenAI embedding model integration.
Setup:
Install langchain_openai and set environment variable OPENAI_API_KEY.
pip install -U langchain_openai
export OPENAI_API_KEY="your-api-key"
Key init args — embedding params:
model:
Name of OpenAI model to use.
dimensions:
The number of dimensions the resulting output embeddings should have.
Only supported in 'text-embedding-3' and later models.
Key init args — client params:
api_key:
OpenAI API key.
organization:
OpenAI organization ID. If not passed in will be read
from env var OPENAI_ORG_ID.
max_retries:
Maximum number of retries to make when generating.
request_timeout:
Timeout for requests to OpenAI completion API
See full list of supported init args and their descriptions in the params section.
Instantiate:
from langchain_openai import OpenAIEmbeddings
embed = OpenAIEmbeddings(
model="text-embedding-3-large"
# With the `text-embedding-3` class
# of models, you can specify the size
# of the embeddings you want returned.
# dimensions=1024
)
Embed single text:
input_text = "The meaning of life is 42"
vector = embeddings.embed_query("hello")
print(vector[:3])
[-0.024603435769677162, -0.007543657906353474, 0.0039630369283258915]
Embed multiple texts:
vectors = embeddings.embed_documents(["hello", "goodbye"])
# Showing only the first 3 coordinates
print(len(vectors))
print(vectors[0][:3])
2
[-0.024603435769677162, -0.007543657906353474, 0.0039630369283258915]
Async:
await embed.aembed_query(input_text)
print(vector[:3])
# multiple:
# await embed.aembed_documents(input_texts)
[-0.009100092574954033, 0.005071679595857859, -0.0029193938244134188]
When using a non-OpenAI provider, set
check_embedding_ctx_length=False to send raw text instead of tokens
(which many providers don't support), and optionally set
encoding_format to 'float' to avoid base64 encoding issues:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model="...",
base_url="...",
check_embedding_ctx_length=False,
)