Name of the deployed OpenAI model, e.g. 'gpt-4o', 'gpt-35-turbo', etc.
What sampling temperature to use.
Holds any model parameters valid for create call not explicitly specified.
Automatically inferred from env var AZURE_OPENAI_API_KEY if not provided.
Base URL path for API requests, leave blank if not using a proxy or
OpenAI organization ID to use for API calls.
Timeout for requests to OpenAI completion API.
Whether to include usage metadata in streaming output. If enabled, an additional
Maximum number of retries to make when generating.
Penalizes repeated tokens.
Penalizes repeated tokens according to frequency.
Seed for generation
Whether to return logprobs.
Number of most likely tokens to return at each token position, each with an
Modify the likelihood of specified tokens appearing in the completion.
Whether to stream the results or not.
Number of chat completions to generate for each prompt.
Total probability mass of tokens to consider at each step.
Constrains effort on reasoning for reasoning models. For use with the Chat
Reasoning parameters for reasoning models. For use with the Responses API.
Controls the verbosity level of responses for reasoning models. For use with the
The model name to pass to tiktoken when using this class.
Optional httpx.Client.
Optional httpx.AsyncClient.
Default stop sequences.
Optional additional JSON properties to include in the request parameters when
Whether to include response headers in the output message response_metadata.
Parameters of the OpenAI client or chat.completions endpoint that should be
Configuration for
Additional fields to include in generations from Responses API.
Latency tier for request.
If True, OpenAI may store response data for future use.
Truncation strategy (Responses API).
If True, always pass previous_response_id using the ID of the most recent
Whether to use the Responses API instead of the Chat API.
Version of AIMessage output format to use.
Build extra kwargs from additional params that were passed in.
Validate temperature parameter for different models.
Validate that api key and python package exists in environment.
Get the tokens present in the text with tiktoken package.
Interface to OpenAI chat model APIs.
ChatOpenAI targets
official OpenAI API specifications
only. Non-standard response fields added by third-party providers (e.g.,
reasoning_content, reasoning_details) are not extracted or
preserved. If you are pointing base_url at a provider such as
OpenRouter, vLLM, or DeepSeek, use the corresponding provider-specific
LangChain package instead (e.g., ChatDeepSeek, ChatOpenRouter).
Install langchain-openai and set environment variable OPENAI_API_KEY.
pip install -U langchain-openai
# or using uv
uv add langchain-openai
export OPENAI_API_KEY="your-api-key"| Param | Type | Description |
|---|---|---|
model |
str |
Name of OpenAI model to use. |
temperature |
float |
Sampling temperature. |
max_tokens |
`int | None` |
logprobs |
`bool | None` |
stream_options |
dict |
Configure streaming outputs, like whether to return token usage when streaming ({"include_usage": True}). |
use_responses_api |
`bool | None` |
See full list of supported init args and their descriptions below.
| Param | Type | Description |
|---|---|---|
timeout |
`float | Tuple[float, float] |
max_retries |
`int | None` |
api_key |
`str | None` |
base_url |
`str | None` |
organization |
`str | None` |
See full list of supported init args and their descriptions below.
Create a model instance with desired params. For example:
from langchain_openai import ChatOpenAI
model = ChatOpenAI(
model="...",
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
# api_key="...",
# base_url="...",
# organization="...",
# other params...
)
See all available params below.
Any param which is not explicitly supported will be passed directly to
openai.OpenAI.chat.completions.create(...)
every time to the model is invoked. For example:
from langchain_openai import ChatOpenAI
import openai
ChatOpenAI(..., frequency_penalty=0.2).invoke(...)
# Results in underlying API call of:
openai.OpenAI(..).chat.completions.create(..., frequency_penalty=0.2)
# Which is also equivalent to:
ChatOpenAI(...).invoke(..., frequency_penalty=0.2)Generate a response from the model:
messages = [
(
"system",
"You are a helpful translator. Translate the user sentence to French.",
),
("human", "I love programming."),
]
model.invoke(messages)
Results in an AIMessage response:
AIMessage(
content="J'adore la programmation.",
response_metadata={
"token_usage": {
"completion_tokens": 5,
"prompt_tokens": 31,
"total_tokens": 36,
},
"model_name": "gpt-4o",
"system_fingerprint": "fp_43dfabdef1",
"finish_reason": "stop",
"logprobs": None,
},
id="run-012cffe2-5d3d-424d-83b5-51c6d4a593d1-0",
usage_metadata={"input_tokens": 31, "output_tokens": 5, "total_tokens": 36},
)Stream a response from the model:
for chunk in model.stream(messages):
print(chunk.text, end="")
Results in a sequence of AIMessageChunk objects with partial content:
AIMessageChunk(content="", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(content="J", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(content="'adore", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(content=" la", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(
content=" programmation", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0"
)
AIMessageChunk(content=".", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(
content="",
response_metadata={"finish_reason": "stop"},
id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0",
)
To collect the full message, you can concatenate the chunks:
stream = model.stream(messages)
full = next(stream)
for chunk in stream:
full += chunk
full = AIMessageChunk(
content="J'adore la programmation.",
response_metadata={"finish_reason": "stop"},
id="run-bf917526-7f58-4683-84f7-36a6b671d140",
)Asynchronous equivalents of invoke, stream, and batch are also available:
# Invoke
await model.ainvoke(messages)
# Stream
async for chunk in (await model.astream(messages))
# Batch
await model.abatch([messages])
Results in an AIMessage response:
AIMessage(
content="J'adore la programmation.",
response_metadata={
"token_usage": {
"completion_tokens": 5,
"prompt_tokens": 31,
"total_tokens": 36,
},
"model_name": "gpt-4o",
"system_fingerprint": "fp_43dfabdef1",
"finish_reason": "stop",
"logprobs": None,
},
id="run-012cffe2-5d3d-424d-83b5-51c6d4a593d1-0",
usage_metadata={
"input_tokens": 31,
"output_tokens": 5,
"total_tokens": 36,
},
)
For batched calls, results in a list[AIMessage].
from pydantic import BaseModel, Field
class GetWeather(BaseModel):
'''Get the current weather in a given location'''
location: str = Field(
..., description="The city and state, e.g. San Francisco, CA"
)
class GetPopulation(BaseModel):
'''Get the current population in a given location'''
location: str = Field(
..., description="The city and state, e.g. San Francisco, CA"
)
model_with_tools = model.bind_tools(
[GetWeather, GetPopulation]
# strict = True # Enforce tool args schema is respected
)
ai_msg = model_with_tools.invoke(
"Which city is hotter today and which is bigger: LA or NY?"
)
ai_msg.tool_calls
[
{
"name": "GetWeather",
"args": {"location": "Los Angeles, CA"},
"id": "call_6XswGD5Pqk8Tt5atYr7tfenU",
},
{
"name": "GetWeather",
"args": {"location": "New York, NY"},
"id": "call_ZVL15vA8Y7kXqOy3dtmQgeCi",
},
{
"name": "GetPopulation",
"args": {"location": "Los Angeles, CA"},
"id": "call_49CFW8zqC9W7mh7hbMLSIrXw",
},
{
"name": "GetPopulation",
"args": {"location": "New York, NY"},
"id": "call_6ghfKxV264jEfe1mRIkS3PE7",
},
]
openai >= 1.32 supports a
parallel_tool_calls parameter that defaults to True. This parameter can
be set to False to disable parallel tool calls:
ai_msg = model_with_tools.invoke(
"What is the weather in LA and NY?", parallel_tool_calls=False
)
ai_msg.tool_calls
[
{
"name": "GetWeather",
"args": {"location": "Los Angeles, CA"},
"id": "call_4OoY0ZR99iEvC7fevsH8Uhtz",
}
]Like other runtime parameters, parallel_tool_calls can be bound to a model
using model.bind(parallel_tool_calls=False) or during instantiation by
setting model_kwargs.
See bind_tools for more.
You can access built-in tools supported by the OpenAI Responses API. See LangChain docs for more detail.
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="...", output_version="responses/v1")
tool = {"type": "web_search"}
model_with_tools = model.bind_tools([tool])
response = model_with_tools.invoke("What was a positive news story from today?")
response.content
[
{
"type": "text",
"text": "Today, a heartwarming story emerged from ...",
"annotations": [
{
"end_index": 778,
"start_index": 682,
"title": "Title of story",
"type": "url_citation",
"url": "<url of story>",
}
],
}
]
langchain-openai 0.3.26: Updated AIMessage formatlangchain-openai >= 0.3.26
allows users to opt-in to an updated AIMessage format when using the
Responses API. Setting ChatOpenAI(..., output_version="responses/v1") will
format output from reasoning summaries, built-in tool invocations, and other
response items into the message's content field, rather than
additional_kwargs. We recommend this format for new applications.
OpenAI's Responses API supports management of conversation state. Passing in response IDs from previous messages will continue a conversational thread.
from langchain_openai import ChatOpenAI
model = ChatOpenAI(
model="...",
use_responses_api=True,
output_version="responses/v1",
)
response = model.invoke("Hi, I'm Bob.")
response.text
"Hi Bob! How can I assist you today?"
second_response = model.invoke(
"What is my name?",
previous_response_id=response.response_metadata["id"],
)
second_response.text
"Your name is Bob. How can I help you today, Bob?"
langchain-openai 0.3.26You can also initialize ChatOpenAI with use_previous_response_id.
Input messages up to the most recent response will then be dropped from request
payloads, and previous_response_id will be set using the ID of the most
recent response.
model = ChatOpenAI(model="...", use_previous_response_id=True)Some OpenAI-compatible providers/proxies may not support forwarding
reasoning blocks in request history. If you see request-format
errors while using reasoning + Responses API, prefer
use_previous_response_id=True (so the server keeps
conversation state).
OpenAI's Responses API supports reasoning models that expose a summary of internal reasoning processes.
from langchain_openai import ChatOpenAI
reasoning = {
"effort": "medium", # 'low', 'medium', or 'high'
"summary": "auto", # 'detailed', 'auto', or None
}
model = ChatOpenAI(
model="...", reasoning=reasoning, output_version="responses/v1"
)
response = model.invoke("What is 3^3?")
# Response text
print(f"Output: {response.text}")
# Reasoning summaries
for block in response.content:
if block["type"] == "reasoning":
for summary in block["summary"]:
print(summary["text"])
Output: 3³ = 27
Reasoning: The user wants to know...
langchain-openai 0.3.26: Updated AIMessage formatlangchain-openai >= 0.3.26
allows users to opt-in to an updated AIMessage format when using the
Responses API. Setting ChatOpenAI(..., output_version="responses/v1") will
format output from reasoning summaries, built-in tool invocations, and other
response items into the message's content field, rather than
additional_kwargs. We recommend this format for new applications.
When using a non-OpenAI endpoint via base_url, request handling for
reasoning history can differ. If agent loops fail after tool calls, use:
ChatOpenAI(..., use_responses_api=True, use_previous_response_id=True).
from pydantic import BaseModel, Field
class Joke(BaseModel):
'''Joke to tell user.'''
setup: str = Field(description="The setup of the joke")
punchline: str = Field(description="The punchline to the joke")
rating: int | None = Field(
description="How funny the joke is, from 1 to 10"
)
structured_model = model.with_structured_output(Joke)
structured_model.invoke("Tell me a joke about cats")
Joke(
setup="Why was the cat sitting on the computer?",
punchline="To keep an eye on the mouse!",
rating=None,
)
See with_structured_output for more info.
json_model = model.bind(response_format={"type": "json_object"})
ai_msg = json_model.invoke(
"Return a JSON object with key 'random_ints' and a value of 10 random ints in [0-99]"
)
ai_msg.content
'\\n{\\n "random_ints": [23, 87, 45, 12, 78, 34, 56, 90, 11, 67]\\n}'import base64
import httpx
from langchain.messages import HumanMessage
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
]
)
ai_msg = model.invoke([message])
ai_msg.content
"The weather in the image appears to be clear and pleasant. The sky is mostly blue with scattered, light clouds, suggesting a sunny day with minimal cloud cover. There is no indication of rain or strong winds, and the overall scene looks bright and calm. The lush green grass and clear visibility further indicate good weather conditions."ai_msg = model.invoke(messages)
ai_msg.usage_metadata
```txt
{"input_tokens": 28, "output_tokens": 5, "total_tokens": 33}
When streaming, set the stream_usage kwarg:
stream = model.stream(messages, stream_usage=True)
full = next(stream)
for chunk in stream:
full += chunk
full.usage_metadata
{"input_tokens": 28, "output_tokens": 5, "total_tokens": 33}logprobs_model = model.bind(logprobs=True)
ai_msg = logprobs_model.invoke(messages)
ai_msg.response_metadata["logprobs"]
{
"content": [
{
"token": "J",
"bytes": [74],
"logprob": -4.9617593e-06,
"top_logprobs": [],
},
{
"token": "'adore",
"bytes": [39, 97, 100, 111, 114, 101],
"logprob": -0.25202933,
"top_logprobs": [],
},
{
"token": " la",
"bytes": [32, 108, 97],
"logprob": -0.20141791,
"top_logprobs": [],
},
{
"token": " programmation",
"bytes": [
32,
112,
114,
111,
103,
114,
97,
109,
109,
97,
116,
105,
111,
110,
],
"logprob": -1.9361265e-07,
"top_logprobs": [],
},
{
"token": ".",
"bytes": [46],
"logprob": -1.2233183e-05,
"top_logprobs": [],
},
]
}ai_msg = model.invoke(messages)
ai_msg.response_metadata
{
"token_usage": {
"completion_tokens": 5,
"prompt_tokens": 28,
"total_tokens": 33,
},
"model_name": "gpt-4o",
"system_fingerprint": "fp_319be4768e",
"finish_reason": "stop",
"logprobs": None,
}OpenAI offers a variety of service tiers. The "flex" tier offers cheaper pricing for requests, with the trade-off that responses may take longer and resources might not always be available. This approach is best suited for non-critical tasks, including model testing, data enhancement, or jobs that can be run asynchronously.
To use it, initialize the model with service_tier="flex":
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="...", service_tier="flex")
Note that this is a beta feature that is only available for a subset of models. See OpenAI flex processing docs for more detail.
ChatOpenAI can be used with OpenAI-compatible APIs like
LM Studio, vLLM,
Ollama, and others.
To use custom parameters specific to these providers, use the extra_body parameter.
from langchain_openai import ChatOpenAI
model = ChatOpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio", # Can be any string
model="mlx-community/QwQ-32B-4bit",
temperature=0,
extra_body={
"ttl": 300
}, # Auto-evict model after 5 minutes of inactivity
)model = ChatOpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
model="meta-llama/Llama-2-7b-chat-hf",
extra_body={"use_beam_search": True, "best_of": 4},
)model_kwargs vs extra_bodyUse the correct parameter for different types of API arguments:
Use model_kwargs for:
max_completion_tokens, stream_options, modalities, audio# Standard OpenAI parameters
model = ChatOpenAI(
model="...",
model_kwargs={
"stream_options": {"include_usage": True},
"max_completion_tokens": 300,
"modalities": ["text", "audio"],
"audio": {"voice": "alloy", "format": "wav"},
},
)
Use extra_body for:
extra_body in the request# Custom provider parameters
model = ChatOpenAI(
base_url="http://localhost:8000/v1",
model="custom-model",
extra_body={
"use_beam_search": True, # vLLM parameter
"best_of": 4, # vLLM parameter
"ttl": 300, # LM Studio parameter
},
)
Key Differences:
model_kwargs: Parameters are merged into top-level request payloadextra_body: Parameters are nested under extra_body key in requestAlways use extra_body for custom parameters, not model_kwargs.
Using model_kwargs for non-OpenAI parameters will cause API errors.
For high-volume applications with repetitive prompts, use prompt_cache_key
per-invocation to improve cache hit rates and reduce costs:
model = ChatOpenAI(model="...")
response = model.invoke(
messages,
prompt_cache_key="example-key-a", # Routes to same machine for cache hits
)
customer_response = model.invoke(messages, prompt_cache_key="example-key-b")
support_response = model.invoke(messages, prompt_cache_key="example-key-c")
# Dynamic cache keys based on context
cache_key = f"example-key-{dynamic_suffix}"
response = model.invoke(messages, prompt_cache_key=cache_key)
Cache keys help ensure requests with the same prompt prefix are routed to machines with existing cache, providing cost reduction and latency improvement on cached tokens.
Maximum number of tokens to generate.
Mapping of secret environment variables.
Get the attributes of the langchain object.
Get the namespace of the LangChain object.
Return whether this model can be serialized by LangChain.
Model wrapper that returns outputs formatted to match the given schema.