OpenAI chat wrapper.
ChatOpenAI targets
official OpenAI API specifications
only. Non-standard response fields added by third-party providers (e.g.,
reasoning_content, reasoning_details) are not extracted or
preserved. If you are pointing base_url at a provider such as
OpenRouter, vLLM, or DeepSeek, use the corresponding provider-specific
LangChain package instead (e.g., ChatDeepSeek, ChatOpenRouter).
BadRequestError raised when input exceeds OpenAI's context limit.
APIError raised when input exceeds OpenAI's context limit.
Base wrapper around OpenAI large language models for chat.
This base class targets
official OpenAI API specifications
only. Non-standard response fields added by third-party providers (e.g.,
reasoning_content) are not extracted. Use a provider-specific subclass for
full provider support.
Interface to OpenAI chat model APIs.
ChatOpenAI targets
official OpenAI API specifications
only. Non-standard response fields added by third-party providers (e.g.,
reasoning_content, reasoning_details) are not extracted or
preserved. If you are pointing base_url at a provider such as
OpenRouter, vLLM, or DeepSeek, use the corresponding provider-specific
LangChain package instead (e.g., ChatDeepSeek, ChatOpenRouter).
Install langchain-openai and set environment variable OPENAI_API_KEY.
pip install -U langchain-openai
# or using uv
uv add langchain-openai
export OPENAI_API_KEY="your-api-key"| Param | Type | Description |
|---|---|---|
model |
str |
Name of OpenAI model to use. |
temperature |
float |
Sampling temperature. |
max_tokens |
`int | None` |
logprobs |
`bool | None` |
stream_options |
dict |
Configure streaming outputs, like whether to return token usage when streaming ({"include_usage": True}). |
use_responses_api |
`bool | None` |
See full list of supported init args and their descriptions below.
| Param | Type | Description |
|---|---|---|
timeout |
`float | Tuple[float, float] |
max_retries |
`int | None` |
api_key |
`str | None` |
base_url |
`str | None` |
organization |
`str | None` |
See full list of supported init args and their descriptions below.
Create a model instance with desired params. For example:
from langchain_openai import ChatOpenAI
model = ChatOpenAI(
model="...",
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
# api_key="...",
# base_url="...",
# organization="...",
# other params...
)
See all available params below.
Any param which is not explicitly supported will be passed directly to
openai.OpenAI.chat.completions.create(...)
every time to the model is invoked. For example:
from langchain_openai import ChatOpenAI
import openai
ChatOpenAI(..., frequency_penalty=0.2).invoke(...)
# Results in underlying API call of:
openai.OpenAI(..).chat.completions.create(..., frequency_penalty=0.2)
# Which is also equivalent to:
ChatOpenAI(...).invoke(..., frequency_penalty=0.2)Generate a response from the model:
messages = [
(
"system",
"You are a helpful translator. Translate the user sentence to French.",
),
("human", "I love programming."),
]
model.invoke(messages)
Results in an AIMessage response:
AIMessage(
content="J'adore la programmation.",
response_metadata={
"token_usage": {
"completion_tokens": 5,
"prompt_tokens": 31,
"total_tokens": 36,
},
"model_name": "gpt-4o",
"system_fingerprint": "fp_43dfabdef1",
"finish_reason": "stop",
"logprobs": None,
},
id="run-012cffe2-5d3d-424d-83b5-51c6d4a593d1-0",
usage_metadata={"input_tokens": 31, "output_tokens": 5, "total_tokens": 36},
)Stream a response from the model:
for chunk in model.stream(messages):
print(chunk.text, end="")
Results in a sequence of AIMessageChunk objects with partial content:
AIMessageChunk(content="", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(content="J", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(content="'adore", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(content=" la", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(
content=" programmation", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0"
)
AIMessageChunk(content=".", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(
content="",
response_metadata={"finish_reason": "stop"},
id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0",
)
To collect the full message, you can concatenate the chunks:
stream = model.stream(messages)
full = next(stream)
for chunk in stream:
full += chunk
full = AIMessageChunk(
content="J'adore la programmation.",
response_metadata={"finish_reason": "stop"},
id="run-bf917526-7f58-4683-84f7-36a6b671d140",
)Asynchronous equivalents of invoke, stream, and batch are also available:
# Invoke
await model.ainvoke(messages)
# Stream
async for chunk in (await model.astream(messages))
# Batch
await model.abatch([messages])
Results in an AIMessage response:
AIMessage(
content="J'adore la programmation.",
response_metadata={
"token_usage": {
"completion_tokens": 5,
"prompt_tokens": 31,
"total_tokens": 36,
},
"model_name": "gpt-4o",
"system_fingerprint": "fp_43dfabdef1",
"finish_reason": "stop",
"logprobs": None,
},
id="run-012cffe2-5d3d-424d-83b5-51c6d4a593d1-0",
usage_metadata={
"input_tokens": 31,
"output_tokens": 5,
"total_tokens": 36,
},
)
For batched calls, results in a list[AIMessage].
from pydantic import BaseModel, Field
class GetWeather(BaseModel):
'''Get the current weather in a given location'''
location: str = Field(
..., description="The city and state, e.g. San Francisco, CA"
)
class GetPopulation(BaseModel):
'''Get the current population in a given location'''
location: str = Field(
..., description="The city and state, e.g. San Francisco, CA"
)
model_with_tools = model.bind_tools(
[GetWeather, GetPopulation]
# strict = True # Enforce tool args schema is respected
)
ai_msg = model_with_tools.invoke(
"Which city is hotter today and which is bigger: LA or NY?"
)
ai_msg.tool_calls
[
{
"name": "GetWeather",
"args": {"location": "Los Angeles, CA"},
"id": "call_6XswGD5Pqk8Tt5atYr7tfenU",
},
{
"name": "GetWeather",
"args": {"location": "New York, NY"},
"id": "call_ZVL15vA8Y7kXqOy3dtmQgeCi",
},
{
"name": "GetPopulation",
"args": {"location": "Los Angeles, CA"},
"id": "call_49CFW8zqC9W7mh7hbMLSIrXw",
},
{
"name": "GetPopulation",
"args": {"location": "New York, NY"},
"id": "call_6ghfKxV264jEfe1mRIkS3PE7",
},
]
openai >= 1.32 supports a
parallel_tool_calls parameter that defaults to True. This parameter can
be set to False to disable parallel tool calls:
ai_msg = model_with_tools.invoke(
"What is the weather in LA and NY?", parallel_tool_calls=False
)
ai_msg.tool_calls
[
{
"name": "GetWeather",
"args": {"location": "Los Angeles, CA"},
"id": "call_4OoY0ZR99iEvC7fevsH8Uhtz",
}
]Like other runtime parameters, parallel_tool_calls can be bound to a model
using model.bind(parallel_tool_calls=False) or during instantiation by
setting model_kwargs.
See bind_tools for more.
You can access built-in tools supported by the OpenAI Responses API. See LangChain docs for more detail.
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="...", output_version="responses/v1")
tool = {"type": "web_search"}
model_with_tools = model.bind_tools([tool])
response = model_with_tools.invoke("What was a positive news story from today?")
response.content
[
{
"type": "text",
"text": "Today, a heartwarming story emerged from ...",
"annotations": [
{
"end_index": 778,
"start_index": 682,
"title": "Title of story",
"type": "url_citation",
"url": "<url of story>",
}
],
}
]
langchain-openai 0.3.26: Updated AIMessage formatlangchain-openai >= 0.3.26
allows users to opt-in to an updated AIMessage format when using the
Responses API. Setting ChatOpenAI(..., output_version="responses/v1") will
format output from reasoning summaries, built-in tool invocations, and other
response items into the message's content field, rather than
additional_kwargs. We recommend this format for new applications.
OpenAI's Responses API supports management of conversation state. Passing in response IDs from previous messages will continue a conversational thread.
from langchain_openai import ChatOpenAI
model = ChatOpenAI(
model="...",
use_responses_api=True,
output_version="responses/v1",
)
response = model.invoke("Hi, I'm Bob.")
response.text
"Hi Bob! How can I assist you today?"
second_response = model.invoke(
"What is my name?",
previous_response_id=response.response_metadata["id"],
)
second_response.text
"Your name is Bob. How can I help you today, Bob?"
langchain-openai 0.3.26You can also initialize ChatOpenAI with use_previous_response_id.
Input messages up to the most recent response will then be dropped from request
payloads, and previous_response_id will be set using the ID of the most
recent response.
model = ChatOpenAI(model="...", use_previous_response_id=True)Some OpenAI-compatible providers/proxies may not support forwarding
reasoning blocks in request history. If you see request-format
errors while using reasoning + Responses API, prefer
use_previous_response_id=True (so the server keeps
conversation state).
OpenAI's Responses API supports reasoning models that expose a summary of internal reasoning processes.
from langchain_openai import ChatOpenAI
reasoning = {
"effort": "medium", # 'low', 'medium', or 'high'
"summary": "auto", # 'detailed', 'auto', or None
}
model = ChatOpenAI(
model="...", reasoning=reasoning, output_version="responses/v1"
)
response = model.invoke("What is 3^3?")
# Response text
print(f"Output: {response.text}")
# Reasoning summaries
for block in response.content:
if block["type"] == "reasoning":
for summary in block["summary"]:
print(summary["text"])
Output: 3³ = 27
Reasoning: The user wants to know...
langchain-openai 0.3.26: Updated AIMessage formatlangchain-openai >= 0.3.26
allows users to opt-in to an updated AIMessage format when using the
Responses API. Setting ChatOpenAI(..., output_version="responses/v1") will
format output from reasoning summaries, built-in tool invocations, and other
response items into the message's content field, rather than
additional_kwargs. We recommend this format for new applications.
When using a non-OpenAI endpoint via base_url, request handling for
reasoning history can differ. If agent loops fail after tool calls, use:
ChatOpenAI(..., use_responses_api=True, use_previous_response_id=True).
from pydantic import BaseModel, Field
class Joke(BaseModel):
'''Joke to tell user.'''
setup: str = Field(description="The setup of the joke")
punchline: str = Field(description="The punchline to the joke")
rating: int | None = Field(
description="How funny the joke is, from 1 to 10"
)
structured_model = model.with_structured_output(Joke)
structured_model.invoke("Tell me a joke about cats")
Joke(
setup="Why was the cat sitting on the computer?",
punchline="To keep an eye on the mouse!",
rating=None,
)
See with_structured_output for more info.
json_model = model.bind(response_format={"type": "json_object"})
ai_msg = json_model.invoke(
"Return a JSON object with key 'random_ints' and a value of 10 random ints in [0-99]"
)
ai_msg.content
'\\n{\\n "random_ints": [23, 87, 45, 12, 78, 34, 56, 90, 11, 67]\\n}'import base64
import httpx
from langchain.messages import HumanMessage
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
]
)
ai_msg = model.invoke([message])
ai_msg.content
"The weather in the image appears to be clear and pleasant. The sky is mostly blue with scattered, light clouds, suggesting a sunny day with minimal cloud cover. There is no indication of rain or strong winds, and the overall scene looks bright and calm. The lush green grass and clear visibility further indicate good weather conditions."ai_msg = model.invoke(messages)
ai_msg.usage_metadata
```txt
{"input_tokens": 28, "output_tokens": 5, "total_tokens": 33}
When streaming, set the stream_usage kwarg:
stream = model.stream(messages, stream_usage=True)
full = next(stream)
for chunk in stream:
full += chunk
full.usage_metadata
{"input_tokens": 28, "output_tokens": 5, "total_tokens": 33}logprobs_model = model.bind(logprobs=True)
ai_msg = logprobs_model.invoke(messages)
ai_msg.response_metadata["logprobs"]
{
"content": [
{
"token": "J",
"bytes": [74],
"logprob": -4.9617593e-06,
"top_logprobs": [],
},
{
"token": "'adore",
"bytes": [39, 97, 100, 111, 114, 101],
"logprob": -0.25202933,
"top_logprobs": [],
},
{
"token": " la",
"bytes": [32, 108, 97],
"logprob": -0.20141791,
"top_logprobs": [],
},
{
"token": " programmation",
"bytes": [
32,
112,
114,
111,
103,
114,
97,
109,
109,
97,
116,
105,
111,
110,
],
"logprob": -1.9361265e-07,
"top_logprobs": [],
},
{
"token": ".",
"bytes": [46],
"logprob": -1.2233183e-05,
"top_logprobs": [],
},
]
}ai_msg = model.invoke(messages)
ai_msg.response_metadata
{
"token_usage": {
"completion_tokens": 5,
"prompt_tokens": 28,
"total_tokens": 33,
},
"model_name": "gpt-4o",
"system_fingerprint": "fp_319be4768e",
"finish_reason": "stop",
"logprobs": None,
}OpenAI offers a variety of service tiers. The "flex" tier offers cheaper pricing for requests, with the trade-off that responses may take longer and resources might not always be available. This approach is best suited for non-critical tasks, including model testing, data enhancement, or jobs that can be run asynchronously.
To use it, initialize the model with service_tier="flex":
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="...", service_tier="flex")
Note that this is a beta feature that is only available for a subset of models. See OpenAI flex processing docs for more detail.
ChatOpenAI can be used with OpenAI-compatible APIs like
LM Studio, vLLM,
Ollama, and others.
To use custom parameters specific to these providers, use the extra_body parameter.
from langchain_openai import ChatOpenAI
model = ChatOpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio", # Can be any string
model="mlx-community/QwQ-32B-4bit",
temperature=0,
extra_body={
"ttl": 300
}, # Auto-evict model after 5 minutes of inactivity
)model = ChatOpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
model="meta-llama/Llama-2-7b-chat-hf",
extra_body={"use_beam_search": True, "best_of": 4},
)model_kwargs vs extra_bodyUse the correct parameter for different types of API arguments:
Use model_kwargs for:
max_completion_tokens, stream_options, modalities, audio# Standard OpenAI parameters
model = ChatOpenAI(
model="...",
model_kwargs={
"stream_options": {"include_usage": True},
"max_completion_tokens": 300,
"modalities": ["text", "audio"],
"audio": {"voice": "alloy", "format": "wav"},
},
)
Use extra_body for:
extra_body in the request# Custom provider parameters
model = ChatOpenAI(
base_url="http://localhost:8000/v1",
model="custom-model",
extra_body={
"use_beam_search": True, # vLLM parameter
"best_of": 4, # vLLM parameter
"ttl": 300, # LM Studio parameter
},
)
Key Differences:
model_kwargs: Parameters are merged into top-level request payloadextra_body: Parameters are nested under extra_body key in requestAlways use extra_body for custom parameters, not model_kwargs.
Using model_kwargs for non-OpenAI parameters will cause API errors.
For high-volume applications with repetitive prompts, use prompt_cache_key
per-invocation to improve cache hit rates and reduce costs:
model = ChatOpenAI(model="...")
response = model.invoke(
messages,
prompt_cache_key="example-key-a", # Routes to same machine for cache hits
)
customer_response = model.invoke(messages, prompt_cache_key="example-key-b")
support_response = model.invoke(messages, prompt_cache_key="example-key-c")
# Dynamic cache keys based on context
cache_key = f"example-key-{dynamic_suffix}"
response = model.invoke(messages, prompt_cache_key=cache_key)
Cache keys help ensure requests with the same prompt prefix are routed to machines with existing cache, providing cost reduction and latency improvement on cached tokens.
Error raised when OpenAI Structured Outputs API returns a refusal.
When using OpenAI's Structured Outputs API with user-generated input, the model may occasionally refuse to fulfill the request for safety reasons.
See more on refusals.