ChatOpenAI

Interface to OpenAI chat model APIs.

API scope

ChatOpenAI targets official OpenAI API specifications only. Non-standard response fields added by third-party providers (e.g., reasoning_content, reasoning_details) are not extracted or preserved. If you are pointing base_url at a provider such as OpenRouter, vLLM, or DeepSeek, use the corresponding provider-specific LangChain package instead (e.g., ChatDeepSeek, ChatOpenRouter).

Setup

Install langchain-openai and set environment variable OPENAI_API_KEY.

pip install -U langchain-openai

# or using uv
uv add langchain-openai

export OPENAI_API_KEY="your-api-key"

Key init args — completion params

Param	Type	Description
`model`	`str`	Name of OpenAI model to use.
`temperature`	`float`	Sampling temperature.
`max_tokens`	`int	None`
`logprobs`	`bool	None`
`stream_options`	`dict`	Configure streaming outputs, like whether to return token usage when streaming (`{"include_usage": True}`).
`use_responses_api`	`bool	None`

See full list of supported init args and their descriptions below.

Key init args — client params

Param	Type	Description
`timeout`	`float	Tuple[float, float]
`max_retries`	`int	None`
`api_key`	`str	None`
`base_url`	`str	None`
`organization`	`str	None`

See full list of supported init args and their descriptions below.

Instantiate

Create a model instance with desired params. For example:

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="...",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # api_key="...",
    # base_url="...",
    # organization="...",
    # other params...
)

See all available params below.

Preserved params

Any param which is not explicitly supported will be passed directly to openai.OpenAI.chat.completions.create(...) every time to the model is invoked. For example:

from langchain_openai import ChatOpenAI
import openai

ChatOpenAI(..., frequency_penalty=0.2).invoke(...)

# Results in underlying API call of:

openai.OpenAI(..).chat.completions.create(..., frequency_penalty=0.2)

# Which is also equivalent to:

ChatOpenAI(...).invoke(..., frequency_penalty=0.2)

Invoke

Generate a response from the model:

messages = [
    (
        "system",
        "You are a helpful translator. Translate the user sentence to French.",
    ),
    ("human", "I love programming."),
]
model.invoke(messages)

Results in an AIMessage response:

AIMessage(
    content="J'adore la programmation.",
    response_metadata={
        "token_usage": {
            "completion_tokens": 5,
            "prompt_tokens": 31,
            "total_tokens": 36,
        },
        "model_name": "gpt-4o",
        "system_fingerprint": "fp_43dfabdef1",
        "finish_reason": "stop",
        "logprobs": None,
    },
    id="run-012cffe2-5d3d-424d-83b5-51c6d4a593d1-0",
    usage_metadata={"input_tokens": 31, "output_tokens": 5, "total_tokens": 36},
)

Stream

Stream a response from the model:

for chunk in model.stream(messages):
    print(chunk.text, end="")

Results in a sequence of AIMessageChunk objects with partial content:

AIMessageChunk(content="", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(content="J", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(content="'adore", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(content=" la", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(
    content=" programmation", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0"
)
AIMessageChunk(content=".", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
AIMessageChunk(
    content="",
    response_metadata={"finish_reason": "stop"},
    id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0",
)

To collect the full message, you can concatenate the chunks:

stream = model.stream(messages)
full = next(stream)
for chunk in stream:
    full += chunk

full = AIMessageChunk(
    content="J'adore la programmation.",
    response_metadata={"finish_reason": "stop"},
    id="run-bf917526-7f58-4683-84f7-36a6b671d140",
)

Async

Asynchronous equivalents of invoke, stream, and batch are also available:

# Invoke
await model.ainvoke(messages)

# Stream
async for chunk in (await model.astream(messages))

# Batch
await model.abatch([messages])

Results in an AIMessage response:

AIMessage(
    content="J'adore la programmation.",
    response_metadata={
        "token_usage": {
            "completion_tokens": 5,
            "prompt_tokens": 31,
            "total_tokens": 36,
        },
        "model_name": "gpt-4o",
        "system_fingerprint": "fp_43dfabdef1",
        "finish_reason": "stop",
        "logprobs": None,
    },
    id="run-012cffe2-5d3d-424d-83b5-51c6d4a593d1-0",
    usage_metadata={
        "input_tokens": 31,
        "output_tokens": 5,
        "total_tokens": 36,
    },
)

For batched calls, results in a list[AIMessage].

Tool calling

from pydantic import BaseModel, Field

class GetWeather(BaseModel):
    '''Get the current weather in a given location'''

    location: str = Field(
        ..., description="The city and state, e.g. San Francisco, CA"
    )

class GetPopulation(BaseModel):
    '''Get the current population in a given location'''

    location: str = Field(
        ..., description="The city and state, e.g. San Francisco, CA"
    )

model_with_tools = model.bind_tools(
    [GetWeather, GetPopulation]
    # strict = True  # Enforce tool args schema is respected
)
ai_msg = model_with_tools.invoke(
    "Which city is hotter today and which is bigger: LA or NY?"
)
ai_msg.tool_calls

[
    {
        "name": "GetWeather",
        "args": {"location": "Los Angeles, CA"},
        "id": "call_6XswGD5Pqk8Tt5atYr7tfenU",
    },
    {
        "name": "GetWeather",
        "args": {"location": "New York, NY"},
        "id": "call_ZVL15vA8Y7kXqOy3dtmQgeCi",
    },
    {
        "name": "GetPopulation",
        "args": {"location": "Los Angeles, CA"},
        "id": "call_49CFW8zqC9W7mh7hbMLSIrXw",
    },
    {
        "name": "GetPopulation",
        "args": {"location": "New York, NY"},
        "id": "call_6ghfKxV264jEfe1mRIkS3PE7",
    },
]

Parallel tool calls

openai >= 1.32 supports a parallel_tool_calls parameter that defaults to True. This parameter can be set to False to disable parallel tool calls:

ai_msg = model_with_tools.invoke(
    "What is the weather in LA and NY?", parallel_tool_calls=False
)
ai_msg.tool_calls

[
    {
        "name": "GetWeather",
        "args": {"location": "Los Angeles, CA"},
        "id": "call_4OoY0ZR99iEvC7fevsH8Uhtz",
    }
]

Like other runtime parameters, parallel_tool_calls can be bound to a model using model.bind(parallel_tool_calls=False) or during instantiation by setting model_kwargs.

See bind_tools for more.

Built-in (server-side) tools

You can access built-in tools supported by the OpenAI Responses API. See LangChain docs for more detail.

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="...", output_version="responses/v1")

tool = {"type": "web_search"}
model_with_tools = model.bind_tools([tool])

response = model_with_tools.invoke("What was a positive news story from today?")
response.content

[
    {
        "type": "text",
        "text": "Today, a heartwarming story emerged from ...",
        "annotations": [
            {
                "end_index": 778,
                "start_index": 682,
                "title": "Title of story",
                "type": "url_citation",
                "url": "<url of story>",
            }
        ],
    }
]

Added in langchain-openai 0.3.26: Updated AIMessage format

langchain-openai >= 0.3.26 allows users to opt-in to an updated AIMessage format when using the Responses API. Setting ChatOpenAI(..., output_version="responses/v1") will format output from reasoning summaries, built-in tool invocations, and other response items into the message's content field, rather than additional_kwargs. We recommend this format for new applications.

Managing conversation state

OpenAI's Responses API supports management of conversation state. Passing in response IDs from previous messages will continue a conversational thread.

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="...",
    use_responses_api=True,
    output_version="responses/v1",
)
response = model.invoke("Hi, I'm Bob.")
response.text

"Hi Bob! How can I assist you today?"

second_response = model.invoke(
    "What is my name?",
    previous_response_id=response.response_metadata["id"],
)
second_response.text

"Your name is Bob. How can I help you today, Bob?"

Added in langchain-openai 0.3.26

You can also initialize ChatOpenAI with use_previous_response_id. Input messages up to the most recent response will then be dropped from request payloads, and previous_response_id will be set using the ID of the most recent response.

model = ChatOpenAI(model="...", use_previous_response_id=True)

OpenAI-compatible endpoints

Some OpenAI-compatible providers/proxies may not support forwarding reasoning blocks in request history. If you see request-format errors while using reasoning + Responses API, prefer use_previous_response_id=True (so the server keeps conversation state).

Reasoning output

OpenAI's Responses API supports reasoning models that expose a summary of internal reasoning processes.

from langchain_openai import ChatOpenAI

reasoning = {
    "effort": "medium",  # 'low', 'medium', or 'high'
    "summary": "auto",  # 'detailed', 'auto', or None
}

model = ChatOpenAI(
    model="...", reasoning=reasoning, output_version="responses/v1"
)
response = model.invoke("What is 3^3?")

# Response text
print(f"Output: {response.text}")

# Reasoning summaries
for block in response.content:
    if block["type"] == "reasoning":
        for summary in block["summary"]:
            print(summary["text"])

Output: 3³ = 27
Reasoning: The user wants to know...

Added in langchain-openai 0.3.26: Updated AIMessage format

Troubleshooting with non-OpenAI backends

When using a non-OpenAI endpoint via base_url, request handling for reasoning history can differ. If agent loops fail after tool calls, use: ChatOpenAI(..., use_responses_api=True, use_previous_response_id=True).

Structured output

from pydantic import BaseModel, Field

class Joke(BaseModel):
    '''Joke to tell user.'''

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: int | None = Field(
        description="How funny the joke is, from 1 to 10"
    )

structured_model = model.with_structured_output(Joke)
structured_model.invoke("Tell me a joke about cats")

Joke(
    setup="Why was the cat sitting on the computer?",
    punchline="To keep an eye on the mouse!",
    rating=None,
)

See with_structured_output for more info.

JSON mode

json_model = model.bind(response_format={"type": "json_object"})
ai_msg = json_model.invoke(
    "Return a JSON object with key 'random_ints' and a value of 10 random ints in [0-99]"
)
ai_msg.content

'\\n{\\n  "random_ints": [23, 87, 45, 12, 78, 34, 56, 90, 11, 67]\\n}'

Image input

import base64
import httpx
from langchain.messages import HumanMessage

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ]
)

ai_msg = model.invoke([message])
ai_msg.content

"The weather in the image appears to be clear and pleasant. The sky is mostly blue with scattered, light clouds, suggesting a sunny day with minimal cloud cover. There is no indication of rain or strong winds, and the overall scene looks bright and calm. The lush green grass and clear visibility further indicate good weather conditions."

Token usage

ai_msg = model.invoke(messages)
ai_msg.usage_metadata

```txt
{"input_tokens": 28, "output_tokens": 5, "total_tokens": 33}

When streaming, set the stream_usage kwarg:

stream = model.stream(messages, stream_usage=True)
full = next(stream)
for chunk in stream:
    full += chunk
full.usage_metadata

{"input_tokens": 28, "output_tokens": 5, "total_tokens": 33}

Logprobs

logprobs_model = model.bind(logprobs=True)
ai_msg = logprobs_model.invoke(messages)
ai_msg.response_metadata["logprobs"]

{
    "content": [
        {
            "token": "J",
            "bytes": [74],
            "logprob": -4.9617593e-06,
            "top_logprobs": [],
        },
        {
            "token": "'adore",
            "bytes": [39, 97, 100, 111, 114, 101],
            "logprob": -0.25202933,
            "top_logprobs": [],
        },
        {
            "token": " la",
            "bytes": [32, 108, 97],
            "logprob": -0.20141791,
            "top_logprobs": [],
        },
        {
            "token": " programmation",
            "bytes": [
                32,
                112,
                114,
                111,
                103,
                114,
                97,
                109,
                109,
                97,
                116,
                105,
                111,
                110,
            ],
            "logprob": -1.9361265e-07,
            "top_logprobs": [],
        },
        {
            "token": ".",
            "bytes": [46],
            "logprob": -1.2233183e-05,
            "top_logprobs": [],
        },
    ]
}

Response metadata

ai_msg = model.invoke(messages)
ai_msg.response_metadata

{
    "token_usage": {
        "completion_tokens": 5,
        "prompt_tokens": 28,
        "total_tokens": 33,
    },
    "model_name": "gpt-4o",
    "system_fingerprint": "fp_319be4768e",
    "finish_reason": "stop",
    "logprobs": None,
}

Flex processing

OpenAI offers a variety of service tiers. The "flex" tier offers cheaper pricing for requests, with the trade-off that responses may take longer and resources might not always be available. This approach is best suited for non-critical tasks, including model testing, data enhancement, or jobs that can be run asynchronously.

To use it, initialize the model with service_tier="flex":

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="...", service_tier="flex")

Note that this is a beta feature that is only available for a subset of models. See OpenAI flex processing docs for more detail.

OpenAI-compatible APIs

ChatOpenAI can be used with OpenAI-compatible APIs like LM Studio, vLLM, Ollama, and others.

To use custom parameters specific to these providers, use the extra_body parameter.

LM Studio example with TTL (auto-eviction)

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio",  # Can be any string
    model="mlx-community/QwQ-32B-4bit",
    temperature=0,
    extra_body={
        "ttl": 300
    },  # Auto-evict model after 5 minutes of inactivity
)

vLLM example with custom parameters

model = ChatOpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",
    model="meta-llama/Llama-2-7b-chat-hf",
    extra_body={"use_beam_search": True, "best_of": 4},
)

model_kwargs vs extra_body

Use the correct parameter for different types of API arguments:

Use model_kwargs for:

Standard OpenAI API parameters not explicitly defined as class parameters
Parameters that should be flattened into the top-level request payload
Examples: max_completion_tokens, stream_options, modalities, audio

# Standard OpenAI parameters
model = ChatOpenAI(
    model="...",
    model_kwargs={
        "stream_options": {"include_usage": True},
        "max_completion_tokens": 300,
        "modalities": ["text", "audio"],
        "audio": {"voice": "alloy", "format": "wav"},
    },
)

Use extra_body for:

Custom parameters specific to OpenAI-compatible providers (vLLM, LM Studio, OpenRouter, etc.)
Parameters that need to be nested under extra_body in the request
Any non-standard OpenAI API parameters

# Custom provider parameters
model = ChatOpenAI(
    base_url="http://localhost:8000/v1",
    model="custom-model",
    extra_body={
        "use_beam_search": True,  # vLLM parameter
        "best_of": 4,  # vLLM parameter
        "ttl": 300,  # LM Studio parameter
    },
)

Key Differences:

model_kwargs: Parameters are merged into top-level request payload
extra_body: Parameters are nested under extra_body key in request

Warning

Always use extra_body for custom parameters, not model_kwargs. Using model_kwargs for non-OpenAI parameters will cause API errors.

Prompt caching optimization

For high-volume applications with repetitive prompts, use prompt_cache_key per-invocation to improve cache hit rates and reduce costs:

model = ChatOpenAI(model="...")

response = model.invoke(
    messages,
    prompt_cache_key="example-key-a",  # Routes to same machine for cache hits
)

customer_response = model.invoke(messages, prompt_cache_key="example-key-b")
support_response = model.invoke(messages, prompt_cache_key="example-key-c")

# Dynamic cache keys based on context
cache_key = f"example-key-{dynamic_suffix}"
response = model.invoke(messages, prompt_cache_key=cache_key)

Cache keys help ensure requests with the same prompt prefix are routed to machines with existing cache, providing cost reduction and latency improvement on cached tokens.

LangChain Assistant

Menu

Bases

Attributes

Methods

Inherited fromBaseChatOpenAI

Attributes

Methods

Inherited fromBaseChatModel(langchain_core)

Attributes

Inherited fromBaseLanguageModel(langchain_core)

Attributes

Inherited fromRunnableSerializable(langchain_core)

Attributes

Methods

Inherited fromSerializable(langchain_core)

Attributes

Methods

Inherited fromRunnable(langchain_core)

Attributes

Methods

Methods

Methods

Menu

ChatOpenAI

Bases

Used in Docs

Attributes

Methods

Inherited fromBaseChatOpenAI

Attributes

Methods

Inherited fromBaseChatModel(langchain_core)

Attributes

Inherited fromBaseLanguageModel(langchain_core)

Attributes

Inherited fromRunnableSerializable(langchain_core)

Attributes

Methods

Inherited fromSerializable(langchain_core)

Attributes

Methods

Inherited fromRunnable(langchain_core)

Attributes

Methods

Methods

Methods