# ChatOpenAI

> **Class** in `langchain_openai`

📖 [View in docs](https://reference.langchain.com/python/langchain-openai/chat_models/base/ChatOpenAI)

Interface to OpenAI chat model APIs.

!!! warning "API scope"

    `ChatOpenAI` targets
    [official OpenAI API specifications](https://github.com/openai/openai-openapi)
    only. Non-standard response fields added by third-party providers (e.g.,
    `reasoning_content`, `reasoning_details`) are **not** extracted or
    preserved. If you are pointing `base_url` at a provider such as
    OpenRouter, vLLM, or DeepSeek, use the corresponding provider-specific
    LangChain package instead (e.g., `ChatDeepSeek`, `ChatOpenRouter`).

???+ info "Setup"

    Install `langchain-openai` and set environment variable `OPENAI_API_KEY`.

    ```bash
    pip install -U langchain-openai

    # or using uv
    uv add langchain-openai
    ```

    ```bash
    export OPENAI_API_KEY="your-api-key"
    ```

??? info "Key init args — completion params"

    | Param               | Type          | Description                                                                                                 |
    | ------------------- | ------------- | ----------------------------------------------------------------------------------------------------------- |
    | `model`             | `str`         | Name of OpenAI model to use.                                                                                |
    | `temperature`       | `float`       | Sampling temperature.                                                                                       |
    | `max_tokens`        | `int | None`  | Max number of tokens to generate.                                                                           |
    | `logprobs`          | `bool | None` | Whether to return logprobs.                                                                                 |
    | `stream_options`    | `dict`        | Configure streaming outputs, like whether to return token usage when streaming (`{"include_usage": True}`). |
    | `use_responses_api` | `bool | None` | Whether to use the responses API.                                                                           |

    See full list of supported init args and their descriptions below.

??? info "Key init args — client params"

    | Param          | Type                                       | Description                                                                         |
    | -------------- | ------------------------------------------ | ----------------------------------------------------------------------------------- |
    | `timeout`      | `float | Tuple[float, float] | Any | None` | Timeout for requests.                                                               |
    | `max_retries`  | `int | None`                               | Max number of retries.                                                              |
    | `api_key`      | `str | None`                               | OpenAI API key. If not passed in will be read from env var `OPENAI_API_KEY`.        |
    | `base_url`     | `str | None`                               | Base URL for API requests. Only specify if using a proxy or service emulator.       |
    | `organization` | `str | None`                               | OpenAI organization ID. If not passed in will be read from env var `OPENAI_ORG_ID`. |

    See full list of supported init args and their descriptions below.

??? info "Instantiate"

    Create a model instance with desired params. For example:

    ```python
    from langchain_openai import ChatOpenAI

    model = ChatOpenAI(
        model="...",
        temperature=0,
        max_tokens=None,
        timeout=None,
        max_retries=2,
        # api_key="...",
        # base_url="...",
        # organization="...",
        # other params...
    )
    ```

    See all available params below.

    !!! tip "Preserved params"
        Any param which is not explicitly supported will be passed directly to
        [`openai.OpenAI.chat.completions.create(...)`](https://platform.openai.com/docs/api-reference/chat/create)
        every time to the model is invoked. For example:

        ```python
        from langchain_openai import ChatOpenAI
        import openai

        ChatOpenAI(..., frequency_penalty=0.2).invoke(...)

        # Results in underlying API call of:

        openai.OpenAI(..).chat.completions.create(..., frequency_penalty=0.2)

        # Which is also equivalent to:

        ChatOpenAI(...).invoke(..., frequency_penalty=0.2)
        ```

??? info "Invoke"

    Generate a response from the model:

    ```python
    messages = [
        (
            "system",
            "You are a helpful translator. Translate the user sentence to French.",
        ),
        ("human", "I love programming."),
    ]
    model.invoke(messages)
    ```

    Results in an `AIMessage` response:

    ```python
    AIMessage(
        content="J'adore la programmation.",
        response_metadata={
            "token_usage": {
                "completion_tokens": 5,
                "prompt_tokens": 31,
                "total_tokens": 36,
            },
            "model_name": "gpt-4o",
            "system_fingerprint": "fp_43dfabdef1",
            "finish_reason": "stop",
            "logprobs": None,
        },
        id="run-012cffe2-5d3d-424d-83b5-51c6d4a593d1-0",
        usage_metadata={"input_tokens": 31, "output_tokens": 5, "total_tokens": 36},
    )
    ```

??? info "Stream"

    Stream a response from the model:

    ```python
    for chunk in model.stream(messages):
        print(chunk.text, end="")
    ```

    Results in a sequence of `AIMessageChunk` objects with partial content:

    ```python
    AIMessageChunk(content="", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
    AIMessageChunk(content="J", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
    AIMessageChunk(content="'adore", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
    AIMessageChunk(content=" la", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
    AIMessageChunk(
        content=" programmation", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0"
    )
    AIMessageChunk(content=".", id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0")
    AIMessageChunk(
        content="",
        response_metadata={"finish_reason": "stop"},
        id="run-9e1517e3-12bf-48f2-bb1b-2e824f7cd7b0",
    )
    ```

    To collect the full message, you can concatenate the chunks:

    ```python
    stream = model.stream(messages)
    full = next(stream)
    for chunk in stream:
        full += chunk
    ```

    ```python
    full = AIMessageChunk(
        content="J'adore la programmation.",
        response_metadata={"finish_reason": "stop"},
        id="run-bf917526-7f58-4683-84f7-36a6b671d140",
    )
    ```

??? info "Async"

    Asynchronous equivalents of `invoke`, `stream`, and `batch` are also available:

    ```python
    # Invoke
    await model.ainvoke(messages)

    # Stream
    async for chunk in (await model.astream(messages))

    # Batch
    await model.abatch([messages])
    ```

    Results in an `AIMessage` response:

    ```python
    AIMessage(
        content="J'adore la programmation.",
        response_metadata={
            "token_usage": {
                "completion_tokens": 5,
                "prompt_tokens": 31,
                "total_tokens": 36,
            },
            "model_name": "gpt-4o",
            "system_fingerprint": "fp_43dfabdef1",
            "finish_reason": "stop",
            "logprobs": None,
        },
        id="run-012cffe2-5d3d-424d-83b5-51c6d4a593d1-0",
        usage_metadata={
            "input_tokens": 31,
            "output_tokens": 5,
            "total_tokens": 36,
        },
    )
    ```

    For batched calls, results in a `list[AIMessage]`.

??? info "Tool calling"

    ```python
    from pydantic import BaseModel, Field

    class GetWeather(BaseModel):
        '''Get the current weather in a given location'''

        location: str = Field(
            ..., description="The city and state, e.g. San Francisco, CA"
        )

    class GetPopulation(BaseModel):
        '''Get the current population in a given location'''

        location: str = Field(
            ..., description="The city and state, e.g. San Francisco, CA"
        )

    model_with_tools = model.bind_tools(
        [GetWeather, GetPopulation]
        # strict = True  # Enforce tool args schema is respected
    )
    ai_msg = model_with_tools.invoke(
        "Which city is hotter today and which is bigger: LA or NY?"
    )
    ai_msg.tool_calls
    ```

    ```python
    [
        {
            "name": "GetWeather",
            "args": {"location": "Los Angeles, CA"},
            "id": "call_6XswGD5Pqk8Tt5atYr7tfenU",
        },
        {
            "name": "GetWeather",
            "args": {"location": "New York, NY"},
            "id": "call_ZVL15vA8Y7kXqOy3dtmQgeCi",
        },
        {
            "name": "GetPopulation",
            "args": {"location": "Los Angeles, CA"},
            "id": "call_49CFW8zqC9W7mh7hbMLSIrXw",
        },
        {
            "name": "GetPopulation",
            "args": {"location": "New York, NY"},
            "id": "call_6ghfKxV264jEfe1mRIkS3PE7",
        },
    ]
    ```

    !!! note "Parallel tool calls"
        [`openai >= 1.32`](https://pypi.org/project/openai/) supports a
        `parallel_tool_calls` parameter that defaults to `True`. This parameter can
        be set to `False` to disable parallel tool calls:

        ```python
        ai_msg = model_with_tools.invoke(
            "What is the weather in LA and NY?", parallel_tool_calls=False
        )
        ai_msg.tool_calls
        ```

        ```python
        [
            {
                "name": "GetWeather",
                "args": {"location": "Los Angeles, CA"},
                "id": "call_4OoY0ZR99iEvC7fevsH8Uhtz",
            }
        ]
        ```

    Like other runtime parameters, `parallel_tool_calls` can be bound to a model
    using `model.bind(parallel_tool_calls=False)` or during instantiation by
    setting `model_kwargs`.

    See `bind_tools` for more.

??? info "Built-in (server-side) tools"

    You can access [built-in tools](https://platform.openai.com/docs/guides/tools?api-mode=responses)
    supported by the OpenAI Responses API. See [LangChain docs](https://docs.langchain.com/oss/python/integrations/chat/openai#responses-api)
    for more detail.

    ```python
    from langchain_openai import ChatOpenAI

    model = ChatOpenAI(model="...", output_version="responses/v1")

    tool = {"type": "web_search"}
    model_with_tools = model.bind_tools([tool])

    response = model_with_tools.invoke("What was a positive news story from today?")
    response.content
    ```

    ```python
    [
        {
            "type": "text",
            "text": "Today, a heartwarming story emerged from ...",
            "annotations": [
                {
                    "end_index": 778,
                    "start_index": 682,
                    "title": "Title of story",
                    "type": "url_citation",
                    "url": "<url of story>",
                }
            ],
        }
    ]
    ```

    !!! version-added "Added in `langchain-openai` 0.3.9"

    !!! version-added "Added in `langchain-openai` 0.3.26: Updated `AIMessage` format"
        [`langchain-openai >= 0.3.26`](https://pypi.org/project/langchain-openai/#history)
        allows users to opt-in to an updated `AIMessage` format when using the
        Responses API. Setting `ChatOpenAI(..., output_version="responses/v1")` will
        format output from reasoning summaries, built-in tool invocations, and other
        response items into the message's `content` field, rather than
        `additional_kwargs`. We recommend this format for new applications.

??? info "Managing conversation state"

    OpenAI's Responses API supports management of [conversation state](https://platform.openai.com/docs/guides/conversation-state?api-mode=responses).
    Passing in response IDs from previous messages will continue a conversational
    thread.

    ```python
    from langchain_openai import ChatOpenAI

    model = ChatOpenAI(
        model="...",
        use_responses_api=True,
        output_version="responses/v1",
    )
    response = model.invoke("Hi, I'm Bob.")
    response.text
    ```

    ```txt
    "Hi Bob! How can I assist you today?"
    ```

    ```python
    second_response = model.invoke(
        "What is my name?",
        previous_response_id=response.response_metadata["id"],
    )
    second_response.text
    ```

    ```txt
    "Your name is Bob. How can I help you today, Bob?"
    ```

    !!! version-added "Added in `langchain-openai` 0.3.9"

    !!! version-added "Added in `langchain-openai` 0.3.26"

        You can also initialize `ChatOpenAI` with `use_previous_response_id`.
        Input messages up to the most recent response will then be dropped from request
        payloads, and `previous_response_id` will be set using the ID of the most
        recent response.

        ```python
        model = ChatOpenAI(model="...", use_previous_response_id=True)
        ```

    !!! note "OpenAI-compatible endpoints"

        Some OpenAI-compatible providers/proxies may not support forwarding
        reasoning blocks in request history. If you see request-format
        errors while using reasoning + Responses API, prefer
        `use_previous_response_id=True` (so the server keeps
        conversation state).

??? info "Reasoning output"

    OpenAI's Responses API supports [reasoning models](https://platform.openai.com/docs/guides/reasoning?api-mode=responses)
    that expose a summary of internal reasoning processes.

    ```python
    from langchain_openai import ChatOpenAI

    reasoning = {
        "effort": "medium",  # 'low', 'medium', or 'high'
        "summary": "auto",  # 'detailed', 'auto', or None
    }

    model = ChatOpenAI(
        model="...", reasoning=reasoning, output_version="responses/v1"
    )
    response = model.invoke("What is 3^3?")

    # Response text
    print(f"Output: {response.text}")

    # Reasoning summaries
    for block in response.content:
        if block["type"] == "reasoning":
            for summary in block["summary"]:
                print(summary["text"])
    ```

    ```txt
    Output: 3³ = 27
    Reasoning: The user wants to know...
    ```

    !!! version-added "Added in `langchain-openai` 0.3.26: Updated `AIMessage` format"
        [`langchain-openai >= 0.3.26`](https://pypi.org/project/langchain-openai/#history)
        allows users to opt-in to an updated `AIMessage` format when using the
        Responses API. Setting `ChatOpenAI(..., output_version="responses/v1")` will
        format output from reasoning summaries, built-in tool invocations, and other
        response items into the message's `content` field, rather than
        `additional_kwargs`. We recommend this format for new applications.

    !!! note "Troubleshooting with non-OpenAI backends"
        When using a non-OpenAI endpoint via `base_url`, request handling for
        reasoning history can differ. If agent loops fail after tool calls, use:
        `ChatOpenAI(..., use_responses_api=True, use_previous_response_id=True)`.

??? info "Structured output"

    ```python
    from pydantic import BaseModel, Field

    class Joke(BaseModel):
        '''Joke to tell user.'''

        setup: str = Field(description="The setup of the joke")
        punchline: str = Field(description="The punchline to the joke")
        rating: int | None = Field(
            description="How funny the joke is, from 1 to 10"
        )

    structured_model = model.with_structured_output(Joke)
    structured_model.invoke("Tell me a joke about cats")
    ```

    ```python
    Joke(
        setup="Why was the cat sitting on the computer?",
        punchline="To keep an eye on the mouse!",
        rating=None,
    )
    ```

    See `with_structured_output` for more info.

??? info "JSON mode"

    ```python
    json_model = model.bind(response_format={"type": "json_object"})
    ai_msg = json_model.invoke(
        "Return a JSON object with key 'random_ints' and a value of 10 random ints in [0-99]"
    )
    ai_msg.content
    ```

    ```txt
    '\\n{\\n  "random_ints": [23, 87, 45, 12, 78, 34, 56, 90, 11, 67]\\n}'
    ```

??? info "Image input"

    ```python
    import base64
    import httpx
    from langchain.messages import HumanMessage

    image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
    image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
    message = HumanMessage(
        content=[
            {"type": "text", "text": "describe the weather in this image"},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
            },
        ]
    )

    ai_msg = model.invoke([message])
    ai_msg.content
    ```

    ```txt
    "The weather in the image appears to be clear and pleasant. The sky is mostly blue with scattered, light clouds, suggesting a sunny day with minimal cloud cover. There is no indication of rain or strong winds, and the overall scene looks bright and calm. The lush green grass and clear visibility further indicate good weather conditions."
    ```

??? info "Token usage"

    ```python
    ai_msg = model.invoke(messages)
    ai_msg.usage_metadata

    ```txt
    {"input_tokens": 28, "output_tokens": 5, "total_tokens": 33}
    ```

    When streaming, set the `stream_usage` kwarg:

    ```python
    stream = model.stream(messages, stream_usage=True)
    full = next(stream)
    for chunk in stream:
        full += chunk
    full.usage_metadata
    ```

    ```txt
    {"input_tokens": 28, "output_tokens": 5, "total_tokens": 33}
    ```

??? info "Logprobs"

    ```python
    logprobs_model = model.bind(logprobs=True)
    ai_msg = logprobs_model.invoke(messages)
    ai_msg.response_metadata["logprobs"]
    ```

    ```txt
    {
        "content": [
            {
                "token": "J",
                "bytes": [74],
                "logprob": -4.9617593e-06,
                "top_logprobs": [],
            },
            {
                "token": "'adore",
                "bytes": [39, 97, 100, 111, 114, 101],
                "logprob": -0.25202933,
                "top_logprobs": [],
            },
            {
                "token": " la",
                "bytes": [32, 108, 97],
                "logprob": -0.20141791,
                "top_logprobs": [],
            },
            {
                "token": " programmation",
                "bytes": [
                    32,
                    112,
                    114,
                    111,
                    103,
                    114,
                    97,
                    109,
                    109,
                    97,
                    116,
                    105,
                    111,
                    110,
                ],
                "logprob": -1.9361265e-07,
                "top_logprobs": [],
            },
            {
                "token": ".",
                "bytes": [46],
                "logprob": -1.2233183e-05,
                "top_logprobs": [],
            },
        ]
    }
    ```

??? info "Response metadata"

    ```python
    ai_msg = model.invoke(messages)
    ai_msg.response_metadata
    ```

    ```txt
    {
        "token_usage": {
            "completion_tokens": 5,
            "prompt_tokens": 28,
            "total_tokens": 33,
        },
        "model_name": "gpt-4o",
        "system_fingerprint": "fp_319be4768e",
        "finish_reason": "stop",
        "logprobs": None,
    }
    ```

??? info "Flex processing"

    OpenAI offers a variety of [service tiers](https://platform.openai.com/docs/guides/flex-processing?api-mode=responses).
    The "flex" tier offers cheaper pricing for requests, with the trade-off that
    responses may take longer and resources might not always be available.
    This approach is best suited for non-critical tasks, including model testing,
    data enhancement, or jobs that can be run asynchronously.

    To use it, initialize the model with `service_tier="flex"`:

    ```python
    from langchain_openai import ChatOpenAI

    model = ChatOpenAI(model="...", service_tier="flex")
    ```

    Note that this is a beta feature that is only available for a subset of models.
    See OpenAI [flex processing docs](https://platform.openai.com/docs/guides/flex-processing?api-mode=responses)
    for more detail.

??? info "OpenAI-compatible APIs"

    `ChatOpenAI` can be used with OpenAI-compatible APIs like
    [LM Studio](https://lmstudio.ai/), [vLLM](https://github.com/vllm-project/vllm),
    [Ollama](https://ollama.com/), and others.

    To use custom parameters specific to these providers, use the `extra_body` parameter.

    !!! example "LM Studio example with TTL (auto-eviction)"

        ```python
        from langchain_openai import ChatOpenAI

        model = ChatOpenAI(
            base_url="http://localhost:1234/v1",
            api_key="lm-studio",  # Can be any string
            model="mlx-community/QwQ-32B-4bit",
            temperature=0,
            extra_body={
                "ttl": 300
            },  # Auto-evict model after 5 minutes of inactivity
        )
        ```

    !!! example "vLLM example with custom parameters"

        ```python
        model = ChatOpenAI(
            base_url="http://localhost:8000/v1",
            api_key="EMPTY",
            model="meta-llama/Llama-2-7b-chat-hf",
            extra_body={"use_beam_search": True, "best_of": 4},
        )
        ```

??? info "`model_kwargs` vs `extra_body`"

    Use the correct parameter for different types of API arguments:

    **Use `model_kwargs` for:**

    - Standard OpenAI API parameters not explicitly defined as class parameters
    - Parameters that should be flattened into the top-level request payload
    - Examples: `max_completion_tokens`, `stream_options`, `modalities`, `audio`

    ```python
    # Standard OpenAI parameters
    model = ChatOpenAI(
        model="...",
        model_kwargs={
            "stream_options": {"include_usage": True},
            "max_completion_tokens": 300,
            "modalities": ["text", "audio"],
            "audio": {"voice": "alloy", "format": "wav"},
        },
    )
    ```

    **Use `extra_body` for:**

    - Custom parameters specific to OpenAI-compatible providers (vLLM, LM Studio,
        OpenRouter, etc.)
    - Parameters that need to be nested under `extra_body` in the request
    - Any non-standard OpenAI API parameters

    ```python
    # Custom provider parameters
    model = ChatOpenAI(
        base_url="http://localhost:8000/v1",
        model="custom-model",
        extra_body={
            "use_beam_search": True,  # vLLM parameter
            "best_of": 4,  # vLLM parameter
            "ttl": 300,  # LM Studio parameter
        },
    )
    ```

    **Key Differences:**

    - `model_kwargs`: Parameters are **merged into top-level** request payload
    - `extra_body`: Parameters are **nested under `extra_body`** key in request

    !!! warning
        Always use `extra_body` for custom parameters, **not** `model_kwargs`.
        Using `model_kwargs` for non-OpenAI parameters will cause API errors.

??? info "Prompt caching optimization"

    For high-volume applications with repetitive prompts, use `prompt_cache_key`
    per-invocation to improve cache hit rates and reduce costs:

    ```python
    model = ChatOpenAI(model="...")

    response = model.invoke(
        messages,
        prompt_cache_key="example-key-a",  # Routes to same machine for cache hits
    )

    customer_response = model.invoke(messages, prompt_cache_key="example-key-b")
    support_response = model.invoke(messages, prompt_cache_key="example-key-c")

    # Dynamic cache keys based on context
    cache_key = f"example-key-{dynamic_suffix}"
    response = model.invoke(messages, prompt_cache_key=cache_key)
    ```

    Cache keys help ensure requests with the same prompt prefix are routed to
    machines with existing cache, providing cost reduction and latency improvement on
    cached tokens.

## Signature

```python
ChatOpenAI()
```

## Extends

- `BaseChatOpenAI`

## Properties

- `max_tokens`
- `lc_secrets`
- `lc_attributes`

## Methods

- [`get_lc_namespace()`](https://reference.langchain.com/python/langchain-openai/chat_models/base/ChatOpenAI/get_lc_namespace)
- [`is_lc_serializable()`](https://reference.langchain.com/python/langchain-openai/chat_models/base/ChatOpenAI/is_lc_serializable)
- [`with_structured_output()`](https://reference.langchain.com/python/langchain-openai/chat_models/base/ChatOpenAI/with_structured_output)

---

[View source on GitHub](https://github.com/langchain-ai/langchain/blob/8fec4e7ceee2c368b068c49f9fed453276e210e7/libs/partners/openai/langchain_openai/chat_models/base.py#L2366)