IBM watsonx.ai chat models integration.
To use, you should have langchain_ibm python package installed,
and the environment variable WATSONX_API_KEY set with your API key, or pass
it as a named parameter api_key to the constructor.
pip install -U langchain-ibm
# or using uv
uv add langchain-ibm
export WATSONX_API_KEY="your-api-key"
apikey and WATSONX_APIKEY are deprecated and will be removed in
version 2.0.0. Use api_key and WATSONX_API_KEY instead.
Create a model instance with desired params. For example:
from langchain_ibm import ChatWatsonx
from ibm_watsonx_ai.foundation_models.schema import TextChatParameters
parameters = TextChatParameters(
top_p=1, temperature=0.5, max_completion_tokens=None
)
model = ChatWatsonx(
model_id="meta-llama/llama-3-3-70b-instruct",
url="https://us-south.ml.cloud.ibm.com",
project_id="*****",
params=parameters,
# api_key="*****"
)Generate a response from the model:
messages = [
(
"system",
"You are a helpful translator. Translate the user sentence to French.",
),
("human", "I love programming."),
]
model.invoke(messages)
Results in an AIMessage response:
AIMessage(
content="J'adore programmer.",
additional_kwargs={},
response_metadata={
"token_usage": {
"completion_tokens": 7,
"prompt_tokens": 30,
"total_tokens": 37,
},
"model_name": "ibm/granite-3-3-8b-instruct",
"system_fingerprint": "",
"finish_reason": "stop",
},
id="chatcmpl-529352c4-93ba-4801-8f1d-a3b4e3935194---daed91fb74d0405f200db1e63da9a48a---7a3ef799-4413-47e4-b24c-85d267e37fa2",
usage_metadata={"input_tokens": 30, "output_tokens": 7, "total_tokens": 37},
)Stream a response from the model:
for chunk in model.stream(messages):
print(chunk.text)
Results in a sequence of AIMessageChunk objects with partial content:
AIMessageChunk(content="", id="run--e48a38d3-1500-4b5e-870c-6313e8cff775")
AIMessageChunk(content="J", id="run--e48a38d3-1500-4b5e-870c-6313e8cff775")
AIMessageChunk(content="'", id="run--e48a38d3-1500-4b5e-870c-6313e8cff775")
AIMessageChunk(content="ad", id="run--e48a38d3-1500-4b5e-870c-6313e8cff775")
AIMessageChunk(content="or", id="run--e48a38d3-1500-4b5e-870c-6313e8cff775")
AIMessageChunk(
content=" programmer", id="run--e48a38d3-1500-4b5e-870c-6313e8cff775"
)
AIMessageChunk(content=".", id="run--e48a38d3-1500-4b5e-870c-6313e8cff775")
AIMessageChunk(
content="",
response_metadata={
"finish_reason": "stop",
"model_name": "ibm/granite-3-3-8b-instruct",
},
id="run--e48a38d3-1500-4b5e-870c-6313e8cff775",
)
AIMessageChunk(
content="",
id="run--e48a38d3-1500-4b5e-870c-6313e8cff775",
usage_metadata={"input_tokens": 30, "output_tokens": 7, "total_tokens": 37},
)
To collect the full message, you can concatenate the chunks:
stream = model.stream(messages)
full = next(stream)
for chunk in stream:
full += chunk
full
AIMessageChunk(
content="J'adore programmer.",
response_metadata={
"finish_reason": "stop",
"model_name": "ibm/granite-3-3-8b-instruct",
},
id="chatcmpl-88a48b71-c149-4a0c-9c02-d6b97ca5dc6c---b7ba15879a8c5283b1e8a3b8db0229f0---0037ca4f-8a74-4f84-a46c-ab3fd1294f24",
usage_metadata={"input_tokens": 30, "output_tokens": 7, "total_tokens": 37},
)Asynchronous equivalents of invoke, stream, and batch are also available:
# Invoke
await model.ainvoke(messages)
# Stream
async for chunk in model.astream(messages):
print(chunk.text)
# Batch
await model.abatch([messages])
Results in an AIMessage response:
AIMessage(
content="J'adore programmer.",
additional_kwargs={},
response_metadata={
"token_usage": {
"completion_tokens": 7,
"prompt_tokens": 30,
"total_tokens": 37,
},
"model_name": "ibm/granite-3-3-8b-instruct",
"system_fingerprint": "",
"finish_reason": "stop",
},
id="chatcmpl-5bef2d81-ef56-463b-a8fa-c2bcc2a3c348---821e7750d18925f2b36226db66667e26---6396c786-9da9-4468-883e-11ed90a05937",
usage_metadata={"input_tokens": 30, "output_tokens": 7, "total_tokens": 37},
)
For batched calls, results in a list[AIMessage].
from pydantic import BaseModel, Field
class GetWeather(BaseModel):
'''Get the current weather in a given location'''
location: str = Field(
..., description="The city and state, e.g. San Francisco, CA"
)
class GetPopulation(BaseModel):
'''Get the current population in a given location'''
location: str = Field(
..., description="The city and state, e.g. San Francisco, CA"
)
model_with_tools = model.bind_tools(
[GetWeather, GetPopulation]
# strict = True # Enforce tool args schema is respected
)
ai_msg = model_with_tools.invoke(
"Which city is hotter today and which is bigger: LA or NY?"
)
ai_msg.tool_calls
[
{
"name": "GetWeather",
"args": {"location": "Los Angeles, CA"},
"id": "chatcmpl-tool-59632abcee8f48a18a5f3a81422b917b",
"type": "tool_call",
},
{
"name": "GetWeather",
"args": {"location": "New York, NY"},
"id": "chatcmpl-tool-c6f3b033b4594918bb53f656525b0979",
"type": "tool_call",
},
{
"name": "GetPopulation",
"args": {"location": "Los Angeles, CA"},
"id": "chatcmpl-tool-175a23281e4747ea81cbe472b8e47012",
"type": "tool_call",
},
{
"name": "GetPopulation",
"args": {"location": "New York, NY"},
"id": "chatcmpl-tool-e1ccc534835945aebab708eb5e685bf7",
"type": "tool_call",
},
]from langchain_ibm import ChatWatsonx
from ibm_watsonx_ai.foundation_models.schema import TextChatParameters
parameters = TextChatParameters(
include_reasoning=True, reasoning_effort="medium"
)
model = ChatWatsonx(
model_id="openai/gpt-oss-120b",
url="https://us-south.ml.cloud.ibm.com",
project_id="*****",
params=parameters,
# api_key="*****"
)
response = model.invoke("What is 3^3?")
# Response text
print(f"Output: {response.content}")
# Reasoning summaries
print(f"Reasoning: {response.additional_kwargs['reasoning_content']}")
Output: 3^3 = 27
Reasoning: The user asks "What is 3^3?" That's 27. Provide answer.
AIMessage formatlangchain-ibm >= 0.3.19
allows users to set Reasoning output parameters and will format output from
reasoning summaries into additional_kwargs field.
from pydantic import BaseModel, Field
class Joke(BaseModel):
'''Joke to tell user.'''
setup: str = Field(description="The setup of the joke")
punchline: str = Field(description="The punchline to the joke")
rating: int | None = Field(description="How funny the joke is, 1 to 10")
structured_model = model.with_structured_output(Joke)
structured_model.invoke("Tell me a joke about cats")
Joke(
setup="Why was the cat sitting on the computer?",
punchline="To keep an eye on the mouse!",
rating=None,
)
See with_structured_output for more info.
json_model = model.bind(response_format={"type": "json_object"})
ai_msg = json_model.invoke(
“Return JSON with 'random_ints': an array of 10 random integers from 0-99.”
)
ai_msg.content
'{\n "random_ints": [12, 34, 56, 78, 10, 22, 44, 66, 88, 99]\n}'import base64
import httpx
from langchain.messages import HumanMessage
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
]
)
ai_msg = model.invoke([message])
ai_msg.content
"The weather in the image presents a clear, sunny day with good visibility
and no immediate signs of rain or strong winds. The vibrant blue sky with
scattered white clouds gives the impression of a tranquil, pleasant day
conducive to outdoor activities."ai_msg = model.invoke(messages)
ai_msg.usage_metadata
{'input_tokens': 30, 'output_tokens': 7, 'total_tokens': 37}
stream = model.stream(messages)
full = next(stream)
for chunk in stream:
full += chunk
full.usage_metadata
{'input_tokens': 30, 'output_tokens': 7, 'total_tokens': 37}logprobs_model = model.bind(logprobs=True)
ai_msg = logprobs_model.invoke(messages)
ai_msg.response_metadata["logprobs"]
{
'content': [
{
'token': 'J',
'logprob': -0.0017940393
},
{
'token': "'",
'logprob': -1.7523613e-05
},
{
'token': 'ad',
'logprob': -0.16112353
},
{
'token': 'ore',
'logprob': -0.0003091811
},
{
'token': ' programmer',
'logprob': -0.24849245
},
{
'token': '.',
'logprob': -2.5033638e-05
},
{
'token': '<|end_of_text|>',
'logprob': -7.080781e-05
}
]
}ai_msg = model.invoke(messages)
ai_msg.response_metadata
{
'token_usage': {
'completion_tokens': 7,
'prompt_tokens': 30,
'total_tokens': 37
},
'model_name': 'ibm/granite-3-3-8b-instruct',
'system_fingerprint': '',
'finish_reason': 'stop'
}Type of model to use.
Name or alias of the foundation model to use. When using IBM's watsonx.ai Model Gateway (public preview), you can specify any supported third-party model—OpenAI, Anthropic, NVIDIA, Cerebras, or IBM's own Granite series—via a single, OpenAI-compatible interface. Models must be explicitly provisioned (opt-in) through the Gateway to ensure secure, vendor-agnostic access and easy switch-over without reconfiguration.
For more details on configuration and usage, see IBM watsonx Model Gateway docs
Type of deployed model to use.
ID of the Watson Studio project.
ID of the Watson Studio space.
URL to the Watson Machine Learning or CPD instance.
API key to the Watson Machine Learning or CPD instance.
Token to the CPD instance.
Password to the CPD instance.
Username to the CPD instance.
Instance_id of the CPD instance.
Version of the CPD instance.
Model parameters to use during request generation.
ValueError is raised if the same Chat generation parameter is provided
within the params attribute and as keyword argument.
Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.
An integer specifying the number of most likely tokens to return at each token position, each with an associated log probability. The option logprobs must be set to true if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. This value is now deprecated in favor of 'max_completion_tokens' parameter.
The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
What sampling temperature to use. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
The chat response format parameters.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
Time limit in milliseconds - if not completed within this time, generation will stop.
Increasing or decreasing probability of tokens being selected during generation.
Random number generator seed to use in sampling mode for experimental repeatability.
Stop sequences are one or more strings which will cause the text generation to stop if/when they are produced as part of the output.
Additional chat template parameters.
A lower reasoning effort can result in faster responses, fewer tokens used, and shorter reasoning_content in the responses. Supported values are: low, medium, and high.
Whether to include reasoning_content in the response.
Represents the penalty for penalizing tokens that have already been generated or belong to the context.
Exponential penalty to the length that is used with beam-based generation.
It is applied as an exponent to the sequence length, which in turn is used
to divide the score of the sequence. Since the score is the log likelihood
of the sequence (i.e. negative), length_penalty > 0.0 promotes longer sequences,
while length_penalty < 0.0 encourages shorter sequences.
You can pass one of following as verify:
Model ID validation.
Whether to stream the results or not.
Mapping of secret environment variables.
Is lc serializable.
Validate that credentials and python package exists in environment.
Bind tool-like objects to this chat model.
Model wrapper that returns outputs formatted to match the given schema.