langchain-cerebras¶
langchain_cerebras
¶
ChatCerebras
¶
Bases: BaseChatOpenAI
ChatCerebras chat model.
Setup
Install langchain-cerebras and set environment variable CEREBRAS_API_KEY.
Key init args — completion params: model: str Name of model to use. temperature: Optional[float] Sampling temperature. max_tokens: Optional[int] Max number of tokens to generate. reasoning_effort: Optional[Literal["low", "medium", "high"]] Level of reasoning effort for gpt-oss-120b model. disable_reasoning: Optional[bool] Whether to disable reasoning for zai-glm-4.6 model.
Key init args — client params:
timeout: Union[float, Tuple[float, float], Any, None]
Timeout for requests.
max_retries: Optional[int]
Max number of retries.
api_key: Optional[str]
Cerebras API key. If not passed in will be read from env var CEREBRAS_API_KEY.
Instantiate
Invoke
messages = [
(
"system",
"You are a helpful translator. Translate the user sentence to French.",
),
("human", "I love programming."),
]
llm.invoke(messages)
AIMessage(
content='The translation of "I love programming" to French is:\n\n"J\'adore programmer."',
response_metadata={
'token_usage': {'completion_tokens': 20, 'prompt_tokens': 32, 'total_tokens': 52},
'model_name': 'llama-3.3-70b',
'system_fingerprint': 'fp_679dff74c0',
'finish_reason': 'stop',
},
id='run-377c2887-30ef-417e-b0f5-83efc8844f12-0',
usage_metadata={'input_tokens': 32, 'output_tokens': 20, 'total_tokens': 52})
Stream
content='' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content='The' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content=' translation' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content=' of' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content=' "' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content='I' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content=' love' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content=' programming' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content='"' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content=' to' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content=' French' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content=' is' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content=':\n\n' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content='"' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content='J' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content="'" id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content='ad' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content='ore' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content=' programmer' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content='."' id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
content='' response_metadata={'finish_reason': 'stop', 'model_name': 'llama-3.3-70b', 'system_fingerprint': 'fp_679dff74c0'} id='run-3f9dc84e-208f-48da-b15d-e552b6759c24'
Async
await llm.ainvoke(messages)
# stream:
# async for chunk in (await llm.astream(messages))
# batch:
# await llm.abatch([messages])
AIMessage(
content='The translation of "I love programming" to French is:\n\n"J\'adore programmer."',
response_metadata={
'token_usage': {'completion_tokens': 20, 'prompt_tokens': 32, 'total_tokens': 52},
'model_name': 'llama-3.3-70b',
'system_fingerprint': 'fp_679dff74c0',
'finish_reason': 'stop',
},
id='run-377c2887-30ef-417e-b0f5-83efc8844f12-0',
usage_metadata={'input_tokens': 32, 'output_tokens': 20, 'total_tokens': 52})
Tool calling
from langchain_core.pydantic_v1 import BaseModel, Field
llm = ChatCerebras(model="llama-3.3-70b")
class GetWeather(BaseModel):
'''Get the current weather in a given location'''
location: str = Field(
..., description="The city and state, e.g. San Francisco, CA"
)
class GetPopulation(BaseModel):
'''Get the current population in a given location'''
location: str = Field(
..., description="The city and state, e.g. San Francisco, CA"
)
llm_with_tools = llm.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke(
"Which city is bigger: LA or NY?"
)
ai_msg.tool_calls
Structured output
from typing import Optional
from langchain_core.pydantic_v1 import BaseModel, Field
class Joke(BaseModel):
'''Joke to tell user.'''
setup: str = Field(description="The setup of the joke")
punchline: str = Field(description="The punchline to the joke")
rating: Optional[int] = Field(description="How funny the joke is, from 1 to 10")
structured_llm = llm.with_structured_output(Joke)
structured_llm.invoke("Tell me a joke about cats")
JSON mode
Token usage
Response metadata .. code-block:: python
ai_msg = llm.invoke(messages)
ai_msg.response_metadata
.. code-block:: python
{
'token_usage': {
'completion_tokens': 4,
'prompt_tokens': 19,
'total_tokens': 23
},
'model_name': 'mistralai/Mixtral-8x7B-Instruct-v0.1',
'system_fingerprint': None,
'finish_reason': 'eos',
'logprobs': None
}
Reasoning with gpt-oss-120b
.. code-block:: python
llm = ChatCerebras(
model="gpt-oss-120b",
reasoning_effort="high" # "low", "medium", or "high"
)
response = llm.invoke("What is the cube root of 50.653?")
# Reasoning is exposed as structured content blocks
for block in response.content:
if isinstance(block, dict):
if block["type"] == "reasoning_content":
reasoning_text = block["reasoning_content"]["text"]
print(f"Reasoning: {reasoning_text}")
elif block["type"] == "text":
print(f"Answer: {block['text']}")
Reasoning with zai-glm-4.6: .. code-block:: python
llm = ChatCerebras(
model="zai-glm-4.6",
disable_reasoning=False # Enable reasoning
)
response = llm.invoke("Explain quantum computing")
# Same access pattern for reasoning content
for block in response.content:
if isinstance(block, dict):
if block["type"] == "reasoning_content":
print(f"Reasoning: {block['reasoning_content']['text']}")
elif block["type"] == "text":
print(f"Answer: {block['text']}")
Reasoning with streaming
.. code-block:: python
llm = ChatCerebras(
model="gpt-oss-120b",
reasoning_effort="medium"
)
full_reasoning = ""
full_text = ""
for chunk in llm.stream("What is 2+2?"):
# Reasoning tokens are in additional_kwargs during streaming
if "reasoning" in chunk.additional_kwargs:
full_reasoning += chunk.additional_kwargs["reasoning"]
if isinstance(chunk.content, str):
full_text += chunk.content
print(f"Reasoning: {full_reasoning}")
print(f"Answer: {full_text}")
| METHOD | DESCRIPTION |
|---|---|
get_lc_namespace |
Get the namespace of the langchain object. |
validate_environment |
Validate that api key and python package exists in environment. |
lc_secrets
property
¶
A map of constructor argument names to secret ids.
For example, {"cerebras_api_key": "CEREBRAS_API_KEY"}
lc_attributes
property
¶
List of attribute names that should be included in the serialized kwargs.
These attributes must be accepted by the constructor.
model_name
class-attribute
instance-attribute
¶
Model name to use.
cerebras_api_key
class-attribute
instance-attribute
¶
cerebras_api_key: SecretStr | None = Field(
alias="api_key", default_factory=secret_from_env("CEREBRAS_API_KEY", default=None)
)
Automatically inferred from env are CEREBRAS_API_KEY if not provided.
reasoning_effort
class-attribute
instance-attribute
¶
reasoning_effort: Literal["low", "medium", "high"] | None = Field(
default=None,
description="Level of reasoning effort for the gpt-oss-120b model. Options: 'low' (minimal reasoning, faster), 'medium' (moderate reasoning), or 'high' (extensive reasoning, more thorough analysis).",
)
Reasoning effort level for gpt-oss-120b model.
disable_reasoning
class-attribute
instance-attribute
¶
disable_reasoning: bool | None = Field(
default=None,
description="Whether to disable reasoning for the zai-glm-4.6 model. Set to True to disable reasoning, False (default) to enable.",
)
Disable reasoning for zai-glm-4.6 model.
get_lc_namespace
classmethod
¶
Get the namespace of the langchain object.