The format for reasoning output. Groq will default to raw if left undefined.
'parsed': Separates reasoning into a dedicated field while keeping the
response concise. Reasoning will be returned in the
additional_kwargs.reasoning_content field of the response.'raw': Includes reasoning within think tags (e.g.
<think>{reasoning_content}</think>).'hidden': Returns only the final answer content. Note: this only suppresses
reasoning content in the response; the model will still perform reasoning unless
overridden in reasoning_effort.See the Groq documentation for more details and a list of supported models.
The level of effort the model will put into reasoning. Groq will default to enabling reasoning if left undefined.
See the Groq documentation for more details and a list of options and models that support setting a reasoning effort.
Optional parameter that you can include to specify the service tier you'd like to use for requests.
'on_demand': Default.'flex': On-demand processing when capacity is available, with rapid timeouts
if resources are constrained. Provides balance between performance and
reliability for workloads that don't require guaranteed processing.'auto': Uses on-demand rate limits, then falls back to 'flex' if those
limits are exceededSee the Groq documentation for more details and a list of service tiers and descriptions.
Groq Chat large language models API.
To use, you should have the
environment variable GROQ_API_KEY set with your API key.
Any parameters that are valid to be passed to the groq.create call can be passed in, even if not explicitly saved on this class.
Setup:
Install langchain-groq and set environment variable
GROQ_API_KEY.
pip install -U langchain-groq
export GROQ_API_KEY="your-api-key"
Key init args — completion params:
model:
Name of Groq model to use, e.g. llama-3.1-8b-instant.
temperature:
Sampling temperature. Ranges from 0.0 to 1.0.
max_tokens:
Max number of tokens to generate.
reasoning_format:
The format for reasoning output. Groq will default to raw if left
undefined.
- `'parsed'`: Separates reasoning into a dedicated field while keeping the
response concise. Reasoning will be returned in the
`additional_kwargs.reasoning_content` field of the response.
- `'raw'`: Includes reasoning within think tags (e.g.
`<think>{reasoning_content}</think>`).
- `'hidden'`: Returns only the final answer content. Note: this only
suppresses reasoning content in the response; the model will still perform
reasoning unless overridden in `reasoning_effort`.
See the [Groq documentation](https://console.groq.com/docs/reasoning#reasoning)
for more details and a list of supported models.
model_kwargs:
Holds any model parameters valid for create call not
explicitly specified.
Key init args — client params:
timeout:
Timeout for requests.
max_retries:
Max number of retries.
api_key:
Groq API key. If not passed in will be read from env var GROQ_API_KEY.
base_url:
Base URL path for API requests, leave blank if not using a proxy
or service emulator.
custom_get_token_ids:
Optional encoder to use for counting tokens.
See full list of supported init args and their descriptions in the params section.
Instantiate:
from langchain_groq import ChatGroq
model = ChatGroq(
model="llama-3.1-8b-instant",
temperature=0.0,
max_retries=2,
# other params...
)
Invoke:
messages = [
("system", "You are a helpful translator. Translate the user sentence to French."),
("human", "I love programming."),
]
model.invoke(messages)
AIMessage(content='The English sentence "I love programming" can
be translated to French as "J\'aime programmer". The word
"programming" is translated as "programmer" in French.',
response_metadata={'token_usage': {'completion_tokens': 38,
'prompt_tokens': 28, 'total_tokens': 66, 'completion_time':
0.057975474, 'prompt_time': 0.005366091, 'queue_time': None,
'total_time': 0.063341565}, 'model_name': 'llama-3.1-8b-instant',
'system_fingerprint': 'fp_c5f20b5bb1', 'finish_reason': 'stop',
'logprobs': None}, id='run-ecc71d70-e10c-4b69-8b8c-b8027d95d4b8-0')
Vision:
from langchain_groq import ChatGroq
from langchain_core.messages import HumanMessage
model = ChatGroq(model="meta-llama/llama-4-scout-17b-16e-instruct")
message = HumanMessage(
content=[
{"type": "text", "text": "Describe this image in detail"},
{"type": "image_url", "image_url": {"url": "example_url.jpg"}},
]
)
response = model.invoke([message])
print(response.content)
Vision-capable models:
Maximum image size: 20MB per request.
Stream:
# Streaming `text` for each content chunk received
for chunk in model.stream(messages):
print(chunk.text, end="")
content='' id='run-4e9f926b-73f5-483b-8ef5-09533d925853'
content='The' id='run-4e9f926b-73f5-483b-8ef5-09533d925853'
content=' English' id='run-4e9f926b-73f5-483b-8ef5-09533d925853'
content=' sentence' id='run-4e9f926b-73f5-483b-8ef5-09533d925853'
...
content=' program' id='run-4e9f926b-73f5-483b-8ef5-09533d925853'
content='".' id='run-4e9f926b-73f5-483b-8ef5-09533d925853'
content='' response_metadata={'finish_reason': 'stop'}
id='run-4e9f926b-73f5-483b-8ef5-09533d925853
# Reconstructing a full response
stream = model.stream(messages)
full = next(stream)
for chunk in stream:
full += chunk
full
AIMessageChunk(content='The English sentence "I love programming"
can be translated to French as "J\'aime programmer". Here\'s the
breakdown of the sentence: "J\'aime" is the French equivalent of "
I love", and "programmer" is the French infinitive for "to program".
So, the literal translation is "I love to program". However, in
English we often omit the "to" when talking about activities we
love, and the same applies to French. Therefore, "J\'aime
programmer" is the correct and natural way to express "I love
programming" in French.', response_metadata={'finish_reason':
'stop'}, id='run-a3c35ac4-0750-4d08-ac55-bfc63805de76')
Async:
await model.ainvoke(messages)
AIMessage(content='The English sentence "I love programming" can
be translated to French as "J\'aime programmer". The word
"programming" is translated as "programmer" in French. I hope
this helps! Let me know if you have any other questions.',
response_metadata={'token_usage': {'completion_tokens': 53,
'prompt_tokens': 28, 'total_tokens': 81, 'completion_time':
0.083623752, 'prompt_time': 0.007365126, 'queue_time': None,
'total_time': 0.090988878}, 'model_name': 'llama-3.1-8b-instant',
'system_fingerprint': 'fp_c5f20b5bb1', 'finish_reason': 'stop',
'logprobs': None}, id='run-897f3391-1bea-42e2-82e0-686e2367bcf8-0')
Tool calling:
from pydantic import BaseModel, Field
class GetWeather(BaseModel):
'''Get the current weather in a given location'''
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
class GetPopulation(BaseModel):
'''Get the current population in a given location'''
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
model_with_tools = model.bind_tools([GetWeather, GetPopulation])
ai_msg = model_with_tools.invoke("What is the population of NY?")
ai_msg.tool_calls
[
{
"name": "GetPopulation",
"args": {"location": "NY"},
"id": "call_bb8d",
}
]
See ChatGroq.bind_tools() method for more.
Structured output:
from typing import Optional
from pydantic import BaseModel, Field
class Joke(BaseModel):
'''Joke to tell user.'''
setup: str = Field(description="The setup of the joke")
punchline: str = Field(description="The punchline to the joke")
rating: int | None = Field(description="How funny the joke is, from 1 to 10")
structured_model = model.with_structured_output(Joke)
structured_model.invoke("Tell me a joke about cats")
Joke(
setup="Why don't cats play poker in the jungle?",
punchline="Too many cheetahs!",
rating=None,
)
See ChatGroq.with_structured_output() for more.
Response metadata:
ai_msg = model.invoke(messages)
ai_msg.response_metadata
{
"token_usage": {
"completion_tokens": 70,
"prompt_tokens": 28,
"total_tokens": 98,
"completion_time": 0.111956391,
"prompt_time": 0.007518279,
"queue_time": None,
"total_time": 0.11947467,
},
"model_name": "llama-3.1-8b-instant",
"system_fingerprint": "fp_c5f20b5bb1",
"finish_reason": "stop",
"logprobs": None,
}