_VertexAICommonBaseChatModel| Name | Type |
|---|---|
| model_name | str | None |
Google Cloud Vertex AI chat model integration.
Setup:
You must either:
GOOGLE_APPLICATION_CREDENTIALS environment variableThis codebase uses the google.auth library which first looks for the
application credentials variable mentioned above, and then looks for
system-level auth.
More information:
google.auth API referenceKey init args — completion params:
model: str
Name of ChatVertexAI model to use. e.g. 'gemini-2.0-flash-001',
'gemini-2.5-pro', etc.
temperature: Optional[float]
Sampling temperature.
seed: Optional[int]
Sampling integer to use.
max_tokens: Optional[int]
Max number of tokens to generate.
stop: Optional[List[str]]
Default stop sequences.
safety_settings: Optional[Dict[vertexai.generative_models.HarmCategory, vertexai.generative_models.HarmBlockThreshold]]
The default safety settings to use for all generations.
Key init args — client params:
max_retries: int
Max number of retries.
wait_exponential_kwargs: Optional[dict[str, float]]
Optional dictionary with parameters for wait_exponential:
- multiplier: Initial wait time multiplier (default: 1.0)
- min: Minimum wait time in seconds (default: 4.0)
- max: Maximum wait time in seconds (default: 10.0)
- exp_base: Exponent base to use (default: 2.0)
credentials: Optional[google.auth.credentials.Credentials]
The default custom credentials to use when making API calls. If not
provided, credentials will be ascertained from the environment.
project: Optional[str]
The default GCP project to use when making Vertex API calls.
location: str = "us-central1"
The default location to use when making API calls.
request_parallelism: int = 5
The amount of parallelism allowed for requests issued to VertexAI models.
base_url: Optional[str]
Base URL for API requests.
See full list of supported init args and their descriptions in the params section.
Instantiate:
from langchain_google_vertexai import ChatVertexAI
llm = ChatVertexAI(
model="gemini-2.5-flash",
temperature=0,
max_tokens=None,
max_retries=6,
stop=None,
# other params...
)
Thinking:
For thinking models, you have the option to adjust the number of internal
thinking tokens used (thinking_budget) or to disable thinking altogether.
Note that not all models allow disabling thinking.
See the Gemini API docs for more details on thinking models.
To see a thinking model's thoughts, set include_thoughts=True to have the
model's reasoning summaries included in the response.
llm = ChatVertexAI(
model="gemini-2.5-flash",
include_thoughts=True,
)
ai_msg = llm.invoke("How many 'r's are in the word 'strawberry'?")
Invoke:
messages = [
(
"system",
"You are a helpful translator. Translate the user sentence to French.",
),
("human", "I love programming."),
]
llm.invoke(messages)
AIMessage(
content="J'adore programmer. ",
response_metadata={
"is_blocked": False,
"safety_ratings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
],
"citation_metadata": None,
"usage_metadata": {
"prompt_token_count": 17,
"candidates_token_count": 7,
"total_token_count": 24,
},
},
id="run-925ce305-2268-44c4-875f-dde9128520ad-0",
)
Stream:
for chunk in llm.stream(messages):
print(chunk)
AIMessageChunk(
content="J",
response_metadata={
"is_blocked": False,
"safety_ratings": [],
"citation_metadata": None,
},
id="run-9df01d73-84d9-42db-9d6b-b1466a019e89",
)
AIMessageChunk(
content="'adore programmer. ",
response_metadata={
"is_blocked": False,
"safety_ratings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
],
"citation_metadata": None,
},
id="run-9df01d73-84d9-42db-9d6b-b1466a019e89",
)
AIMessageChunk(
content="",
response_metadata={
"is_blocked": False,
"safety_ratings": [],
"citation_metadata": None,
"usage_metadata": {
"prompt_token_count": 17,
"candidates_token_count": 7,
"total_token_count": 24,
},
},
id="run-9df01d73-84d9-42db-9d6b-b1466a019e89",
)
stream = llm.stream(messages)
full = next(stream)
for chunk in stream:
full += chunk
full
AIMessageChunk(
content="J'adore programmer. ",
response_metadata={
"is_blocked": False,
"safety_ratings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
],
"citation_metadata": None,
"usage_metadata": {
"prompt_token_count": 17,
"candidates_token_count": 7,
"total_token_count": 24,
},
},
id="run-b7f7492c-4cb5-42d0-8fc3-dce9b293b0fb",
)
Async invocation:
await llm.ainvoke(messages)
# stream
async for chunk in (await llm.astream(messages))
# batch
await llm.abatch([messages])
Context Caching:
Context caching allows you to store and reuse content (e.g., PDFs, images) for faster processing.
The cached_content parameter accepts a cache name created via the Google
Generative AI API with Vertex AI.
This caches content from GCS and queries it.
from google import genai
from google.genai.types import (
Content,
CreateCachedContentConfig,
HttpOptions,
Part,
)
from langchain_google_vertexai import ChatVertexAI
from langchain_core.messages import HumanMessage
client = genai.Client(http_options=HttpOptions(api_version="v1beta1"))
contents = [
Content(
role="user",
parts=[
Part.from_uri(
file_uri="gs://your-bucket/file1",
mime_type="application/pdf",
),
Part.from_uri(
file_uri="gs://your-bucket/file2",
mime_type="image/jpeg",
),
],
)
]
cache = client.caches.create(
model="gemini-2.5-flash",
config=CreateCachedContentConfig(
contents=contents,
system_instruction="You are an expert content analyzer.",
display_name="content-cache",
ttl="300s",
),
)
llm = ChatVertexAI(
model_name="gemini-2.5-flash",
cached_content=cache.name,
)
message = HumanMessage(
content="Provide a summary of the key information across the content."
)
llm.invoke([message])Tool calling:
from pydantic import BaseModel, Field
class GetWeather(BaseModel):
'''Get the current weather in a given location'''
location: str = Field(
..., description="The city and state, e.g. San Francisco, CA"
)
class GetPopulation(BaseModel):
'''Get the current population in a given location'''
location: str = Field(
..., description="The city and state, e.g. San Francisco, CA"
)
llm_with_tools = llm.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke(
"Which city is hotter today and which is bigger: LA or NY?"
)
ai_msg.tool_calls
[
{
"name": "GetWeather",
"args": {"location": "Los Angeles, CA"},
"id": "2a2401fa-40db-470d-83ce-4e52de910d9e",
},
{
"name": "GetWeather",
"args": {"location": "New York City, NY"},
"id": "96761deb-ab7f-4ef9-b4b4-6d44562fc46e",
},
{
"name": "GetPopulation",
"args": {"location": "Los Angeles, CA"},
"id": "9147d532-abee-43a2-adb5-12f164300484",
},
{
"name": "GetPopulation",
"args": {"location": "New York City, NY"},
"id": "c43374ea-bde5-49ca-8487-5b83ebeea1e6",
},
]
See bind_tools for more.
Built-in search:
from google.cloud.aiplatform_v1beta1.types import Tool as VertexTool
from langchain_google_vertexai import ChatVertexAI
llm = ChatVertexAI(model="gemini-2.5-flash")
resp = llm.invoke(
"When is the next total solar eclipse in US?",
tools=[VertexTool(google_search={})],
)
Built-in code execution:
from google.cloud.aiplatform_v1beta1.types import Tool as VertexTool
from langchain_google_vertexai import ChatVertexAI
llm = ChatVertexAI(model="gemini-2.5-flash")
resp = llm.invoke(
"What is 3^3?",
tools=[VertexTool(code_execution={})],
)
Structured output:
from typing import Optional
from pydantic import BaseModel, Field
class Joke(BaseModel):
'''Joke to tell user.'''
setup: str = Field(description="The setup of the joke")
punchline: str = Field(description="The punchline to the joke")
rating: Optional[int] = Field(
default=None, description="How funny the joke is, from 1 to 10"
)
structured_llm = llm.with_structured_output(Joke)
structured_llm.invoke("Tell me a joke about cats")
Joke(
setup="What do you call a cat that loves to bowl?",
punchline="An alley cat!",
rating=None,
)
See with_structured_output for more.
Image input:
import base64
import httpx
from langchain_core.messages import HumanMessage
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
message = HumanMessage(
content=[
{"type": "text", "text": "describe the weather in this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
],
)
ai_msg = llm.invoke([message])
ai_msg.content
The weather in this image appears to be sunny and pleasant. The sky is a bright
blue with scattered white clouds, suggesting a clear and mild day. The lush
green grass indicates recent rainfall or sufficient moisture. The absence of...
You can also point to GCS files which is faster / more efficient because bytes are transferred back and forth.
llm.invoke(
[
HumanMessage(
[
"What's in the image?",
{
"type": "media",
"file_uri": "gs://cloud-samples-data/generative-ai/image/scones.jpg",
"mime_type": "image/jpeg",
},
]
)
]
).content
The image is of five blueberry scones arranged on a piece of baking paper. Here
is a list of what is in the picture:* **Five blueberry scones:** They are
scattered across the parchment paper, dusted with powdered sugar. * **Two...
PDF input:
import base64
from langchain_core.messages import HumanMessage
pdf_bytes = open("/path/to/your/test.pdf", "rb").read()
pdf_base64 = base64.b64encode(pdf_bytes).decode("utf-8")
message = HumanMessage(
content=[
{"type": "text", "text": "describe the document in a sentence"},
{
"type": "file",
"mime_type": "application/pdf",
"base64": pdf_base64,
},
]
)
ai_msg = llm.invoke([message])
ai_msg.content
This research paper describes a system developed for SemEval-2025 Task 9, which
aims to automate the detection of food hazards from recall reports, addressing
the class imbalance problem by leveraging LLM-based data augmentation...
You can also point to GCS files.
llm.invoke(
[
HumanMessage(
[
"describe the document in a sentence",
{
"type": "media",
"file_uri": "gs://cloud-samples-data/generative-ai/pdf/1706.03762v7.pdf",
"mime_type": "application/pdf",
},
]
)
]
).content
The article introduces Transformer, a new model architecture for sequence
transduction based solely on attention mechanisms, outperforming previous models
in machine translation tasks and demonstrating good generalization to English...
Video input:
import base64
from langchain_core.messages import HumanMessage
video_bytes = open("/path/to/your/video.mp4", "rb").read()
video_base64 = base64.b64encode(video_bytes).decode("utf-8")
message = HumanMessage(
content=[
{
"type": "text",
"text": "describe what's in this video in a sentence",
},
{
"type": "file",
"mime_type": "video/mp4",
"base64": video_base64,
},
]
)
ai_msg = llm.invoke([message])
ai_msg.content
Tom and Jerry, along with a turkey, engage in a chaotic Thanksgiving-themed
adventure involving a corn-on-the-cob chase, maze antics, and a disastrous
attempt to prepare a turkey dinner.
You can also pass YouTube URLs directly:
from langchain_core.messages import HumanMessage
message = HumanMessage(
content=[
{"type": "text", "text": "summarize the video in 3 sentences."},
{
"type": "media",
"file_uri": "https://www.youtube.com/watch?v=9hE5-98ZeCg",
"mime_type": "video/mp4",
},
]
)
ai_msg = llm.invoke([message])
ai_msg.content
The video is a demo of multimodal live streaming in Gemini 2.0. The narrator is
sharing his screen in AI Studio and asks if the AI can see it. The AI then reads
text that is highlighted on the screen, defines the word “multimodal,” and...
You can also point to GCS files.
llm = ChatVertexAI(model="gemini-2.5-pro")
llm.invoke(
[
HumanMessage(
[
"What's in the video?",
{
"type": "media",
"file_uri": "gs://cloud-samples-data/video/animals.mp4",
"mime_type": "video/mp4",
},
]
)
]
).content
The video is about a new feature in Google Photos called "Zoomable Selfies". The
feature allows users to take selfies with animals at the zoo. The video shows
several examples of people taking selfies with animals, including a tiger,...
Audio input:
import base64
from langchain_core.messages import HumanMessage
audio_bytes = open("/path/to/your/audio.mp3", "rb").read()
audio_base64 = base64.b64encode(audio_bytes).decode("utf-8")
message = HumanMessage(
content=[
{"type": "text", "text": "summarize this audio in a sentence"},
{
"type": "file",
"mime_type": "audio/mp3",
"base64": audio_base64,
},
]
)
ai_msg = llm.invoke([message])
ai_msg.content
"In this episode of the Made by Google podcast, Stephen Johnson and Simon Tokumine discuss NotebookLM, a tool designed to help users understand complex material in various modalities, with a focus on its unexpected uses, the development of audio overviews, and the implementation of new features like mind maps and source discovery."
You can also point to GCS files.
from langchain_core.messages import HumanMessage
llm = ChatVertexAI(model="gemini-2.5-flash")
llm.invoke(
[
HumanMessage(
[
"What's this audio about?",
{
"type": "media",
"file_uri": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3",
"mime_type": "audio/mpeg",
},
]
)
]
).content
"This audio is an interview with two product managers from Google who work on Pixel feature drops. They discuss how feature drops are important for showcasing how Google devices are constantly improving and getting better. They also discuss some of the highlights of the January feature drop and the new features coming in the March drop for Pixel phones and Pixel watches. The interview concludes with discussion of how user feedback is extremely important to them in deciding which features to include in the feature drops."
Token usage:
ai_msg = llm.invoke(messages)
ai_msg.usage_metadata
{"input_tokens": 17, "output_tokens": 7, "total_tokens": 24}
Logprobs:
llm = ChatVertexAI(model="gemini-2.5-flash", logprobs=True)
ai_msg = llm.invoke(messages)
ai_msg.response_metadata["logprobs_result"]
[
{"token": "J", "logprob": -1.549651415189146e-06, "top_logprobs": []},
{"token": "'", "logprob": -1.549651415189146e-06, "top_logprobs": []},
{"token": "adore", "logprob": 0.0, "top_logprobs": []},
{
"token": " programmer",
"logprob": -1.1922384146600962e-07,
"top_logprobs": [],
},
{"token": ".", "logprob": -4.827636439586058e-05, "top_logprobs": []},
{"token": " ", "logprob": -0.018011733889579773, "top_logprobs": []},
{"token": "\\n", "logprob": -0.0008687592926435173, "top_logprobs": []},
]
Response metadata:
ai_msg = llm.invoke(messages)
ai_msg.response_metadata
{
"is_blocked": False,
"safety_ratings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
],
"usage_metadata": {
"prompt_token_count": 17,
"candidates_token_count": 7,
"total_token_count": 24,
},
}
Safety settings:
from langchain_google_vertexai import HarmBlockThreshold, HarmCategory
llm = ChatVertexAI(
model="gemini-2.5-pro",
safety_settings={
HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
},
)
llm.invoke(messages).response_metadata
{
"is_blocked": False,
"safety_ratings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability_label": "NEGLIGIBLE",
"probability_score": 0.1,
"blocked": False,
"severity": "HARM_SEVERITY_NEGLIGIBLE",
"severity_score": 0.1,
},
],
"usage_metadata": {
"prompt_token_count": 17,
"candidates_token_count": 7,
"total_token_count": 24,
},
}