ChatVertexAI

Name	Type
model_name	str \| None

Setup:

You must either:

Have credentials configured for your environment (gcloud, workload identity, etc...)
Store the path to a service account JSON file as the GOOGLE_APPLICATION_CREDENTIALS environment variable

This codebase uses the google.auth library which first looks for the application credentials variable mentioned above, and then looks for system-level auth.

More information:

Credential types
google.auth API reference

Key init args — completion params: model: str Name of ChatVertexAI model to use. e.g. 'gemini-2.0-flash-001', 'gemini-2.5-pro', etc. temperature: Optional[float] Sampling temperature. seed: Optional[int] Sampling integer to use. max_tokens: Optional[int] Max number of tokens to generate. stop: Optional[List[str]] Default stop sequences. safety_settings: Optional[Dict[vertexai.generative_models.HarmCategory, vertexai.generative_models.HarmBlockThreshold]] The default safety settings to use for all generations.

Key init args — client params: max_retries: int Max number of retries. wait_exponential_kwargs: Optional[dict[str, float]] Optional dictionary with parameters for wait_exponential: - multiplier: Initial wait time multiplier (default: 1.0) - min: Minimum wait time in seconds (default: 4.0) - max: Maximum wait time in seconds (default: 10.0) - exp_base: Exponent base to use (default: 2.0) credentials: Optional[google.auth.credentials.Credentials] The default custom credentials to use when making API calls. If not provided, credentials will be ascertained from the environment. project: Optional[str] The default GCP project to use when making Vertex API calls. location: str = "us-central1" The default location to use when making API calls. request_parallelism: int = 5 The amount of parallelism allowed for requests issued to VertexAI models. base_url: Optional[str] Base URL for API requests.

See full list of supported init args and their descriptions in the params section.

Instantiate:

from langchain_google_vertexai import ChatVertexAI

llm = ChatVertexAI(
    model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None,
    max_retries=6,
    stop=None,
    # other params...
)

Thinking:

For thinking models, you have the option to adjust the number of internal thinking tokens used (thinking_budget) or to disable thinking altogether. Note that not all models allow disabling thinking.

See the Gemini API docs for more details on thinking models.

To see a thinking model's thoughts, set include_thoughts=True to have the model's reasoning summaries included in the response.

llm = ChatVertexAI(
    model="gemini-2.5-flash",
    include_thoughts=True,
)
ai_msg = llm.invoke("How many 'r's are in the word 'strawberry'?")

Invoke:

messages = [
    (
        "system",
        "You are a helpful translator. Translate the user sentence to French.",
    ),
    ("human", "I love programming."),
]
llm.invoke(messages)

AIMessage(
    content="J'adore programmer. ",
    response_metadata={
        "is_blocked": False,
        "safety_ratings": [
            {
                "category": "HARM_CATEGORY_HATE_SPEECH",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
            {
                "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
            {
                "category": "HARM_CATEGORY_HARASSMENT",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
            {
                "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
        ],
        "citation_metadata": None,
        "usage_metadata": {
            "prompt_token_count": 17,
            "candidates_token_count": 7,
            "total_token_count": 24,
        },
    },
    id="run-925ce305-2268-44c4-875f-dde9128520ad-0",
)

Stream:

for chunk in llm.stream(messages):
    print(chunk)

AIMessageChunk(
    content="J",
    response_metadata={
        "is_blocked": False,
        "safety_ratings": [],
        "citation_metadata": None,
    },
    id="run-9df01d73-84d9-42db-9d6b-b1466a019e89",
)
AIMessageChunk(
    content="'adore programmer. ",
    response_metadata={
        "is_blocked": False,
        "safety_ratings": [
            {
                "category": "HARM_CATEGORY_HATE_SPEECH",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
            {
                "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
            {
                "category": "HARM_CATEGORY_HARASSMENT",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
            {
                "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
        ],
        "citation_metadata": None,
    },
    id="run-9df01d73-84d9-42db-9d6b-b1466a019e89",
)
AIMessageChunk(
    content="",
    response_metadata={
        "is_blocked": False,
        "safety_ratings": [],
        "citation_metadata": None,
        "usage_metadata": {
            "prompt_token_count": 17,
            "candidates_token_count": 7,
            "total_token_count": 24,
        },
    },
    id="run-9df01d73-84d9-42db-9d6b-b1466a019e89",
)

stream = llm.stream(messages)
full = next(stream)
for chunk in stream:
    full += chunk
full

AIMessageChunk(
    content="J'adore programmer. ",
    response_metadata={
        "is_blocked": False,
        "safety_ratings": [
            {
                "category": "HARM_CATEGORY_HATE_SPEECH",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
            {
                "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
            {
                "category": "HARM_CATEGORY_HARASSMENT",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
            {
                "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                "probability_label": "NEGLIGIBLE",
                "probability_score": 0.1,
                "blocked": False,
                "severity": "HARM_SEVERITY_NEGLIGIBLE",
                "severity_score": 0.1,
            },
        ],
        "citation_metadata": None,
        "usage_metadata": {
            "prompt_token_count": 17,
            "candidates_token_count": 7,
            "total_token_count": 24,
        },
    },
    id="run-b7f7492c-4cb5-42d0-8fc3-dce9b293b0fb",
)

Async invocation:

await llm.ainvoke(messages)

# stream
async for chunk in (await llm.astream(messages))

# batch
await llm.abatch([messages])

Context Caching:

Context caching allows you to store and reuse content (e.g., PDFs, images) for faster processing.

The cached_content parameter accepts a cache name created via the Google Generative AI API with Vertex AI.

Content caching

This caches content from GCS and queries it.

from google import genai
from google.genai.types import (
    Content,
    CreateCachedContentConfig,
    HttpOptions,
    Part,
)
from langchain_google_vertexai import ChatVertexAI
from langchain_core.messages import HumanMessage

client = genai.Client(http_options=HttpOptions(api_version="v1beta1"))

contents = [
    Content(
        role="user",
        parts=[
            Part.from_uri(
                file_uri="gs://your-bucket/file1",
                mime_type="application/pdf",
            ),
            Part.from_uri(
                file_uri="gs://your-bucket/file2",
                mime_type="image/jpeg",
            ),
        ],
    )
]

cache = client.caches.create(
    model="gemini-2.5-flash",
    config=CreateCachedContentConfig(
        contents=contents,
        system_instruction="You are an expert content analyzer.",
        display_name="content-cache",
        ttl="300s",
    ),
)

llm = ChatVertexAI(
    model_name="gemini-2.5-flash",
    cached_content=cache.name,
)
message = HumanMessage(
    content="Provide a summary of the key information across the content."
)
llm.invoke([message])

Tool calling:

from pydantic import BaseModel, Field

class GetWeather(BaseModel):
    '''Get the current weather in a given location'''

    location: str = Field(
        ..., description="The city and state, e.g. San Francisco, CA"
    )

class GetPopulation(BaseModel):
    '''Get the current population in a given location'''

    location: str = Field(
        ..., description="The city and state, e.g. San Francisco, CA"
    )

llm_with_tools = llm.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke(
    "Which city is hotter today and which is bigger: LA or NY?"
)
ai_msg.tool_calls

[
    {
        "name": "GetWeather",
        "args": {"location": "Los Angeles, CA"},
        "id": "2a2401fa-40db-470d-83ce-4e52de910d9e",
    },
    {
        "name": "GetWeather",
        "args": {"location": "New York City, NY"},
        "id": "96761deb-ab7f-4ef9-b4b4-6d44562fc46e",
    },
    {
        "name": "GetPopulation",
        "args": {"location": "Los Angeles, CA"},
        "id": "9147d532-abee-43a2-adb5-12f164300484",
    },
    {
        "name": "GetPopulation",
        "args": {"location": "New York City, NY"},
        "id": "c43374ea-bde5-49ca-8487-5b83ebeea1e6",
    },
]

See bind_tools for more.

Built-in search:

from google.cloud.aiplatform_v1beta1.types import Tool as VertexTool
from langchain_google_vertexai import ChatVertexAI

llm = ChatVertexAI(model="gemini-2.5-flash")
resp = llm.invoke(
    "When is the next total solar eclipse in US?",
    tools=[VertexTool(google_search={})],
)

Built-in code execution:

from google.cloud.aiplatform_v1beta1.types import Tool as VertexTool
from langchain_google_vertexai import ChatVertexAI

llm = ChatVertexAI(model="gemini-2.5-flash")
resp = llm.invoke(
    "What is 3^3?",
    tools=[VertexTool(code_execution={})],
)

Structured output:

from typing import Optional

from pydantic import BaseModel, Field

class Joke(BaseModel):
    '''Joke to tell user.'''

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(
        default=None, description="How funny the joke is, from 1 to 10"
    )

structured_llm = llm.with_structured_output(Joke)
structured_llm.invoke("Tell me a joke about cats")

Joke(
    setup="What do you call a cat that loves to bowl?",
    punchline="An alley cat!",
    rating=None,
)

See with_structured_output for more.

Image input:

import base64
import httpx
from langchain_core.messages import HumanMessage

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ],
)
ai_msg = llm.invoke([message])
ai_msg.content

The weather in this image appears to be sunny and pleasant. The sky is a bright
blue with scattered white clouds, suggesting a clear and mild day. The lush
green grass indicates recent rainfall or sufficient moisture. The absence of...

You can also point to GCS files which is faster / more efficient because bytes are transferred back and forth.

llm.invoke(
    [
        HumanMessage(
            [
                "What's in the image?",
                {
                    "type": "media",
                    "file_uri": "gs://cloud-samples-data/generative-ai/image/scones.jpg",
                    "mime_type": "image/jpeg",
                },
            ]
        )
    ]
).content

The image is of five blueberry scones arranged on a piece of baking paper. Here
is a list of what is in the picture:* **Five blueberry scones:** They are
scattered across the parchment paper, dusted with powdered sugar.  * **Two...

PDF input:

import base64
from langchain_core.messages import HumanMessage

pdf_bytes = open("/path/to/your/test.pdf", "rb").read()
pdf_base64 = base64.b64encode(pdf_bytes).decode("utf-8")
message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the document in a sentence"},
        {
            "type": "file",
            "mime_type": "application/pdf",
            "base64": pdf_base64,
        },
    ]
)
ai_msg = llm.invoke([message])
ai_msg.content

This research paper describes a system developed for SemEval-2025 Task 9, which
aims to automate the detection of food hazards from recall reports, addressing
the class imbalance problem by leveraging LLM-based data augmentation...

You can also point to GCS files.

llm.invoke(
    [
        HumanMessage(
            [
                "describe the document in a sentence",
                {
                    "type": "media",
                    "file_uri": "gs://cloud-samples-data/generative-ai/pdf/1706.03762v7.pdf",
                    "mime_type": "application/pdf",
                },
            ]
        )
    ]
).content

The article introduces Transformer, a new model architecture for sequence
transduction based solely on attention mechanisms, outperforming previous models
in machine translation tasks and demonstrating good generalization to English...

Video input:

import base64
from langchain_core.messages import HumanMessage

video_bytes = open("/path/to/your/video.mp4", "rb").read()
video_base64 = base64.b64encode(video_bytes).decode("utf-8")

message = HumanMessage(
    content=[
        {
            "type": "text",
            "text": "describe what's in this video in a sentence",
        },
        {
            "type": "file",
            "mime_type": "video/mp4",
            "base64": video_base64,
        },
    ]
)
ai_msg = llm.invoke([message])
ai_msg.content

Tom and Jerry, along with a turkey, engage in a chaotic Thanksgiving-themed
adventure involving a corn-on-the-cob chase, maze antics, and a disastrous
attempt to prepare a turkey dinner.

You can also pass YouTube URLs directly:

from langchain_core.messages import HumanMessage

message = HumanMessage(
    content=[
        {"type": "text", "text": "summarize the video in 3 sentences."},
        {
            "type": "media",
            "file_uri": "https://www.youtube.com/watch?v=9hE5-98ZeCg",
            "mime_type": "video/mp4",
        },
    ]
)
ai_msg = llm.invoke([message])
ai_msg.content

The video is a demo of multimodal live streaming in Gemini 2.0. The narrator is
sharing his screen in AI Studio and asks if the AI can see it. The AI then reads
text that is highlighted on the screen, defines the word “multimodal,” and...

You can also point to GCS files.

llm = ChatVertexAI(model="gemini-2.5-pro")

llm.invoke(
    [
        HumanMessage(
            [
                "What's in the video?",
                {
                    "type": "media",
                    "file_uri": "gs://cloud-samples-data/video/animals.mp4",
                    "mime_type": "video/mp4",
                },
            ]
        )
    ]
).content

The video is about a new feature in Google Photos called "Zoomable Selfies". The
feature allows users to take selfies with animals at the zoo. The video shows
several examples of people taking selfies with animals, including a tiger,...

Audio input:

import base64
from langchain_core.messages import HumanMessage

audio_bytes = open("/path/to/your/audio.mp3", "rb").read()
audio_base64 = base64.b64encode(audio_bytes).decode("utf-8")

message = HumanMessage(
    content=[
        {"type": "text", "text": "summarize this audio in a sentence"},
        {
            "type": "file",
            "mime_type": "audio/mp3",
            "base64": audio_base64,
        },
    ]
)
ai_msg = llm.invoke([message])
ai_msg.content

"In this episode of the Made by Google podcast, Stephen Johnson and Simon Tokumine discuss NotebookLM, a tool designed to help users understand complex material in various modalities, with a focus on its unexpected uses, the development of audio overviews, and the implementation of new features like mind maps and source discovery."

You can also point to GCS files.

from langchain_core.messages import HumanMessage

llm = ChatVertexAI(model="gemini-2.5-flash")

llm.invoke(
    [
        HumanMessage(
            [
                "What's this audio about?",
                {
                    "type": "media",
                    "file_uri": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3",
                    "mime_type": "audio/mpeg",
                },
            ]
        )
    ]
).content

"This audio is an interview with two product managers from Google who work on Pixel feature drops. They discuss how feature drops are important for showcasing how Google devices are constantly improving and getting better. They also discuss some of the highlights of the January feature drop and the new features coming in the March drop for Pixel phones and Pixel watches. The interview concludes with discussion of how user feedback is extremely important to them in deciding which features to include in the feature drops."

Token usage:

ai_msg = llm.invoke(messages)
ai_msg.usage_metadata

{"input_tokens": 17, "output_tokens": 7, "total_tokens": 24}

Logprobs:

llm = ChatVertexAI(model="gemini-2.5-flash", logprobs=True)
ai_msg = llm.invoke(messages)
ai_msg.response_metadata["logprobs_result"]

[
    {"token": "J", "logprob": -1.549651415189146e-06, "top_logprobs": []},
    {"token": "'", "logprob": -1.549651415189146e-06, "top_logprobs": []},
    {"token": "adore", "logprob": 0.0, "top_logprobs": []},
    {
        "token": " programmer",
        "logprob": -1.1922384146600962e-07,
        "top_logprobs": [],
    },
    {"token": ".", "logprob": -4.827636439586058e-05, "top_logprobs": []},
    {"token": " ", "logprob": -0.018011733889579773, "top_logprobs": []},
    {"token": "\\n", "logprob": -0.0008687592926435173, "top_logprobs": []},
]

Response metadata:

ai_msg = llm.invoke(messages)
ai_msg.response_metadata

{
    "is_blocked": False,
    "safety_ratings": [
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "probability_label": "NEGLIGIBLE",
            "probability_score": 0.1,
            "blocked": False,
            "severity": "HARM_SEVERITY_NEGLIGIBLE",
            "severity_score": 0.1,
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "probability_label": "NEGLIGIBLE",
            "probability_score": 0.1,
            "blocked": False,
            "severity": "HARM_SEVERITY_NEGLIGIBLE",
            "severity_score": 0.1,
        },
        {
            "category": "HARM_CATEGORY_HARASSMENT",
            "probability_label": "NEGLIGIBLE",
            "probability_score": 0.1,
            "blocked": False,
            "severity": "HARM_SEVERITY_NEGLIGIBLE",
            "severity_score": 0.1,
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "probability_label": "NEGLIGIBLE",
            "probability_score": 0.1,
            "blocked": False,
            "severity": "HARM_SEVERITY_NEGLIGIBLE",
            "severity_score": 0.1,
        },
    ],
    "usage_metadata": {
        "prompt_token_count": 17,
        "candidates_token_count": 7,
        "total_token_count": 24,
    },
}

Safety settings:

from langchain_google_vertexai import HarmBlockThreshold, HarmCategory

llm = ChatVertexAI(
    model="gemini-2.5-pro",
    safety_settings={
        HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
        HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
        HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
    },
)

llm.invoke(messages).response_metadata

{
    "is_blocked": False,
    "safety_ratings": [
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "probability_label": "NEGLIGIBLE",
            "probability_score": 0.1,
            "blocked": False,
            "severity": "HARM_SEVERITY_NEGLIGIBLE",
            "severity_score": 0.1,
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "probability_label": "NEGLIGIBLE",
            "probability_score": 0.1,
            "blocked": False,
            "severity": "HARM_SEVERITY_NEGLIGIBLE",
            "severity_score": 0.1,
        },
        {
            "category": "HARM_CATEGORY_HARASSMENT",
            "probability_label": "NEGLIGIBLE",
            "probability_score": 0.1,
            "blocked": False,
            "severity": "HARM_SEVERITY_NEGLIGIBLE",
            "severity_score": 0.1,
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "probability_label": "NEGLIGIBLE",
            "probability_score": 0.1,
            "blocked": False,
            "severity": "HARM_SEVERITY_NEGLIGIBLE",
            "severity_score": 0.1,
        },
    ],
    "usage_metadata": {
        "prompt_token_count": 17,
        "candidates_token_count": 7,
        "total_token_count": 24,
    },
}

LangChain Assistant

Menu

Bases

Constructors

Attributes

Methods

Inherited fromBaseChatModel(langchain_core)

Attributes

Methods

Inherited fromBaseLanguageModel(langchain_core)

Attributes

Methods

Inherited fromRunnableSerializable(langchain_core)

Attributes

Methods

Inherited fromSerializable(langchain_core)

Attributes

Methods

Inherited fromRunnable(langchain_core)

Attributes

Methods

Menu

ChatVertexAI

Bases

Used in Docs

Constructors

Attributes

Methods

Inherited fromBaseChatModel(langchain_core)

Attributes

Methods

Inherited fromBaseLanguageModel(langchain_core)

Attributes

Methods

Inherited fromRunnableSerializable(langchain_core)

Attributes

Methods

Inherited fromSerializable(langchain_core)

Attributes

Methods

Inherited fromRunnable(langchain_core)

Attributes

Methods