Build extra kwargs from additional params that were passed in.
Get a list of available models that work with ChatNVIDIA.
Bind tools to the model.
Bind a structured output schema to the model.
Configure the model to use thinking mode.
ChatNVIDIA subclass that injects nvext.agent_hints into requests
for Dynamo KV cache routing optimization.
A unique prefix_id is auto-generated for every request.
Example:
from langchain_nvidia_ai_endpoints import ChatNVIDIADynamo
llm = ChatNVIDIADynamo(model="meta/llama3-8b-instruct")
# override per-invocation:
llm.invoke("Hello", osl=2048, iat=50)