LangChain Reference home page

GitHub
Main Docs

Google GenAI (Gemini)

Google Vertex AI

IBM

Overview
Chat Models
LLMs
Embeddings
Rerankers

⌘I

LangChain Assistant

Ask a question to get started

Enter to send•Shift+Enter new line

Menu

Google GenAI (Gemini)

Google Vertex AI

Overview Chat Models LLMs Embeddings Rerankers

Language

Theme

Pythonlangchain-nvidia-ai-endpoints

langchain-nvidia-ai-endpoints

Description

Classes

Model

Model information.

inference_priority

Set inference priority for all LLM calls within scope.

Ranking

NVIDIARerank

LangChain Document Compressor that uses the NVIDIA NeMo Retriever Reranking API.

NVIDIARAGError

Base exception for NVIDIA RAG retriever errors.

NVIDIARAGConnectionError

Raised when the RAG server endpoint is unreachable.

NVIDIARAGServerError

Raised when the RAG server returns an error response.

NVIDIARAGValidationError

Raised when the request payload is invalid (e.g. 422 Unprocessable Entity).

NVIDIARAGRetriever

LangChain retriever that queries the NVIDIA RAG Blueprint /v1/search endpoint.

ChatNVIDIADynamo

ChatNVIDIA subclass that injects nvext.agent_hints into requests

NVIDIAEmbeddings

Client to NVIDIA embeddings models.

NVIDIA

LangChain LLM that uses the Completions API with NVIDIA NIMs.

UsageCallbackHandler

Callback Handler that tracks OpenAI info.

ChatNVIDIA

NVIDIA chat model.

Functions

register_model

Register a model as a known model.

lookup_model

Lookup a model by name, using only the table of known models.

determine_model

Determine the model to use based on a name, using only the table of known models.

get_inference_priority

Return the active inference priority, or None if unset.

convert_message_to_dict

Convert a LangChain message to a dictionary.

standardize_model_name

Standardize the model name to a format that can be used in the OpenAI API.

get_token_cost_for_model

Get the cost in USD for a given model and number of tokens.

get_usage_callback

Get the OpenAI callback handler in a context manager.

parse_thinking_content

Parse thinking content from text.

Modules

langchain_nvidia_ai_endpoints

LangChain NVIDIA AI Foundation Model Playground Integration

decorators

Inference-priority decorator for LangChain chat models.

reranking

retrievers

NVIDIARAGRetriever for NVIDIA RAG Blueprint /search endpoint.

chat_models_dynamo

ChatNVIDIA subclass with Dynamo KV cache optimization support.

embeddings

llm

callbacks

Callback Handler that prints to std out.

chat_models

Chat Model Components Derived from ChatModel/NVIDIA

data

Model profile data. All edits should be made in profile_augmentations.toml.