LangChain LLM that uses the Completions API with NVIDIA NIMs.
ChatNVIDIA subclass that injects nvext.agent_hints into requests
Model information.
Client to NVIDIA embeddings models.
LangChain Document Compressor that uses the NVIDIA NeMo Retriever Reranking API.
Set inference priority for all LLM calls within scope.
Base exception for NVIDIA RAG retriever errors.
Raised when the RAG server endpoint is unreachable.
Raised when the RAG server returns an error response.
Raised when the request payload is invalid (e.g. 422 Unprocessable Entity).
LangChain retriever that queries the NVIDIA RAG Blueprint /v1/search endpoint.
Callback Handler that tracks OpenAI info.
NVIDIA chat model.
Register a model as a known model.
Lookup a model by name, using only the table of known models.
Determine the model to use based on a name, using only the table of known models.
Return the active inference priority, or None if unset.
Standardize the model name to a format that can be used in the OpenAI API.
Get the cost in USD for a given model and number of tokens.
Get the OpenAI callback handler in a context manager.
Parse thinking content from text.
Convert a LangChain message to a dictionary.