Model information.
ChatNVIDIA subclass that injects nvext.agent_hints into requests
LangChain Document Compressor that uses the NVIDIA NeMo Retriever Reranking API.
Base exception for NVIDIA RAG retriever errors.
Raised when the RAG server endpoint is unreachable.
Raised when the RAG server returns an error response.
Raised when the request payload is invalid (e.g. 422 Unprocessable Entity).
LangChain retriever that queries the NVIDIA RAG Blueprint /v1/search endpoint.
Set inference priority for all LLM calls within scope.
Callback Handler that tracks OpenAI info.
NVIDIA chat model.
Client to NVIDIA embeddings models.
LangChain LLM that uses the Completions API with NVIDIA NIMs.
Register a model as a known model.
Lookup a model by name, using only the table of known models.
Determine the model to use based on a name, using only the table of known models.
Return the active inference priority, or None if unset.
Convert a LangChain message to a dictionary.
Standardize the model name to a format that can be used in the OpenAI API.
Get the cost in USD for a given model and number of tokens.
Get the OpenAI callback handler in a context manager.
Parse thinking content from text.