LangChain LLM that uses the Completions API with NVIDIA NIMs.
LangChain Document Compressor that uses the NVIDIA NeMo Retriever Reranking API.
Model information.
Base exception for NVIDIA RAG retriever errors.
Raised when the RAG server endpoint is unreachable.
Raised when the RAG server returns an error response.
Raised when the request payload is invalid (e.g. 422 Unprocessable Entity).
LangChain retriever that queries the NVIDIA RAG Blueprint /v1/search endpoint.
ChatNVIDIA subclass that injects nvext.agent_hints into requests
NVIDIA chat model.
Client to NVIDIA embeddings models.
Callback Handler that tracks OpenAI info.
Set inference priority for all LLM calls within scope.
Register a model as a known model.
Lookup a model by name, using only the table of known models.
Determine the model to use based on a name, using only the table of known models.
Parse thinking content from text.
Standardize the model name to a format that can be used in the OpenAI API.
Get the cost in USD for a given model and number of tokens.
Get the OpenAI callback handler in a context manager.
Convert a LangChain message to a dictionary.
Return the active inference priority, or None if unset.