Ask a question to get started
Enter to sendā¢Shift+Enter new line
ChatNVIDIA subclass with Dynamo KV cache optimization support.
NVIDIA chat model.
ChatNVIDIA subclass that injects nvext.agent_hints into requests for Dynamo KV cache routing optimization.
nvext.agent_hints
A unique prefix_id is auto-generated for every request.
prefix_id