Callback Handler that prints to std out.
For tracking model usage and , you can use the get_usage_callback context manager to
track token information similar to get_openai_callback. Additionally, you can specify
custom price mappings as necessary (price_map argument), or provide a custom callback
manager for advanced use-cases (callback argument).
This feature is currently not supported in streaming modes, but works fine
for non-streaming invoke/ainvoke queries.
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings
from langchain_nvidia_ai_endpoints.callbacks import get_usage_callback
## Assume a price map per 1K tokens for a particular deployment plan
price_map = {
"mixtral_8x7b": 0.00060,
"gemma_7b": 0.0002,
"nvolveqa_40k": 0.000016,
}
llm_large = ChatNVIDIA(model="mixtral_8x7b", temperature=0.1)
llm_small = ChatNVIDIA(model="gemma_2b", temperature=0.1)
embedding = NVIDIAEmbeddings(model="nvolveqa_40k")
models = [llm_large, llm_small, embedding]
with get_usage_callback(price_map=price_map) as cb:
## Reset either at beginning or end. Statistics will run until cleared
cb.reset()
llm_large.invoke("Tell me a joke")
print(cb, end="
")
# llm_large.invoke("Tell me a short joke")
# print(cb, end="
")
# ## Tracking through streaming coming soon
# [_ for _ in llm_small.stream("Tell me a joke")]
# print(cb, end="
[Should not change yet]
")
## Tracking for streaming supported
embedding.embed_query("What a nice day :D")
print(cb, end="
")
# ## Sanity check. Should still be tracked fine
# llm_small.invoke("Tell me a long joke")
# print(cb, end="
")
## Out of scope. Will not be tracked
llm_small.invoke("Tell me a short joke")
print(cb, end="
[Should not change ever]
")
cb.model_usageStandardize the model name to a format that can be used in the OpenAI API.
Get the cost in USD for a given model and number of tokens.
Get the OpenAI callback handler in a context manager.
Exposes token and cost information.
Callback Handler that tracks OpenAI info.