Inference-priority decorator for LangChain chat models.
Provides a decorator / context manager that sets priority for every
:class:~langchain_core.language_models.BaseChatModel call in scope.
The mechanism is universal: any BaseChatModel subclass whose Pydantic
model_fields include priority will automatically receive the value as
a keyword argument — no per-model integration required.
Lower number = higher priority (priority=1 is most urgent).
Example — decorator (deprioritize background work)::
from langchain_nvidia_ai_endpoints import ChatNVIDIADynamo, inference_priority
llm = ChatNVIDIADynamo(model="my-model", base_url="http://localhost:8099/v1")
@inference_priority(priority=10)
def background_research(query: str) -> str:
return llm.invoke(query).content
Example — context manager::
with inference_priority(priority=10):
result = llm.invoke("background task")
Return the active inference priority, or None if unset.
Set inference priority for all LLM calls within scope.
Lower number = higher priority (priority=1 is most urgent).
Works as both a decorator and a context manager::
# decorator — deprioritize background work
@inference_priority(priority=10)
def background_research(query):
return llm.invoke(query)
# context manager
with inference_priority(priority=1):
result = llm.invoke(query)
# async decorator
@inference_priority(priority=10)
async def background_async(query):
return await llm.ainvoke(query)
Precedence (wins first → last):
inference_priority contextChatNVIDIADynamo(priority=1)Nesting: inner scopes fully replace outer scopes.