Set inference priority for all LLM calls within scope.
Lower number = higher priority (priority=1 is most urgent).
Works as both a decorator and a context manager::
# decorator — deprioritize background work
@inference_priority(priority=10)
def background_research(query):
return llm.invoke(query)
# context manager
with inference_priority(priority=1):
result = llm.invoke(query)
# async decorator
@inference_priority(priority=10)
async def background_async(query):
return await llm.ainvoke(query)
Precedence (wins first → last):
inference_priority contextChatNVIDIADynamo(priority=1)Nesting: inner scopes fully replace outer scopes.
inference_priority(
self,
*,
priority: int,
)| Name | Type |
|---|---|
| priority | int |