# inference_priority

> **Class** in `langchain_nvidia_ai_endpoints`

📖 [View in docs](https://reference.langchain.com/python/langchain-nvidia-ai-endpoints/decorators/inference_priority)

Set inference priority for all LLM calls within scope.

Lower number = higher priority (``priority=1`` is most urgent).

Works as **both** a decorator and a context manager::

    # decorator — deprioritize background work
    @inference_priority(priority=10)
    def background_research(query):
        return llm.invoke(query)

    # context manager
    with inference_priority(priority=1):
        result = llm.invoke(query)

    # async decorator
    @inference_priority(priority=10)
    async def background_async(query):
        return await llm.ainvoke(query)

**Precedence** (wins first → last):

1. Active ``inference_priority`` context
2. Instance default: ``ChatNVIDIADynamo(priority=1)``

Nesting: inner scopes fully replace outer scopes.

## Signature

```python
inference_priority(
    self,
    *,
    priority: int,
)
```

## Constructors

```python
__init__(
    self,
    *,
    priority: int,
) -> None
```

| Name | Type |
|------|------|
| `priority` | `int` |


---

[View source on GitHub](https://github.com/langchain-ai/langchain-nvidia/blob/5bfb68d5b10aa0330a6b79a36375b9bc0c6acef7/libs/ai-endpoints/langchain_nvidia_ai_endpoints/decorators.py#L142)