# Xinference

> **Class** in `langchain_community`

📖 [View in docs](https://reference.langchain.com/python/langchain-community/llms/xinference/Xinference)

`Xinference` large-scale model inference service.

To use, you should have the xinference library installed:

.. code-block:: bash

   pip install "xinference[all]"

If you're simply using the services provided by Xinference, you can utilize the xinference_client package:

.. code-block:: bash

    pip install xinference_client

Check out: https://github.com/xorbitsai/inference
To run, you need to start a Xinference supervisor on one server and Xinference workers on the other servers

## Signature

```python
Xinference(
    self,
    server_url: Optional[str] = None,
    model_uid: Optional[str] = None,
    api_key: Optional[str] = None,
    **model_kwargs: Any = {},
)
```

## Description

**Example:**

To start a local instance of Xinference, run

.. code-block:: bash

   $ xinference

You can also deploy Xinference in a distributed cluster. Here are the steps:

Starting the supervisor:

.. code-block:: bash

   $ xinference-supervisor

Starting the worker:

.. code-block:: bash

   $ xinference-worker

Then, launch a model using command line interface (CLI).

Example:

.. code-block:: bash

   $ xinference launch -n orca -s 3 -q q4_0

It will return a model UID. Then, you can use Xinference with LangChain.

Example:

.. code-block:: python

    from langchain_community.llms import Xinference

    llm = Xinference(
        server_url="http://0.0.0.0:9997",
        model_uid = {model_uid} # replace model_uid with the model UID return from launching the model
    )

    llm.invoke(
        prompt="Q: where can we visit in the capital of France? A:",
        generate_config={"max_tokens": 1024, "stream": True},
    )

Example:

.. code-block:: python

    from langchain_community.llms import Xinference
    from langchain_classic.prompts import PromptTemplate

    llm = Xinference(
        server_url="http://0.0.0.0:9997",
        model_uid={model_uid}, # replace model_uid with the model UID return from launching the model
        stream=True
    )
    prompt = PromptTemplate(
        input=['country'],
        template="Q: where can we visit in the capital of {country}? A:"
    )
    chain = prompt | llm
    chain.stream(input={'country': 'France'})

To view all the supported builtin models, run:

.. code-block:: bash

    $ xinference list --all

## Extends

- `LLM`

## Constructors

```python
__init__(
    self,
    server_url: Optional[str] = None,
    model_uid: Optional[str] = None,
    api_key: Optional[str] = None,
    **model_kwargs: Any = {},
)
```

| Name | Type |
|------|------|
| `server_url` | `Optional[str]` |
| `model_uid` | `Optional[str]` |
| `api_key` | `Optional[str]` |


## Properties

- `client`
- `server_url`
- `model_uid`
- `model_kwargs`

---

[View source on GitHub](https://github.com/langchain-ai/langchain-community/blob/4b280287bd55b99b44db2dd849f02d66c89534d5/libs/community/langchain_community/llms/xinference.py#L31)