Xinference large-scale model inference service.
To use, you should have the xinference library installed:
.. code-block:: bash
pip install "xinference[all]"
If you're simply using the services provided by Xinference, you can utilize the xinference_client package:
.. code-block:: bash
pip install xinference_client
Check out: https://github.com/xorbitsai/inference To run, you need to start a Xinference supervisor on one server and Xinference workers on the other servers
Example:
To start a local instance of Xinference, run
.. code-block:: bash
$ xinference
You can also deploy Xinference in a distributed cluster. Here are the steps:
Starting the supervisor:
.. code-block:: bash
$ xinference-supervisor
Starting the worker:
.. code-block:: bash
$ xinference-worker
Then, launch a model using command line interface (CLI).
Example:
.. code-block:: bash
$ xinference launch -n orca -s 3 -q q4_0
It will return a model UID. Then, you can use Xinference with LangChain.
Example:
.. code-block:: python
from langchain_community.llms import Xinference
llm = Xinference(
server_url="http://0.0.0.0:9997",
model_uid = {model_uid} # replace model_uid with the model UID return from launching the model
)
llm.invoke(
prompt="Q: where can we visit in the capital of France? A:",
generate_config={"max_tokens": 1024, "stream": True},
)
Example:
.. code-block:: python
from langchain_community.llms import Xinference
from langchain_classic.prompts import PromptTemplate
llm = Xinference(
server_url="http://0.0.0.0:9997",
model_uid={model_uid}, # replace model_uid with the model UID return from launching the model
stream=True
)
prompt = PromptTemplate(
input=['country'],
template="Q: where can we visit in the capital of {country}? A:"
)
chain = prompt | llm
chain.stream(input={'country': 'France'})
To view all the supported builtin models, run:
.. code-block:: bash
$ xinference list --all