Chat model for OCI data science model deployment endpoint.
Base class for LLM deployed on OCI Data Science Model Deployment.
OCI Data Science Model Deployment chat model integration.
Prerequisite
The OCI Model Deployment plugins are installable only on
python version 3.9 and above. If you're working inside the notebook,
try installing the python 3.10 based conda pack and running the
following setup.
Setup:
Install ``oracle-ads`` and ``langchain-openai``.
.. code-block:: bash
pip install -U oracle-ads langchain-openai
Use `ads.set_auth()` to configure authentication.
For example, to use OCI resource_principal for authentication:
.. code-block:: python
import ads
ads.set_auth("resource_principal")
For more details on authentication, see:
https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/authentication.html
Make sure to have the required policies to access the OCI Data
Science Model Deployment endpoint. See:
https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm
Key init args - completion params:
endpoint: str
The OCI model deployment endpoint.
temperature: float
Sampling temperature.
max_tokens: Optional[int]
Max number of tokens to generate.
Key init args ā client params:
auth: dict
ADS auth dictionary for OCI authentication.
default_headers: Optional[Dict]
The headers to be added to the Model Deployment request.
Instantiate:
.. code-block:: python
from langchain_community.chat_models import ChatOCIModelDeployment
chat = ChatOCIModelDeployment(
endpoint="https://modeldeployment.<region>.oci.customer-oci.com/<ocid>/predict",
model="odsc-llm", # this is the default model name if deployed with AQUA
streaming=True,
max_retries=3,
model_kwargs={
"max_token": 512,
"temperature": 0.2,
# other model parameters ...
},
default_headers={
"route": "/v1/chat/completions",
# other request headers ...
},
)
Invocation:
.. code-block:: python
messages = [
("system", "Translate the user sentence to French."),
("human", "Hello World!"),
]
chat.invoke(messages)
.. code-block:: python
AIMessage(
content='Bonjour le monde!',
response_metadata={
'token_usage': {
'prompt_tokens': 40,
'total_tokens': 50,
'completion_tokens': 10
},
'model_name': 'odsc-llm',
'system_fingerprint': '',
'finish_reason': 'stop'
},
id='run-cbed62da-e1b3-4abd-9df3-ec89d69ca012-0'
)
Streaming:
.. code-block:: python
for chunk in chat.stream(messages):
print(chunk)
.. code-block:: python
content='' id='run-02c6-c43f-42de'
content='
' id='run-02c6-c43f-42de' content='B' id='run-02c6-c43f-42de' content='on' id='run-02c6-c43f-42de' content='j' id='run-02c6-c43f-42de' content='our' id='run-02c6-c43f-42de' content=' le' id='run-02c6-c43f-42de' content=' monde' id='run-02c6-c43f-42de' content='!' id='run-02c6-c43f-42de' content='' response_metadata={'finish_reason': 'stop'} id='run-02c6-c43f-42de'
Async:
.. code-block:: python
await chat.ainvoke(messages)
# stream:
# async for chunk in (await chat.astream(messages))
.. code-block:: python
AIMessage(
content='Bonjour le monde!',
response_metadata={'finish_reason': 'stop'},
id='run-8657a105-96b7-4bb6-b98e-b69ca420e5d1-0'
)
Structured output:
.. code-block:: python
from typing import Optional
from pydantic import BaseModel, Field
class Joke(BaseModel):
setup: str = Field(description="The setup of the joke")
punchline: str = Field(description="The punchline to the joke")
structured_llm = chat.with_structured_output(Joke, method="json_mode")
structured_llm.invoke(
"Tell me a joke about cats, "
"respond in JSON with `setup` and `punchline` keys"
)
.. code-block:: python
Joke(
setup='Why did the cat get stuck in the tree?',
punchline='Because it was chasing its tail!'
)
See ``ChatOCIModelDeployment.with_structured_output()`` for more.
Customized Usage:
You can inherit from base class and overwrite the `_process_response`,
`_process_stream_response`, `_construct_json_body` for customized usage.
.. code-block:: python
class MyChatModel(ChatOCIModelDeployment):
def _process_stream_response(self, response_json: dict) -> ChatGenerationChunk:
print("My customized streaming result handler.")
return GenerationChunk(...)
def _process_response(self, response_json:dict) -> ChatResult:
print("My customized output handler.")
return ChatResult(...)
def _construct_json_body(self, messages: list, params: dict) -> dict:
print("My customized payload handler.")
return {
"messages": messages,
**params,
}
chat = MyChatModel(
endpoint=f"https://modeldeployment.<region>.oci.customer-oci.com/{ocid}/predict",
model="odsc-llm",
}
chat.invoke("tell me a joke")
Response metadata
.. code-block:: python
ai_msg = chat.invoke(messages)
ai_msg.response_metadata
.. code-block:: python
{
'token_usage': {
'prompt_tokens': 40,
'total_tokens': 50,
'completion_tokens': 10
},
'model_name': 'odsc-llm',
'system_fingerprint': '',
'finish_reason': 'stop'
}
OCI large language chat models deployed with vLLM.
To use, you must provide the model HTTP endpoint from your deployed
model, e.g. https://modeldeployment.us-ashburn-1.oci.customer-oci.com/
To authenticate, oracle-ads has been used to automatically load
credentials: https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/authentication.html
Make sure to have the required policies to access the OCI Data Science Model Deployment endpoint. See: https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm#model_dep_policies_auth__predict-endpoint
Example:
.. code-block:: python
from langchain_community.chat_models import ChatOCIModelDeploymentVLLM
chat = ChatOCIModelDeploymentVLLM(
endpoint="https://modeldeployment.us-ashburn-1.oci.customer-oci.com/<ocid>/predict",
frequency_penalty=0.1,
max_tokens=512,
temperature=0.2,
top_p=1.0,
# other model parameters...
)
OCI large language chat models deployed with Text Generation Inference.
To use, you must provide the model HTTP endpoint from your deployed
model, e.g. https://modeldeployment.us-ashburn-1.oci.customer-oci.com/
To authenticate, oracle-ads has been used to automatically load
credentials: https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/authentication.html
Make sure to have the required policies to access the OCI Data Science Model Deployment endpoint. See: https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm#model_dep_policies_auth__predict-endpoint
Example:
.. code-block:: python
from langchain_community.chat_models import ChatOCIModelDeploymentTGI
chat = ChatOCIModelDeploymentTGI(
endpoint="https://modeldeployment.us-ashburn-1.oci.customer-oci.com/<ocid>/predict",
max_token=512,
temperature=0.2,
frequency_penalty=0.1,
seed=42,
# other model parameters...
)