OCIModelDeploymentLLM()LLM deployed on OCI Data Science Model Deployment.
To use, you must provide the model HTTP endpoint from your deployed
model, e.g. https://modeldeployment.
To authenticate, oracle-ads has been used to automatically load
credentials: https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/authentication.html
Make sure to have the required policies to access the OCI Data Science Model Deployment endpoint. See: https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-policies-auth.htm#model_dep_policies_auth__predict-endpoint
Example:
.. code-block:: python
from langchain_community.llms import OCIModelDeploymentLLM
llm = OCIModelDeploymentLLM(
endpoint="https://modeldeployment.us-ashburn-1.oci.customer-oci.com/<ocid>/predict",
model="odsc-llm",
streaming=True,
model_kwargs={"frequency_penalty": 1.0},
headers={
"route": "/v1/completions",
# other request headers ...
}
)
llm.invoke("tell me a joke.")
Customized Usage:
User can inherit from our base class and overrwrite the `_process_response`, `_process_stream_response`,
`_construct_json_body` for satisfying customized needed.
.. code-block:: python
from langchain_community.llms import OCIModelDeploymentLLM
class MyCutomizedModel(OCIModelDeploymentLLM):
def _process_stream_response(self, response_json:dict) -> GenerationChunk:
print("My customized output stream handler.")
return GenerationChunk()
def _process_response(self, response_json:dict) -> List[Generation]:
print("My customized output handler.")
return [Generation()]
def _construct_json_body(self, prompt: str, param:dict) -> dict:
print("My customized input handler.")
return {}
llm = MyCutomizedModel(
endpoint=f"https://modeldeployment.us-ashburn-1.oci.customer-oci.com/{ocid}/predict",
model="<model_name>",
}
llm.invoke("tell me a joke.")
The name of the model.
Denotes the number of tokens to predict per generation.
A non-negative float that tunes the degree of randomness in generation.
Number of most likely tokens to consider at each step.
Total probability mass of tokens to consider at each step.
Generates best_of completions server-side and returns the "best" (the one with the highest log probability per token).
Stop words to use when generating. Model output is cut off at the first occurrence of any of these substrings.
Keyword arguments to pass to the model.