Databricks(
self,
Databricks serving endpoint or a cluster driver proxy app for LLM.
It supports two endpoint types:
Serving endpoint (recommended for both production and development).
We assume that an LLM was deployed to a serving endpoint.
To wrap it as an LLM you must have "Can Query" permission to the endpoint.
Set endpoint_name accordingly and do not set cluster_id and
cluster_driver_port.
If the underlying model is a model registered by MLflow, the expected model signature is:
inputs::
[{"name": "prompt", "type": "string"}, {"name": "stop", "type": "list[string]"}]
outputs: [{"type": "string"}]
If the underlying model is an external or foundation model, the response from the
endpoint is automatically transformed to the expected format unless
transform_output_fn is provided.
Cluster driver proxy app (recommended for interactive development).
One can load an LLM on a Databricks interactive cluster and start a local HTTP
server on the driver node to serve the model at / using HTTP POST method
with JSON input/output.
Please use a port number between [3000, 8000] and let the server listen to
the driver IP address or simply 0.0.0.0 instead of localhost only.
To wrap it as an LLM you must have "Can Attach To" permission to the cluster.
Set cluster_id and cluster_driver_port and do not set endpoint_name.
The expected server schema (using JSON schema) is:
inputs::
{"type": "object", "properties": { "prompt": {"type": "string"}, "stop": {"type": "array", "items": {"type": "string"}}}, "required": ["prompt"]}`
outputs: {"type": "string"}
If the endpoint model signature is different or you want to set extra params,
you can use transform_input_fn and transform_output_fn to apply necessary
transformations before and after the query.
Databricks workspace hostname. If not provided, the default value is determined by
DATABRICKS_HOST environment variable if present, orDatabricks personal access token. If not provided, the default value is determined by
DATABRICKS_TOKEN environment variable if present, orName of the model serving endpoint.
You must specify the endpoint name to connect to a model serving endpoint.
You must not set both endpoint_name and cluster_id.
ID of the cluster if connecting to a cluster driver proxy app.
If neither endpoint_name nor cluster_id is not provided and the code runs
inside a Databricks notebook attached to an interactive cluster in "single user"
or "no isolation shared" mode, the current cluster ID is used as default.
You must not set both endpoint_name and cluster_id.
The port number used by the HTTP server running on the cluster driver node.
The server should listen on the driver IP address or simply 0.0.0.0 to connect.
We recommend the server using a port number between [3000, 8000].
Deprecated. Please use extra_params instead. Extra parameters to pass to
the endpoint.
A function that transforms {prompt, stop, **kwargs} into a JSON-compatible
request object that the endpoint accepts.
For example, you can apply a prompt template to the input prompt.
A function that transforms the output from the endpoint to the generated text.
The databricks URI. Only used when using a serving endpoint.
The sampling temperature.
The number of completion choices to generate.
The stop sequence.
The maximum number of tokens to generate.
Any extra parameters to pass to the endpoint.
The task of the endpoint. Only used when using a serving endpoint. If not provided, the task is automatically inferred from the endpoint.
Whether to allow dangerous deserialization of the data which involves loading data using pickle.
If the data has been modified by a malicious actor, it can deliver a malicious payload that results in execution of arbitrary code on the target machine.