Databricks serving endpoint or a cluster driver proxy app for LLM.
It supports two endpoint types:
-
Serving endpoint (recommended for both production and development).
We assume that an LLM was deployed to a serving endpoint.
To wrap it as an LLM you must have "Can Query" permission to the endpoint.
Set endpoint_name accordingly and do not set cluster_id and
cluster_driver_port.
If the underlying model is a model registered by MLflow, the expected model
signature is:
-
inputs::
[{"name": "prompt", "type": "string"},
{"name": "stop", "type": "list[string]"}]
-
outputs: [{"type": "string"}]
If the underlying model is an external or foundation model, the response from the
endpoint is automatically transformed to the expected format unless
transform_output_fn is provided.
-
Cluster driver proxy app (recommended for interactive development).
One can load an LLM on a Databricks interactive cluster and start a local HTTP
server on the driver node to serve the model at / using HTTP POST method
with JSON input/output.
Please use a port number between [3000, 8000] and let the server listen to
the driver IP address or simply 0.0.0.0 instead of localhost only.
To wrap it as an LLM you must have "Can Attach To" permission to the cluster.
Set cluster_id and cluster_driver_port and do not set endpoint_name.
The expected server schema (using JSON schema) is:
-
inputs::
{"type": "object",
"properties": {
"prompt": {"type": "string"},
"stop": {"type": "array", "items": {"type": "string"}}},
"required": ["prompt"]}`
-
outputs: {"type": "string"}
If the endpoint model signature is different or you want to set extra params,
you can use transform_input_fn and transform_output_fn to apply necessary
transformations before and after the query.