HuggingFaceEndpoint()Name of the provider to use for inference with the model specified in repo_id.
e.g. "cerebras". if not specified, Defaults to "auto" i.e. the first of the
providers available for the model, sorted by the user's order in https://hf.co/settings/inference-providers.
available providers can be found in the huggingface_hub documentation.
Typical Decoding mass. See Typical Decoding for Natural Language Generation for more information.
The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.
Watermarking with [A Watermark for Large Language Models] (https://arxiv.org/abs/2301.10226)
Hugging Face Endpoint. This works with any model that supports text generation (i.e. text completion) task.
To use this class, you should have installed the huggingface_hub package, and
the environment variable HUGGINGFACEHUB_API_TOKEN set with your API token,
or given as a named parameter to the constructor.
Example:
# Basic Example (no streaming)
model = HuggingFaceEndpoint(
endpoint_url="http://localhost:8010/",
max_new_tokens=512,
top_k=10,
top_p=0.95,
typical_p=0.95,
temperature=0.01,
repetition_penalty=1.03,
huggingfacehub_api_token="my-api-key",
)
print(model.invoke("What is Deep Learning?"))
# Streaming response example
from langchain_core.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
callbacks = [StreamingStdOutCallbackHandler()]
model = HuggingFaceEndpoint(
endpoint_url="http://localhost:8010/",
max_new_tokens=512,
top_k=10,
top_p=0.95,
typical_p=0.95,
temperature=0.01,
repetition_penalty=1.03,
callbacks=callbacks,
streaming=True,
huggingfacehub_api_token="my-api-key",
)
print(model.invoke("What is Deep Learning?"))
# Basic Example (no streaming) with Mistral-Nemo-Base-2407 model using a third-party provider (Novita).
model = HuggingFaceEndpoint(
repo_id="mistralai/Mistral-Nemo-Base-2407",
provider="novita",
max_new_tokens=100,
do_sample=False,
huggingfacehub_api_token="my-api-key",
)
print(model.invoke("What is Deep Learning?"))