HuggingFaceTextGenInference()Typical Decoding mass. See Typical Decoding for Natural Language Generation for more information.
The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.
Watermarking with [A Watermark for Large Language Models] (https://arxiv.org/abs/2301.10226)
HuggingFace text generation API. ! This class is deprecated, you should use HuggingFaceEndpoint instead !
To use, you should have the text-generation python package installed and
a text-generation server running.
Example:
.. code-block:: python
llm = HuggingFaceTextGenInference( inference_server_url="http://localhost:8010/", max_new_tokens=512, top_k=10, top_p=0.95, typical_p=0.95, temperature=0.01, repetition_penalty=1.03, ) print(llm.invoke("What is Deep Learning?")) # noqa: T201
from langchain_community.callbacks import streaming_stdout
callbacks = [streaming_stdout.StreamingStdOutCallbackHandler()] llm = HuggingFaceTextGenInference( inference_server_url="http://localhost:8010/", max_new_tokens=512, top_k=10, top_p=0.95, typical_p=0.95, temperature=0.01, repetition_penalty=1.03, callbacks=callbacks, streaming=True ) print(llm.invoke("What is Deep Learning?")) # noqa: T201