SelfHostedHuggingFaceLLM(
self,
**kwargs: Any = {},
)Allow deserialization using pickle which can be dangerous if
HuggingFace Pipeline API to run on self-hosted remote hardware.
Supported hardware includes auto-launched instances on AWS, GCP, Azure, and Lambda, as well as servers specified by IP address and SSH credentials (such as on-prem, or another cloud like Paperspace, Coreweave, etc.).
To use, you should have the runhouse python package installed.
Only supports text-generation, text2text-generation and summarization for now.
Example using from_model_id:
.. code-block:: python
from langchain_community.llms import SelfHostedHuggingFaceLLM import runhouse as rh gpu = rh.cluster(name="rh-a10x", instance_type="A100:1") hf = SelfHostedHuggingFaceLLM( model_id="google/flan-t5-large", task="text2text-generation", hardware=gpu )
Example passing fn that generates a pipeline (bc the pipeline is not serializable): .. code-block:: python
from langchain_community.llms import SelfHostedHuggingFaceLLM
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import runhouse as rh
def get_pipeline():
model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline(
"text-generation", model=model, tokenizer=tokenizer
)
return pipe
hf = SelfHostedHuggingFaceLLM(
model_load_fn=get_pipeline, model_id="gpt2", hardware=gpu)