SelfHostedEmbeddings(
self,
**kwargs: Any = {},
)Function to load the model remotely on the server.
Custom embedding models on self-hosted remote hardware.
Supported hardware includes auto-launched instances on AWS, GCP, Azure, and Lambda, as well as servers specified by IP address and SSH credentials (such as on-prem, or another cloud like Paperspace, Coreweave, etc.).
To use, you should have the runhouse python package installed.
Example using a model load function:
.. code-block:: python
from langchain_community.embeddings import SelfHostedEmbeddings from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline import runhouse as rh
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1") def get_pipeline(): model_id = "facebook/bart-large" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) return pipeline("feature-extraction", model=model, tokenizer=tokenizer) embeddings = SelfHostedEmbeddings( model_load_fn=get_pipeline, hardware=gpu model_reqs=["./", "torch", "transformers"], )
Example passing in a pipeline path: .. code-block:: python
from langchain_community.embeddings import SelfHostedHFEmbeddings
import runhouse as rh
from transformers import pipeline
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")
pipeline = pipeline(model="bert-base-uncased", task="feature-extraction")
rh.blob(pickle.dumps(pipeline),
path="models/pipeline.pkl").save().to(gpu, path="models")
embeddings = SelfHostedHFEmbeddings.from_pipeline(
pipeline="models/pipeline.pkl",
hardware=gpu,
model_reqs=["./", "torch", "transformers"],
)