Class●Since v0.3

SelfHostedPipeline

SelfHostedPipeline(
    self,
    **kwargs: Any = {},
)

Bases

LLM

Constructors

Attributes

Methods

Inherited fromBaseLLM(langchain_core)

Attributes

AOutputType

Methods

Minvoke Mainvoke Mbatch M

View source on GitHub

Model inference on self-hosted remote hardware.

Supported hardware includes auto-launched instances on AWS, GCP, Azure, and Lambda, as well as servers specified by IP address and SSH credentials (such as on-prem, or another cloud like Paperspace, Coreweave, etc.).

To use, you should have the runhouse python package installed.

Example for custom pipeline and inference functions:

.. code-block:: python

from langchain_community.llms import SelfHostedPipeline from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline import runhouse as rh

def load_pipeline(): tokenizer = AutoTokenizer.from_pretrained("gpt2") model = AutoModelForCausalLM.from_pretrained("gpt2") return pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10 ) def inference_fn(pipeline, prompt, stop = None): return pipeline(prompt)[0]["generated_text"]

gpu = rh.cluster(name="rh-a10x", instance_type="A100:1") llm = SelfHostedPipeline( model_load_fn=load_pipeline, hardware=gpu, model_reqs=model_reqs, inference_fn=inference_fn )

Example for <2GB model (can be serialized and sent directly to the server): .. code-block:: python

    from langchain_community.llms import SelfHostedPipeline
    import runhouse as rh
    gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")
    my_model = ...
    llm = SelfHostedPipeline.from_pipeline(
        pipeline=my_model,
        hardware=gpu,
        model_reqs=["./", "torch", "transformers"],
    )

Example passing model path for larger models: .. code-block:: python

    from langchain_community.llms import SelfHostedPipeline
    import runhouse as rh
    import pickle
    from transformers import pipeline

    generator = pipeline(model="gpt2")
    rh.blob(pickle.dumps(generator), path="models/pipeline.pkl"
        ).save().to(gpu, path="models")
    llm = SelfHostedPipeline.from_pipeline(
        pipeline="models/pipeline.pkl",
        hardware=gpu,
        model_reqs=["./", "torch", "transformers"],
    )

LangChain Assistant

Menu

SelfHostedPipeline

Bases

Constructors

Attributes

Methods

Inherited fromBaseLLM(langchain_core)

Attributes

Methods

Inherited fromBaseLanguageModel(langchain_core)

Attributes

Methods

Inherited fromRunnableSerializable(langchain_core)

Attributes

Methods

Inherited fromSerializable(langchain_core)

Attributes

Methods

Inherited fromRunnable(langchain_core)

Attributes

Methods

Inherited fromBaseModel

Attributes