# SelfHostedPipeline

> **Class** in `langchain_community`

📖 [View in docs](https://reference.langchain.com/python/langchain-community/llms/self_hosted/SelfHostedPipeline)

Model inference on self-hosted remote hardware.

Supported hardware includes auto-launched instances on AWS, GCP, Azure,
and Lambda, as well as servers specified
by IP address and SSH credentials (such as on-prem, or another
cloud like Paperspace, Coreweave, etc.).

To use, you should have the ``runhouse`` python package installed.

## Signature

```python
SelfHostedPipeline(
    self,
    **kwargs: Any = {},
)
```

## Description

**Example for custom pipeline and inference functions:**

.. code-block:: python

from langchain_community.llms import SelfHostedPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import runhouse as rh

def load_pipeline():
    tokenizer = AutoTokenizer.from_pretrained("gpt2")
    model = AutoModelForCausalLM.from_pretrained("gpt2")
    return pipeline(
        "text-generation", model=model, tokenizer=tokenizer,
        max_new_tokens=10
    )
def inference_fn(pipeline, prompt, stop = None):
    return pipeline(prompt)[0]["generated_text"]

gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")
llm = SelfHostedPipeline(
    model_load_fn=load_pipeline,
    hardware=gpu,
    model_reqs=model_reqs, inference_fn=inference_fn
)

Example for <2GB model (can be serialized and sent directly to the server):
    .. code-block:: python

        from langchain_community.llms import SelfHostedPipeline
        import runhouse as rh
        gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")
        my_model = ...
        llm = SelfHostedPipeline.from_pipeline(
            pipeline=my_model,
            hardware=gpu,
            model_reqs=["./", "torch", "transformers"],
        )
Example passing model path for larger models:
    .. code-block:: python

        from langchain_community.llms import SelfHostedPipeline
        import runhouse as rh
        import pickle
        from transformers import pipeline

        generator = pipeline(model="gpt2")
        rh.blob(pickle.dumps(generator), path="models/pipeline.pkl"
            ).save().to(gpu, path="models")
        llm = SelfHostedPipeline.from_pipeline(
            pipeline="models/pipeline.pkl",
            hardware=gpu,
            model_reqs=["./", "torch", "transformers"],
        )

## Extends

- `LLM`

## Constructors

```python
__init__(
    self,
    **kwargs: Any = {},
)
```


## Properties

- `pipeline_ref`
- `client`
- `inference_fn`
- `hardware`
- `model_load_fn`
- `load_fn_kwargs`
- `model_reqs`
- `allow_dangerous_deserialization`
- `model_config`

## Methods

- [`from_pipeline()`](https://reference.langchain.com/python/langchain-community/llms/self_hosted/SelfHostedPipeline/from_pipeline)

---

[View source on GitHub](https://github.com/langchain-ai/langchain-community/blob/4b280287bd55b99b44db2dd849f02d66c89534d5/libs/community/langchain_community/llms/self_hosted.py#L66)