# WeightOnlyQuantPipeline

> **Class** in `langchain_community`

📖 [View in docs](https://reference.langchain.com/python/langchain-community/llms/weight_only_quantization/WeightOnlyQuantPipeline)

Weight only quantized model.

To use, you should have the `intel-extension-for-transformers` packabge and
    `transformers` package installed.
intel-extension-for-transformers:
    https://github.com/intel/intel-extension-for-transformers

## Signature

```python
WeightOnlyQuantPipeline()
```

## Description

**Example using from_model_id:**

.. code-block:: python

from langchain_community.llms import WeightOnlyQuantPipeline
from intel_extension_for_transformers.transformers import (
    WeightOnlyQuantConfig
)
config = WeightOnlyQuantConfig
hf = WeightOnlyQuantPipeline.from_model_id(
    model_id="google/flan-t5-large",
    task="text2text-generation"
    pipeline_kwargs={"max_new_tokens": 10},
    quantization_config=config,
)

Example passing pipeline in directly:
.. code-block:: python

    from langchain_community.llms import WeightOnlyQuantPipeline
    from intel_extension_for_transformers.transformers import (
        AutoModelForSeq2SeqLM
    )
    from intel_extension_for_transformers.transformers import (
        WeightOnlyQuantConfig
    )
    from transformers import AutoTokenizer, pipeline

    model_id = "google/flan-t5-large"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    config = WeightOnlyQuantConfig
    model = AutoModelForSeq2SeqLM.from_pretrained(
        model_id,
        quantization_config=config,
    )
    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=10,
    )
    hf = WeightOnlyQuantPipeline(pipeline=pipe)

## Extends

- `LLM`

## Properties

- `pipeline`
- `model_id`
- `model_kwargs`
- `pipeline_kwargs`
- `model_config`

## Methods

- [`from_model_id()`](https://reference.langchain.com/python/langchain-community/llms/weight_only_quantization/WeightOnlyQuantPipeline/from_model_id)

---

[View source on GitHub](https://github.com/langchain-ai/langchain-community/blob/4b280287bd55b99b44db2dd849f02d66c89534d5/libs/community/langchain_community/llms/weight_only_quantization.py#L15)