# UpstageDocumentParseLoader

> **Class** in `langchain_upstage`

📖 [View in docs](https://reference.langchain.com/python/langchain-upstage/document_parse/UpstageDocumentParseLoader)

Upstage Document Parse Loader.

To use, you should have the environment variable `UPSTAGE_API_KEY`
set with your API key or pass it as a named parameter to the constructor.

## Signature

```python
UpstageDocumentParseLoader(
    self,
    file_path: Union[str, Path, List[str], List[Path]],
    split: SplitType = 'none',
    api_key: Optional[str] = None,
    base_url: str = DOCUMENT_PARSE_BASE_URL,
    model: str = DOCUMENT_PARSE_DEFAULT_MODEL,
    chart_recognition: bool = True,
    ocr: OCR = 'auto',
    output_format: OutputFormat = 'html',
    coordinates: bool = True,
    base64_encoding: Optional[List[Category]] = None,
)
```

## Description

**Example:**

```python
from langchain_upstage import UpstageDocumentParseLoader

file_path = "/PATH/TO/YOUR/FILE.pdf"
loader = UpstageDocumentParseLoader(
    file_path, split="page", output_format="text"
)
```

## Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `file_path` | `Union[str, Path, List[str], List[Path]]` | Yes | The path to the document to be loaded. |
| `split` | `SplitType` | No | The type of splitting to be applied. (default: `'none'`) |
| `api_key` | `str` | No | The API key for accessing the Upstage API. Defaults to None, in which case it will be fetched from the environment variable `UPSTAGE_API_KEY`. (default: `None`) |
| `base_url` | `str` | No | The base URL for accessing the Upstage API. (default: `DOCUMENT_PARSE_BASE_URL`) |
| `model` | `str` | No | The model to be used for the document parse. Defaults to `'document-parse'`. (default: `DOCUMENT_PARSE_DEFAULT_MODEL`) |
| `ocr` | `OCRMode` | No | Extract text from images in the document using OCR.  If the value is `'force'`, OCR is used to extract text from an image.  If the value is `'auto'`, text is extracted from a PDF. (An error will occur if the value is `'auto'` and the input is NOT in PDF format) (default: `'auto'`) |
| `output_format` | `OutputFormat` | No | Format of the inference results. (default: `'html'`) |
| `coordinates` | `bool` | No | Whether to include the coordinates of the OCR in the output. (default: `True`) |
| `base64_encoding` | `List[Category]` | No | The category of the elements to be encoded in base64. (default: `None`) |

## Extends

- `BaseLoader`

## Constructors

```python
__init__(
    self,
    file_path: Union[str, Path, List[str], List[Path]],
    split: SplitType = 'none',
    api_key: Optional[str] = None,
    base_url: str = DOCUMENT_PARSE_BASE_URL,
    model: str = DOCUMENT_PARSE_DEFAULT_MODEL,
    chart_recognition: bool = True,
    ocr: OCR = 'auto',
    output_format: OutputFormat = 'html',
    coordinates: bool = True,
    base64_encoding: Optional[List[Category]] = None,
)
```

| Name | Type |
|------|------|
| `file_path` | `Union[str, Path, List[str], List[Path]]` |
| `split` | `SplitType` |
| `api_key` | `Optional[str]` |
| `base_url` | `str` |
| `model` | `str` |
| `chart_recognition` | `bool` |
| `ocr` | `OCR` |
| `output_format` | `OutputFormat` |
| `coordinates` | `bool` |
| `base64_encoding` | `Optional[List[Category]]` |


## Properties

- `file_path`
- `split`
- `api_key`
- `base_url`
- `model`
- `chart_recognition`
- `ocr`
- `output_format`
- `coordinates`
- `base64_encoding`
- `parser`

## Methods

- [`load()`](https://reference.langchain.com/python/langchain-upstage/document_parse/UpstageDocumentParseLoader/load)
- [`lazy_load()`](https://reference.langchain.com/python/langchain-upstage/document_parse/UpstageDocumentParseLoader/lazy_load)
- [`merge_and_split()`](https://reference.langchain.com/python/langchain-upstage/document_parse/UpstageDocumentParseLoader/merge_and_split)

---

[View source on GitHub](https://github.com/langchain-ai/langchain-upstage/blob/8a74237ef34a625f463371e63a8c406096042f28/libs/upstage/langchain_upstage/document_parse.py#L41)