# UnstructuredLoader

> **Class** in `langchain_unstructured`

📖 [View in docs](https://reference.langchain.com/python/langchain-unstructured/document_loaders/UnstructuredLoader)

Unstructured document loader interface.

## Signature

```python
UnstructuredLoader(
    self,
    file_path: Optional[str | Path | list[str] | list[Path]] = None,
    *,
    file: Optional[IO[bytes] | list[IO[bytes]]] = None,
    partition_via_api: bool = False,
    post_processors: Optional[list[Callable[[str], str]]] = None,
    api_key: Optional[str] = None,
    client: Optional[UnstructuredClient] = None,
    url: Optional[str] = None,
    web_url: Optional[str] = None,
    **kwargs: Any = {},
)
```

## Description

**Setup:**

Install `langchain-unstructured` and set environment variable `UNSTRUCTURED_API_KEY`.

```bash
pip install -U langchain-unstructured
export UNSTRUCTURED_API_KEY="your-api-key"
```

**Instantiate:**

```python
from langchain_unstructured import UnstructuredLoader

loader = UnstructuredLoader(
    file_path = ["example.pdf", "fake.pdf"],
    api_key=UNSTRUCTURED_API_KEY,
    partition_via_api=True,
    chunking_strategy="by_title",
    strategy="fast",
)
```

**Lazy load:**

```python
docs = []
docs_lazy = loader.lazy_load()

# async variant:
# docs_lazy = await loader.alazy_load()

for doc in docs_lazy:
    docs.append(doc)
print(docs[0].page_content[:100])
print(docs[0].metadata)
```

```python
1 2 0 2
{'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((16.34, 213.36), (16.34, 253.36), (36.34, 253.36), (36.34, 213.36)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-07-25T21:28:58', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'd3ce55f220dfb75891b4394a18bcb973'}
```

**Async load:**

```python
docs = await loader.aload()
print(docs[0].page_content[:100])
print(docs[0].metadata)
```

```python
1 2 0 2
{'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((16.34, 213.36), (16.34, 253.36), (36.34, 253.36), (36.34, 213.36)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-07-25T21:28:58', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'd3ce55f220dfb75891b4394a18bcb973'}
```

**Load URL:**

```python
loader = UnstructuredLoader(web_url="https://www.example.com/")
print(docs[0])
```

```
page_content='Example Domain' metadata={'category_depth': 0, 'languages': ['eng'], 'filetype': 'text/html', 'url': 'https://www.example.com/', 'category': 'Title', 'element_id': 'fdaa78d856f9d143aeeed85bf23f58f8'}
```

```python
print(docs[1])
```

```
page_content='This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.' metadata={'languages': ['eng'], 'parent_id': 'fdaa78d856f9d143aeeed85bf23f58f8', 'filetype': 'text/html', 'url': 'https://www.example.com/', 'category': 'NarrativeText', 'element_id': '3652b8458b0688639f973fe36253c992'}
```

References
----------
https://docs.unstructured.io/api-reference/api-services/sdk
https://docs.unstructured.io/api-reference/api-services/overview
https://docs.unstructured.io/open-source/core-functionality/partitioning
https://docs.unstructured.io/open-source/core-functionality/chunking

## Extends

- `BaseLoader`

## Constructors

```python
__init__(
    self,
    file_path: Optional[str | Path | list[str] | list[Path]] = None,
    *,
    file: Optional[IO[bytes] | list[IO[bytes]]] = None,
    partition_via_api: bool = False,
    post_processors: Optional[list[Callable[[str], str]]] = None,
    api_key: Optional[str] = None,
    client: Optional[UnstructuredClient] = None,
    url: Optional[str] = None,
    web_url: Optional[str] = None,
    **kwargs: Any = {},
)
```

| Name | Type |
|------|------|
| `file_path` | `Optional[str \| Path \| list[str] \| list[Path]]` |
| `file` | `Optional[IO[bytes] \| list[IO[bytes]]]` |
| `partition_via_api` | `bool` |
| `post_processors` | `Optional[list[Callable[[str], str]]]` |
| `api_key` | `Optional[str]` |
| `client` | `Optional[UnstructuredClient]` |
| `url` | `Optional[str]` |
| `web_url` | `Optional[str]` |


## Properties

- `client`
- `file_path`
- `file`
- `partition_via_api`
- `post_processors`
- `unstructured_kwargs`

## Methods

- [`lazy_load()`](https://reference.langchain.com/python/langchain-unstructured/document_loaders/UnstructuredLoader/lazy_load)

---

[View source on GitHub](https://github.com/langchain-ai/langchain-unstructured/blob/4a33c925e9b6a1326a11f3bedd1f7a7c3102b94d/libs/unstructured/langchain_unstructured/document_loaders.py#L24)