# DedocAPIFileLoader

> **Class** in `langchain_community`

📖 [View in docs](https://reference.langchain.com/python/langchain-community/document_loaders/dedoc/DedocAPIFileLoader)

Load files using `dedoc` API.
The file loader automatically detects the file type (even with the wrong extension).
By default, the loader makes a call to the locally hosted `dedoc` API.
More information about `dedoc` API can be found in `dedoc` documentation:
    https://dedoc.readthedocs.io/en/latest/dedoc_api_usage/api.html

Please see the documentation of DedocBaseLoader to get more details.

## Signature

```python
DedocAPIFileLoader(
    self,
    file_path: str,
    *,
    url: str = 'http://0.0.0.0:1231',
    split: str = 'document',
    with_tables: bool = True,
    with_attachments: Union[str, bool] = False,
    recursion_deep_attachments: int = 10,
    pdf_with_text_layer: str = 'auto_tabby',
    language: str = 'rus+eng',
    pages: str = ':',
    is_one_column_document: str = 'auto',
    document_orientation: str = 'auto',
    need_header_footer_analysis: Union[str, bool] = False,
    need_binarization: Union[str, bool] = False,
    need_pdf_table_analysis: Union[str, bool] = True,
    delimiter: Optional[str] = None,
    encoding: Optional[str] = None,
)
```

## Description

**Setup:**

You don't need to install `dedoc` library for using this loader.
Instead, the `dedoc` API needs to be run.
You may use Docker container for this purpose.
Please see `dedoc` documentation for more details:
    https://dedoc.readthedocs.io/en/latest/getting_started/installation.html#install-and-run-dedoc-using-docker

.. code-block:: bash

    docker pull dedocproject/dedoc
    docker run -p 1231:1231

**Instantiate:**

.. code-block:: python

from langchain_community.document_loaders import DedocAPIFileLoader

loader = DedocAPIFileLoader(
    file_path="example.pdf",
    # url=...,
    # split=...,
    # with_tables=...,
    # pdf_with_text_layer=...,
    # pages=...,
    # ...
)

**Load:**

.. code-block:: python

    docs = loader.load()
    print(docs[0].page_content[:100])
    print(docs[0].metadata)

.. code-block:: python

    Some text
    {
        'file_name': 'example.pdf',
        'file_type': 'application/pdf',
        # ...
    }

**Lazy load:**

.. code-block:: python

    docs = []
    docs_lazy = loader.lazy_load()

    for doc in docs_lazy:
        docs.append(doc)
    print(docs[0].page_content[:100])
    print(docs[0].metadata)

.. code-block:: python

    Some text
    {
        'file_name': 'example.pdf',
        'file_type': 'application/pdf',
        # ...
    }

## Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `file_path` | `str` | Yes | path to the file for processing |
| `url` | `str` | No | URL to call `dedoc` API (default: `'http://0.0.0.0:1231'`) |
| `split` | `str` | No | type of document splitting into parts (each part is returned separately), default value "document" "document": document is returned as a single langchain Document object     (don't split) "page": split document into pages (works for PDF, DJVU, PPTX, PPT, ODP) "node": split document into tree nodes (title nodes, list item nodes,     raw text nodes) "line": split document into lines (default: `'document'`) |
| `with_tables` | `bool` | No | add tables to the result - each table is returned as a single langchain Document object (default: `True`) |

## Extends

- `DedocBaseLoader`

## Constructors

```python
__init__(
    self,
    file_path: str,
    *,
    url: str = 'http://0.0.0.0:1231',
    split: str = 'document',
    with_tables: bool = True,
    with_attachments: Union[str, bool] = False,
    recursion_deep_attachments: int = 10,
    pdf_with_text_layer: str = 'auto_tabby',
    language: str = 'rus+eng',
    pages: str = ':',
    is_one_column_document: str = 'auto',
    document_orientation: str = 'auto',
    need_header_footer_analysis: Union[str, bool] = False,
    need_binarization: Union[str, bool] = False,
    need_pdf_table_analysis: Union[str, bool] = True,
    delimiter: Optional[str] = None,
    encoding: Optional[str] = None,
) -> None
```

| Name | Type |
|------|------|
| `file_path` | `str` |
| `url` | `str` |
| `split` | `str` |
| `with_tables` | `bool` |
| `with_attachments` | `Union[str, bool]` |
| `recursion_deep_attachments` | `int` |
| `pdf_with_text_layer` | `str` |
| `language` | `str` |
| `pages` | `str` |
| `is_one_column_document` | `str` |
| `document_orientation` | `str` |
| `need_header_footer_analysis` | `Union[str, bool]` |
| `need_binarization` | `Union[str, bool]` |
| `need_pdf_table_analysis` | `Union[str, bool]` |
| `delimiter` | `Optional[str]` |
| `encoding` | `Optional[str]` |


## Properties

- `url`

## Methods

- [`lazy_load()`](https://reference.langchain.com/python/langchain-community/document_loaders/dedoc/DedocAPIFileLoader/lazy_load)

---

[View source on GitHub](https://github.com/langchain-ai/langchain-community/blob/4b280287bd55b99b44db2dd849f02d66c89534d5/libs/community/langchain_community/document_loaders/dedoc.py#L362)