# AmazonTextractPDFLoader

> **Class** in `langchain_community`

📖 [View in docs](https://reference.langchain.com/python/langchain-community/document_loaders/pdf/AmazonTextractPDFLoader)

Load `PDF` files from a local file system, HTTP or S3.

To authenticate, the AWS client uses the following methods to
automatically load credentials:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If a specific credential profile should be used, you must pass
the name of the profile from the ~/.aws/credentials file that is to be used.

Make sure the credentials / roles used have the required policies to
access the Amazon Textract service.

## Signature

```python
AmazonTextractPDFLoader(
    self,
    file_path: Union[str, PurePath],
    textract_features: Optional[Sequence[str]] = None,
    client: Optional[Any] = None,
    credentials_profile_name: Optional[str] = None,
    region_name: Optional[str] = None,
    endpoint_url: Optional[str] = None,
    headers: Optional[dict] = None,
    *,
    linearization_config: Optional[TextLinearizationConfig] = None,
)
```

## Description

**Example:**

.. code-block:: python
from langchain_community.document_loaders import AmazonTextractPDFLoader
loader = AmazonTextractPDFLoader(
    file_path="s3://pdfs/myfile.pdf"
)
document = loader.load()

## Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `file_path` | `Union[str, PurePath]` | Yes | A file, url or s3 path for input file |
| `textract_features` | `Optional[Sequence[str]]` | No | Features to be used for extraction, each feature                should be passed as a str that conforms to the enum                `Textract_Features`, see `amazon-textract-caller` pkg (default: `None`) |
| `client` | `Optional[Any]` | No | boto3 textract client (Optional) (default: `None`) |
| `credentials_profile_name` | `Optional[str]` | No | AWS profile name, if not default (Optional) (default: `None`) |
| `region_name` | `Optional[str]` | No | AWS region, eg us-east-1 (Optional) (default: `None`) |
| `endpoint_url` | `Optional[str]` | No | endpoint url for the textract service (Optional) (default: `None`) |
| `linearization_config` | `Optional[TextLinearizationConfig]` | No | Config to be used for linearization of the output                   should be an instance of TextLinearizationConfig from                   the `textractor` pkg (default: `None`) |

## Extends

- `BasePDFLoader`

## Constructors

```python
__init__(
    self,
    file_path: Union[str, PurePath],
    textract_features: Optional[Sequence[str]] = None,
    client: Optional[Any] = None,
    credentials_profile_name: Optional[str] = None,
    region_name: Optional[str] = None,
    endpoint_url: Optional[str] = None,
    headers: Optional[dict] = None,
    *,
    linearization_config: Optional[TextLinearizationConfig] = None,
) -> None
```

| Name | Type |
|------|------|
| `file_path` | `Union[str, PurePath]` |
| `textract_features` | `Optional[Sequence[str]]` |
| `client` | `Optional[Any]` |
| `credentials_profile_name` | `Optional[str]` |
| `region_name` | `Optional[str]` |
| `endpoint_url` | `Optional[str]` |
| `headers` | `Optional[dict]` |
| `linearization_config` | `Optional[TextLinearizationConfig]` |


## Properties

- `parser`

## Methods

- [`load()`](https://reference.langchain.com/python/langchain-community/document_loaders/pdf/AmazonTextractPDFLoader/load)
- [`lazy_load()`](https://reference.langchain.com/python/langchain-community/document_loaders/pdf/AmazonTextractPDFLoader/lazy_load)

---

[View source on GitHub](https://github.com/langchain-ai/langchain-community/blob/4b280287bd55b99b44db2dd849f02d66c89534d5/libs/community/langchain_community/document_loaders/pdf.py#L1049)