| Name | Type | Description |
|---|---|---|
file_path* | Union[str, PurePath] | A file, url or s3 path for input file |
textract_features | Optional[Sequence[str]] | Default: None |
client | Optional[Any] | Default: None |
credentials_profile_name | Optional[str] | Default: None |
region_name | Optional[str] | Default: None |
endpoint_url | Optional[str] | Default: None |
linearization_config | Optional[TextLinearizationConfig] | Default: None |
Load PDF files from a local file system, HTTP or S3.
To authenticate, the AWS client uses the following methods to automatically load credentials: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
If a specific credential profile should be used, you must pass the name of the profile from the ~/.aws/credentials file that is to be used.
Make sure the credentials / roles used have the required policies to access the Amazon Textract service.
Example:
.. code-block:: python from langchain_community.document_loaders import AmazonTextractPDFLoader loader = AmazonTextractPDFLoader( file_path="s3://pdfs/myfile.pdf" ) document = loader.load()
Features to be used for extraction, each feature
should be passed as a str that conforms to the enum
Textract_Features, see amazon-textract-caller pkg
boto3 textract client (Optional)
AWS profile name, if not default (Optional)
AWS region, eg us-east-1 (Optional)
endpoint url for the textract service (Optional)
Config to be used for linearization of the output
should be an instance of TextLinearizationConfig from
the textractor pkg