AmazonTextractPDFLoader

AmazonTextractPDFLoader(
  self,
  file_path: Union[str, PurePath],
  textract_features: Optional[

Bases

BasePDFLoader

Constructors

Attributes

Methods

Inherited fromBasePDFLoader

Attributes

Afile_path: path Aweb_path: web_path Aheaders: headers Atemp_dir

View source on GitHub

credentials_profile_name

linearization_config

Inherited fromBaseLoader(langchain_core)

Methods

Maload Mload_and_split Malazy_load

Parameters

Name	Type	Description
`file_path`*	`Union[str, PurePath]`	A file, url or s3 path for input file
`textract_features`	`Optional[Sequence[str]]`	Default:`None`
`client`	`Optional[Any]`	Default:`None`
`credentials_profile_name`	`Optional[str]`	Default:`None`
`region_name`	`Optional[str]`	Default:`None`
`endpoint_url`	`Optional[str]`	Default:`None`
`linearization_config`	`Optional[TextLinearizationConfig]`	Default:`None`

constructor

__init__

Name	Type
file_path	Union[str, PurePath]
textract_features	Optional[Sequence[str]]
client	Optional[Any]
credentials_profile_name	Optional[str]
region_name	Optional[str]
endpoint_url	Optional[str]
headers	Optional[dict]
linearization_config	Optional[TextLinearizationConfig]

Load PDF files from a local file system, HTTP or S3.

To authenticate, the AWS client uses the following methods to automatically load credentials: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If a specific credential profile should be used, you must pass the name of the profile from the ~/.aws/credentials file that is to be used.

Make sure the credentials / roles used have the required policies to access the Amazon Textract service.

Example:

.. code-block:: python from langchain_community.document_loaders import AmazonTextractPDFLoader loader = AmazonTextractPDFLoader( file_path="s3://pdfs/myfile.pdf" ) document = loader.load()

Features to be used for extraction, each feature should be passed as a str that conforms to the enum Textract_Features, see amazon-textract-caller pkg

boto3 textract client (Optional)

AWS profile name, if not default (Optional)

AWS region, eg us-east-1 (Optional)

endpoint url for the textract service (Optional)

Config to be used for linearization of the output should be an instance of TextLinearizationConfig from the textractor pkg

LangChain Assistant

Menu

AmazonTextractPDFLoader

Bases

Constructors

Attributes

Methods

Inherited fromBasePDFLoader

Attributes

Inherited fromBaseLoader(langchain_core)

Methods

Parameters

Menu

AmazonTextractPDFLoader

Bases

Used in Docs

Constructors

Attributes

Methods

Inherited fromBasePDFLoader

Attributes

Inherited fromBaseLoader(langchain_core)

Methods

Parameters