| Name | Type | Description |
|---|---|---|
path* | Union[str, PurePath] | The path to the directory containing PDF files to be loaded. |
glob | str | Default: '**/[!.]*.pdf'The glob pattern to match files in the directory. |
silent_errors | bool | Default: False |
recursive | bool | Default: False |
extract_images | bool | Default: False |
password | Optional[str] | Default: None |
mode | Literal['single', 'page'] | Default: 'page' |
images_parser | Optional[BaseImageBlobParser] | Default: None |
headers | Optional[dict] | Default: None |
extraction_mode | Literal['plain', 'layout'] | Default: 'plain' |
extraction_kwargs | Optional[dict] | Default: None |
Load and parse a directory of PDF files using 'pypdf' library.
This class provides methods to load and parse multiple PDF documents in a directory,
supporting options for recursive search, handling password-protected files,
extracting images, and defining extraction modes. It integrates the pypdf library
for PDF processing and offers synchronous document loading.
Whether to log errors instead of raising them.
Whether to include hidden files in the search.
Whether to search subdirectories recursively.
Whether to extract images from PDFs.
Optional password for opening encrypted PDFs.
The extraction mode, either "single" for extracting the entire document or "page" for page-wise extraction.
Optional image blob parser..
Optional headers to use for GET request to download a file from a web path.
“plain” for legacy functionality, “layout” for experimental layout mode functionality
Optional additional parameters for the extraction process.