PyPDFDirectoryLoader

PyPDFDirectoryLoader(
  self,
  path: Union[str, PurePath],
  glob: str

Bases

BaseLoader

Constructors

Attributes

Methods

Inherited fromBaseLoader(langchain_core)

Methods

Maload Mload_and_split Mlazy_load Malazy_load

View source on GitHub

Parameters

Name	Type	Description
`path`*	`Union[str, PurePath]`	The path to the directory containing PDF files to be loaded.
`glob`	`str`	Default:`'*/[!.].pdf'` The glob pattern to match files in the directory.
`silent_errors`	`bool`	Default:`False`
`load_hidden`	`bool`	Default:`False`
`recursive`	`bool`	Default:`False`
`extract_images`	`bool`	Default:`False`
`password`	`Optional[str]`	Default:`None`
`mode`	`Literal['single', 'page']`	Default:`'page'`
`images_parser`	`Optional[BaseImageBlobParser]`	Default:`None`
`headers`	`Optional[dict]`	Default:`None`
`extraction_mode`	`Literal['plain', 'layout']`	Default:`'plain'`
`extraction_kwargs`	`Optional[dict]`	Default:`None`

constructor

__init__

Name	Type
path	Union[str, PurePath]
glob	str
silent_errors	bool
load_hidden	bool
recursive	bool
extract_images	bool
password	Optional[str]
mode	Literal['single', 'page']
images_parser	Optional[BaseImageBlobParser]
headers	Optional[dict]
extraction_mode	Literal['plain', 'layout']
extraction_kwargs	Optional[dict]

Load and parse a directory of PDF files using 'pypdf' library.

This class provides methods to load and parse multiple PDF documents in a directory, supporting options for recursive search, handling password-protected files, extracting images, and defining extraction modes. It integrates the pypdf library for PDF processing and offers synchronous document loading.

Whether to log errors instead of raising them.

Whether to include hidden files in the search.

Whether to search subdirectories recursively.

Whether to extract images from PDFs.

Optional password for opening encrypted PDFs.

The extraction mode, either "single" for extracting the entire document or "page" for page-wise extraction.

Optional image blob parser..

Optional headers to use for GET request to download a file from a web path.

“plain” for legacy functionality, “layout” for experimental layout mode functionality

Optional additional parameters for the extraction process.

LangChain Assistant

Menu

PyPDFDirectoryLoader

Bases

Constructors

Attributes

Methods

Inherited fromBaseLoader(langchain_core)

Methods

Parameters

Menu

PyPDFDirectoryLoader

Bases

Used in Docs

Constructors

Attributes

Methods

Inherited fromBaseLoader(langchain_core)

Methods

Parameters