GitbookLoader

GitbookLoader(
  self,
  web_page: str,
  load_all_paths: bool = False,
  base_url: Optional

Bases

BaseLoader

Constructors

Attributes

Methods

Inherited fromBaseLoader(langchain_core)

Methods

Mload Maload Mload_and_split

View source on GitHub

Parameters

Name	Type	Description
`web_page`*	`str`	The web page to load or the starting point from where relative paths are discovered.
`load_all_paths`	`bool`	Default:`False` If set to True, all relative paths in the navbar are loaded instead of only `web_page`. Requires `lxml` package.
`base_url`	`Optional[str]`	Default:`None`
`content_selector`	`str`	Default:`'main'`
`continue_on_failure`	`bool`	Default:`False`
`show_progress`	`bool`	Default:`True`
`sitemap_url`	`Optional[str]`	Default:`None`
`allowed_domains`	`Optional[Set[str]]`	Default:`None`

constructor

__init__

Name	Type
web_page	str
load_all_paths	bool
base_url	Optional[str]
content_selector	str
continue_on_failure	bool
show_progress	bool
sitemap_url	Optional[str]
allowed_domains	Optional[Set[str]]

Load GitBook data.

load from either a single page, or
load all (relative) paths in the sitemap, handling nested sitemap indexes.

When load_all_paths=True, the loader parses XML sitemaps and requires the lxml package to be installed (pip install lxml).

If load_all_paths is True, the relative paths are appended to this base url. Defaults to web_page.

The CSS selector for the content to load. Defaults to "main".

whether to continue loading the sitemap if an error occurs loading a url, emitting a warning instead of raising an exception. Setting this to True makes the loader more robust, but also may result in missing data. Default: False

whether to show a progress bar while loading. Default: True

Custom sitemap URL to use when load_all_paths is True. Defaults to "{base_url}/sitemap.xml".

Optional set of allowed domains to fetch from. If None (default), the loader will restrict crawling to the domain of the web_page URL to prevent potential SSRF vulnerabilities. Provide an explicit set (e.g., {"example.com", "docs.example.com"}) to allow crawling across multiple domains. Use with caution in server environments where users might control the input URLs.

LangChain Assistant

Menu

GitbookLoader

Bases

Constructors

Attributes

Methods

Inherited fromBaseLoader(langchain_core)

Methods

Parameters

Menu

GitbookLoader

Bases

Used in Docs

Constructors

Attributes

Methods

Inherited fromBaseLoader(langchain_core)

Methods

Parameters