# GitbookLoader

> **Class** in `langchain_community`

📖 [View in docs](https://reference.langchain.com/python/langchain-community/document_loaders/gitbook/GitbookLoader)

Load `GitBook` data.

1. load from either a single page, or
2. load all (relative) paths in the sitemap, handling nested sitemap indexes.

When `load_all_paths=True`, the loader parses XML sitemaps and requires the
`lxml` package to be installed (`pip install lxml`).

## Signature

```python
GitbookLoader(
    self,
    web_page: str,
    load_all_paths: bool = False,
    base_url: Optional[str] = None,
    content_selector: str = 'main',
    continue_on_failure: bool = False,
    show_progress: bool = True,
    *,
    sitemap_url: Optional[str] = None,
    allowed_domains: Optional[Set[str]] = None,
)
```

## Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `web_page` | `str` | Yes | The web page to load or the starting point from where relative paths are discovered. |
| `load_all_paths` | `bool` | No | If set to True, all relative paths in the navbar are loaded instead of only `web_page`. Requires `lxml` package. (default: `False`) |
| `base_url` | `Optional[str]` | No | If `load_all_paths` is True, the relative paths are appended to this base url. Defaults to `web_page`. (default: `None`) |
| `content_selector` | `str` | No | The CSS selector for the content to load. Defaults to "main". (default: `'main'`) |
| `continue_on_failure` | `bool` | No | whether to continue loading the sitemap if an error occurs loading a url, emitting a warning instead of raising an exception. Setting this to True makes the loader more robust, but also may result in missing data. Default: False (default: `False`) |
| `show_progress` | `bool` | No | whether to show a progress bar while loading. Default: True (default: `True`) |
| `sitemap_url` | `Optional[str]` | No | Custom sitemap URL to use when load_all_paths is True. Defaults to "{base_url}/sitemap.xml". (default: `None`) |
| `allowed_domains` | `Optional[Set[str]]` | No | Optional set of allowed domains to fetch from. If None (default), the loader will restrict crawling to the domain of the `web_page` URL to prevent potential SSRF vulnerabilities. Provide an explicit set (e.g., {"example.com", "docs.example.com"}) to allow crawling across multiple domains. Use with caution in server environments where users might control the input URLs. (default: `None`) |

## Extends

- `BaseLoader`

## Constructors

```python
__init__(
    self,
    web_page: str,
    load_all_paths: bool = False,
    base_url: Optional[str] = None,
    content_selector: str = 'main',
    continue_on_failure: bool = False,
    show_progress: bool = True,
    *,
    sitemap_url: Optional[str] = None,
    allowed_domains: Optional[Set[str]] = None,
)
```

| Name | Type |
|------|------|
| `web_page` | `str` |
| `load_all_paths` | `bool` |
| `base_url` | `Optional[str]` |
| `content_selector` | `str` |
| `continue_on_failure` | `bool` |
| `show_progress` | `bool` |
| `sitemap_url` | `Optional[str]` |
| `allowed_domains` | `Optional[Set[str]]` |


## Properties

- `base_url`
- `web_page`
- `load_all_paths`
- `content_selector`
- `continue_on_failure`
- `show_progress`
- `allowed_domains`
- `start_url`

## Methods

- [`lazy_load()`](https://reference.langchain.com/python/langchain-community/document_loaders/gitbook/GitbookLoader/lazy_load)
- [`alazy_load()`](https://reference.langchain.com/python/langchain-community/document_loaders/gitbook/GitbookLoader/alazy_load)

---

[View source on GitHub](https://github.com/langchain-ai/langchain-community/blob/a6a6079511ac8a5c1293337f88096b8641562e77/libs/community/langchain_community/document_loaders/gitbook.py#L12)