# WebBaseLoader

> **Class** in `langchain_community`

📖 [View in docs](https://reference.langchain.com/python/langchain-community/document_loaders/web_base/WebBaseLoader)

WebBaseLoader document loader integration

## Signature

```python
WebBaseLoader(
    self,
    web_path: Union[str, Sequence[str]] = '',
    header_template: Optional[dict] = None,
    verify_ssl: bool = True,
    proxies: Optional[dict] = None,
    continue_on_failure: bool = False,
    autoset_encoding: bool = True,
    encoding: Optional[str] = None,
    web_paths: Sequence[str] = (),
    requests_per_second: int = 2,
    default_parser: str = 'html.parser',
    requests_kwargs: Optional[Dict[str, Any]] = None,
    raise_for_status: bool = False,
    bs_get_text_kwargs: Optional[Dict[str, Any]] = None,
    bs_kwargs: Optional[Dict[str, Any]] = None,
    session: Any = None,
    *,
    show_progress: bool = True,
    trust_env: bool = False,
)
```

## Description

**Setup:**

Install ``langchain_community``.

.. code-block:: bash

    pip install -U langchain_community

**Instantiate:**

.. code-block:: python

from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(
    web_path = "https://www.espn.com/"
    # header_template = None,
    # verify_ssl = True,
    # proxies = None,
    # continue_on_failure = False,
    # autoset_encoding = True,
    # encoding = None,
    # web_paths = (),
    # requests_per_second = 2,
    # default_parser = "html.parser",
    # requests_kwargs = None,
    # raise_for_status = False,
    # bs_get_text_kwargs = None,
    # bs_kwargs = None,
    # session = None,
    # show_progress = True,
    # trust_env = False,
)

**Lazy load:**

.. code-block:: python

    docs = []
    for doc in loader.lazy_load():
        docs.append(doc)
    print(docs[0].page_content[:100])
    print(docs[0].metadata)

.. code-block:: python

    ESPN - Serving Sports Fans. Anytime. Anywhere.

    {'source': 'https://www.espn.com/', 'title': 'ESPN - Serving Sports Fans. Anytime. Anywhere.', 'description': 'Visit ESPN for live scores, highlights and sports news. Stream exclusive games on ESPN+ and play fantasy sports.', 'language': 'en'}

**Async load:**

.. code-block:: python

    docs = []
    async for doc in loader.alazy_load():
        docs.append(doc)
    print(docs[0].page_content[:100])
    print(docs[0].metadata)

.. code-block:: python

    ESPN - Serving Sports Fans. Anytime. Anywhere.

    {'source': 'https://www.espn.com/', 'title': 'ESPN - Serving Sports Fans. Anytime. Anywhere.', 'description': 'Visit ESPN for live scores, highlights and sports news. Stream exclusive games on ESPN+ and play fantasy sports.', 'language': 'en'}

.. versionchanged:: 0.3.14

Deprecated ``aload`` (which was not async) and implemented a native async
``alazy_load``. Expand below for more details.

.. dropdown:: How to update ``aload``

    Instead of using ``aload``, you can use ``load`` for synchronous loading or
    ``alazy_load`` for asynchronous lazy loading.

    Example using ``load`` (synchronous):

    .. code-block:: python

        docs: List[Document] = loader.load()

    Example using ``alazy_load`` (asynchronous):

    .. code-block:: python

        docs: List[Document] = []
        async for doc in loader.alazy_load():
            docs.append(doc)

    This is in preparation for accommodating an asynchronous ``aload`` in the
    future:

    .. code-block:: python

        docs: List[Document] = await loader.aload()

## Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `web_paths` | `Sequence[str]` | No | Web paths to load from. (default: `()`) |
| `requests_per_second` | `int` | No | Max number of concurrent requests to make. (default: `2`) |
| `default_parser` | `str` | No | Default parser to use for BeautifulSoup. (default: `'html.parser'`) |
| `requests_kwargs` | `Optional[Dict[str, Any]]` | No | kwargs for requests (default: `None`) |
| `raise_for_status` | `bool` | No | Raise an exception if http status code denotes an error. (default: `False`) |
| `bs_get_text_kwargs` | `Optional[Dict[str, Any]]` | No | kwargs for beatifulsoup4 get_text (default: `None`) |
| `bs_kwargs` | `Optional[Dict[str, Any]]` | No | kwargs for beatifulsoup4 web page parsing (default: `None`) |
| `show_progress` | `bool` | No | Show progress bar when loading pages. (default: `True`) |
| `trust_env` | `bool` | No | set to True if using proxy to make web requests, for example using http(s)_proxy environment variables. Defaults to False. (default: `False`) |

## Extends

- `BaseLoader`

## Constructors

```python
__init__(
    self,
    web_path: Union[str, Sequence[str]] = '',
    header_template: Optional[dict] = None,
    verify_ssl: bool = True,
    proxies: Optional[dict] = None,
    continue_on_failure: bool = False,
    autoset_encoding: bool = True,
    encoding: Optional[str] = None,
    web_paths: Sequence[str] = (),
    requests_per_second: int = 2,
    default_parser: str = 'html.parser',
    requests_kwargs: Optional[Dict[str, Any]] = None,
    raise_for_status: bool = False,
    bs_get_text_kwargs: Optional[Dict[str, Any]] = None,
    bs_kwargs: Optional[Dict[str, Any]] = None,
    session: Any = None,
    *,
    show_progress: bool = True,
    trust_env: bool = False,
) -> None
```

| Name | Type |
|------|------|
| `web_path` | `Union[str, Sequence[str]]` |
| `header_template` | `Optional[dict]` |
| `verify_ssl` | `bool` |
| `proxies` | `Optional[dict]` |
| `continue_on_failure` | `bool` |
| `autoset_encoding` | `bool` |
| `encoding` | `Optional[str]` |
| `web_paths` | `Sequence[str]` |
| `requests_per_second` | `int` |
| `default_parser` | `str` |
| `requests_kwargs` | `Optional[Dict[str, Any]]` |
| `raise_for_status` | `bool` |
| `bs_get_text_kwargs` | `Optional[Dict[str, Any]]` |
| `bs_kwargs` | `Optional[Dict[str, Any]]` |
| `session` | `Any` |
| `show_progress` | `bool` |
| `trust_env` | `bool` |


## Properties

- `web_paths`
- `requests_per_second`
- `default_parser`
- `requests_kwargs`
- `raise_for_status`
- `show_progress`
- `bs_get_text_kwargs`
- `bs_kwargs`
- `session`
- `continue_on_failure`
- `autoset_encoding`
- `encoding`
- `trust_env`
- `web_path`

## Methods

- [`fetch_all()`](https://reference.langchain.com/python/langchain-community/document_loaders/web_base/WebBaseLoader/fetch_all)
- [`scrape_all()`](https://reference.langchain.com/python/langchain-community/document_loaders/web_base/WebBaseLoader/scrape_all)
- [`ascrape_all()`](https://reference.langchain.com/python/langchain-community/document_loaders/web_base/WebBaseLoader/ascrape_all)
- [`scrape()`](https://reference.langchain.com/python/langchain-community/document_loaders/web_base/WebBaseLoader/scrape)
- [`lazy_load()`](https://reference.langchain.com/python/langchain-community/document_loaders/web_base/WebBaseLoader/lazy_load)
- [`alazy_load()`](https://reference.langchain.com/python/langchain-community/document_loaders/web_base/WebBaseLoader/alazy_load)
- [`aload()`](https://reference.langchain.com/python/langchain-community/document_loaders/web_base/WebBaseLoader/aload)

---

[View source on GitHub](https://github.com/langchain-ai/langchain-community/blob/d5ea8358933260ad48dd31f7f8076555c7b4885a/libs/community/langchain_community/document_loaders/web_base.py#L42)