# ArxivLoader

> **Class** in `langchain_community`

📖 [View in docs](https://reference.langchain.com/python/langchain-community/document_loaders/arxiv/ArxivLoader)

Load a query result from `Arxiv`.
The loader converts the original PDF format into the text.

## Signature

```python
ArxivLoader(
    self,
    query: str,
    doc_content_chars_max: Optional[int] = None,
    **kwargs: Any = {},
)
```

## Description

**Setup:**

Install ``arxiv`` and ``PyMuPDF`` packages.
``PyMuPDF`` transforms PDF files downloaded from the arxiv.org site
into the text format.

.. code-block:: bash

    pip install -U arxiv pymupdf

**Instantiate:**

.. code-block:: python

from langchain_community.document_loaders import ArxivLoader

loader = ArxivLoader(
    query="reasoning",
    # load_max_docs=2,
    # load_all_available_meta=False
)

**Load:**

.. code-block:: python

    docs = loader.load()
    print(docs[0].page_content[:100])
    print(docs[0].metadata)

.. code-block:: python
    Understanding the Reasoning Ability of Language Models
    From the Perspective of Reasoning Paths Aggre
    {
        'Published': '2024-02-29',
        'Title': 'Understanding the Reasoning Ability of Language Models From the
                Perspective of Reasoning Paths Aggregation',
        'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan,
                Wenhu Chen, William Yang Wang',
        'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning
                without explicit fine-tuning...'
    }

**Lazy load:**

.. code-block:: python

    docs = []
    docs_lazy = loader.lazy_load()

    # async variant:
    # docs_lazy = await loader.alazy_load()

    for doc in docs_lazy:
        docs.append(doc)
    print(docs[0].page_content[:100])
    print(docs[0].metadata)

.. code-block:: python

    Understanding the Reasoning Ability of Language Models
    From the Perspective of Reasoning Paths Aggre
    {
        'Published': '2024-02-29',
        'Title': 'Understanding the Reasoning Ability of Language Models From the
                Perspective of Reasoning Paths Aggregation',
        'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan,
                Wenhu Chen, William Yang Wang',
        'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning
                without explicit fine-tuning...'
    }

**Async load:**

.. code-block:: python

    docs = await loader.aload()
    print(docs[0].page_content[:100])
    print(docs[0].metadata)

.. code-block:: python

    Understanding the Reasoning Ability of Language Models
    From the Perspective of Reasoning Paths Aggre
    {
        'Published': '2024-02-29',
        'Title': 'Understanding the Reasoning Ability of Language Models From the
                Perspective of Reasoning Paths Aggregation',
        'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan,
                Wenhu Chen, William Yang Wang',
        'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning
                without explicit fine-tuning...'
    }

**Use summaries of articles as docs:**

.. code-block:: python

    from langchain_community.document_loaders import ArxivLoader

    loader = ArxivLoader(
        query="reasoning"
    )

    docs = loader.get_summaries_as_docs()
    print(docs[0].page_content[:100])
    print(docs[0].metadata)

.. code-block:: python

    Pre-trained language models (LMs) are able to perform complex reasoning
    without explicit fine-tuning
    {
        'Entry ID': 'http://arxiv.org/abs/2402.03268v2',
        'Published': datetime.date(2024, 2, 29),
        'Title': 'Understanding the Reasoning Ability of Language Models From the
                Perspective of Reasoning Paths Aggregation',
        'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan,
                Wenhu Chen, William Yang Wang'
    }

## Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `query` | `str` | Yes | free text which used to find documents in the Arxiv |
| `doc_content_chars_max` | `Optional[int]` | No | cut limit for the length of a document's content (default: `None`) |

## Extends

- `BaseLoader`

## Constructors

```python
__init__(
    self,
    query: str,
    doc_content_chars_max: Optional[int] = None,
    **kwargs: Any = {},
)
```

| Name | Type |
|------|------|
| `query` | `str` |
| `doc_content_chars_max` | `Optional[int]` |


## Properties

- `query`
- `client`

## Methods

- [`lazy_load()`](https://reference.langchain.com/python/langchain-community/document_loaders/arxiv/ArxivLoader/lazy_load)
- [`get_summaries_as_docs()`](https://reference.langchain.com/python/langchain-community/document_loaders/arxiv/ArxivLoader/get_summaries_as_docs)

---

[View source on GitHub](https://github.com/langchain-ai/langchain-community/blob/4b280287bd55b99b44db2dd849f02d66c89534d5/libs/community/langchain_community/document_loaders/arxiv.py#L9)