# ArxivRetriever

> **Class** in `langchain_community`

📖 [View in docs](https://reference.langchain.com/python/langchain-community/retrievers/arxiv/ArxivRetriever)

`Arxiv` retriever.

## Signature

```python
ArxivRetriever()
```

## Description

**Setup:**

Install ``arxiv``:

.. code-block:: bash

    pip install -U arxiv

**Key init args:**

load_max_docs: int
    maximum number of documents to load
get_ful_documents: bool
    whether to return full document text or snippets

**Instantiate:**

.. code-block:: python

from langchain_community.retrievers import ArxivRetriever

retriever = ArxivRetriever(
    load_max_docs=2,
    get_ful_documents=True,
)

**Usage:**

.. code-block:: python

    docs = retriever.invoke("What is the ImageBind model?")
    docs[0].metadata

.. code-block:: none

    {'Entry ID': 'http://arxiv.org/abs/2305.05665v2',
    'Published': datetime.date(2023, 5, 31),
    'Title': 'ImageBind: One Embedding Space To Bind Them All',
    'Authors': 'Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra'}

**Use within a chain:**

.. code-block:: python

    from langchain_core.output_parsers import StrOutputParser
    from langchain_core.prompts import ChatPromptTemplate
    from langchain_core.runnables import RunnablePassthrough
    from langchain_openai import ChatOpenAI

    prompt = ChatPromptTemplate.from_template(
        """Answer the question based only on the context provided.

    Context: {context}

    Question: {question}"""
    )

    llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)

    chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

    chain.invoke("What is the ImageBind model?")

.. code-block:: none

     'The ImageBind model is an approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data...'

## Extends

- `BaseRetriever`
- `ArxivAPIWrapper`

## Properties

- `get_full_documents`

---

[View source on GitHub](https://github.com/langchain-ai/langchain-community/blob/a6a6079511ac8a5c1293337f88096b8641562e77/libs/community/langchain_community/retrievers/arxiv.py#L10)