DedocFileLoader document loader integration to load files using dedoc.
The file loader automatically detects the file type (with the correct extension). The list of supported file types is gives at https://dedoc.readthedocs.io/en/latest/index.html#id1. Please see the documentation of DedocBaseLoader to get more details.
Setup:
Install dedoc package.
.. code-block:: bash
pip install -U dedoc
Instantiate:
.. code-block:: python
from langchain_community.document_loaders import DedocFileLoader
loader = DedocFileLoader( file_path="example.pdf", # split=..., # with_tables=..., # pdf_with_text_layer=..., # pages=..., # ... )
Load:
.. code-block:: python
docs = loader.load()
print(docs[0].page_content[:100])
print(docs[0].metadata)
.. code-block:: python
Some text
{
'file_name': 'example.pdf',
'file_type': 'application/pdf',
# ...
}
Lazy load:
.. code-block:: python
docs = []
docs_lazy = loader.lazy_load()
for doc in docs_lazy:
docs.append(doc)
print(docs[0].page_content[:100])
print(docs[0].metadata)
.. code-block:: python
Some text
{
'file_name': 'example.pdf',
'file_type': 'application/pdf',
# ...
}