ModuleName document loader integration
Setup:
Install langchain-community and bs4.
.. code-block:: bash
pip install -U langchain-community bs4
Instantiate:
.. code-block:: python
from langchain_community.document_loaders import BSHTMLLoader
loader = BSHTMLLoader( file_path="./example_data/fake-content.html", )
Lazy load:
.. code-block:: python
docs = []
docs_lazy = loader.lazy_load()
# async variant:
# docs_lazy = await loader.alazy_load()
for doc in docs_lazy:
docs.append(doc)
print(docs[0].page_content[:100])
print(docs[0].metadata)
.. code-block:: python
Test Title
My First Heading
My first paragraph.
{'source': './example_data/fake-content.html', 'title': 'Test Title'}
Async load:
.. code-block:: python
docs = await loader.aload()
print(docs[0].page_content[:100])
print(docs[0].metadata)
.. code-block:: python
Test Title
My First Heading
My first paragraph.
{'source': './example_data/fake-content.html', 'title': 'Test Title'}
Any kwargs to pass to the BeautifulSoup object.
The separator to use when calling get_text on the soup.