Load Documents using LLMSherpa.
LLMSherpaFileLoader use LayoutPDFReader, which is part of the LLMSherpa library. This tool is designed to parse PDFs while preserving their layout information, which is often lost when using most PDF to text parsers.
from langchain_community.document_loaders.llmsherpa import LLMSherpaFileLoader
loader = LLMSherpaFileLoader( "example.pdf", strategy="chunks", llmsherpa_api_url="http://localhost:5010/api/parseDocument?renderFormat=all", ) docs = loader.load()