Load blobs in the local file system.
Example:
.. code-block:: python
from langchain_community.document_loaders.blob_loaders import FileSystemBlobLoader
loader = FileSystemBlobLoader("/path/to/directory")
for blob in loader.yield_blobs():
print(blob) # noqa: T201
Generic Document Loader.
A generic document loader that allows combining an arbitrary blob loader with a blob parser.
Examples:
Parse a specific PDF file:
.. code-block:: python
from langchain_community.document_loaders import GenericLoader
from langchain_community.document_loaders.parsers.pdf import PyPDFParser
# Recursively load all text files in a directory.
loader = GenericLoader.from_filesystem(
"my_lovely_pdf.pdf",
parser=PyPDFParser()
)
.. code-block:: python
from langchain_community.document_loaders import GenericLoader
from langchain_community.document_loaders.blob_loaders import FileSystemBlobLoader
loader = GenericLoader.from_filesystem(
path="path/to/directory",
glob="**/[!.]*",
suffixes=[".pdf"],
show_progress=True,
)
docs = loader.lazy_load()
next(docs)
Example instantiations to change which files are loaded:
.. code-block:: python
# Recursively load all text files in a directory.
loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/*.txt")
# Recursively load all non-hidden files in a directory.
loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/[!.]*")
# Load all files in a directory without recursion.
loader = GenericLoader.from_filesystem("/path/to/dir", glob="*")
Example instantiations to change which parser is used:
.. code-block:: python
from langchain_community.document_loaders.parsers.pdf import PyPDFParser
# Recursively load all text files in a directory.
loader = GenericLoader.from_filesystem(
"/path/to/dir",
glob="**/*.pdf",
parser=PyPDFParser()
)
Load and pars Documents concurrently.