Load from the Huawei OBS file.
Parse Oracle doc metadata...
Read a file
Load Jupyter notebook (.ipynb) files.
Load from Amazon AWS S3 directory.
Load documents from TiDB.
Load local Airbyte json files.
Load a sitemap and its URLs.
Load from TensorFlow Dataset.
Load CHM files using Unstructured.
Microsoft Compiled HTML Help (CHM) Parser.
Load Org-Mode files using Unstructured.
Load from Hugging Face Hub datasets.
Load Roam files from a directory.
Load Pandas DataFrame.
Load files from Dropbox.
Load pages from OneNote notebooks.
Load from Telegram chat dump.
Load Telegram chat json directory dump.
Load Documents using LLMSherpa.
Pebblo Safe Loader class is a wrapper around document loaders enabling the data
Loader for text data.
Load iFixit repair guides, device wikis and answers.
Load DOCX file using docx2txt and chunks at character level.
Load Microsoft Word file using Unstructured.
Load from Huawei OBS directory.
Load from Snowflake API.
Load model information from Hugging Face Hub, including README content.
Load elements from a blockchain smart contract.
Load RTF files using Unstructured.
Load PDF files using Unstructured.
Base Loader class for PDF files.
Load online PDF.
Load and parse a PDF file using 'pypdf' library.
Load and parse a PDF file using the pypdfium2 library.
Load and parse a directory of PDF files using 'pypdf' library.
Load and parse a PDF file using 'pdfminer.six' library.
Load PDF files as HTML content using PDFMiner.
Load and parse a PDF file using 'PyMuPDF' library.
Load PDF files using Mathpix service.
Load PDF files using pdfplumber.
Load PDF files from a local file system, HTTP or S3.
DedocPDFLoader document loader integration to load PDF files using dedoc.
Load a PDF with Azure Document Intelligence
Document loader utilizing Zerox library:
Load documents from Yuque.
Load from Open City.
Load Xorbits DataFrame.
Client for lakeFS.
Load from lakeFS.
Load from lakeFS as unstructured data.
Load documents from AWS Athena.
Load documents from Microsoft OneDrive.
Load from Baidu Cloud BOS file.
Load EPub files using Unstructured.
Load conversations from exported ChatGPT data.
Load webpages with Browserless /content endpoint.
Scrape HTML pages from URLs using a
Load HTML asynchronously.
Load CoNLL-U files.
Load files from remote URLs using Unstructured.
Load image captions.
Load Notion directory dump.
Load from IUGU.
Load from Azure AI Data.
Load from FaunaDB.
Load MongoDB documents.
WebBaseLoader document loader integration
Load from a directory.
Load records from an ArcGIS FeatureLayer.
Load Quip pages.
Load and pars Documents concurrently.
Transcript format to use for the document loader.
Load AssemblyAI audio transcripts.
Load AssemblyAI audio transcripts.
Load TOML files.
Load the Airtable tables.
Load College Confidential webpages.
Load Polars DataFrame.
Load geopandas Dataframe.
Generic Document Loader.
Load SurrealDB documents.
Load a query result from Arxiv.
Load a bibtex file.
Generic Google API Client.
Output formats of transcripts from YoutubeLoader.
Load YouTube video transcripts.
Load all Videos from a YouTube Channel.
Load news articles from RSS feeds using Unstructured.
Load Cube semantic layer metadata.
Load from LarkSuite (FeiShu).
Load from LarkSuite (FeiShu) wiki.
Load notes from Joplin.
Load from Alibaba Cloud MaxCompute table.
Load Twitter tweets.
Load Datadog logs.
Load documents from Couchbase.
Load from Spreedly API.
Load documents by querying database tables supported by SQLAlchemy.
Load IMSDb webpages.
Load Figma file.
Base class for all loaders that uses O365 Package
Enumerator of the content formats of Confluence page.
Load Confluence pages.
Load with an Airbyte source connector implemented using the CDK.
A wrapper around the CDK integration.
Load from Hubspot using an Airbyte source connector.
Load from Stripe using an Airbyte source connector.
Load from Typeform using an Airbyte source connector.
Load from Zendesk Support using an Airbyte source connector.
Load from Shopify using an Airbyte source connector.
Load from Salesforce using an Airbyte source connector.
Load from Gong using an Airbyte source connector.
Load ReadTheDocs documentation directory.
Load from a Slack directory dump.
Load AZLyrics webpages.
Load from Kinetica API.
Load a PDF with Azure Document Intelligence.
Load Obsidian files from directory.
Document loader for EverNote ENEX export files.
Load Python files, respecting any non-default encoding if specified.
Load Hacker News data.
Load Markdown files using Unstructured.
Load weather data with Open Weather Map API.
File encoding as the NamedTuple.
NeedleLoader is a document loader for managing documents stored in a collection.
Load from SharePoint.
Load from any file type using Nuclia Understanding API.
Load Microsoft PowerPoint files using Unstructured.
Base Loader that uses dedoc (https://dedoc.readthedocs.io).
DedocFileLoader document loader integration to load files using dedoc.
Load files using dedoc API.
Load .srt (subtitle) files.
Load Diffbot json file.
Load from Tencent Cloud COS directory.
Load PySpark DataFrames.
Column not found error.
Load from a Rockset database.
Turn a url to llm accessible markdown with Scrapfly.io.
Load from DuckDB.
Load GitBook data.
Load a CSV file into a list of Document objects.
Load CSV files using Unstructured.
Load a Blackboard course.
Load from Gutenberg.org.
Load acreom vault from a directory.
Load from Stripe API.
Load XML file using Unstructured.
Merge documents from a list of loaders
Load from Baidu BOS directory.
Load Facebook Chat messages directory dump.
ModuleName document loader integration
Load TSV files using Unstructured.
Load from Amazon AWS S3 file.
Load PNG and JPG files using Unstructured.
Load a JSON file using a jq schema.
Abstract base class for all evaluators.
Evaluate the page HTML content using the unstructured library.
Load HTML pages with Playwright and parse with Unstructured.
Load HTML using 2markdown API.
Enumerator of the supported blockchains.
Load elements from a blockchain smart contract.
Load from Docusaurus Documentation.
Load a file from Microsoft OneDrive.
Load MediaWiki dump from an XML file.
Load RST files using Unstructured.
Load the Mastodon 'toots'.
Recursively load all child links from a root URL.
Load text file.
Parse MHTML files with BeautifulSoup.
Load Git repository files.
Load from Wikipedia.
Load OpenOffice ODT files using Unstructured.
FireCrawlLoader document loader integration
Load news articles from URLs using Unstructured.
Load Reddit posts.
Load HTML pages with Selenium and parse with Unstructured.
Load cards from a Trello board.
Load from Modern Treasury.
Load from the PubMed biomedical library.
Base Loader that uses Unstructured.
Load transactions from Ethereum mainnet.
Load HTML files using Unstructured.
Load WhatsApp messages text file.
Load email files using Unstructured.
Loads Outlook Message files using extract_msg.
Load table schemas from AWS Glue.
Load content from RSpace notebooks, folders, documents or PDF Gallery files.
Load with Brave Search engine.
Load from Notion DB.
Load from Tencent Cloud COS file.
Load Discord chat logs.
Load from Psychic.dev.
Turn an url to LLM accessible markdown with ScrapingAnt.
Load GitHub repository Issues.
Load issues of a GitHub repository.
Load GitHub File
Load fetching transcripts from BiliBili videos.
Load pre-rendered web pages using a headless browser hosted on Browserbase.
Load web pages as Documents using Spider AI.
Load Microsoft Excel files using Unstructured.
Load blobs from cloud URL or file:.
Load YouTube urls as audio file(s).
Load blobs in the local file system.
Parse the Microsoft Word documents from a blob.
Parse a blob from a PDF using pypdf library.
Parse a blob from a PDF using pdfminer.six library.
Parse a blob from a PDF using PyMuPDF library.
Parse a blob from a PDF using PyPDFium2 library.
Parse PDF with PDFPlumber.
Send PDF files to Amazon Textract and parse them.
Loads a PDF with Azure Document Intelligence
Transcribe and parse audio files using Azure OpenAI Whisper.
Transcribe and parse audio files.
Transcribe and parse audio files with OpenAI Whisper model.
Transcribe and parse audio files.
Transcribe and parse audio files with faster-whisper.
A wrapper class that adapts a document loader to function as a parser.
Parser that uses mime-types to parse a blob.
Dataclass to store Document AI parsing results.
Loads a PDF with Azure Document Intelligence
Parser for text blobs.
Parser for vsdx files.
Abstract base class for parsing image blobs into text.
Parser for extracting text from images using the RapidOCR library.
Parse for extracting text from images using the Tesseract OCR library.
Parser for analyzing images using a language model (LLM).
Exception raised when the Grobid server is unavailable.
Load article PDF files using Grobid.
Code segmenter for Go.
Code segmenter for PHP.
Parse using the respective programming language syntax.
Code segmenter for C.
Code segmenter for Lua.
Code segmenter for Scala.
Code segmenter for Ruby.
Code segmenter for TypeScript.
Code segmenter for SQL.
Code segmenter for Python.
Code segmenter for C#.
Code segmenter for COBOL.
Abstract class for the code segmenter.
Code segmenter for Java.
Code segmenter for Elixir.
Code segmenter for JavaScript.
Abstract class for CodeSegmenters that use the tree-sitter library.
Code segmenter for Perl.
Code segmenter for Kotlin.
Code segmenter for Rust.
Code segmenter for C++.
Parse HTML files using Beautiful Soup.
Document compressor that uses Volcengine Rerank API.
Document compressor using Flashrank interface.
Document compressor that uses Jina Rerank API.
Request for reranking.
OpenVINO rerank models.
Document compressor using Flashrank interface.
Compress using LLMLingua Project.
Document compressor that uses Infinity Rerank API.
Document compressor that uses DashScope Rerank API.
Transform HTML content by extracting specific tags and removing unwanted ones.
Nuclia Text Transformer.
Replace occurrences of a particular search pattern with a replacement string
Reorder long context.
Extract metadata tags from document contents using OpenAI functions.
Extract properties from text documents using doctran.
Translate text documents using doctran.
Converts HTML documents to Markdown format with customizable options for handling
Extract QA from text documents using doctran.
Filter that drops redundant documents by comparing their embeddings.
Perform K-means clustering on document vectors.
Load telegram conversations to LangChain chat messages.
Load chat sessions from a list of LangSmith "llm" runs.
Load chat sessions from a LangSmith dataset with the "chat" data type.
Load chat sessions from the iMessage chat.db SQLite file.
Load Slack conversations from a dump zip file.
Load WhatsApp conversations from a dump zip file or directory.
Load Facebook Messenger chat data from a single file.
Load Facebook Messenger chat data from a folder.
Load chat sessions from Gmail.
Load from oracle adb
Read documents using OracleDocLoader
Splitting text using Oracle chunker.
Load from Azure Blob Storage container.
Load from the Google Cloud Platform BigQuery.
Load from GCS file.
Load Google Docs from Google Drive.
Load from GCS directory.
Loader for Google Cloud Speech-to-Text audio transcripts.
Load from Azure Blob Storage files.
Load datasets from Apify web scraping, crawling, and data extraction platform.
Load files using Unstructured.
Load files using Unstructured API.
Load file-like objects opened in read mode using Unstructured.
Send file-like objects with unstructured-client sdk to the Unstructured API.
Load from Docugami.
Google Cloud Document AI parser.
Translate text documents using Google Cloud Translation.