Cache that stores things in memory.
SQLite table for full LLM Cache (all generations).
Cache that uses SQAlchemy as a backend.
Cache that uses SQLite as a backend.
Cache that uses Upstash Redis as a backend.
Cache that uses Redis as a backend. Allows to use a sync redis.Redis client.
Cache that uses Redis as a backend. Allows to use an
Cache that uses Redis as a vector-store backend.
Cache that uses GPTCache as a backend.
Cache that uses Momento as a backend. See https://gomomento.com/
Cache that uses Cassandra / Astra DB as a backend.
Cache that uses Cassandra as a vector-store backend for semantic
SQLite table for full LLM Cache (all generations).
Cache that uses SQAlchemy as a backend.
Cache that uses Cosmos DB Mongo vCore vector-store backend
Cache that uses Cosmos DB NoSQL backend
Cache that uses OpenSearch vector store backend
Cache that uses Memcached backend through pymemcache client lib
Load from the Huawei OBS file.
Parse Oracle doc metadata...
Read a file
Load Jupyter notebook (.ipynb) files.
Load from Amazon AWS S3 directory.
Load documents from TiDB.
Load local Airbyte json files.
Load a sitemap and its URLs.
Load from TensorFlow Dataset.
Load CHM files using Unstructured.
Microsoft Compiled HTML Help (CHM) Parser.
Load Org-Mode files using Unstructured.
Load from Hugging Face Hub datasets.
Load Roam files from a directory.
Load Pandas DataFrame.
Load files from Dropbox.
Load pages from OneNote notebooks.
Load from Telegram chat dump.
Load Telegram chat json directory dump.
Load Documents using LLMSherpa.
Pebblo Safe Loader class is a wrapper around document loaders enabling the data
Loader for text data.
Load iFixit repair guides, device wikis and answers.
Load DOCX file using docx2txt and chunks at character level.
Load Microsoft Word file using Unstructured.
Load from Huawei OBS directory.
Load from Snowflake API.
Load model information from Hugging Face Hub, including README content.
Load elements from a blockchain smart contract.
Load RTF files using Unstructured.
Load PDF files using Unstructured.
Base Loader class for PDF files.
Load online PDF.
Load and parse a PDF file using 'pypdf' library.
Load and parse a PDF file using the pypdfium2 library.
Load and parse a directory of PDF files using 'pypdf' library.
Load and parse a PDF file using 'pdfminer.six' library.
Load PDF files as HTML content using PDFMiner.
Load and parse a PDF file using 'PyMuPDF' library.
Load PDF files using Mathpix service.
Load PDF files using pdfplumber.
Load PDF files from a local file system, HTTP or S3.
DedocPDFLoader document loader integration to load PDF files using dedoc.
Load a PDF with Azure Document Intelligence
Document loader utilizing Zerox library:
Load documents from Yuque.
Load from Open City.
Load Xorbits DataFrame.
Client for lakeFS.
Load from lakeFS.
Load from lakeFS as unstructured data.
Load documents from AWS Athena.
Load documents from Microsoft OneDrive.
Load from Baidu Cloud BOS file.
Load EPub files using Unstructured.
Load conversations from exported ChatGPT data.
Load webpages with Browserless /content endpoint.
Scrape HTML pages from URLs using a
Load HTML asynchronously.
Load CoNLL-U files.
Load files from remote URLs using Unstructured.
Load image captions.
Load Notion directory dump.
Load from IUGU.
Load from Azure AI Data.
Load from FaunaDB.
Load MongoDB documents.
WebBaseLoader document loader integration
Load from a directory.
Load records from an ArcGIS FeatureLayer.
Load Quip pages.
Load and pars Documents concurrently.
Transcript format to use for the document loader.
Load AssemblyAI audio transcripts.
Load AssemblyAI audio transcripts.
Load TOML files.
Load the Airtable tables.
Load College Confidential webpages.
Load Polars DataFrame.
Load geopandas Dataframe.
Generic Document Loader.
Load SurrealDB documents.
Load a query result from Arxiv.
Load a bibtex file.
Generic Google API Client.
Output formats of transcripts from YoutubeLoader.
Load YouTube video transcripts.
Load all Videos from a YouTube Channel.
Load news articles from RSS feeds using Unstructured.
Load Cube semantic layer metadata.
Load from LarkSuite (FeiShu).
Load from LarkSuite (FeiShu) wiki.
Load notes from Joplin.
Load from Alibaba Cloud MaxCompute table.
Load Twitter tweets.
Load Datadog logs.
Load documents from Couchbase.
Load from Spreedly API.
Load documents by querying database tables supported by SQLAlchemy.
Load IMSDb webpages.
Load Figma file.
Base class for all loaders that uses O365 Package
Enumerator of the content formats of Confluence page.
Load Confluence pages.
Load with an Airbyte source connector implemented using the CDK.
A wrapper around the CDK integration.
Load from Hubspot using an Airbyte source connector.
Load from Stripe using an Airbyte source connector.
Load from Typeform using an Airbyte source connector.
Load from Zendesk Support using an Airbyte source connector.
Load from Shopify using an Airbyte source connector.
Load from Salesforce using an Airbyte source connector.
Load from Gong using an Airbyte source connector.
Load ReadTheDocs documentation directory.
Load from a Slack directory dump.
Load AZLyrics webpages.
Load from Kinetica API.
Load a PDF with Azure Document Intelligence.
Load Obsidian files from directory.
Document loader for EverNote ENEX export files.
Load Python files, respecting any non-default encoding if specified.
Load Hacker News data.
Load Markdown files using Unstructured.
Load weather data with Open Weather Map API.
File encoding as the NamedTuple.
NeedleLoader is a document loader for managing documents stored in a collection.
Load from SharePoint.
Load from any file type using Nuclia Understanding API.
Load Microsoft PowerPoint files using Unstructured.
Base Loader that uses dedoc (https://dedoc.readthedocs.io).
DedocFileLoader document loader integration to load files using dedoc.
Load files using dedoc API.
Load .srt (subtitle) files.
Load Diffbot json file.
Load from Tencent Cloud COS directory.
Load PySpark DataFrames.
Column not found error.
Load from a Rockset database.
Turn a url to llm accessible markdown with Scrapfly.io.
Load from DuckDB.
Load GitBook data.
Load a CSV file into a list of Document objects.
Load CSV files using Unstructured.
Load a Blackboard course.
Load from Gutenberg.org.
Load acreom vault from a directory.
Load from Stripe API.
Load XML file using Unstructured.
Merge documents from a list of loaders
Load from Baidu BOS directory.
Load Facebook Chat messages directory dump.
ModuleName document loader integration
Load TSV files using Unstructured.
Load from Amazon AWS S3 file.
Load PNG and JPG files using Unstructured.
Load a JSON file using a jq schema.
Abstract base class for all evaluators.
Evaluate the page HTML content using the unstructured library.
Load HTML pages with Playwright and parse with Unstructured.
Load HTML using 2markdown API.
Enumerator of the supported blockchains.
Load elements from a blockchain smart contract.
Load from Docusaurus Documentation.
Load a file from Microsoft OneDrive.
Load MediaWiki dump from an XML file.
Load RST files using Unstructured.
Load the Mastodon 'toots'.
Recursively load all child links from a root URL.
Load text file.
Parse MHTML files with BeautifulSoup.
Load Git repository files.
Load from Wikipedia.
Load OpenOffice ODT files using Unstructured.
FireCrawlLoader document loader integration
Load news articles from URLs using Unstructured.
Load Reddit posts.
Load HTML pages with Selenium and parse with Unstructured.
Load cards from a Trello board.
Load from Modern Treasury.
Load from the PubMed biomedical library.
Base Loader that uses Unstructured.
Load transactions from Ethereum mainnet.
Load HTML files using Unstructured.
Load WhatsApp messages text file.
Load email files using Unstructured.
Loads Outlook Message files using extract_msg.
Load table schemas from AWS Glue.
Load content from RSpace notebooks, folders, documents or PDF Gallery files.
Load with Brave Search engine.
Load from Notion DB.
Load from Tencent Cloud COS file.
Load Discord chat logs.
Load from Psychic.dev.
Turn an url to LLM accessible markdown with ScrapingAnt.
Load GitHub repository Issues.
Load issues of a GitHub repository.
Load GitHub File
Load fetching transcripts from BiliBili videos.
Load pre-rendered web pages using a headless browser hosted on Browserbase.
Load web pages as Documents using Spider AI.
Load Microsoft Excel files using Unstructured.
Load blobs from cloud URL or file:.
Load YouTube urls as audio file(s).
Load blobs in the local file system.
Parse the Microsoft Word documents from a blob.
Parse a blob from a PDF using pypdf library.
Parse a blob from a PDF using pdfminer.six library.
Parse a blob from a PDF using PyMuPDF library.
Parse a blob from a PDF using PyPDFium2 library.
Parse PDF with PDFPlumber.
Send PDF files to Amazon Textract and parse them.
Loads a PDF with Azure Document Intelligence
Transcribe and parse audio files using Azure OpenAI Whisper.
Transcribe and parse audio files.
Transcribe and parse audio files with OpenAI Whisper model.
Transcribe and parse audio files.
Transcribe and parse audio files with faster-whisper.
A wrapper class that adapts a document loader to function as a parser.
Parser that uses mime-types to parse a blob.
Dataclass to store Document AI parsing results.
Loads a PDF with Azure Document Intelligence
Parser for text blobs.
Parser for vsdx files.
Abstract base class for parsing image blobs into text.
Parser for extracting text from images using the RapidOCR library.
Parse for extracting text from images using the Tesseract OCR library.
Parser for analyzing images using a language model (LLM).
Exception raised when the Grobid server is unavailable.
Load article PDF files using Grobid.
Code segmenter for Go.
Code segmenter for PHP.
Parse using the respective programming language syntax.
Code segmenter for C.
Code segmenter for Lua.
Code segmenter for Scala.
Code segmenter for Ruby.
Code segmenter for TypeScript.
Code segmenter for SQL.
Code segmenter for Python.
Code segmenter for C#.
Code segmenter for COBOL.
Abstract class for the code segmenter.
Code segmenter for Java.
Code segmenter for Elixir.
Code segmenter for JavaScript.
Abstract class for CodeSegmenters that use the tree-sitter library.
Code segmenter for Perl.
Code segmenter for Kotlin.
Code segmenter for Rust.
Code segmenter for C++.
Parse HTML files using Beautiful Soup.
Representation of a callable function to the Ernie API.
Representation of a callable function to the Ernie API.
LocalAI embedding models.
Javelin AI Gateway embeddings.
Fake embedding model.
Fake embedding model that always returns
Custom exception for interfacing with Takeoff Embedding class.
Exception raised when no consumer group is provided on initialization of
Device to use for inference, cuda or cpu.
Configuration for the reader to be deployed in Takeoff.
Interface with Takeoff Inference API for embedding models.
Content handler for LLM class.
Custom Sagemaker Inference Endpoints.
Google's PaLM Embeddings APIs.
Tencent Hunyuan embedding models API by Tencent.
NCP ClovaStudio Embedding API.
OCI authentication types as enumerator.
OCI embedding models.
Payload for the Embaas embeddings API.
Embaas's embedding service.
MiniMax embedding model integration.
JohnSnowLabs embedding models
Baichuan Text Embedding models.
URL class for parsing the URL.
SparkLLM embedding model integration.
Exception raised for errors in the header assembly.
MLflow AI Gateway embeddings.
EdenAI embedding.
NLP Cloud embedding models.
MosaicML embedding service.
TensorflowHub embedding models.
Embedding LLMs in MLflow.
Cohere embedding LLMs in MLflow.
Embeddings by spaCy models.
llama.cpp embedding models.
Anyscale Embeddings API.
Prem's Embedding APIs
Volcengine Embeddings embedding models.
OctoAI Compute Service embedding models.
OpenVINO embedding models.
OpenVNO BGE embedding models.
Embedding documents and queries with Awa DB.
ModelScopeHub embedding models.
Ascend NPU accelerate Embedding model
text2vec embedding models.
Jina embedding models.
HuggingFace embedding models on self-hosted remote hardware.
HuggingFace InstructEmbedding models on self-hosted remote hardware.
Custom embedding models on self-hosted remote hardware.
Bookend AI sentence_transformers embedding models.
ZhipuAI embedding model integration.
LLMRails embedding models.
YandexGPT Embeddings models.
GPT4All embedding models.
Gradient.ai Embedding models.
Deprecated, TinyAsyncGradientEmbeddingClient was removed.
Llamafile lets you distribute and run large language models with a
OVHcloud AI Endpoints Embeddings.
Symmetric version of the Aleph Alpha's semantic embeddings.
NeedleRetriever retrieves relevant documents or context from a Needle collection
Wrapper for Jira API. You can connect to Jira with either an API token or OAuth2.
Callback Handler that logs evaluation results to uptrain and the console.
Azure ML endpoints API types. Use dedicated for models deployed in hosted
Recursively remove newlines, no matter the data structure they are stored in.
Import the textstat python package and raise an error if it is not installed.
Convert a dictionary to a YAML-like string without using external libraries.