Transcribe and parse audio files with faster-whisper.
faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.
It can automatically detect the following 14 languages and transcribe the text into their respective languages: en, zh, fr, de, ja, ko, ru, es, th, it, pt, vi, ar, tr.
The gitbub repository for faster-whisper is : https://github.com/SYSTRAN/faster-whisper
Load a YouTube video and transcribe the video speech into a document.:
.. code-block:: python
from langchain_classic.document_loaders.generic import GenericLoader from langchain_community.document_loaders.parsers.audio import FasterWhisperParser from langchain_classic.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader
url="https://www.youtube.com/watch?v=your_video" save_dir="your_dir/" loader = GenericLoader( YoutubeAudioLoader([url],save_dir), FasterWhisperParser() ) docs = loader.load()
It can be "cuda" or "cpu" based on the available device.
There are four model sizes to choose from: "base", "small", "medium", and "large-v3", based on the available GPU memory.
Lazily parse the blob.