Llamafile lets you distribute and run large language models with a single file.
To get started, see: https://github.com/Mozilla-Ocho/llamafile
To use this class, you will need to first:
Download a llamafile.
Make the downloaded file executable: chmod +x path/to/model.llamafile
Start the llamafile in server mode with embeddings enabled:
./path/to/model.llamafile --server --nobrowser --embedding