Note that the modelPath is the only required parameter. For testing you
can set this in the environment variable LLAMA_PATH.
interface LlamaCppInputsLlamaBaseCppInputsBaseChatModelParamsPrompt processing batch size.
Text context size.
Embedding mode only.
Use fp16 for KV cache.
GBNF string to be used to format output. Also known as grammar.
Number of layers to store in VRAM.
JSON schema to be used to format output. Also known as grammar.
The llama_eval() call computes all logits, not just the last one.
The maximum number of concurrent calls that can be made.
Defaults to Infinity, which means no limit.
The maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.
Path to the model on the filesystem.
Custom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.
Add the begining of sentence token.
Amount of randomness injected into the response. Ranges from 0 to 1 (0 is not included). Use temp closer to 0 for analytical / multiple choice, and temp closer to 1 for creative and generative tasks. Defaults to 0.95.
Number of threads to use to evaluate tokens.
Total probability mass of tokens to consider at each step. Range from 0 to 1.0. Defaults to 0.8.
Force system to keep model in RAM.
Use mmap if possible.
Whether to print out response text.
Only load the vocabulary, no weights.