Interface LlamaCppInputs

Note that the modelPath is the only required parameter. For testing you can set this in the environment variable LLAMA_PATH.

interface LlamaCppInputs {
    batchSize?: number;
    contextSize?: number;
    embedding?: boolean;
    f16Kv?: boolean;
    gbnf?: string;
    gpuLayers?: number;
    jsonSchema?: object;
    logitsAll?: boolean;
    maxTokens?: number;
    modelPath: string;
    prependBos?: boolean;
    seed?: null | number;
    temperature?: number;
    threads?: number;
    topK?: number;
    topP?: number;
    trimWhitespaceSuffix?: boolean;
    useMlock?: boolean;
    useMmap?: boolean;
    vocabOnly?: boolean;
}

Hierarchy (View Summary)

LlamaBaseCppInputs
Toolkit
- LlamaCppInputs

Properties

`Optional`batchSize

batchSize?: number

Prompt processing batch size.

`Optional`contextSize

contextSize?: number

Text context size.

`Optional`embedding

embedding?: boolean

Embedding mode only.

`Optional`f16Kv

f16Kv?: boolean

Use fp16 for KV cache.

`Optional`gbnf

gbnf?: string

GBNF string to be used to format output. Also known as grammar.

`Optional`gpuLayers

gpuLayers?: number

Number of layers to store in VRAM.

`Optional`jsonSchema

jsonSchema?: object

JSON schema to be used to format output. Also known as grammar.

`Optional`logitsAll

logitsAll?: boolean

The llama_eval() call computes all logits, not just the last one.

`Optional`maxTokens

maxTokens?: number

modelPath

modelPath: string

Path to the model on the filesystem.

`Optional`prependBos

prependBos?: boolean

Add the begining of sentence token.

`Optional`seed

seed?: null | number

If null, a random seed will be used.

`Optional`temperature

temperature?: number

The randomness of the responses, e.g. 0.1 deterministic, 1.5 creative, 0.8 balanced, 0 disables.

`Optional`threads

threads?: number

Number of threads to use to evaluate tokens.

`Optional`topK

topK?: number

Consider the n most likely tokens, where n is 1 to vocabulary size, 0 disables (uses full vocabulary). Note: only applies when temperature > 0.

`Optional`topP

topP?: number

Selects the smallest token set whose probability exceeds P, where P is between 0 - 1, 1 disables. Note: only applies when temperature > 0.

`Optional`trimWhitespaceSuffix

trimWhitespaceSuffix?: boolean

Trim whitespace from the end of the generated text Disabled by default.

`Optional`useMlock

useMlock?: boolean

Force system to keep model in RAM.

`Optional`useMmap

useMmap?: boolean

Use mmap if possible.

`Optional`vocabOnly

vocabOnly?: boolean

Only load the vocabulary, no weights.

Interface LlamaCppInputs

Hierarchy (View Summary)

Index

Properties

Properties

`Optional`batchSize

`Optional`contextSize

`Optional`embedding

`Optional`f16Kv

`Optional`gbnf

`Optional`gpuLayers

`Optional`jsonSchema

`Optional`logitsAll

`Optional`maxTokens

modelPath

`Optional`prependBos

`Optional`seed

`Optional`temperature

`Optional`threads

`Optional`topK

`Optional`topP

`Optional`trimWhitespaceSuffix

`Optional`useMlock

`Optional`useMmap

`Optional`vocabOnly

Settings

On This Page

Interface LlamaCppInputs

Hierarchy (View Summary)

Index

Properties

Properties

OptionalbatchSize

OptionalcontextSize

Optionalembedding

Optionalf16Kv

Optionalgbnf

OptionalgpuLayers

OptionaljsonSchema

OptionallogitsAll

OptionalmaxTokens

modelPath

OptionalprependBos

Optionalseed

Optionaltemperature

Optionalthreads

OptionaltopK

OptionaltopP

OptionaltrimWhitespaceSuffix

OptionaluseMlock

OptionaluseMmap

OptionalvocabOnly

Settings

On This Page

`Optional`batchSize

`Optional`contextSize

`Optional`embedding

`Optional`f16Kv

`Optional`gbnf

`Optional`gpuLayers

`Optional`jsonSchema

`Optional`logitsAll

`Optional`maxTokens

`Optional`prependBos

`Optional`seed

`Optional`temperature

`Optional`threads

`Optional`topK

`Optional`topP

`Optional`trimWhitespaceSuffix

`Optional`useMlock

`Optional`useMmap

`Optional`vocabOnly