Interface●Since v1.0

LlamaCppInputs

Note that the modelPath is the only required parameter. For testing you can set this in the environment variable LLAMA_PATH.

interface LlamaCppInputs

Bases

LlamaBaseCppInputsBaseChatModelParams

Properties

property

batchSize: number

Prompt processing batch size.

property

cache: boolean | BaseCache<Generation[]>

property

callbacks: Callbacks

property

contextSize: number

Text context size.

property

disableStreaming: boolean

property

embedding: boolean

Embedding mode only.

property

f16Kv: boolean

Use fp16 for KV cache.

property

gbnf: string

GBNF string to be used to format output. Also known as grammar.

property

gpuLayers: number

Number of layers to store in VRAM.

property

jsonSchema: object

JSON schema to be used to format output. Also known as grammar.

property

logitsAll: boolean

The llama_eval() call computes all logits, not just the last one.

property

maxConcurrency: number

The maximum number of concurrent calls that can be made. Defaults to Infinity, which means no limit.

property

maxRetries: number

The maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.

property

maxTokens: number

property

metadata: Record<string, unknown>

property

modelPath: string

Path to the model on the filesystem.

property

onFailedAttempt: FailedAttemptHandler

Custom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.

property

outputVersion: MessageOutputVersion

property

prependBos: boolean

Add the begining of sentence token.

property

seed: number | null

property

tags: string[]

property

temperature: number

Amount of randomness injected into the response. Ranges from 0 to 1 (0 is not included). Use temp closer to 0 for analytical / multiple choice, and temp closer to 1 for creative and generative tasks. Defaults to 0.95.

property

threads: number

Number of threads to use to evaluate tokens.

property

topK: number

property

topP: number

Total probability mass of tokens to consider at each step. Range from 0 to 1.0. Defaults to 0.8.

property

trimWhitespaceSuffix: boolean

property

useMlock: boolean

Force system to keep model in RAM.

property

useMmap: boolean

Use mmap if possible.

property

verbose: boolean

Whether to print out response text.

property

vocabOnly: boolean

Only load the vocabulary, no weights.

deprecatedproperty

callbackManager: CallbackManager

View source on GitHub

Properties

property

batchSize: number

Prompt processing batch size.

property

cache: boolean | BaseCache<Generation[]>

property

callbacks: Callbacks

property

contextSize: number

Text context size.

property

disableStreaming: boolean

property

embedding: boolean

Embedding mode only.

property

f16Kv: boolean

Use fp16 for KV cache.

property

gbnf: string

GBNF string to be used to format output. Also known as grammar.

property

gpuLayers: number

Number of layers to store in VRAM.

property

jsonSchema: object

JSON schema to be used to format output. Also known as grammar.

property

logitsAll: boolean

The llama_eval() call computes all logits, not just the last one.

property

maxConcurrency: number

The maximum number of concurrent calls that can be made. Defaults to Infinity, which means no limit.

property

maxRetries: number

The maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.

property

maxTokens: number

property

metadata: Record<string, unknown>

property

modelPath: string

Path to the model on the filesystem.

property

onFailedAttempt: FailedAttemptHandler

Custom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.

property

outputVersion: MessageOutputVersion

property

prependBos: boolean

Add the begining of sentence token.

property

seed: number | null

property

tags: string[]

property

temperature: number

property

threads: number

Number of threads to use to evaluate tokens.

property

topK: number

property

topP: number

Total probability mass of tokens to consider at each step. Range from 0 to 1.0. Defaults to 0.8.

property

trimWhitespaceSuffix: boolean

property

useMlock: boolean

Force system to keep model in RAM.

property

useMmap: boolean

Use mmap if possible.

property

verbose: boolean

Whether to print out response text.

property

vocabOnly: boolean

Only load the vocabulary, no weights.

deprecatedproperty

callbackManager: CallbackManager

LlamaCppInputs

Bases

Properties

LangChain Assistant

Menu

LlamaCppInputs

Bases

Properties