langchain.js
    Preparing search index...

    Note that the modelPath is the only required parameter. For testing you can set this in the environment variable LLAMA_PATH.

    interface LlamaCppInputs {
        batchSize?: number;
        contextSize?: number;
        embedding?: boolean;
        f16Kv?: boolean;
        gbnf?: string;
        gpuLayers?: number;
        jsonSchema?: object;
        logitsAll?: boolean;
        maxTokens?: number;
        modelPath: string;
        prependBos?: boolean;
        seed?: null | number;
        temperature?: number;
        threads?: number;
        topK?: number;
        topP?: number;
        trimWhitespaceSuffix?: boolean;
        useMlock?: boolean;
        useMmap?: boolean;
        vocabOnly?: boolean;
    }

    Hierarchy (View Summary)

    Index

    Properties

    batchSize?: number

    Prompt processing batch size.

    contextSize?: number

    Text context size.

    embedding?: boolean

    Embedding mode only.

    f16Kv?: boolean

    Use fp16 for KV cache.

    gbnf?: string

    GBNF string to be used to format output. Also known as grammar.

    gpuLayers?: number

    Number of layers to store in VRAM.

    jsonSchema?: object

    JSON schema to be used to format output. Also known as grammar.

    logitsAll?: boolean

    The llama_eval() call computes all logits, not just the last one.

    maxTokens?: number
    modelPath: string

    Path to the model on the filesystem.

    prependBos?: boolean

    Add the begining of sentence token.

    seed?: null | number

    If null, a random seed will be used.

    temperature?: number

    The randomness of the responses, e.g. 0.1 deterministic, 1.5 creative, 0.8 balanced, 0 disables.

    threads?: number

    Number of threads to use to evaluate tokens.

    topK?: number

    Consider the n most likely tokens, where n is 1 to vocabulary size, 0 disables (uses full vocabulary). Note: only applies when temperature > 0.

    topP?: number

    Selects the smallest token set whose probability exceeds P, where P is between 0 - 1, 1 disables. Note: only applies when temperature > 0.

    trimWhitespaceSuffix?: boolean

    Trim whitespace from the end of the generated text Disabled by default.

    useMlock?: boolean

    Force system to keep model in RAM.

    useMmap?: boolean

    Use mmap if possible.

    vocabOnly?: boolean

    Only load the vocabulary, no weights.