LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
  • MCP Adapters
    • Overview
    • Agents
    • Callbacks
    • Chains
    • Chat models
    • Embeddings
    • Evaluation
    • Globals
    • Hub
    • Memory
    • Output parsers
    • Retrievers
    • Runnables
    • LangSmith
    • Storage
    Standard Tests
    Text Splitters
    ⌘I

    LangChain Assistant

    Ask a question to get started

    Enter to send•Shift+Enter new line

    Menu

    MCP Adapters
    OverviewAgentsCallbacksChainsChat modelsEmbeddingsEvaluationGlobalsHubMemoryOutput parsersRetrieversRunnablesLangSmithStorage
    Standard Tests
    Text Splitters
    Language
    Theme
    Pythonlangchain-classicsmithevaluationconfigRunEvalConfig
    Class●Since v1.0

    RunEvalConfig

    Configuration for a run evaluation.

    Copy
    RunEvalConfig()

    Bases

    BaseModel

    Attributes

    attribute
    evaluators: list[SINGLE_EVAL_CONFIG_TYPE | CUSTOM_EVALUATOR_TYPE]

    Configurations for which evaluators to apply to the dataset run. Each can be the string of an EvaluatorType <langchain.evaluation.schema.EvaluatorType>, such as EvaluatorType.QA, the evaluator type string ("qa"), or a configuration for a given evaluator (e.g., RunEvalConfig.QA <langchain.smith.evaluation.config.RunEvalConfig.QA>).

    attribute
    custom_evaluators: list[CUSTOM_EVALUATOR_TYPE] | None

    Custom evaluators to apply to the dataset run.

    attribute
    batch_evaluators: list[BATCH_EVALUATOR_LIKE] | None

    Evaluators that run on an aggregate/batch level.

    These generate one or more metrics that are assigned to the full test run. As a result, they are not associated with individual traces.

    attribute
    reference_key: str | None

    The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.

    attribute
    prediction_key: str | None

    The key from the traced run's outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.

    attribute
    input_key: str | None

    The key from the traced run's inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.

    attribute
    eval_llm: BaseLanguageModel | None

    The language model to pass to any evaluators that require one.

    attribute
    model_config

    Classes

    class
    Criteria

    Configuration for a reference-free criteria evaluator.

    class
    LabeledCriteria

    Configuration for a labeled (with references) criteria evaluator.

    class
    EmbeddingDistance

    Configuration for an embedding distance evaluator.

    class
    StringDistance

    Configuration for a string distance evaluator.

    class
    QA

    Configuration for a QA evaluator.

    class
    ContextQA

    Configuration for a context-based QA evaluator.

    class
    CoTQA

    Configuration for a context-based QA evaluator.

    class
    JsonValidity

    Configuration for a json validity evaluator.

    class
    JsonEqualityEvaluator

    Configuration for a json equality evaluator.

    class
    ExactMatch

    Configuration for an exact match string evaluator.

    class
    RegexMatch

    Configuration for a regex match string evaluator.

    class
    ScoreString

    Configuration for a score string evaluator.

    This is like the criteria evaluator but it is configured by default to return a score on the scale from 1-10.

    It is recommended to normalize these scores by setting normalize_by to 10.

    class
    LabeledScoreString

    Configuration for a labeled score string evaluator.

    View source on GitHub