Class●Since v1.0

RunEvalConfig

Configuration for a run evaluation.

RunEvalConfig()

Bases

BaseModel

Attributes

evaluators: list[SINGLE_EVAL_CONFIG_TYPE | CUSTOM_EVALUATOR_TYPE]

Configurations for which evaluators to apply to the dataset run. Each can be the string of an EvaluatorType <langchain.evaluation.schema.EvaluatorType>, such as EvaluatorType.QA, the evaluator type string ("qa"), or a configuration for a given evaluator (e.g., RunEvalConfig.QA <langchain.smith.evaluation.config.RunEvalConfig.QA>).

custom_evaluators: list[CUSTOM_EVALUATOR_TYPE] | None

Custom evaluators to apply to the dataset run.

batch_evaluators: list[BATCH_EVALUATOR_LIKE] | None

Evaluators that run on an aggregate/batch level.

These generate one or more metrics that are assigned to the full test run. As a result, they are not associated with individual traces.

reference_key: str | None

The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.

prediction_key: str | None

The key from the traced run's outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.

input_key: str | None

The key from the traced run's inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.

eval_llm: BaseLanguageModel | None

The language model to pass to any evaluators that require one.

Classes

Configuration for a reference-free criteria evaluator.

LabeledCriteria

Configuration for a labeled (with references) criteria evaluator.

EmbeddingDistance

Configuration for an embedding distance evaluator.

Configuration for a string distance evaluator.

Configuration for a QA evaluator.

Configuration for a context-based QA evaluator.

Configuration for a context-based QA evaluator.

Configuration for a json validity evaluator.

JsonEqualityEvaluator

Configuration for a json equality evaluator.

Configuration for an exact match string evaluator.

Configuration for a regex match string evaluator.

Configuration for a score string evaluator.

This is like the criteria evaluator but it is configured by default to return a score on the scale from 1-10.

It is recommended to normalize these scores by setting normalize_by to 10.

LabeledScoreString

Configuration for a labeled score string evaluator.

View source on GitHub