Configuration for a run evaluation.
RunEvalConfig()Configurations for which evaluators to apply to the dataset run.
Each can be the string of an
EvaluatorType <langchain.evaluation.schema.EvaluatorType>, such
as EvaluatorType.QA, the evaluator type string ("qa"), or a configuration for a
given evaluator
(e.g.,
RunEvalConfig.QA <langchain.smith.evaluation.config.RunEvalConfig.QA>).
Custom evaluators to apply to the dataset run.
Evaluators that run on an aggregate/batch level.
These generate one or more metrics that are assigned to the full test run. As a result, they are not associated with individual traces.
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
The key from the traced run's outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
The key from the traced run's inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
The language model to pass to any evaluators that require one.
Configuration for a reference-free criteria evaluator.
Configuration for a labeled (with references) criteria evaluator.
Configuration for an embedding distance evaluator.
Configuration for a string distance evaluator.
Configuration for a QA evaluator.
Configuration for a context-based QA evaluator.
Configuration for a context-based QA evaluator.
Configuration for a json validity evaluator.
Configuration for a json equality evaluator.
Configuration for an exact match string evaluator.
Configuration for a regex match string evaluator.
Configuration for a score string evaluator.
This is like the criteria evaluator but it is configured by default to return a score on the scale from 1-10.
It is recommended to normalize these scores
by setting normalize_by to 10.
Configuration for a labeled score string evaluator.