evaluate_comparative(
experiments: tuple[EXPERIMENT_T, EXPERIMENT_T],
,
evaluators: Sequence[COMPARATIVE_EVALUATOR_T| Name | Type | Description |
|---|---|---|
experiments* | Tuple[Union[str, uuid.UUID], Union[str, uuid.UUID]] | |
evaluators* | Sequence[COMPARATIVE_EVALUATOR_T] | |
experiment_prefix | Optional[str] | Default: None |
description | Optional[str] | Default: None |
max_concurrency | int | Default: 5 |
client | Optional[langsmith.Client] | Default: None |
metadata | Optional[dict] | Default: None |
load_nested | bool | Default: False |
randomize_order | bool | Default: False |
Evaluate existing experiment runs against each other.
This lets you use pairwise preference scoring to generate more reliable feedback in your experiments.
The identifiers of the experiments to compare.
A list of evaluators to run on each example.
A prefix to provide for your experiment name.
A free-form text description for the experiment.
The maximum number of concurrent evaluations to run.
The LangSmith client to use.
Metadata to attach to the experiment.
Whether to load all child runs for the experiment.
Default is to only load the top-level root runs.
Whether to randomize the order of the outputs for each evaluation.