LangChain Reference home pageLangChain ReferenceLangChain Reference
  • GitHub
  • Main Docs
Deep Agents
LangChain
LangGraph
Integrations
LangSmith
  • Overview
  • Client
  • AsyncClient
  • Run Helpers
  • Run Trees
  • Evaluation
  • Schemas
  • Utilities
  • Wrappers
  • Anonymizer
  • Testing
  • Expect API
  • Middleware
  • Pytest Plugin
  • Deployment SDK
  • RemoteGraph
⌘I

LangChain Assistant

Ask a question to get started

Enter to send•Shift+Enter new line

Menu

OverviewClientAsyncClientRun HelpersRun TreesEvaluationSchemasUtilitiesWrappersAnonymizerTestingExpect APIMiddlewarePytest PluginDeployment SDKRemoteGraph
Language
Theme
Pythonlangsmithevaluation_runnerevaluate_comparative
Function●Since v0.1

evaluate_comparative

Evaluate existing experiment runs against each other.

This lets you use pairwise preference scoring to generate more reliable feedback in your experiments.

Copy
evaluate_comparative(
  experiments: tuple[EXPERIMENT_T, EXPERIMENT_T],
  ,
  evaluators: Sequence[COMPARATIVE_EVALUATOR_T],
  experiment_prefix: Optional[str] = None,
  description: Optional[str] = None,
  max_concurrency: int = 5,
  client: Optional[langsmith.Client] = None,
  metadata: Optional[dict] = None,
  load_nested: bool = False,
  randomize_order: bool = False
) -> ComparativeExperimentResults

Parameters

NameTypeDescription
experiments*Tuple[Union[str, uuid.UUID], Union[str, uuid.UUID]]

The identifiers of the experiments to compare.

evaluators*Sequence[COMPARATIVE_EVALUATOR_T]

A list of evaluators to run on each example.

experiment_prefixOptional[str]
Default:None

A prefix to provide for your experiment name.

descriptionOptional[str]
Default:None

A free-form text description for the experiment.

max_concurrencyint
Default:5

The maximum number of concurrent evaluations to run.

clientOptional[langsmith.Client]
Default:None

The LangSmith client to use.

metadataOptional[dict]
Default:None

Metadata to attach to the experiment.

load_nestedbool
Default:False

Whether to load all child runs for the experiment.

Default is to only load the top-level root runs.

randomize_orderbool
Default:False

Whether to randomize the order of the outputs for each evaluation.

View source on GitHub