Feedback scores for the results of comparative evaluations.
These are generated by functions that compare two or more runs, returning a ranking or other feedback.
ComparisonEvaluationResult()The aspect, metric name, or label for this evaluation.
The scores for each run in the comparison.
The ID of the trace of the evaluator itself.
Comment for the scores. If a string, it's shared across all target runs.
If a dict, it maps run IDs to individual comments.