Response for grouped comparison view of dataset examples.
Returns dataset examples grouped by a run metadata value (e.g., model='gpt-4'). Optional filters are applied to all runs before grouping.
Shows:
Used for comparing how different sessions performed on the same set of examples.
Example schema with list of runs from ClickHouse.
For non-grouped endpoint (/datasets/{dataset_id}/runs): runs from single session. For grouped endpoint (/datasets/{dataset_id}/group/runs): flat array of runs from all sessions, where each run has a session_id field for frontend to determine column placement.
List of feedback keys with number of improvements and regressions for each.
ComparativeExperiment schema.
Dataset diff schema.