Group of examples with a specific metadata value across multiple sessions.
Extends RunGroupBase with:
- group_key: metadata value that defines this group
- sessions: per-session stats for runs matching this metadata value
- examples: shared examples across all sessions (intersection logic) with flat
array of runs (each run has session_id field for frontend to determine column)
- example_count: unique example count (pagination-aware, same across all
sessions due to intersection)
Inherited from RunGroupBase:
- filter: metadata filter for this group (e.g., "and(eq(is_root, true),
and(eq(metadata_key, 'model'), eq(metadata_value, 'gpt-4')))")
- count: total run count across all sessions (includes duplicate runs)
- total_tokens, total_cost: aggregate across sessions
- min_start_time, max_start_time: time range across sessions
- latency_p50, latency_p99: aggregate latency stats across sessions
- feedback_stats: weighted average feedback across sessions
Additional aggregate stats (from ExampleWithRunsGroup):
- prompt_tokens, completion_tokens: separate token counts
- prompt_cost, completion_cost: separate costs
- error_rate: average error rate