Compare Runs functionality enables you to perform side-by-side comparison for multiple experiment runs.
With this functionality you can compare up to three experiment runs side by side.
See how different prompts, models, or parameters impact results, including - how they are matching against your expected results, track cost, latency, token usage, and compare your evaluations metrics across different runs.
How to Compare with Arato
On the relevant prompt header, click on Compare Runs.
Select up to 3 runs you want to compare, and click Compare Results to generate a detailed comparison. As the comparison includes side by side view of each input row from your dataset, you can only compare runs that share the same dataset
Analyze the Comparison Overview at the top of the comparison report you can see and compare aggregated performance metrics for the selected runs.
Scroll down to the Results Comparison breakdown to see individual responses for each event, making it easier to get a deeper insights and assess how different models or prompt versions affect the results.
You can change your selection or add more runs to your current comparison, by clicking the β+β button on the right.