Skip to main content

Comparing Run Results

Side-by-side comparison of different experiment runs.

Updated over 3 months ago

Compare Runs functionality enables you to perform side-by-side comparison for multiple experiment runs.

With this functionality you can compare up to three experiment runs side by side.

See how different prompts, models, or parameters impact results, including - how they are matching against your expected results, track cost, latency, token usage, and compare your evaluations metrics across different runs.

How to Compare with Arato

  • On the relevant prompt header, click on Compare Runs.

  • Select up to 3 runs you want to compare, and click Compare Results to generate a detailed comparison. As the comparison includes side by side view of each input row from your dataset, you can only compare runs that share the same dataset

  • Analyze the Comparison Overview at the top of the comparison report you can see and compare aggregated performance metrics for the selected runs.

  • Scroll down to the Results Comparison breakdown to see individual responses for each event, making it easier to get a deeper insights and assess how different models or prompt versions affect the results.

  • You can change your selection or add more runs to your current comparison, by clicking the β€œ+” button on the right.

Did this answer your question?