Skip to main content

How to Use Evals

Updated this week

When and How to Use Evals?

Evals should be incorporated throughout the lifecycle of a GenAI application, from initial development to production deployment. Key use cases include:

  • Prompt Optimization: Testing different prompt structures to determine the most effective phrasing.

  • Model Comparison: Benchmarking different AI models to select the best-performing option.

  • Regression Testing: Ensuring new changes do not degrade existing performance.

  • Bias and Safety Checks: Detecting unwanted biases or potentially harmful outputs.

  • Inputs Sanitation: Assuring only certain inputs are indeed processed.

Using Evals effectively involves:

  • Defining the evaluation criteria and desired outcomes.

  • Selecting the appropriate type of Eval (Automated, HITL, or LLM-as-a-Judge).

  • Running evaluations on sample queries and reviewing the results.

  • Iterating on prompts and model configurations based on findings.

Did this answer your question?