Loading navigation...
Unify Agentic AI
Logo
Overview

Overview

Logo

5 mins READ

The Eval Module is a comprehensive platform designed to test and evaluate AI agents using predefined test cases and customizable metrics. This testing framework ensures agents perform as expected across various scenarios and use cases.

Core Components

The platform is built on three fundamental pillars that work together to provide thorough agent evaluation:

Image
Image

1. Datasets

Datasets are collections of test cases used to run agents and analyze their outputs. These predefined test cases serve as benchmarks for agent performance. The platform supports two distinct types of test cases:

  • LLM Test Cases: Single input, single output scenarios ideal for straightforward response validation

  • Conversational Test Cases: Multi-turn interactions that test complete user journeys and complex workflows

    Image
    Image

2. Metrics

Metrics define the evaluation criteria used to assess agent performance. The platform offers flexible evaluation methods:

  • LLM-based evaluation: Using AI models as judges to assess response quality

  • Automated evaluation: Systematic testing using predefined rules and criteria

  • Manual evaluation: Human review for nuanced assessment

    Image
    Image

3. Experiments

Experiments bring datasets and metrics together to perform comprehensive evaluations. Users can:

  • Select specific agents for testing

  • Choose appropriate metrics for evaluation

  • Apply relevant datasets to test various scenarios

    Image
    Image

This integrated approach ensures thorough testing across different dimensions of agent performance, providing valuable insights for optimization and improvement.