Home
CB
AI
Loading home...
Loading dashboard...
New Evaluation Run
Suite
All suites
Mode
Full Pipeline
Context Only
Agent Isolated
Model Override
(default)
Run
Stop
Compare Two Runs (A/B)
Run A (Baseline)
Run B (Variant)
Compare
Model Sweep
Run the same test suite across multiple LLM models to compare their accuracy.
Mode
Full Pipeline
Context Only
Models to Sweep
Sweep
Stop