Puller Eval

Loading home...

Loading dashboard...

Suite

Mode

Model Override

Run A (Baseline)

Run B (Variant)

Run the same test suite across multiple LLM models to compare their accuracy.

Mode

Models to Sweep

Home