evals
Evaluation setups, benchmarks, metrics, rubrics, and result analysis used to measure model quality, regressions, or capability gains.
Loading postsā¦
Evaluation setups, benchmarks, metrics, rubrics, and result analysis used to measure model quality, regressions, or capability gains.