HumanEval benchmark results, regressions, and analysis for code generation models and post-training runs.