Skip to content

posttraining

Post-training experiments, reward shaping, RL continuation, and benchmark deltas after base model release.

Loading posts…