Skip to content
energyMEDIUM2026-05-08 02:38 UTC

From -9.15pp to +0.61pp: An engineering journey through four DPO iteration failures

Over 36 hours we ran four DPO training iterations against Qwen2.5-Coder-7B-Instruct, trying to push HumanEval pass@1 above the base model's 87.20%. The first three iterations failed in different ways (-9.15pp, -1.22pp, two NO-GO calls). The fourth recovered to +0.61pp. Each failure revealed a differ

ADVERTISEMENT
⚡ STAY AHEAD

Events like this, convergence-verified across 689 sources, land in your inbox every Sunday. Free.

GET THE SUNDAY BRIEFING →

RELATED · energy