techMEDIUM2026-05-09 05:53 UTC

How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard

TL;DR We added architectural context to AI coding agents via MCP and tested on SWE-bench Verified (500 real bugs). MiniMax M2.5 — a model that costs $0.02 per call — scored 78.2%, surpassing every model on the official mini-SWE-agent leaderboard, including Claude Opus 4.5 (76.8%) which costs 37x m

ORIGINAL SOURCE →via Dev.to

⚡ STAY AHEAD

Events like this, convergence-verified across 689 sources, land in your inbox every Sunday. Free.

GET THE SUNDAY BRIEFING →

RELATED · tech

[TECH] Launch: Electron | Viva La StriX (StriX Launch 9)
[TECH] Launch: Atlas V 551 | Amazon Leo (LA-07)
[TECH] Shifting Budget Dynamics for Identity Security and AI Agents
[TECH] Launch: GSLV Mk II | GISAT-1A (EOS-05)
[TECH] Launch: Vega-C | Solar wind Magnetosphere Ionosphere Link Explorer (SMILE)
[TECH] Launch: Falcon 9 Block 5 | Starlink Group 17-42

Editorial policy · Report a correction