News
SWE-bench Verified Benchmark 2026: 44 LLM scores
3+ week, 4+ day ago (267+ words) As of May 1, 2026, Claude Mythos Preview leads the SWE-bench Verified leaderboard with 93. 9%, followed by Claude Opus 4. 7 (Adaptive) (87. 6%) and GPT-5. 3 Codex (85%). Claude Opus 4. 7 (Adaptive) According to Bench LM. ai, Claude Mythos Preview leads the SWE-bench Verified benchmark with a score of 93. 9%, followed…...
SWE-bench & Live Code Bench Leaderboard (March 2026) " AI Coding Benchmarks
8+ mon, 2+ week ago (285+ words) Bench LM summaries for coding plus the practical tradeoffs users check next: open weights, price, speed, latency, and context. avg / 1 M tokens Nemotron 3 Ultra 500 B As of March 2026, GPT-5. 4 Pro leads the coding leaderboard with a weighted score of 88. 3%, followed…...