CursorBench 3.1: Cursor Benchmarks 36 AI Models on Real Coding Tasks — Fable 5 Max Leads at 72.9%
What's new
- CursorBench 3.1: New version focuses tasks on codebase understanding, bug diagnosis, planning, and code review (vs. edit/refactor/bugfix in 3.0).
- 36 models tested: Includes frontier and open-source models.
- Improved grading: New criteria for edit tasks.
Results (top models)
- Fable 5 Max: 72.9% / $18.02 per task
- GPT-5.5 Extra High: 64.3% / $4.37 per task
- Composer 2.5: 63.2% / $0.55 per task (best value)
Why it matters
The benchmark draws from real Cursor sessions rather than synthetic examples, providing a more practical view of model performance in daily coding and helping developers choose models by budget.
Open original source
Cursor