Back to section
Anysphere

CursorBench 3.1: Cursor Benchmarks 36 AI Models on Real Coding Tasks — Fable 5 Max Leads at 72.9%

Štvrtok 2. júla 2026 Source: Cursor

What's new

  • CursorBench 3.1: New version focuses tasks on codebase understanding, bug diagnosis, planning, and code review (vs. edit/refactor/bugfix in 3.0).
  • 36 models tested: Includes frontier and open-source models.
  • Improved grading: New criteria for edit tasks.

Results (top models)

  • Fable 5 Max: 72.9% / $18.02 per task
  • GPT-5.5 Extra High: 64.3% / $4.37 per task
  • Composer 2.5: 63.2% / $0.55 per task (best value)

Why it matters

The benchmark draws from real Cursor sessions rather than synthetic examples, providing a more practical view of model performance in daily coding and helping developers choose models by budget.