Anysphere AI features

CursorBench 3.1: Cursor Benchmarks 36 AI Models on Real Coding Tasks — Fable 5 Max Leads at 72.9%

Štvrtok 2. júla 2026 • Source: Cursor

What's new

CursorBench 3.1: New version focuses tasks on codebase understanding, bug diagnosis, planning, and code review (vs. edit/refactor/bugfix in 3.0).
36 models tested: Includes frontier and open-source models.
Improved grading: New criteria for edit tasks.

Results (top models)

Fable 5 Max: 72.9% / $18.02 per task
GPT-5.5 Extra High: 64.3% / $4.37 per task
Composer 2.5: 63.2% / $0.55 per task (best value)

Why it matters

The benchmark draws from real Cursor sessions rather than synthetic examples, providing a more practical view of model performance in daily coding and helping developers choose models by budget.

Open original source Cursor