Back to section
Výskum ⭐ Notable

OpenAI Introduces GeneBench-Pro, a Computational Biology Benchmark for AI Agents

Piatok 3. júla 2026 Source: OpenAI

What happened

OpenAI released GeneBench-Pro, a new benchmark evaluating AI agents on real computational biology research tasks. It comprises 129 problems spanning genomics, quantitative biology, and translational medicine, each involving noisy real-world datasets and high-stakes judgment calls.

Context and impact

Human experts estimate each task takes 20–40 hours to complete. The benchmark reveals that even the best model falls far short of expert-level performance in this domain. OpenAI projects the benchmark to become saturated by end of 2026 at current improvement rates.

Details

  • 129 tasks across genomics, quantitative biology, and translational medicine
  • GPT-5.6 Sol: 31.5% (top performer)
  • Claude Opus 4.8: 16%
  • Gemini 3.5 Flash: 8.1%
  • Each task estimated at 20–40 hours of human expert work
  • OpenAI projects benchmark saturation by end of 2026