Výskum AI ⭐ Notable

OpenAI Introduces GeneBench-Pro, a Computational Biology Benchmark for AI Agents

Piatok 3. júla 2026 • Source: OpenAI

What happened

OpenAI released GeneBench-Pro, a new benchmark evaluating AI agents on real computational biology research tasks. It comprises 129 problems spanning genomics, quantitative biology, and translational medicine, each involving noisy real-world datasets and high-stakes judgment calls.

Context and impact

Human experts estimate each task takes 20–40 hours to complete. The benchmark reveals that even the best model falls far short of expert-level performance in this domain. OpenAI projects the benchmark to become saturated by end of 2026 at current improvement rates.

Details

129 tasks across genomics, quantitative biology, and translational medicine
GPT-5.6 Sol: 31.5% (top performer)
Claude Opus 4.8: 16%
Gemini 3.5 Flash: 8.1%
Each task estimated at 20–40 hours of human expert work
OpenAI projects benchmark saturation by end of 2026

Open original source OpenAI