Modely AI ⭐ Notable

Better Models: Worse Tools

Nedeľa 5. júla 2026 • Source: lucumr.pocoo.org

What happened

Armin Ronacher (author of Flask and Werkzeug) published an analysis on July 4, 2026, showing that newer Claude models — specifically Opus 4.8 and Sonnet 5 — are less reliable at following non-standard tool call schemas, hallucinating fields not present in the schema.

Context and impact

Ronacher argues that RL post-training focused on Claude Code's native harness creates lock-in to the Anthropic ecosystem — projects with custom tool schemas suffer degraded reliability compared to older Claude models. The post gathered 72 HN points and was linked by Simon Willison.

Details

Post published at lucumr.pocoo.org, July 4, 2026
Conclusion: 'tool schemas are not neutral' on Anthropic models
Regression documented in Opus 4.8 and Sonnet 5 vs. older Claude versions
Root cause: RL training optimized primarily on the Claude Code harness schema format
Consequence: third-party harnesses become dependent on Anthropic-native schemas
HN score: 72 points; linked by Simon Willison

Open original source lucumr.pocoo.org