Back to section
Modely ⭐ Notable

Better Models: Worse Tools

Nedeľa 5. júla 2026 Source: lucumr.pocoo.org

What happened

Armin Ronacher (author of Flask and Werkzeug) published an analysis on July 4, 2026, showing that newer Claude models — specifically Opus 4.8 and Sonnet 5 — are less reliable at following non-standard tool call schemas, hallucinating fields not present in the schema.

Context and impact

Ronacher argues that RL post-training focused on Claude Code's native harness creates lock-in to the Anthropic ecosystem — projects with custom tool schemas suffer degraded reliability compared to older Claude models. The post gathered 72 HN points and was linked by Simon Willison.

Details

  • Post published at lucumr.pocoo.org, July 4, 2026
  • Conclusion: 'tool schemas are not neutral' on Anthropic models
  • Regression documented in Opus 4.8 and Sonnet 5 vs. older Claude versions
  • Root cause: RL training optimized primarily on the Claude Code harness schema format
  • Consequence: third-party harnesses become dependent on Anthropic-native schemas
  • HN score: 72 points; linked by Simon Willison
Open original source lucumr.pocoo.org