Back to section
Modely ⭐ Notable

GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2

Sobota 20. júna 2026 Source: Hacker News

What happened

On 19 June 2026, a blog post benchmarking GPT-5.5 vs. GLM-5.2 (Z.ai's MIT-licensed open-weight model) on the AA-Omniscience hallucination benchmark hit Hacker News (score 149).

Context and impact

The result is unusually divergent: open-weight GLM-5.2 (28% hallucination rate) clearly beats GPT-5.5 (86%). The community cautions against overreading — it's one benchmark, and 'omniscience' measures answering questions where the model lacks the fact; refusal is the desired behavior. GLM-5.2 is more cautious and refuses more often. Still — for RAG and agent builders, the signal is that default open-model caution can be a feature, not a bug.

Details

  • GPT-5.5: 86% hallucination rate on AA-Omniscience
  • GLM-5.2: 28% (roughly 3× less)
  • Identical setup: high reasoning, temp 1, coding-assistant prompt, OpenRouter