GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2
What happened
On 19 June 2026, a blog post benchmarking GPT-5.5 vs. GLM-5.2 (Z.ai's MIT-licensed open-weight model) on the AA-Omniscience hallucination benchmark hit Hacker News (score 149).
Context and impact
The result is unusually divergent: open-weight GLM-5.2 (28% hallucination rate) clearly beats GPT-5.5 (86%). The community cautions against overreading — it's one benchmark, and 'omniscience' measures answering questions where the model lacks the fact; refusal is the desired behavior. GLM-5.2 is more cautious and refuses more often. Still — for RAG and agent builders, the signal is that default open-model caution can be a feature, not a bug.
Details
- GPT-5.5: 86% hallucination rate on AA-Omniscience
- GLM-5.2: 28% (roughly 3× less)
- Identical setup: high reasoning, temp 1, coding-assistant prompt, OpenRouter
Open original source
Hacker News