Back to section
Modely 🔥 Top

We Have Mythos at Home: GLM 5.2 Beats Claude in Our Cyber Benchmarks

Pondelok 29. júna 2026 • Source: Semgrep

What happened

Semgrep published a detailed evaluation of Zhipu's open-weight GLM-5.2 model on offensive cybersecurity tasks. The post hit the top of Hacker News with 336 points.

Context and impact

GLM-5.2 reached 39% F1 on IDOR (Insecure Direct Object Reference) detection at $0.17 per vulnerability — higher than Claude Opus 4.8 (28%) and Opus 4.6 (37%) on the same minimal harness. Semgrep argues that with an MIT-licensed Chinese model reaching parity on cyber benchmarks, US export controls on frontier models lose their meaning — the security gains come from the open-weight ecosystem, not geopolitical gating.

Details

  • GLM-5.2: 39% F1 IDOR, $0.17/vuln
  • Claude Opus 4.8: 28%
  • Claude Opus 4.6: 37%
  • MIT license, fully local deployment possible
  • 336 points on HN, wide reach in sec-research community