We Have Mythos at Home: GLM 5.2 Beats Claude in Our Cyber Benchmarks
What happened
Semgrep published a detailed evaluation of Zhipu's open-weight GLM-5.2 model on offensive cybersecurity tasks. The post hit the top of Hacker News with 336 points.
Context and impact
GLM-5.2 reached 39% F1 on IDOR (Insecure Direct Object Reference) detection at $0.17 per vulnerability — higher than Claude Opus 4.8 (28%) and Opus 4.6 (37%) on the same minimal harness. Semgrep argues that with an MIT-licensed Chinese model reaching parity on cyber benchmarks, US export controls on frontier models lose their meaning — the security gains come from the open-weight ecosystem, not geopolitical gating.
Details
- GLM-5.2: 39% F1 IDOR, $0.17/vuln
- Claude Opus 4.8: 28%
- Claude Opus 4.6: 37%
- MIT license, fully local deployment possible
- 336 points on HN, wide reach in sec-research community
Open original source
Semgrep