Vercel AI Gateway adds GLM 5.2 Fast via Wafer at 170+ tokens/s
What's new
- GLM 5.2 Fast by Z.ai is now in Vercel AI Gateway through the Wafer inference stack.
- Average throughput is 170+ tokens/sec, ranging 120-250 TPS.
- 2x decode throughput versus other serverless providers.
- Retains GLM 5.2 strengths: strong coding, usable 1M-token context, long-horizon tasks.
- No markup, BYOK support, ZDR mode and per-key budgets.
Why it matters
Since its mid-June launch, GLM 5.2 has been climbing into the world-class open-agentic tier. Wafer adds the inference speed that makes the model competitive for real-time code agents — not just a cheap alternative, but a fast inference option.
How to try it
Available in Vercel AI Gateway under model glm-5.2-fast. Routing through Wafer is the default.
Open original source
Vercel