Back to section
Vercel ⭐ Notable

Vercel AI Gateway adds GLM 5.2 Fast via Wafer at 170+ tokens/s

Štvrtok 25. júna 2026 Source: Vercel

What's new

  • GLM 5.2 Fast by Z.ai is now in Vercel AI Gateway through the Wafer inference stack.
  • Average throughput is 170+ tokens/sec, ranging 120-250 TPS.
  • 2x decode throughput versus other serverless providers.
  • Retains GLM 5.2 strengths: strong coding, usable 1M-token context, long-horizon tasks.
  • No markup, BYOK support, ZDR mode and per-key budgets.

Why it matters

Since its mid-June launch, GLM 5.2 has been climbing into the world-class open-agentic tier. Wafer adds the inference speed that makes the model competitive for real-time code agents — not just a cheap alternative, but a fast inference option.

How to try it

Available in Vercel AI Gateway under model glm-5.2-fast. Routing through Wafer is the default.