Realtime voice, speech, and transcription now supported on AI Gateway
What's new
- Realtime voice agents: low-latency speak-and-respond loops with mid-dialogue tool calls
- Text-to-speech: convert text into spoken audio with selectable voices and MP3 output
- Speech-to-text: transcribe files, base64 strings, or URLs
- OpenAI gpt-realtime-2 supported: first realtime model wired through the Gateway
- Same governance: observability, spend caps, and BYOK identical to text/image/video models
- Available via AI SDK 7: drop-in for existing Vercel AI SDK apps
Why it matters
Voice agents are now a first-class primitive on Vercel, removing the need for bespoke realtime pipelines. The unified governance is significant for teams that need usage limits, key control, and tracing across modalities.
How to try
Upgrade to AI SDK 7 and call a supported model (e.g. openai/gpt-realtime-2) through the AI Gateway, or try it directly in the Gateway playground.
Open original source
Vercel Changelog