xAI Launches Grok Voice Agent Builder Beta — No-Code Voice AI at $0.05/min with Sub-Second Response
What's new
- No-code agent creation: describe the call flow in plain language → working voice agent in ~2 minutes
- Single speech-to-speech AI: instead of 3 chained APIs (STT → LLM → TTS), one model → sub-second response time
- Pricing: $0.05/min for audio + $0.01/min for telephony (provisioned phone number)
- Voices: 80+ voices plus voice cloning from just 2 minutes of audio recording
- Languages: 25+ languages with mid-conversation switching
- Built-in infrastructure: telephony, knowledge retrieval, tools, guardrails, MCP integrations, observability in one package
- Benchmark: #1 on Big Bench Audio
Why it matters
Grok Voice Agent Builder directly competes with Vapi, Bland, and ElevenLabs Conversational AI — with more aggressive pricing and native Grok model integration. Sub-second response from a single speech-to-speech model eliminates the latency issues of chained pipeline architectures.
How to try it
Beta available at x.ai/news/grok-voice-agent-builder. Developer registration required.