Executive Summary
xAI's new audio models for real-time voice, text-to-speech (TTS), and speech-to-text (STT) are now integrated into the AI Gateway. Developers can access these capabilities through the AI SDK 7 release, which provides a unified interface for managing these models alongside others. This integration allows for centralized routing, observability, and cost control for building applications with advanced audio features.
Key Takeaways
* New Model Availability: Three new xAI audio models are now live:
* `xai/grok-voice-think-fast-1.0`: For building real-time voice agents.
* `xai/grok-tts`: For generating spoken audio from text.
* `xai/grok-stt`: For transcribing audio files into text.
* Developer Access: The models are accessible via the AI SDK (version 7 release) using dedicated functions like `generateSpeech`, `transcribe`, and the `experimental_useRealtime` hook for voice agents.
* Unified Platform: The models are managed through AI Gateway, giving developers the same routing, observability, and spend controls available for their other integrated AI models.
* Playground Testing: Developers can experiment with the new xAI audio models directly in the AI Gateway playground without writing code.
Strategic Importance
This integration expands the AI Gateway's functionality beyond text-based models, positioning it as a more comprehensive hub for developers building multi-modal AI applications. It simplifies the developer workflow by providing a single, managed access point for both text and audio AI capabilities.