Vercel

xAI Audio Models Now Available on AI Gateway via AI SDK


Executive Summary

xAI's new audio models for real-time voice, text-to-speech (TTS), and speech-to-text (STT) are now integrated into the AI Gateway. Developers can access these capabilities through the AI SDK 7 release, which provides a unified interface for managing these models alongside others. This integration allows for centralized routing, observability, and cost control for building applications with advanced audio features.

Key Takeaways

* New Model Availability: Three new xAI audio models are now live:

* `xai/grok-voice-think-fast-1.0`: For building real-time voice agents.

* `xai/grok-tts`: For generating spoken audio from text.

* `xai/grok-stt`: For transcribing audio files into text.

* Developer Access: The models are accessible via the AI SDK (version 7 release) using dedicated functions like `generateSpeech`, `transcribe`, and the `experimental_useRealtime` hook for voice agents.

* Unified Platform: The models are managed through AI Gateway, giving developers the same routing, observability, and spend controls available for their other integrated AI models.

* Playground Testing: Developers can experiment with the new xAI audio models directly in the AI Gateway playground without writing code.

Strategic Importance

This integration expands the AI Gateway's functionality beyond text-based models, positioning it as a more comprehensive hub for developers building multi-modal AI applications. It simplifies the developer workflow by providing a single, managed access point for both text and audio AI capabilities.

Original article