Google Meet Launches AI-Powered Real-Time Spoken Language Translation

Executive Summary

Google has launched "Speech Translation," a new feature for Google Meet developed in collaboration with DeepMind and other internal teams. This feature provides real-time, spoken translation during video calls, converting a user's speech directly into translated audio in another language. By using advanced AI models for direct audio-to-audio translation, the system significantly reduces latency to 2-3 seconds, aiming to make multilingual conversations feel more natural and seamless.

Key Takeaways

* Product: Speech Translation for Google Meet.

* Core Technology: The feature uses advanced AI models for "one-shot" audio-to-audio translation, a significant improvement over older methods that required separate transcription, translation, and speech synthesis steps. This reduces latency from 10-20 seconds down to a conversational 2-3 seconds.

* Primary Function: It automatically translates a speaker's words in near real-time, outputting the translation in a voice that resembles the original speaker.

* Availability: The feature is now available for translation into Italian, Portuguese, German, and French.

* Development: The project was a cross-functional effort between Google Meet, DeepMind, and other Google teams (Pixel, Cloud, Chrome), accelerating the development from an estimated five years to just two.

* Future Improvements: Google expects future updates using more advanced LLMs to better handle nuances like idioms, tone, and irony, which are currently translated literally.

Strategic Importance

This feature directly integrates Google's cutting-edge AI research from DeepMind into a core communication product, creating a significant differentiator in the competitive video conferencing market. It addresses a major pain point for global businesses and users, reinforcing Google's position as an AI-first company by turning advanced research into a tangible, high-impact user feature.

Original article