Meta has unveiled SeamlessM4T, an advanced AI model capable of translating speech in 101 languages, marking a significant step toward real-time, universal communication. This innovation is akin to the Babel fish from The Hitchhiker’s Guide to the Galaxy, enabling nearly instant multilingual conversations.
Traditional translation models rely on a multi-step process: converting speech to text, translating the text, and converting it back to speech. This approach often introduces inefficiencies and errors. SeamlessM4T simplifies this by directly translating speech from one language to another, offering faster and more accurate results.
According to a study published in Nature, SeamlessM4T delivers 23% greater accuracy than existing models. While Google’s AudioPaLM supports 113 languages, its translations are limited to English. In contrast, SeamlessM4T offers translation across 36 languages, making it a more versatile tool for global communication.
The model employs parallel data mining, leveraging audio and subtitles from web data to improve translation accuracy. Pre-training on millions of hours of audio enables it to handle less common languages effectively.
Reference: MIT