Meta’s SeamlessM4T AI can transcribe and translate up to 100 languages
With the help of the SeamlessM4T AI, content shared by users across Meta’s social media space will be more accurately translated, allowing creators to reach audiences beyond their borders
In an attempt to build the world's first universal speech translator, Meta AI has developed a new multimodal multi-lingual AI model that can transcribe and translate speech and text in up to 100 different languages.
Bundled with a new open-source translation dataset containing 443,000 hours of speech with text and 29,000 hours of speech-to-speech alignments, the all-in-one SeamlessM4T transcription and translation model can take input in both verbal and written modes.
This multimodal processing allows it to transcribe speech in nearly 100 languages and produce output as translated text in the same. However, for translated speech output from speech or text, the new AI model is limited to 36 languages, including English.
It means that the model can take a speech in one of the 100 languages, transcribe it, translate it into the desired language, and give the translated text as output. Or, it can go one step further and produce the speech in that translated language. It works both ways between text and speech, allowing text-to-text, text-to-speech, speech-to-text, and even speech-to-speech translation with one single AI model.
SeamlessM4T, shortened from Seamless Massively Multilingual and Multimodal Machine Translation, in spirit, is a successor of last year's No Language Left Behind (NLLB) text-to-text machine translation model that supported 200 languages.
The first direct speech-to-speech translator, however, came a few months later in the form of a demo Universal Speech Translator from the Meta AI team, which was built to translate Hokkien, a language that does not even have a widely-used writing standard.
All of these models, combined with the Massively Multilingual Speech model, released earlier this year, with speech recognition and synthesis capabilities across more than 1,100 languages, laid the foundation for Meta AI's newer models like Voicebox and the most recent SeamlessM4T.
With the help of the SeamlessM4T AI, content shared by users across Meta's social media space, including Facebook, Instagram, Threads, and the Metaverse, will be more accurately translated, allowing creators to reach audiences beyond their borders.
The NPCs in Metaverse could also benefit from this multilingual model, enabling seamless conversation in any language.
If the VR craze takes off again and Metaverse gains traction in their virtual world, this model can also enable real-time translation between users interacting within the Metaverse, acting as a real-life universal translator, which will not only penetrate the language barrier but also make the content shared online, universal and streamlined.