Meta uses AI to preserve the world's language diversity

Meta uses AI to preserve the world's language diversity

Technology

MMS models expand text-to-speech, speech-to-text technology from around 100 languages to over 1,100

(Web Desk) – Meta has announced series of artificial intelligence (AI) models to preserve the world’s languages that are in danger of disappearing.

As the limitations of current speech recognition and generation technology will only accelerate this disappearing trend, the tech giant said: “We want to make it easier for people to access information and use devices in their preferred language, and today we’re announcing a series of artificial intelligence (AI) models that could help them do just that”.

“Massively Multilingual Speech (MMS) models expand text-to-speech and speech-to-text technology from around 100 languages to more than 1,100 — more than 10 times as many as before — and can also identify more than 4,000 spoken languages, 40 times more than before,” reads the Meta blog.

There are also many use cases for speech technology — from virtual and augmented reality technology to messaging services — that can be used in a person’s preferred language and can understand everyone’s voice.

It has open-sourced its models and code so that others in the research community can build on its work and help preserve the world’s languages and bring the world closer together.

“Collecting audio data for thousands of languages was our first challenge because the largest existing speech datasets cover 100 languages at most. To overcome this, we turned to religious texts, such as the Bible, that have been translated in many different languages and whose translations have been widely studied for text-based language translation research,” it said.

These translations have publicly available audio recordings of people reading these texts in different languages. As part of the MMS project, we created a dataset of readings of the New Testament in more than 1,100 languages, which provided on average 32 hours of data per language.