Voicebox: Meta introducing versatile AI for speech generation

Technology

Voicebox can produce high quality audio clips and edit pre-recorded audio

20 June,2023 09:04 am

(Web Desk) - In a breakthrough in generative AI for speech, Meta has developed Voicebox, a state of the art AI model that can perform speech generation tasks — like editing, sampling and stylizing — that it wasn’t specifically trained to do through in-context learning.

Voicebox can produce high quality audio clips and edit pre-recorded audio — like removing car horns or a dog barking — all while preserving the content and style of the audio. The model is also multilingual and can produce speech in six languages.

In the future, multipurpose generative AI models like Voicebox could give natural-sounding voices to virtual assistants and non-player-characters in the metaverse. They could allow visually impaired people to hear written messages from friends read by AI in their voices, give creators new tools to easily create and edit audio tracks for videos, and much more.

The versatility of Voicebox enables a variety of tasks, including:

In-context text-to-speech synthesis: Using an audio sample as short as two seconds long, Voicebox can match the audio style and use it for text-to-speech generation.

Speech editing and noise reduction: Voicebox can recreate a portion of speech that’s interrupted by noise or replace misspoken words without having to re-record an entire speech. For example, you can identify a segment of a speech that’s interrupted by a dog barking, crop it, and instruct Voicebox to re-generate that segment – like an eraser for audio editing.

Cross-lingual style transfer: When given a sample of someone’s speech and a passage of text in English, French, German, Spanish, Polish or Portuguese, Voicebox can produce a reading of the text in any of those languages, even when the sample speech and the text are in different languages. This capability could be used in the future to help people communicate in a natural, authentic way even if they don’t speak the same languages.

Diverse speech sampling: Having learned from diverse data, Voicebox can generate speech that is more representative of how people talk in the real world and in the six languages listed above.

Voicebox is an important step forward in our generative AI research, and we look forward to continuing our exploration in the audio space and seeing how other researchers build on our work.

Voicebox: Meta introducing versatile AI for speech generation

Related News

Pak Air Force Attack How Many Terrorists Killed? Pak vs Afghan Breaking Dunya News

Bannu Under Attack Security Forces Targeted by Fitna Al-Khawarij Pak Army Takes Big Success

Terrible Attack on Iran Chaos Erupt Trump in Action Breaking News Dunya News

White House vs Supreme Court Big Announcement Dunya News

Inflation Hits Lahore Ramadan 2026 Fruits & Vegetable Prices Breaking News

Chaos and Tragedy Horrible Night Leaves Many Dead Dunya News

Trump in Trouble? Middle East Judges Nullify Actions of U.S. President Donald Trump Dunya News

Good News for Citizens CM Maryam Nawaz Receives Briefing on Ramadan Bazaars Dunya News

War Started? Iran-US Tensions Ayatollah khameni's Explosive Statement Breaking Dunya News

Former British Prince Andrew Released From Police Station Breaking News Dunya News

English

Urdu

Shows

videos

Video Headlines

Coronavirus

PSL 7

Newspaper

Follow Us

Links

Blogs