On Friday, Meta announced the development of Voicebox, a generative Artificial Intelligence (AI) tool to create realistic spoken dialogue. By incorporating text input and a brief audio clip, Voicebox can generate new speech that sounds strikingly similar to the voice from the source clip. Instead of specialized training like traditional AI speech generators, Voicebox utilizes a distinctive approach to learn from raw audio and transcription.
This breakthrough generative speech system is designed with Flow Matching technology and can synthesize speech in six different languages. Some potential applications include improving cross-language communication using tech tools or delivering realistic video game character dialogue. However, Voicebox’s unique abilities also raise concerns about potential misuse in creating deceptive “deepfake” dialogue, which might imitate public figures or celebrities saying things they never actually said.
To tackle this issue, Meta AI has developed classifiers capable of distinguishing between Voicebox-generated speech and human speech. These classifiers sort data into different groups or classes — human or AI-generated. While Meta aims to be transparent and open with the research community, they also acknowledge the importance of balancing openness with responsibility. Consequently, Meta currently has no plans to release Voicebox’s model or code to the public due to potential risks.
By sharing audio samples and a research paper instead of the functional tool, Meta hopes to provide researchers with an understanding of Voicebox’s potential without jeopardizing safety. This cautious approach reflects growing global concerns around the misuse of rapidly advancing AI technologies. The United Nations (UN) Secretary-General, António Guterres, has emphasized the importance of addressing generative AI’s potential dangers, calling it an existential threat to humanity on par with the risk of nuclear war.
While large-scale threats like nuclear war might still be a fictional concern, more immediate risks of generative AI abuse lie in scams targeting individuals. Deepfake images and voices have been used in schemes to extort money from victims or spread misinformation online. In one case, CNN reported AI technology cloning a woman’s 15-year-old daughter’s voice in a kidnapping and ransom scam.
As Voicebox presents immense potential in speech generation and AI development, its potential for misuse underscores the importance of responsible innovation. Striking the right balance between advancing AI technology and ensuring its ethical use is crucial to harnessing AI’s potential while minimizing negative consequences.
Source: Decrypt