The tech giant Meta, parent company of Facebook and Instagram, has recently announced the development of an AI-powered text-to-speech technology capable of identifying over 4,000 languages. With the aim of preserving the world’s languages, Meta has turned to religious texts, namely the Bible, as a source of data for the project.
For the technology to work effectively, Meta needed extensive audio data from thousands of languages. Existing speech datasets covered only about 100 languages at most, posing a challenge. To address this issue, Meta utilized translations of religious texts like the Bible, which are available in numerous languages and have been widely studied for text-based language translation research.
Meta’s AI core team collected data from the Bible, including original text and audio recordings from sources like FaithComesByHearing.com, GoTo.Bible, and Bible.com. The project encompassed Bible stories, evangelistic messages, scripture readings, and songs in over 6,255 languages and dialects. While the majority of recordings were done by male readers, Meta confirmed that its models work equally well for female voices.
On average, Meta’s dataset provided 32 hours of data per language from readings of the New Testament in more than 1,100 languages. With over 7,100 living languages worldwide, according to Broward College’s Lingua Language Center, covering as many languages as possible is vital.
However, using religious texts as the foundation for AI did raise some concerns. Meta consulted Christian ethicists, who determined that most Christians would not consider the New Testament translations too sacred for machine learning purposes. Yet, the same conclusion may not apply to all religious texts. Additionally, there is the potential risk of religious training data biasing the models toward a particular worldview. Despite this, Meta AI found that the generated language demonstrated only slight bias compared to baseline models trained on other domains.
Following the setback in its metaverse plans, Meta appears to be concentrating on artificial intelligence, developing various AI tools, such as an AI-powered tool for brands to target users on Facebook and Instagram.
Although the AI-powered text-to-speech technology is still in the development stage, Meta is open-sourcing its data and code, allowing others to build on, develop, and enhance the platform. The company seeks to address the potential disappearance of many world languages and the limitations of current speech recognition and generation technology by making it easier for people to access information and use devices in their preferred language. Through this ambitious project, Meta aims to create a series of artificial intelligence models to achieve that goal.