OpenAI's Voice Engine is scary: artificial intelligence clones your voice in just 15 seconds

OpenAIthe pioneering company in the development of generative artificial intelligence software (such as ChatGPT and Sora), announced Voice Enginea new tool from speech synthesis which allows you to clone an entry starting from an audio sample of just 15 seconds. The capabilities of the AI model of generate synthetic voices starting from a textual input are demonstrated by some examples published by OpenAI on their site. The company reported that it has already deployed it in APIs text-to-speech and in ChatGPT's “Read Aloud” feature. In an interview given by one of its developers to TechCrunch model training was said to be based on “a mix of licensed and publicly available data.” From the example audio tracks reproducible on the OpenAI blog you can appreciate the great ability of the tool to clone voices, giving all those characteristics of intonation, timbre and “warmth” which are usually not present in synthetic voices (but which in this case there are, of course).

For the moment Voice Engine will be distributed to around ten developers and priority will be given to its use in activities considered “low risk” and considered “socially advantageous”. Among those who are testing Voice Engine is Spotifywhich has been using it since the beginning of September to dub the podcasts of top-level hosts – such as Lex Fridman – into various languages.

What are the possible uses of Voice Engine

The use of a tool like Voice Engine can have great utility in several industries. Among other things, the tool can be used to provide reading assistance children and people do not enjoy an adequate level of education.

Voice Engine can also be used for translate content (such as videos and podcasts) to reach a global audience. Interestingly, when used for this purpose, the tool preserves the native accent present in the “original” voice: to be clear, if it has to generate text in English starting from an audio sample in French, it would produce an output audio in English, but with a French inflection.

The tool can also be used for medical purposes, perhaps to give a voice back to those who have lost it or to those who, due to some disability, have never had one. The Norman Prince Neurosciences Institute, for example, is exploring the uses of AI in clinical contexts of this complexity and, as part of a pilot project that “administers” Voice Engine to subjects with oncological or neurological problems affected by language disorders, has already obtained some results positive. This is the case of doctors Fatima Mirza, Rohaid Ali and Konstantina Svokos who, starting from an audio sample of a young patient (for the record, she had lost the ability to speak due to a vascular brain tumor), were able to restore his voice using the Voice Engine.

What are the risks of OpenAI's AI to clone real voices

In the official press release in which it announced its new tool, in addition to listing the noble purposes with which it can be used, OpenAI also spoke about the possible risks linked to improper use of Voice Engine, saying:

We are aware that Generating speech that resembles the voice of people carries serious risks, particularly important in an election year (the 2024 US presidential elections, ed.). We are working with U.S. and international partners from across government, media, entertainment, education, civil society and more to ensure we incorporate their feedback as we develop the tool.

Cloning a politician's voice to discredit him is not the only danger; even a common WhatsApp voice message from any user could be used to clone his voice and be used to his detriment to discredit him or to perpetrate scams, scams or even to “bypass” security systems based on voice recognition.

In addition to involving partners of various kinds in the development of the tool, what else is OpenAI doing to mitigate all these potential risks? The official press release reads:

Partners testing Voice Engine today have agreed to our usage policies, which prohibit impersonation of another individual or organization without consent or legal right. Additionally, our terms with these partners require the explicit and informed consent of the original speaker and we do not allow developers to create ways for individual users to create their own voice. Partners must also clearly communicate to their audience that the voices they hear are generated by artificial intelligence. Finally, we have implemented a number of security measures, including watermarking to trace the origin of any audio generated by Voice Enginein addition to proactive monitoring how it is used.

OpenAI will release Voice Engine AI to the public?

At the moment OpenAI has not finalized when or if Voice Engine will be released to the public. The use of the conditional tense is a must given that in its official statement the company is directed by Sam Altman said, “It is important that people around the world understand where this technology is headed, whether we ultimately deploy it on a large scale or not.” The reasons for such caution concern the aforementioned risks linked to improper use of such a powerful and effective tool in cloning human voices.

don't miss this article

air head shy kids short film sora openai AI

Here is the first short film created with Sora, OpenAI's artificial intelligence: the video

Sources

OpenAI