Artificial intelligence can clone our voice: the new tool for defending ourselves

Artificial intelligence advances its unstoppable race, touching more and more sectors and surprisingly improving its capabilities. Among these is that of being able to clone the human voice almost perfectly. Since a cloned voice could be misused to generate deepfake audio files to perpetrate with scams and generate misinformation, several researchers are thinking of solutions that can mitigate the problem. Among them there is a tool, called coincidentally AntiFakewhich is designed to make life more difficult for cybercriminals trying to acquire voice data with which to clone the voices of their victims.

How much material does the AI ​​need to clone the voice

The development of tools that counteract the cloning of the human voice has become of fundamental importance because, unlike a few years ago, modern AI systems are able to clone the voice starting from audio samples of just a few seconds. A clear example of this is the Voice Engine tool from OpenAI – the company that develops ChatGPT – which recently came under fire due to the alleged voice cloning of the actress Scarlett Johansson done without permission, which is capable of clone your voice in just 15 seconds (given its potential danger, the tool has not yet been released publicly).

Until some time ago, artificial intelligence was able to clone the voice based on rather long audio samples (at least 30 minutes), which had to respect specific qualitative and expressive standards in order to allow the algorithm to synthesize the voice perfectly. Today things are evidently different, given that potentially a simple WhatsApp voice message can be enough and fed to an AI tool to copy another person's voice.

What tools exist to prevent AI from cloning your voice

Since anyone who usually sends voice messages (even occasional ones) or leaves recorded messages on the answering machine has in fact already provided more than enough material to be cloned vocally, the advice not to send material of this type to anyone, therefore, is not very valid. valence. And let's face it: our voice could easily be recorded without our knowledge with any smartphone.

This is why it is necessary counter technology with other technology. It is the only way to “fight on equal terms”. Speaking of “technological weapons”, the one developed by the computer scientist and engineer Ning Zhang of the research institute McKelvey School of Engineering from the Washington University of St. Louis seems to promise quite well. AntiFakethe tool that Zhang developed and presented at a conference on cybersecurity atAssociation for Computing Machinery held in Copenhagen, Denmark on November 27 last year, has a proactive approach compared to conventional methods to detect deepfakeswhich only take effect when the damage has already been done.

AntiFake, in fact, uses techniques similar to those used by cybercriminals for voice cloning to effectively protect voices from piracy and counterfeiting. In explaining how his “creature” works, Zhang reported:

The tool uses an adversarial AI technique that was originally part of the tools of cyber criminals, but now we use it to defend ourselves against them. We slightly muddy the recorded audio signal, distort or perturb it just enough so that it still sounds good to human listeners.

Once this is done, the audio therefore remains of good quality for human listeners, but at the same time renders it unusable for training a vocal clone.

Credits: AntiFake.

Clearly, AntiFake is not the panacea for all ills. As he points out Ben Zhao from the University of Chicago, which was not involved in the development of AntiFake, it is true that tools of this type can “raise the bar and limit the attack to a smaller group of highly motivated individuals with significant resources” but it is also true that, like all digital security systems, it will never provide complete protection and will be threatened by the persistent ingenuity of scammers. The fight against evil, even in the IT field, is a continuous process.