Once upon a time if you wanted to prove that someone said something, the easiest way was to record their voice saying it and just play back the audio. Well it appears that the use of audio recordings to prove that someone said something is now a thing of the past.
Baidu (one of the Chinese Tech giants) now only needs a 3.7 second recording of your voice to be able to fake you saying anything. A year ago the company’s voice cloning tool, Deep Voice, needed 30 minutes of your voice recording to achieve the same result. That is how fast the technology is improving.
Listen here to several cloning examples to show the breadth of how this technology can be applied. The technology can switch the gender of the speaker, add accents and change the style of the speaker.
Google recently unveiled Tacotron 2, a text-to-speech tool that leverages the company’s neural network and speech generation method, WaveNet. WaveNet analyzes a visual representation of audio called a spectrogram to generate audio. It is used to generate the voice for Google Assistant. The quality is so high it is now difficult to tell between what is machine generated and human.
What could this be used for? Well someone created a recording of Jordan Peterson, the author of 12 Rules for Life, doing a version of Eminem’s “Lose Yourself”.
The creator in this instance used 6 hours of Peterson speaking (as a public speaker more than enough samples could be found on the internet) to create the audio. Baidu has now shown that this type of substitution can only take seconds.
So what are the commercial applications for this type of technology? Lyrebird is an AI company that will create voices for you. Chatbots, Audio books, hotlines, video games and text readers are all places where you may like to use this newly created voice.
Who knows what the future products might be? Your cars’s navigation system may use your partner’s voice to direct you. When you buy an audio book you might be able to choose who you want to read it to you, breaking news on the internet could be read to you by your choice of celebrity.
The potential for misuse of this technology is vast. It is more difficult for humans to detect fake voices than to detect fake images. Think about how fake voices might be used in interviews or news conferences to make people think that they are listening to an authority figure or the CEO of a company.
Our ability to critically assess a situation, evaluate the source of information and verify its validity will become increasingly important. Don’t believe everything that you hear!
Paying it Forward
If you have a start-up or know of a start-up that has a product ready for market please let me know. I would be happy to have a look and give the start-up a shout out to my readers if it is something that I think they could use. If you have any questions or comments please email me via my website craigcarlyon.com
Till next week.