Alphabet Inc (NASDAQ:GOOGL) artificial intelligence arm DeepMind says it has achieved a milestone in creating machine-generated speech that sounds more natural.
DeepMind has come up with a speech technology that it calls WaveNet, which is capable of emitting synthetic speech that sounds more human. It turns out that WaveNet is about 50% better than existing technologies when it comes to producing sound that would make you think it is a human being responding to your queries.
English and Chinese
Alphabet’s DeepMind ran a blind test of WaveNet in U.S. English and Chinese languages and human listeners felt it had an amazing speech that sounded more natural compared to the machine-generated speech in the so-called text-to-speech programs (TTS).
WaveNet is designed to emit actual human speech by learning how to perform sound waves such as those a human voice would create.
Alphabet’s WaveNet has many advantages over existing machine-speech technologies. However, the one that seems to stand out prominently is that with WaveNet, it is easy to modify the sound of the voice. That is the reason it is possible to achieve more natural speech using WaveNet than typical TTS programs.
A challenging task
DeepMind unit said that coming up with WaveNet was a challenging affair. For instance, the system needed to be trained like a human brain and it obviously takes time and hard work to achieve excellence in that area. Keep in mind that WaveNet belongs to the class of AI called neural network, which are technologies that get better at what they do as they get more training.
Commercial application
Though there is a wide range of potential applications for Alphabet’s WaveNet in real life situations, the company has not thought of immediate commercial application of the technology. Perhaps that tells you that DeepMind intends to refine the technology further before it can begin to bring it to market.
Alphabet got hold of DeepMind in 2014 through an acquisition that cost it $533 million.