Please disable your adblock and script blockers to view this page

WellSaid aims to make natural-sounding synthetic speech a credible alternative to real humans


Google
Apple
Amazon
Google Books
AMP
WellSaid
the Allen Institute
CTO Michael Petrochuk
AI
NPR
toolbox.“There
users’
good.’
MVP


Tacotron
’d
Matt Hocking
Petrochuk
admen):I
Barack Obama
WellSaid

No matching tags

No matching tags

No matching tags


efficient.“Their
Hocking
Petrochuk
AI

No matching tags

Positivity     47.00%   
   Negativity   53.00%
The New York Times
SOURCE: https://techcrunch.com/2019/03/07/wellsaid-aims-to-make-natural-sounding-synthetic-speech-a-credible-alternative-to-real-humans/
Write a review: TechCrunch
Summary

But Google, Apple and Amazon seem unwilling to make their great voice tech available for anything but chirps from your phone or home hub.As soon as I heard about WaveNet, and later Tacotron, I tried to contact the team at Google to ask when they’d get to work producing natural-sounding audiobooks for everything on Google Books, or as a part of AMP, or make it an accessibility service, and so on. “We took research like Tacotron and pushed it even further — but we’re not trying to control speech and enforce this arbitrary structure on it.”“When you think about the human voice, what makes it natural, kind of, is the inconsistencies,” said Hocking.And where better to find inconsistencies than in humans? But the two founders cautioned that’s a ways off for several reasons, even though it’s very much a possibility.“Right now we’re using about 20 hours of data per person, but we see a future where we can get it down to one or two hours while maintaining a premium lifelike quality to the voice,” said Petrochuk.“And we can build off existing data sets, like where someone has a back catalog of content,” added Hocking.The trouble is that the content may not be exactly right for training the deep learning model, which advanced as it is can no doubt be finicky. It’s more efficient to demonstrate for them: “say it like this.”Even so, getting the quality just right with limited and imperfect training data is a challenge that will take some serious work if and when the team decides to take it on.But as some of you may have noticed, there are also some parallels to the unsavory world of “deepfakes.” Download a dozen podcasts or speeches and you’ve got enough material to make a passable replica of someone’s voice, perhaps a public figure.

As said here by Devin Coldewey