Adobe tech intern Zeyu Jin shows off his Adobe Sneaks pitch, Project VoCo, to Sneaks hosts Kimberley F. Chambers, a community engagement manager with Adobe, and comedian Jordan Peele. (Courtesy Adobe)
Adobe tech intern Zeyu Jin shows off his Adobe Sneaks pitch, Project VoCo, to Sneaks hosts Kimberley F. Chambers, a community engagement manager with Adobe, and comedian Jordan Peele. (Courtesy Adobe)

Published: November 4th, 2016

SAN DIEGO, Calif. – Do you have a 20-minute file of someone talking? If so, you could use a future Adobe product to put words into their mouth, literally, if this year’s Adobe Sneaks is anything to go on.

Developed by Zeyu Jin, a tech intern with Adobe Systems Inc.’s creative technologies division, Project VoCo appears to have captured the most attention among the features showcased by the company’s annual competition, but with good reason: It brings many a paranoiac’s worst fear to life by making fake speech sound extremely convincing.

“We’ve made a lot of breakthroughs in the past decade with photo editing, right?” developer Jin told the media during a pre-competition conference. “So why not do the same with speech?”

To demonstrate Project VoCo’s capabilities, Jin played an audio clip of comedian Keegan-Michael Key (not coincidentally the frequent partner of Jordan Peele, who hosted this year’s Adobe Sneaks event) delivering a joke mid-conversation regarding the day he won an award:

“I jumped out of bed, and I kissed my dogs, and my wife, in that order – ”

Here Jin stopped the recording and demonstrated how, simply by typing into the Project VoCo interface, which displays the language used in the audio file in an easily edited way, he could make Key appear to say:

“…and I kissed my wife, and my wife – ”

But that hadn’t been Jin’s intent. Key doesn’t have two wives; Jin had wanted Key to acknowledge his wife first, then his dogs:

“I kissed my wife, and my dogs.”

Then, to showcase the new feature’s capabilities, Jin replaced the words “wife and my dogs” with something else:

“I kissed Jordan, and my dogs,” Key now apparently said.

Then, simply because he could, Jin changed the message once again:

“…and I kissed Jordan three times – .”

It must be emphasized – and believe us, we’re trying to be objective here – that each time Jin changed what Key was saying the new audio sounded natural, as if Jin had recruited Key to record it so that he could hoodwink some unsuspecting journalists, but no – Jin insisted it really was an electronic version of Key using his words.

When asked how Adobe would address the inevitable security concerns behind such a product, Jin admitted that to an extent he and the other developers were counting on users to refrain from using this sort of feature for nefarious means, noting that he and the feature’s other developers believed it would mainly be useful for recording media such as audiobooks and podcasts.

However, he also said that as the development team strove to improve the feature, making it sound as natural as possible, security would be a leading concern as well.

“We hope that people will have the personal constraint of not using it in a bad way,” Jin said. “At the same time… if [people are] really going to use it in a bad way, we’ll have to find a way to protect the feature and detect its use.”

He also said that to effectively replicate a person’s speaking voice, Project VoCo would need to build a library of phonemes (vocal noises) from a recording of adequate length – around 20 minutes.

Other highlights from this year’s Adobe Sneaks, the Adobe Max conference’s annual showcase of untested, unexpected, and often useful features, included Project Quick Layout, an Illustrator patch that would make objects on a poster automatically accommodate new additions in an attractive way; and Project Clover, a VR video editor that can be used within a VR interface.

Share on LinkedIn Share with Google+
More Articles

  • SavageNarce

    Where to begin with the list of ways this is open to abuse?

    Let’s start with the election. Not only could Trump call Clinton “Crooked Hillary”, he could actually get her to admit she broke the law.

    And let’s not forget the potential for embarrassing quotes. If you
    think Trump has made some verbal mis-steps in his campaign, just imagine
    what kind of gaffes could be attributed to him using this software.

    Then there’s the “voice recognition security” question – if you think you recognize someone’s voice when they call you on the phone, how do you know if it’s actually them on the other end of the line? And in that case, the data isn’t transferred, so you can’t look for a watermark or other indication that it has been processed by Sneaks.

    And consider the potential for crime. Kidnapping ransom messages coming from Tom Cruise. 911 calls coming from Barack Obama. Bomb threats made by Queen Elizabeth. School shootings because someone got a message from God (Charlton Heston? George Burns?) telling them to do so. Incitements to riot made by Martin Luther King – and no, it doesn’t matter if someone is alive or dead.

    Adobe is a responsible company and won’t let this kind of thing happen, you say? Just do a web search for, say, Nude Michelle Obama and see how many Photoshopped images show up.