To cleanse the palate, my headline’s misleading. This isn’t quite yet the ultimate fake news. It is, however, a major advance towards producing it.
News consumers know that you can’t believe everything you read, but you can still believe everything you see or hear, more or less. That age is ending, though, and sooner than you think. What will replace it?
The clip comes from researchers at the University of Washington, who developed an algorithm to take audio of someone talking and turn that into a realistic video of someone speaking those words. In the video below, you can see a side-by-side comparison of the original audio—which came from actual Obama remarks—and the generated video.
Obama was a natural subject for this kind of experiment because there are so many readily available, high-quality video clips of him speaking. In order to make a photo-realistic mouth texture, researchers had to input many, many examples of Obama speaking—layering that data atop a more basic mouth shape. The researchers used what’s called a recurrent neural network to synthesize the mouth shape from the audio. (This kind of system, modeled on the human brain, can take in huge piles of data and find patterns. Recurrent neural networks are also used for facial recognition and speech recognition.) They trained their system using millions of existing video frames. Finally, they smoothed out the footage using compositing techniques applied to real footage of Obama’s head and torso.
The component parts of the clip below are real. The audio comes from an actual interview with Obama and most of the footage in the video on the right comes from an actual clip of his presidential weekly address. But Obama never said those words in that setting; the lip movements in the second clip are computer-generated to sync up with the audio from the first to make it appear that he did. A computer would be able to detect that the footage on the right is fake by matching that image of Obama to the original footage, where he said no such words, and by closely scrutinizing the blurring in the movement of his lips. Watch closely and you can detect a certain unnaturalness to his speech that might tip you off that something’s amiss. But they’re already close enough to perfection here that actual perfection in fooling the viewer can’t be far away. I’d compare it to the flaw in the hands of the robots in the original “Westworld.” There’s still a “tell” that what you’re seeing isn’t real, but it’s impressively small. And you can imagine how soon it’ll go away for good.
Perfect the lip movements and add a computer-generated vocal track capable of mimicking a known speaker’s voice so closely as to be indistinguishable from the real thing and suddenly you really do have the ultimate fake news at your fingertips. An F/X specialist could punch up a video of Trump confessing to collusion with Russia and there’d be no way for the average person to tell it’s bogus. Computers probably could still tell, but then we’d be trapped in a second iteration of not knowing what to believe — do we believe the experts who tell us the footage is fake or the experts who inevitably disagree? By 2024, President Trump could be tweeting that the bombshell video of him on CNN admitting to whatever is nothing more than a computer simulation cooked up by someone on the Internet to enable a hit piece. And he might be right.