The Art of Shapeshifting: How Digital Villains Can Disguise as Anybody

A familiar voice at another end of the line or a face on screen that you know and trust can be a digital clone. Let’s dive into the world of online fakery.

Is this the Real Life?

Deep learning has brought some noticeable metamorphosis in our life. Thanks to it, cars can find a parking space on their own — so many nerve endings spared! It can also detect dangerous diseases, trade stock, help your room both safely navigate through sofas, chairs and cats, and so on.

The scarier side of deep learning allows it to learn and imitate. It can imitate virtually anything it can see, hear or read. For instance, if you feed an AI a bunch of classical paintings from Bosch to Matisse, it will learn at some point to emulate their iconic styles. (Even though pretty vaguely, so far). And even let you paint a picture with words: check Dream by Wombo.

This is the magic of a Generative Adversarial Network (GAN) in action. It can generate anything; from a Borgian fresco to a believable copy of a person’s face or even voice. In fact, the finished product of a GAN can be so realistic, we had to devise a method to expose these macabre falsifications — antispoofing

The Alchemy of Spoofing

The principle behind deepfakes is simple. But its implementation is brain-bending. Commonly, they involve just three stages:

  1. Culprits assemble a training dataset, which can consist of your photos, TikTok videos, voice messages from Telegram and so on. It depends on which part of you they want to replicate: face or voice.
  • Then a nefarious duo — encoder and decoder — are trained to sculpt a deepfake. Encoder is responsible for studying faces of various people, as well as poses, facial expressions, illumination and so on. Decoder trains on the target’s face — a person whose face they want to steal. For vocal deepfakes a similar principle is used.
  • Finally, it comes to face swapping. The neural network surgically extracts your face and slaps it on someone else’s body. What’s really bloodcurdling is that some apps — like Face2Face or FaceRig — allow you to wear your lovely visage like a mask in real time. And voice conversion achieves the same effect, only with your voice.

If you’d like, you can host an experiment and Skype yourself. For that you will need a volunteer, a neural network from GitHub, a pretty powerful computer, and some time to spare.

Liveness on Guard

Luckily, there’s a way to expose digital charlatans and restore justice — liveness detection. Liveness definition — proposed by Dorothy E. Denning — states that it’s a group of parameters that are native to a living person. Based on them, a recognition system can identify you and confirm that it’s you and not some doppelgänger who tries to access your valuables. Now, there are numerous ways to spot liveness.

For example, remote photoplethysmography (rPPg) appears to be a promising remedy. This is a know-how, which can remotely detect heart activity of a person presented in a video. The trick is when we breathe, our skin slightly changes in color. We can’t see that, but a computer can! If the person is real and alive, their facial skin will invisibly shift in coloring as they keep on breathing. If that doesn’t happen — we’re definitely dealing with a deepfake here.

And if it’s voice liveness we’re talking about, then articulatory gestures can be analyzed. It’s a smart know-how, which emits a 20kHz tone from a phone speaker. To hear this tone, you need to be a bat. But you don’t need to hear it at all.

The signal is used to let the phone make sure that it’s really you who utters the passphrase, which unlocks the gizmo. It captures both the passphrase and reflections of the tone, does some analysis lickety-split and decides to open like the magical Sesame cave or stay locked.

There are numerous ways to combat deepfakes, even the best of them. Stay tuned to learn more about falsification techniques and how we can tackle them!

Related Articles

Leave a Reply

Your email address will not be published.