Imaginary Worlds - Stuck in the Uncanny Valley

Episode Date: March 22, 2018

The holy grail for many animators is to create digital humans that can pass for the real thing -- in other words to cross the "uncanny valley." The problem is that the closer they get to realism, the... more those almost-real humans repulse us. Blame evolution for that. I talk with Hal Hickel from ILM who brought Peter Cushing to life on Rogue One, Marianne Hayden who worked on games like The Last of Us and Uncharted for Naughty Dog studios, Vladimir Mastilovic from 3Lateral studios who worked on Hellblade: Senua's Sacrifice, and SVA instructor Terrence Masson about what it takes to cross that valley.Learn more about your ad choices. Visit megaphone.fm/adchoices Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:01 This episode is brought to you by Secret. Secret deodorant gives you 72 hours of clinically proven odor protection, free of aluminum, parabens, dyes, talc, and baking soda. It's made with pH-balancing minerals and crafted with skin-conditioning oils. So whether you're going for a run or just running late, do what life throws your way and smell like you didn't. Find Secret at your nearest Walmart or Shoppers Drug Mart today. A special message from your family jewels brought to you by Old Spice Total Body.
Starting point is 00:00:35 Hey, it stinks down here. Why do armpits get all of the attention? We're down here all day with no odor protection. Wait, what's that? Mmm, vanilla and shea. That's Old Spice Total Body Deodorant. 24-7 freshness from pits to privates with daily use. It's so gentle.
Starting point is 00:00:53 We've never smelled so good. Shop Old Spice Total Body Deodorant now. You're listening to Imaginary Worlds, a show about how we create them and why we suspend our disbelief. I'm Eric Malinsky. So, way back in the 20th century, when I was studying animation, our instructors used to tell us to avoid animating realistic human beings. Because what's the point? I mean, you can animate anything you want.
Starting point is 00:01:22 And more importantly, you can't do it. No one can cross the Uncanny Valley. Now, in case you don't know what that means, the Uncanny Valley was first proposed in 1970 by a roboticist named Masahiro Mori. He predicted that in the future, as robots look more and more like humans, they would actually repulse us
Starting point is 00:01:40 because they're so close to human beings, but not quite there. The reason why this uncanny valley exists is because we are biologically wired to read human faces. If anything is slightly off, we will notice it right away. And it's because we have a primal instinct to avoid people that may appear psychotic or diseased. That's why to me some of the scariest zombie movies are not the ones with a lot of makeup, but just the ones where the people are just slightly off. Now, robotics are still not there yet. I don't think anyone will mistake the animatronics at a Disney World ride for a real
Starting point is 00:02:17 person. But computer animation, I mean, nobody saw that coming. I mean, I remember how shocked people were by the digital dinosaurs in Jurassic Park. And those special effects are ancient by today's standards. And speaking of Steven Spielberg, his new movie Ready Player One is coming out, and it's based on the novel by Ernest Cline, which takes place in the future, where the Internet has evolved into a virtual reality universe, where everybody interacts with each other through their digital avatars. This is the Oasis, a whole virtual universe.
Starting point is 00:02:54 You can do anything, be anyone, without going anywhere at all. And when I read the book, I imagined these avatars to be completely realistic. But when I watched the trailers, the avatars were so digital looking. It completely triggered the Uncanny Valley in me to the point where I'm just not going to see the movie. I mean, I'm just sticking with the version of Ready Player One that was in my imagination when I read the book. Which made me wonder, why is it so hard to cross the uncanny valley? And why is it still the holy grail of computer animation? Are we getting closer? What's holding us back? And what happens when we cross it? We'll strap on your VR headsets after the break.
Starting point is 00:03:41 That's after the break. Now, this episode is going to get very technical, but it's actually really personal for me because even though I left animation, I've often wondered if I'd stayed, what would I be working on? And how would my job evolve with the technology. Because, you know, when I studied animation, I mean, we were filming pencil drawings with these clunky video cameras, you know, in rooms surrounded by thick black curtains. So I was curious, how is the Uncanny Valley being taught today? Well, I visited Terence Masson,
Starting point is 00:04:16 who runs the computer animation department at the School of Visual Arts in New York. And when I showed up on a Friday afternoon, he was lecturing about a dozen students at their computer terminals. I have this example of why CG characters so often look bad and why it pisses me off. It really
Starting point is 00:04:34 does. Now Terrence has worked in the industry for decades. He was an animator on the Star Wars prequels. He worked at major video game companies. He even helped Trey Parker and Matt Stone build the software for South Park. Thanks, everybody. Thank you.
Starting point is 00:04:50 So after the lecture, we sat in his office, and he explained that the big breakthrough in digital humans came about 10 years ago with a technology called subsurface scattering. Subsurface scattering basically simulates all the inconsistent splotches, freckles, colors, and hairs in the human skin. But it can also simulate something much more subtle and important. Everybody has done this, including my five-year-old as a kid. You put a flashlight behind your fingers, and your fingers kind of glow red, right? Because the light enters the skin, it scatters,
Starting point is 00:05:27 it bounces around, and then it reflects back out having picked up some of that color of blood. So to be able to do that in CG was basically a huge necessary leap forward to accurately render flesh, skin. And before that, it just looked plastic. So this is sort of, you're talking about like the, almost the internal light that a person has that's coming from inside of you as a living person, as opposed to being a corpse that makes your skin luminescent. Is that what you're saying? Yeah, exactly right.
Starting point is 00:05:54 It's the thing that makes us look healthy. Yeah. But subsurface scattering can make the uncanny valley worse because if a digital human looks photorealistic, then our expectations are sky high once it starts moving. So I asked Terrence, what are the basic problems with the uncanny valley? And he just started ticking them off. First of all, no two people form their words in the same way. Everyone's mouth shapes are idiosyncratic, and that requires a lot of extra work from animators who are sometimes crunched for time. Secondly, if a digital human is seen in a close-up or a medium shot,
Starting point is 00:06:29 very often the animators will only animate what we see on screen. But the human body never stops moving. Otherwise, it's just somebody on a stick, you know, a severed torso on a stick. But he thinks the big problem are the eyes. In fact, his favorite term is eye darts, which is when your eyes dart back and forth while you're thinking. Animators often forget to put these in.
Starting point is 00:06:52 And then they'll do, you can tell someone said, well, you better have the eyes move. So they'll just, they'll kind of look at something else in the room and then come back to the center position. And without that kind of animated thought going on within the character's mind, it just goes dead. It's funny, once you pointed this out, I started noticing this all the time.
Starting point is 00:07:16 I mean, as people are speaking, their eyes are darting back and forth, almost like micro movements. And to demonstrate this point, Terrence showed his students footage of the most eye-darty person he could think of, the director Martin Scorsese. That's the thing, if the actor, he or she is moving and they're on a level... When he talks about working with someone on set about a film or about something he's recalling from 50 years ago, you can watch his eyes and he's actually imagining what he's talking about in his mind and his eyes are darting around and following those ideas and tracing what he's talking about in his mind and his eyes are darting around following those ideas and tracing what he's recalling it brings it all alive and he's going to raise that gun he's going to say i saw you coming and i really knew he had something when he said i i know you're talking to me because i'm the only one here now we tend to think of the uncanny valley with movies but
Starting point is 00:08:00 there is a much bigger imperative to get them right in video games. Now, if you haven't played a video game since Atari, they've gotten a lot more complex. In fact, the way that a lot of the major games work, the big budget games, is you control a character who is running, shooting, jumping, kicking, punching, you know, video game stuff. And then when you reach your goal, you can kind of put down your game controller because a cinematic or a cutscene will come up, which is like a computer animated movie that will illustrate the next story beat. And that's where you see the characters up close as we get all the subtle acting. And for a long time, that's where you'd be really trapped in the uncanny valley. So animators today are not just trying to move game animation out of the uncanny valley, but they're also trying to blend those two.
Starting point is 00:08:46 So you don't just put down your game controller and watch this cut scene. Everything is blended together. So you really feel like you're controlling a character in a movie. And that puts a lot of pressure on animators like Vladimir Mostilovich. Yeah, I mean, in movies, it's much easier. He runs a game studio in Serbia called Three Lateral. And he says the big difference in a movie is that whatever you see is what the director wants you to see. But in a lot of video games...
Starting point is 00:09:13 You can't control the camera. You can't control, you know, from which angle the player will view the characters. So the challenge is orders of magnitude greater. Now, his company specializes in scanning real actors to create digital doubles. And his company was part of a team that collaborated on a game called Hellblade Senua's Sacrifice. Senua is the name of the main character. She's an eighth century Celtic warrior, goes on an epic journey. But she's also mentally ill. She thinks that her psychosis are supernatural voices. In the game, these voices manifest themselves as mythical
Starting point is 00:09:52 characters or doppelgangers of her. It's a trippy game with a very spooky atmosphere, and the attention to detail is gritty. All of her suffering will have been for nothing. Now until Vladimir explained this to me, I hadn't really thought about it because when you watch a movie, you just put it on and it plays. But a game is recreated every time you play it. So the more acting a character
Starting point is 00:10:18 does, that requires a lot more memory power from the game console. It's usually the high and intense emotion, which is also hardware intense, because that's where you need the most, the biggest number of facial shapes and the code that runs the faces. And so if you want really subtle, complicated acting on a human face in a video game, you also have to figure out how to compress all that data.
Starting point is 00:10:44 That's why we have this very elaborate level of detail systems, which completely invisibly to the player shed off a lot of weight when it comes to the asset itself. Now, of course, they don't create these digital humans from scratch. They're usually based on actors wearing motion capture suits, otherwise known as mocap for short. And I'm sure you've seen the behind the scenes footage where, you know, and the actor is wearing like a full body black leotard and they've got ping pong balls all over their body and dots on their faces. And there's a kind of a rig on their
Starting point is 00:11:15 heads that has a camera mounted that's pointed directly back at their faces. And then that performance gets transferred onto the digital character. And Vladimir's company actually invented a technology that allows that transfer to happen instantly. So the actor performs and in real time, the digital character is reflecting that performance. And he says they need to do all this because the animators would never think to add things like... The specific way of how the skin on the lips gets stuck with the upper lip or the lower lip or how the eyelid unfolds against the skin that covers it or the jiggle of the iris. So, you know, you would say that that amount of detail is crazy. But for some reason, we perceive all that. So another game studio that's doing really incredible hyper-realistic human beings is Naughty Dog in Santa Monica, California.
Starting point is 00:12:22 Naughty Dog's two big franchises are the Uncharted games, which are like Indiana Jones adventures, and The Last of Us, which is a zombie-type adventure, but the zombies are really monstrous. Now, there are two main characters in The Last of Us. There's a young girl named Ellie and this older guy that's mentoring and protecting her named Joel. And the trailer for the second game in the series just came out recently. It takes place a few years later. Ellie is now a teenager, and we see her in a room. There's blood on the floor, blood dripping down her face. We don't know what's going on, what happened, but she's calmly playing the guitar.
Starting point is 00:12:49 And the close-up on her hands is incredibly realistic. And we cut to her face, blood dripping down, and she says to Joel, I'm going to find and I'm going to kill every last one of them. And I'm going to kill every last one of them. And it's weird because she doesn't look like a real person. She looks like a hyper-realistic digital person. But she felt more real and alive than really any video game character I've seen.
Starting point is 00:13:24 I talked with Marianne Hayden, who's an animator at Naughty Dog Studios, and I asked her, why are digital humans looking so much better? She says it's because the motion capture has gotten so much better. You know, the first couple games of Uncharted, you can look at the motion capture data. It kind of looks jagged around the edges, even though it's been touched by an animator. It's just much more precise now, picking up a lot of the nuances that maybe we didn't see before. Now, Marianne went to the same animation school that I went to, CalArts. And, you know, our program really stressed creative freedom. And I asked her, does she feel less creative freedom working with motion capture? She said, no. You know, first of all, the actors
Starting point is 00:14:00 never look exactly like the characters. So there's a lot of tweaking there. Also, they have to caricature the movements that the actors do to make them feel more real, which is a weird kind of trick of animation that shouldn't make sense, and yet it does. And sometimes they'll sort of stitch together different performances the actor gave to create an original acting moment that was invented by the animator. Some people feel like if we're losing animation when we have the motion capture. I just think it's just a base, such a base to start at as an animator and having the eye of an artist take it one step further.
Starting point is 00:14:36 But what amazes me about the characters in the Naughty Dog games are the eyes. Like this scene between the characters Drake and Elena fromena from the uncharted video games those are the indiana jones type adventures and i've seen previous versions of drake and elena which are very plastic looking but in the latest uncharted game they're hyper real and their acting is so subtle like in this scene they're having a difficult conversation they're shifting their weight they're having trouble making eye contact And their eyes are doing those kind of micro eye darts. Come on, wait. Elena, wait!
Starting point is 00:15:10 I don't get you. Look, I wanted to tell you. You know what? Enough. No, I wanted to, but how could I? I don't know. Just say it. I had to protect you. That is bullshit, Nate. You just didn't have the nerve to face me again.
Starting point is 00:15:22 I knew you would react like this. And Marianne says the motion capture suits do track the eyes of the actors. But there still needs to be an extra layer of love put into the eye movement. And it's not just the eyes moving, because when the eyes move, your eyebrows move and like like your cheeks move, and your nose moves sometimes. If you're smiling and looking around, like your whole face is alive. And if part of that isn't captured in the data, then we have to go back and add that in. And I think that part's getting it as perfect as it can be from like taking that original performance and then amplifying it so that it
Starting point is 00:16:06 crosses the uncanny valley that's the tricky part that's that's what our job is and of course telling a good story i think the more immersed you are hopefully the less you pay attention to the fact that these aren't really real people but they feel like they're real because you're emotionally invested and visually invested. If it looks really great, it plays really great, and you're enjoying it, then I think you're not in that valley anymore.
Starting point is 00:16:37 But a game studio is still limited by the processing power of the game console, time, money, the amount of people you can hire. You know who isn't limited by very much? Industrial light and magic. In 2016, they took a big leap into the uncanny valley with the movie Rogue One. And if you haven't seen Rogue One, spoilers ahead.
Starting point is 00:17:01 The movie takes place right before the original Star Wars A New Hope, and it was about how the Rebels stole the plans to right before the original Star Wars A New Hope, and it was about how the Rebels stole the plans to blow up the Death Star. And the filmmakers wanted to bring back characters from the 1977 film. Darth Vader was easy. They just got James Earl Jones to do the voice, and they have a new guy in the suit. But they also needed Darth Vader's right-hand man, Grand Moff Tarkin. The actor who played him, Peter Cushing, died in 1994. So they had an actor on the set, Guy Henry, who played Tarkin in a motion capture suit. And then they animated a digital version of Peter Cushing on top of Guy Henry's performance.
Starting point is 00:17:39 But to make things even more challenging, this digital version of Tarkin was sharing the screen with real flesh-and-blood actors. We've heard word of rumors circulating through the city. Apparently, you've lost a rather talkative cargo pilot. If the Senate gets wind of our project, countless systems will flock to the rebellion. When the battle station is finished, Governor Tarkin, the Senate will be of little concern. Hal Hickel was the animation supervisor on that character. You know, I think we had shots of Tarkin in Rogue that were great and totally convincing.
Starting point is 00:18:15 And then there were others that were less so. And he mentioned all the issues we've talked about before. Eye darts that feel motivated, mouth shapes that were specific to Peter Cushing and not Guy Henry, animating the full body, even if it's not in the shot. But they had another problem. We've only seen Tarkin in the original Star Wars, which had this harsh 1970s-style lighting. The lighting in Rogue One was more subtle and modern.
Starting point is 00:18:39 And so we would put our CG Tarkin into these shots in that lighting, the Rogue One lighting. And we'd work on it, we'd work on it, we'd work on it. And we'd feel like, boy, we're so close. But it just doesn't, it feels like Peter Cushing's cousin or something. What's the problem? And so we did an experiment where we took same animation, same texture, same model that we've been working with. And all we changed was the lighting.
Starting point is 00:19:03 And we lit it like one of these shots we've been staring at from 1977. And boom, it was an instant improvement in likeness. It was like, oh, that's Tarkin. That looks great. That's Peter Cushing. But the problem was when you light him that way, then he doesn't match with anything else in the shot and it looks pasted on. They eventually managed to find a balance between the two styles of lighting. But the other thing that's difficult about digital humans is the sort of swamp of opinions that you find yourself in. Because, you know, you assemble the best team you possibly can, and you are all together every day reviewing the work and looking at it, but
Starting point is 00:19:39 what you find often happening is, you know, you're all in there looking at it, and somebody goes, you know what it is? The forehead too high and then somebody else goes no no forehead's fine we checked it's you know what it is I think the nose is just not quite long enough or whatever you know and then somebody else is like no no the nose is fine it's look at the cheeks the peaks and the quite I swear and you know it was rare that everyone would walk in and there'd be this consensus exactly what the issue is. And identifying what those final little percents of believability are and realism are is just the very hardest thing. Now, I have nothing but reverence for the skills of the crew at ILM. I
Starting point is 00:20:21 mean, they are some of the best animators in the world. But I have to admit this, Tarkin didn't quite work for me. I mean, he looked great, but maybe this is my animation training, but I could feel the decisions that the animators were making. Like I could see when they decided his eyebrows should crinkle right here, or he should blink and turn his head. Now Hal did get that kind of feedback from some of his colleagues, and he also got a lot of praise from people in the industry. But amongst general populace moviegoers, like I've given a bunch of talks since the movie came out,
Starting point is 00:20:54 and some of them have been to completely non-industry people. And I would say the vast majority of those folks, I've had tons of them come up to me and say, oh, I thought you recast it or something, you know. So that tells me that we got most of the way there. But if Grand Moff Tarkin was hard enough to animate, recreating the young Princess Leia was even harder, and she was only in one scene. The architecture of Tarkin's face, it's kind of hard to describe, but it kind of just gave us more to work with. Whereas her face is just, especially at that age, 18 or 19, I think, and it's just this perfect form with, like, flawless skin.
Starting point is 00:21:35 And as soon as you started moving things and lighting it, just, like, anything that was even the tiniest bit off was glaring. Your Highness, the transmission we received. Anything that was even the tiniest bit off was glaring. Your Highness, the transmission we received. What is it they've sent us? Help. Now, while they were making the film, they tried to schedule time with Carrie Fisher to do motion capture,
Starting point is 00:22:01 but she was not available. Shortly beforehand, the producer Kathleen Kennedy did show her the footage of the young Princess Leia. You know, Kathy showed her the shot when it was done, before the movie came out. Hal was really nervous about it. Finally, word came back. She loved it. That really made us all feel good. That was like the thing we were biting our nails about the most, to be frank. I mean, we knew it had to be the capper on the film, and that was enough pressure. But honestly, the thing we cared about the most, I was carrying a feel about it. So I was curious, are there any digital humans that completely blew him away? And he did not hesitate for a second. He said, Blade Runner 2049.
Starting point is 00:22:42 And if you have not seen Blade Runner 2049, sorry, this is going to be another spoiler. So that movie, of course, is a sequel to Blade Runner from 1982. And they wanted to recreate the character of Rachel, who is a replicant, which means she could be reborn in any moment. In this case, the animators were able to get the original actress, Sean Young, to do the motion capture performance. Don't you love me? But that doesn't necessarily mean it's going to work because Jeff Bridges, Robert Downey Jr., Kurt Russell, Michael Douglas have all played younger versions of
Starting point is 00:23:17 themselves in flashbacks and movies. And those were cool special effects, but when you're watching those scenes, you could tell it was a special effect. I asked Hal, in his professional opinion, what did they do differently in recreating the young Rachel? And he says when you're 99% almost all the way there with the uncanny valley, even he can't put his finger on what they did right. It was the absence of, you know, she comes on screen and you go, oh, that's cool. But it was the absence of the but. It was just like, wow, it's her. That's amazing.
Starting point is 00:23:54 This whole experience has left him feeling kind of frustrated. You kind of get to a point of what's the point? And he says, you know, his company, ILM, gets contacted all the time to resurrect actors who are no longer alive. I'm very squeamish about that. I mean, people have asked me in some of the talks I've given some pretty pointed questions about the morality of even, you know, what we did with Tarkin. And with Tarkin, I felt really assured because he only did in that film the same things he did in New Hope, which is to, you know, stand around on the Death Star and bark at people about, you know, firing the Death Star laser. On the other hand, if someone came in and said, you know, we want to hire you guys to do a TV commercial and we want to put Jimmy Stewart in it, I'd have to just decline.
Starting point is 00:24:41 But recreating celebrities is not just for the pros anymore. There's a new app called Deep Fake where you can try this at home. But recreating celebrities is not just for the pros anymore. There's a new app called Deep Fake where you can try this at home. And in fact, a bunch of people use the app to reanimate that scene of Princess Leia from Rogue One. There are all these articles saying, look, these people did it just as good. But they really, really didn't. I'm sorry. It just does not even look close. But Deep Fake is being used to create fake sex tapes of celebrities. And even more disturbing, it's being used to create digital versions of politicians saying things they never said.
Starting point is 00:25:16 That is a whole new scary level of fake news. But Terence Masson is still optimistic about where this technology is going and how it could be used. Eventually it'll become so much automatic and so cheap. That's going to be the endgame, is that it'll be available in real-time augmented reality and virtual reality and photorealistic and hyperreal. In other words, he's talking about the world of Ready Player One. Yeah, yeah, but really cheap and just everywhere. And yeah, everywhere and everything. You know, when I studied animation,
Starting point is 00:25:52 our teachers would often use a phrase called the illusion of life, which came from these two Disney animators, Frank Thomas and Ollie Johnston, who said when you animate your first character, even if it's just a bouncing ball, you're going to be so amazed at the illusion of life that you created. You're going to be hooked on animation forever. And I remember that feeling 20 years ago, thinking, oh my god, I created something that looks like it's
Starting point is 00:26:17 alive. I'm going to do this forever. And I didn't do it forever, but I remember that feeling. And in fact, talking with these animators reminded me how great that was. And it made me miss it again. And this, you know, this desire to create life, it's what makes us human, not just for the survival of the species. And whether we think this technology is good or scary or both, we will never stop wanting to create the illusion of life. Even if the life we create is only good at convincing us, it's human. Well, that's it for this week. Thank you for listening. Special thanks to Terrence Masson, Vladimir Mastilovich, Marianne Hayden, Hal Hickel, and all the other experts that I talked with that I just didn't have room to include. Imaginary Worlds is part of the Panoply Network.
Starting point is 00:27:10 Stephanie Billman is my assistant producer. You can like the show on Facebook. I tweeted emolinski and Imagine Worlds pod. My website is imaginaryworldspodcast.org. Panoply.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.