Front Burner - AI video’s groundbreaking, controversial leap forward

Episode Date: February 20, 2024

 OpenAI has just introduced a new tool, Sora, which turns text prompts into short, shockingly realistic videos. Sora hasn’t been released to the public yet, but it’s already sparking controve...rsy about its potential implications for industries like animation and video games, as well as for deepfake videos — and for democracy as a whole.Today, Gary Marcus — a cognitive scientist, AI researcher and entrepreneur, and author of the forthcoming book Taming Silicon Valley — talks to us about the promise and potential consequences of Sora and other generative AI video tools.

Transcript
Discussion (0)
Starting point is 00:00:00 In the Dragon's Den, a simple pitch can lead to a life-changing connection. Watch new episodes of Dragon's Den free on CBC Gem. Brought to you in part by National Angel Capital Organization, empowering Canada's entrepreneurs through angel investment and industry connections. This is a CBC Podcast. Hi, I'm Jamie Poisson. So on Thursday afternoon, sitting around the office, everyone has one eye on a script, the other on Twitter or X, whatever. And these videos, they start popping up. A hamster riding a half duck, half dragon as the sun sets.
Starting point is 00:00:50 Two golden retrievers sitting on a blanket on a mountain podcasting. A train zooming through a futuristic city. They were impressive, weird, but impressive. And the product of OpenAI's new tool, a text-to-video product called Sora. Watching these videos, we were all like, wow. But then, as more and more of them kept getting posted, oh wow, it kind of turned to, oh no. There was a realistic food influencer making gnocchi, a grandmother blowing out her birthday cake candles. And while they weren't 100% accurate compared to where the tech was a year ago,
Starting point is 00:01:29 they are definitely getting closer to mirroring reality. Today, I'm speaking with Professor Gary Marcus. He's a cognitive scientist, an AI researcher, an entrepreneur, and author of the forthcoming book, Taming Silicon Valley. And we're going to talk about SORA, what we know and don't know about how it works, and its potential implications for creative industries, deep fakes, and even democracy as a whole. Gary, thank you so much for coming on to FrontBurner. It's great to be back.
Starting point is 00:02:11 So what is Sora and what do we know about how it works? Sora is something where you type in a piece of text, a description of something, and it generates a short video for you. So you can say, make me a point of view video of some ants walking in an ant colony. And we'll actually do that. It may not work perfectly, but it'll give you something that looks at least a little bit like what you're talking about. Often something that looks a lot like what you're talking about. And often something that if you look really carefully, isn't quite right. Last week, the OpenAI CEO, Sam Altman, he got people to tweet at him with like prompts, those text prompts that you're talking about.
Starting point is 00:02:47 And the videos that came out right away, I mean, there were some that were pretty amazing, like two golden retrievers podcasting on the top of a mountain. Or one with dolphins and other animals riding bicycles on top of the ocean. And they were fantastical scenarios, but these ones, they did look incredibly realistic. I'm looking at one right now. It's a woman in a leather jacket walking down a Tokyo street filled with warm sort of glowing neon signs. And it is very impressive. I have to say. It looks almost completely real. It looks almost completely real. Most of them have problems. So for example, the woman walking on the street, there are people in the background, if that's the one I remember, where it looks like
Starting point is 00:03:36 they're almost like zombies kind of floating around. The woman actually takes two left steps at about 28 seconds in, which is not biologically possible. So when you start to look carefully at them, and of course, that's this generation, and we can ask what will happen in future generations. But if you start to look carefully at these videos, there are often a lot of violations of physical laws. The one that's maybe most disconcerting to me is that objects pop in and out of existence, which six monthmonth-old babies realize can't actually happen. So there's one, for example, of wolf pups. And if you look at it carefully, the number of wolf pups changes from one frame to the
Starting point is 00:04:13 next. There's another one where, I don't know if they're archaeologists, what they're supposed to be, but they dig a chair out of the ground. The chair starts to levitate at some point. And one of the people walks behind another. And when the camera kind of shifts around that, the first person I think is in a tan shirt has just disappeared altogether. So there are violations of physical laws. Another video showed, I mentioned already, the ant walking through a colony. And if you look carefully at it, the ant only has four legs. And
Starting point is 00:04:42 most normal, well, all normal ants have six legs. It'd be very weird to see an ant with four legs. And so somebody posted, wow, I can't believe they even got the dynamics of the legs right. But they didn't get the dynamics of the legs right. Wasn't even the right number of legs. And if you watch that whole video, there's like this weird two headed ant that pops up and so forth. So there are a lot of you might call them glitches. And from the perspective of cognitive science,
Starting point is 00:05:09 this matters because you want to know, does this thing really understand the world? And I think the answer is no. I think something else is going on here. But I would say they look fantastic. They look photorealistic, detailed, sharp graphics. So there's something absolutely stunning about them. But there's also something, if you look carefully, for many of them, maybe not all, but quite a lot of
Starting point is 00:05:30 them, there are little glitches that don't quite meet reality. But having said all that, and that was fascinating listening to you go through all those examples. I could like see the glitches in the woman in Tokyo video in real time as you were describing it. But having said all that, the speed at which this technology can improve is pretty breathtaking, right? Like if we look at another example, the OpenAI tool ChatGPT, when it was first released in the fall of 2022, and then now it's gotten so much better. And what could that tell us about what Sora might be able to do in even a few months from now?
Starting point is 00:06:12 Well, it has and it hasn't. The prominent idea in the field for the last few years has been sometimes paraphrased as scale is all you need. The idea is if we just make them bigger and bigger, they'll solve all the problems. And I've been arguing against that, saying that although they get better and better in many dimensions, there's still certain basic problems that they have. So here's what I think we will see for Sora itself in two years, because they will keep trying to make it. Although there's not so much headroom given how much data and money I think
Starting point is 00:06:43 they already put into it, but there's surely some headroom. It will continue to improve for a while. But I think that these physical errors are going to continue. Another one was a glass that turns on its side and starts floating in air and liquid falls through the glass. The lack of understanding of everyday physical world we have seen in GPT-2, GPT-3, GPT-4. We've seen it in SORA. I think we will still continue to see it. We might see fewer errors as there's more data they might reduce. But I think inherently this approach is not about representing things in the
Starting point is 00:07:19 world, saying I see an X here, a Y there. It's really about pixels and predicting patterns of pixels over time. And that's why we're getting some of these quirks. And I would actually be surprised if the quirks are entirely eliminated. I had a conversation today on Twitter with somebody about this, and they said, well, I think these errors will reduce by 80% in two years. And my reply was, well, if it reduces by 80%, but you're getting something like one per five seconds of video, that's still actually a lot. That's still enough that for a high production value film, we wouldn't be enough. It might be enough for an advertisement. It might be enough for misinformation campaign where people aren't going to look that close.
Starting point is 00:08:03 I mean, certainly I didn't see some of what you pointed out in those examples, right? Like I didn't see, you know, those glitches in the woman. A lot of people didn't see the four legs on the ant. Yeah. Yeah.
Starting point is 00:08:14 I also didn't see the four legs on the ant. Most people didn't. Yeah. Let's say that is correct, right? And that these models, you hear people say that technology is never going to get worse than it is right now. But let's say it gets only a little bit better, and then it kind of plateaus. It could get more dangerous, but it won't get technically worse. So maybe then, yeah, let's talk about that. Let's talk about how it could be used in the real world, I guess, either for good or for evil. Could we start with good?
Starting point is 00:09:01 Could we start with good? Sure. I mean, I guess the positive use case for Sora is it enables a lot of people to be creative in ways that they couldn't before. So I couldn't make a 30-second video or wouldn't have the patience for it. And now, once it becomes available to the public and assuming it's a reasonable price, lots of people will be able to make short videos and that will be fun. It's going to be empowering in something like the way that Photoshop is. Not exactly because you have more control over Photoshop, but you can do things faster in this system. So it's kind of trade-off. So it will be of some use to creative people.
Starting point is 00:09:41 A great use case might be if you want to make a movie, you used to often draw storyboards one scene at a time. And now even if you don't have drawing skills, you could kind of do that with Sora. They wouldn't be perfect. You would still need to remake them. So, you know, there are limits. But I think as a prototyping tool, it's already, you know, looks like it's pretty good. I haven't actually used it. They haven't released it to the general public, nor to the scientific community in any broad way. But if it's as good as it looks in the demos, then it's a very cool prototyping tool. And, you know, undeniably fun. Yeah, I was thinking, you know, if I had written like a short story, right, I could probably use this tool to try and create a short video
Starting point is 00:10:27 about that story or something like that, right? Even though you say that it has these limitations. If you wanted a little 90 second or let's say 30 second video that was kind of like an advertisement for your story, I think you could easily do that. I think if you wanted a 10 minute video, it would actually turn out to be really frustrating. So like, it would probably make the protagonist look different in each scene, the lighting would vary and so forth. And I suspect, although this is really just speculation from the 12 or so videos that I've seen, I suspect that it would get very frustrating if you wanted to do anything longer than a single clip.
Starting point is 00:11:06 So in many ways, this system is similar to Dali or Dali 3 mid-journey. And those are really good. Which was images, right? Like, yeah, still images. Exactly. They make still images. But people who have played around with them a lot often get frustrated. So for amateurs, again, it's a fantastic thing.
Starting point is 00:11:24 But if you want something precise, we'll have, for example, text that isn't really right, and there's no way to fix that text, or you want it from a slightly different angle, it can be hard to get it to do exactly what you want. So in the near term, anyway, if you write a short story, and you really want to turn that into a movie, I think you're going to be frustrated. You know, if I'm one of these people in these creative industries, in the long term, what might the implications be here? Like, should I be really concerned, in your opinion? It's a double-edged sword.
Starting point is 00:11:59 So one thing we haven't talked about is copyright. And for working artists, this is a real problem. You know, if your living is making concept art to help a movie designer, director figure out what they want, you're in trouble because this stuff can do a bunch of that. If your job is to do set design throughout a film, it's not really going to do that. But to some, you know, Hollywood artists are already threatened.
Starting point is 00:12:24 Some film studios should be really upset. You probably know about the New York Times lawsuit against OpenAI, which showed that OpenAI could essentially plagiarize some of their work. It was obviously trading on their data. Yeah, the language story, just for our listeners, the language model behind the chat GPT had used an enormous amount of New York Times articles to help train it. So that's why they're suing. That's right. And in the lawsuit, they have a hundred examples where things are almost word for word identical over a space of paragraphs
Starting point is 00:12:58 between what chat GPT would do with a prompt that was the first few words of a story, and then it would basically regurgitate the story. So Reed Salathan, who's an artist who's worked with places like Marvell, and I did some experiments in December, which we published in January in the IEEE Spectrum, showing that the visual models do the same thing. So for example, you can say something like, draw me a picture of an Italian plumber, and you're probably going to get Nintendo's Mario character back. Well, Nintendo's not going to like that. So on the one hand, the film studios, I would suspect are going to be quite upset about this. On the other hand, they're like, hmm, can I save money if I use this? Yeah, I was just going to say that.
Starting point is 00:13:33 So I think some of the film studios are hanging back trying to decide what to do. I think a lot of people are watching that New York Times lawsuit. If it actually goes to trial, it could set a huge precedent either way. It could almost shut down the whole AI industry, or well, at least this part of it, the generative AI industry, or it could give them license to use things. Most likely, it'll be a settlement. So I've been joking that 2023 was the year of generative AI, and 2024 is the year of generative AI litigation. There's going to be so many lawsuits filed. In the Dragon's Den, a simple pitch can lead to a life-changing connection. Watch new episodes of Dragon's Den free on CBC Gem.
Starting point is 00:14:27 Brought to you in part by National Angel Capital Organization, empowering Canada's entrepreneurs through angel investment and industry connections. Hi, it's Ramit Sethi here. You may have seen my money show on Netflix. I've been talking about money for 20 years. I've talked to millions of people and I have some startling numbers to share with you. Did you know that of the people I speak to, 50% of them do not know their own household income? That's not a typo, 50%. That's because money is confusing. In my new book and podcast, Money for Couples, I help you and your partner create a financial vision together.
Starting point is 00:15:05 To listen to this podcast, just search for Money for Couples. So I think this is a great spot for us to talk about misinformation because it seems like the big one here. And it's particularly concerning to me as a journalist. It's the potential implications for deep fakes, for misinformations, for videos that look real with real famous people, for example, doing and saying things that they never did. And talk to me a bit about your concerns there, what it could mean for elections, for democracy, for our understanding of what's even real. All of those things. I'm frankly terrified.
Starting point is 00:15:47 of what's even real? All of those things. I'm frankly terrified. So I actually posted about the four-legged ant as an example of this. So there are the obvious cases with elections, right? We've already seen this. It was at least one election that may have turned on a deepfake. This is Michael Szymetszka. He is the leader of the main opposition party here in Slovakia. And on the eve of this country's elections last year, he was the target of a deepfake. Just two days before voting began in that high-stakes election, this audio tape began circulating online. It purported to be a recording of a conversation in which Shemeshka talks about stealing the election. His party, Progressive Slovakia, went on to lose the election by a few points.
Starting point is 00:16:25 Do you think this year is like 70 elections around the globe this year? The technology is improving. People are using it more and more. So there's definitely a serious chance of having impact on elections. You can expect that, you know, in October 24, we'll see, for example, deep fake footage of, you know, one of the U.S. presidential candidates following down the steps in order to make them look infirm. That kind of stuff seems inevitable. And then there's another problem, which I would call the pollution of the information ecosphere, which is scammers like to make fake stuff and monetize it.
Starting point is 00:17:00 And we've already seen this in different ways. Last year, people had fake websites saying that um mayim bialik if i've got her name right was selling cbd gummies well she wasn't but they sold ads off of it they didn't care that it wasn't true um now the new york times had an article yesterday about fake books and some of my friends have been hit by this where they will write a book and then somebody writes a book with a slightly different author name, slightly different title. It takes them like five minutes with ChatGPT. They put it on Amazon, and then the real authors get hurt.
Starting point is 00:17:32 And there are cases where people are putting out books about how to eat mushrooms, and they're probably filled with mistakes because we know that these systems are not accurate. And so there's going to start to be increasing risk that people are going to get bad information of that sort. With the four-legged ant, somebody said to me, well, what's the big deal about, you know, having a video with a four-legged ant? Well, the problem is that we are soon not going to trust any videos because there are going to be so many of these fake videos put out in order to try to make money on YouTube and so forth. And we're going to reach a point where we just have no idea what to believe or not. And also, like politicians could say that something was fake when it wasn't actually fake.
Starting point is 00:18:11 Maybe they actually said that thing and got caught saying it on camera. That's going to be a very common thing, right? Essentially, the evidential value of video is going to drop to zero. There are some efforts, like Adobe is leading a very nice effort is going to drop to zero. There are some efforts like Adobe is leading a very nice effort to try to watermark videos. So if you have the right camera, it will actually give some kind of authentication signal. So there are some ways to deal with this, but I don't think they're enough. I think they're like, you know, fingers and dikes, and there's just going to be an enormous amount of this stuff. And, you know, the only solutions I see that work even a little bit are to have governments demand that any AI generated content is labeled as that.
Starting point is 00:18:51 And maybe we will get laws passed that do that. I think the EU AI Act is somewhat in that direction. It is adopted. Congratulations. The European Union's parliament passing a draft law restricting AI, limiting the use of facial recognition software, and requiring AI companies to disclose more about the data behind their programs. Even that's not going to be enough because a lot of cheaters are going to try to evade whatever protections we have. So it's going to be like gun laws where if you actually catch somebody, you can do something about it, but it doesn't mean that it's literally impossible for somebody to secure a gun, somebody that shouldn't have one. We are going to be in a kind of cat and mouse chase forever on this stuff.
Starting point is 00:19:54 This idea that governments might be able to step in and regulate here and label everything generated by AI so. Even if they took a real run at that, do you even have confidence that they can do that. I keep thinking about that moment several years back when Mark Zuckerberg was in front of, you know, lawmakers and, you know, one of the lawmakers asked him, like, essentially, how does Facebook make its money? We believe that we need to offer
Starting point is 00:20:18 a service that everyone can afford and we're committed to doing that. Well, if so, how do you sustain a business model in which users don't pay for your service? Senator, we run ads. I see.
Starting point is 00:20:32 And you're like, oh, my God, they don't even understand how it works. And so I guess the thing with AI is that it strikes me, though, that the people who made the AI don't even understand how it works. So first of all, nobody fully understands current AI. We can do kind of empirical science to say, in this kind of circumstance, what does this system do? But nobody understands it in the way that we understand a simple set of equations where we can just plug in the numbers. We have to actually run these models, which is expensive.
Starting point is 00:21:07 It takes millions or even billion dollars to train this, and then it takes a lot of money to run it. And you can run experiments on it, like it's an alien from another planet, but it's not like we really fully understand exactly how it works. Nobody does, not at the companies, not in the government, not outside scientists like myself,
Starting point is 00:21:25 which is cause for concern, right? We have well under control the engineering behind bridges and airplanes and so forth. We don't have the engineering well under control around AI. Then there's a sort of milder sense, which is like how many people in the Senate understand the basic notion that you train on a large set of data, that the quality of your results depends on the amount of that data, the quality of the data, even basic stuff like that. And probably, it's a lot less than 100%, but it's a lot higher than zero. I don't know if that gives me... Gives you great hopes?
Starting point is 00:22:01 If I find that reassuring. You should be only slightly reassured. You should have that, well, I guess it could be worse reaction. Yeah. Oh, great. That's always good. Yeah. And I guess also, too, you know, this stuff is coming from all over the world.
Starting point is 00:22:18 So it's hard. How do you kind of regulate a group of people who are operating out of Moscow too, right? There are those problems too. So, you know, the U.S. has to get its own house in order, but then what do you do globally? I have been advocating for some kind of global governance. And there are questions about, like, would any country give up its sovereignty in order to be part of some global thing looking out for the dangers of AI? I think the answer is actually yes. You know, we've done that, for example, for nuclear weapons.
Starting point is 00:22:48 It's hard to negotiate these treaties. They take like decades, not weeks, or a decade, I should say, you know, several years to negotiate treaties like this. But I think every country, I mean, Russia is on its own little axis, so I don't even know what to say about Russia. But China, I've actually talked to a bunch of people from China, including fairly high academics that are in close contact with the government. China, you know, wants the same things that we do, which is they want an orderly universe.
Starting point is 00:23:15 They may have a different definition of that than we do, but they want an orderly universe. They want their citizens to be safe and so forth. So like, you know, China doesn't want its citizens to all be robbed by cyber criminals either, right? Right mean, right. Well, I imagine to that technology out of control could threaten the leadership, the Chinese government, right? Like, every government should actually be worried about what Ian Bremmer calls a technopolar world, which would be a world in which, you know, most of the power is in the hands of a
Starting point is 00:23:45 few unelected tech companies and not the governments themselves. We're already drifting towards that, right? Well, talking about, yeah, talking about those tech companies, you know, what about what they're doing, like the companies that are kind of leading the way here. So OpenAI and Sam Oltman, the CEO of OpenAI, would say that they're putting up guardrails and that they're doing this really carefully. And what do you think? So it's very difficult with current technology to build guardrails that really work. So you have two problems. One is that they can be too restrictive.
Starting point is 00:24:22 So people start complaining that the systems are too politically correct or too lazy and so forth, as they won't engage in perfectly reasonable requests. Like I asked the system, what would be the religion of the first Jewish president? And the guardrail intervened and said, well, it's impossible to tell what the religion of the first Jewish president would. That's just absurd, right? So there you have a guardrail that it's too restrictive, and sometimes they're too loose. So I testified in front of the U.S. Senate last May for my Senate testimony. I made it or had a friend actually make an illustration in which there was a news story purporting that U.S. senators were involved in a conspiracy with space aliens to keep human beings as one planet species. And the guardrails were not enough
Starting point is 00:25:05 to stop that whatsoever. So the truth is that current AI does not really understand humans, does not really understand the world. And so the guardrails are hit or miss. Gary, just to end this conversation, you've been working in AI for decades now. And I know that you have said that this kind of use of AI didn't even occur to you. And I'm just curious to know, what did you imagine might be the uses for this technology? Like, where did you think we would be right now? You know, in some ways we're behind. I thought maybe we would have the Star Trek computer by 2024. And then the other side is I thought it would be used for good, like it would solve medicine. You know, like I thought by now we would have solved Alzheimer's.
Starting point is 00:25:59 Come on. Like, you know, many, many neuroscientists have worked on Alzheimer's, many companies and so forth. We can't seem to get those questions right. You'd think, well, maybe AI could help us with that. But instead, the number one, you know, sexiest application of AI right now is making videos that are kind of like screwing artists whose materials are being used to train them. Like, that's not such a pro-social application. And that's, I didn't go into AI to like, make some money by ripping off artists. That's not why I'm insocial application. And I didn't go into AI to make some money by ripping off artists.
Starting point is 00:26:26 That's not why I'm in this field. Too much of it right now has been about making a quick buck without, I think, enough concern for the ethical consequences of what these systems are used for. Gary, thank you so much for this. That was really eye-opening. Thank you for giving me a chance to say it. And I really appreciate being back. Thanks for having me again. All right. That is all for today. I'm Jamie Poisson. Thanks so much for listening.
Starting point is 00:27:00 Talk to you tomorrow.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.