Better Offline - Monologue: Don't Be Scared Of Sora

Starting point is 00:00:00 This is an IHeart podcast. Guaranteed human. Run a business and not thinking about podcasting. Think again. More Americans listen to podcasts than adds supported streaming music from Spotify and Pandora. And as the number one podcaster, IHearts twice as large as the next two combined. Learn how podcasting can help your business. Call 844-844-I-Hart.

Starting point is 00:00:19 Another podcast from some SNL late-night comedy guy, not quite. Unhumor me with Robert Smygel and friends. Me and hilarious guests from Bob Odenkirk to David Letterman, help make you funnier. This week, my guest, SNL's Mikey Day and head writer, Streeter Seidel, help an a cappella band with their between songs banter.

Starting point is 00:00:38 Where does your group perform? We do some retirement homes. Those people are starving for banter. Listen to humor me with Robert Smigel and friends on the IHeart Radio app, Apple Podcasts, or wherever you get your podcasts. Life is full of hurdles. So how do you keep going?

Starting point is 00:00:53 On Hurtle with Emily Abadi, we're talking with the most inspiring women in sports and wellness, from professional athletes, coaches, and Olympic champions about the challenges that shape them and the mindset that keeps them moving forward. At our level, at this scale, being able to fail in front of the entire world.

Starting point is 00:01:09 Like, I can do anything. I can do anything. Listen to Hurtle with Emily Abadi on the IHeart Radio app, Apple Podcasts, or wherever you get your podcasts. Presented by Capital One, founding partner of IHart Women's Sports. Hey, I'm Deanna Maria Riva, and on my new podcast, How Hard Can It Be?

Starting point is 00:01:25 I call on my GenX squad from Ohio to Hollywood as we navigate midlife's most fantastic BS. Unfiltered conversations from night sweats to futas to scheduling sex. Wait, what sex? Is it just me or does every woman my age want to look at Pinterest instead of having sex sometimes? They say we can't polish a turd, but we're sure going to try. So let's get blunt with laughs, tears, or tears of laughter. Listen to How Hard Can It Be with Diana Maria Riva on the IHeart Radio app, Apple Podcasts,

Starting point is 00:01:54 or wherever you get your podcasts. All right, Matt, I've read the YouTube comments, and this time I want it so you do not cut me off with the music too fast, okay? Good, right, right. Let's go. This is this week's better offline monologue, and I'm Ed Zid. A lot of you have been saying you want me to do something about Sora, and if I'm honest, I haven't wanted to because I find the whole thing so utterly pathetic. A few weeks ago, Open AI launched a half-baked social networking app attached to a compute-intensive video and audio generator, and people immediately began to do two things. Free count, and generate as many copyright violations. as humanely possible. All because of OpenAI's original plan was to ask copyright holders to opt out of having their content presented in these videos. SORA spent several days covered in Nazi SpongeBob's

Starting point is 00:02:53 and Piccadjews with guns before multiple Hollywood talent agencies, along with the estate of Martin Luther King Jr., intervened complained, leading to OpenAI, creating, to quote NPR, an opt-in policy allowing all artists, performers and individuals, the right to determine how and whether they can be simulated, with Open AI blocking the generation of well-known characters on its public feed and offering to take down material not in compliance. It's unclear what happened with Nintendo, but I imagine one of their 70 million lawyers attacked. And now we've got that out of the way.

Starting point is 00:03:22 Let's talk about Sora itself. I understand a lot of the people who listen in film and TV, they're kind of scared, and I understand that you've seen a few clips that look kind of sort of realistic and that this, especially of you in the creative arts, is quite terrifying because your mind naturally assumes that these clips can be strung together into some sort of coherent hole. This isn't the case. Every single good, and I use the term loosely, SORA video is

Starting point is 00:03:45 cherry-picked for many, many, many terrible generations. Every time you use Sora is random. It doesn't matter how specific your prompt is, however many times you've used it, Sora is effectively a giant video and audio slot machine. You can never, ever guarantee that Sora will generate something useful, and as a result can never really budget for using it. The human eye is remarkably demanding, and little visual inconsistencies between scenes will make people feel, weird and uncomfortable. Imagine that extrapolated to 10 or 15 seconds at a time and how difficult it will be to get something that makes visual sense before you have to think about things like, does this connect to the rest of the footage I'm using? Okay. So the majority of actual professionals

Starting point is 00:04:24 who would use SORA would not be using the app. They'll be connecting directly to the model on Open AI's API. It's just, it's not done via a classical app interface. Now, then there's the problem of cost. This is where you really need to start worrying if you're building things with SORA. So let's start off with the first problem. Cost. So Open AI offers two different SORA models. SORA 2, which they say is designed for speed and flexibility and is ideal for the exploration phase. And that costs 10 cents per second. And then there's SORA 2 Pro, which is either 30 cents or 50 cents a second depending on resolution. And I quote, it's the thing you go to for production quality outputs. So you're either spending one, three or five dollars for every 10 seconds of footage. And

Starting point is 00:05:09 every generative model, the longer you generate the higher the likelihood of hallucinations, which in the case of sorrow means bizarre animations, inconsistent details or just flat-out useless crap. Then there's the problem of time. OpenAI's own documentation says that a single render may take several minutes. At the end of those several minutes, out pops a video that may or may not be of any use. Open AI allows you to remix using more prompts, which allows some iterative development, but these remixes also cost money and also take several minutes. So let me walk you through a scenario. You're making a short film.

Starting point is 00:05:43 Let's just say it's 15 minutes long, which is 900 seconds. You ask Zora to generate a man putting on a hat. Your first eight generations, each taking four minutes and five dollars a piece, which takes about 32 minutes and $40. They don't really do the job. So you do two more, taking another four minutes a piece and ten more dollars. You finally, on the next try, get something kind of useful, which cost you another $5,

Starting point is 00:06:06 And then you realize you wanted him to wear a specific kind of ham. This happens all the time when directing stuff. There are minor changes you make that you realize when you're finally in the moment would look or sound or be better. So yeah, that doesn't go so well with probabilistic models. So shit, fuck, you've got to do something. So you remix it. Another four minutes, another five dollars. Fuck, wrong hand.

Starting point is 00:06:28 Four minutes, five dollars. Right hat, his hand blends through it for some reason. Okay, four minutes, five dollars. The hat's right. puts it on his eye blinks. One of his eyes just blinks three times for some reason. So you can't really use that. Okay, four minutes, five dollars, looks kind of good, different hat again. Four minutes, five dollars, hmm. You've now spent eighty dollars in over an hour generating a man trying to put on a hat. You're not really much closer to having useful footage, and because, as you remix it again and again, sort of keeps making these little errors, because that's how these models go.

Starting point is 00:07:01 It's impossible to tell whether the next generation will be the one that works or whether Sorra will spit out some new little fuck up. So the more intricate something is, the more expensive it gets. But you know what? You can find money places. You can't find more goddamn time. I guess you could have a separate computer running more, but that's still going to cost a bunch of money. How many of these slot machines are you going to run at once? How many times are you going to allow them to edit? How can you have a coherent vision when you've got multiple people generating things? You can't. But you know what? Perhaps Perhaps the next generation will be great. Or perhaps it will be dogshare.

Starting point is 00:07:37 You have no way to know because that's the magic of generative AI. Yet these problems compound aggressively once you need any kind of visual consistency. The man now has to put the hat on and leave the house. How does the house look? Is the hat the same? Does he have wallpaper on his walls? Is there anyone else in the house? What kind of table?

Starting point is 00:07:55 Two chairs? One chair? Five chairs? How do you possibly keep all of these things consistent? You don't. You can't. That's part of what makes. Sora so goddamn awful. It's built specifically to make you scared of it. To create superficially

Starting point is 00:08:09 impressive clips so that brain-dead Hollywood executives can claim it's the future, yet in a practical sense it's impossible to budget or plan or guarantee anything about what SORA might do, and this is pretty much across the board for these generative models of making video and audio. Now, I've heard from a few people that Sora is cheaper because it doesn't involve labour, which is something you could say only if you believed Sora would give consistent output, And really, the only thing that a probabilistic model like Sorrow can do is guarantee inconsistency. Even by Hollywood accounting standards, a generative tool that will cost hundreds or thousands of dollars to generate 10 seconds of shitty footage that is impossible to coherently connect to more footage is a really terrible idea and also very inconsistent in its costs too. And like I said earlier, there's the issue of time.

Starting point is 00:08:58 Every single entertainment product requires some sort of time budgeting and it's impossible to say how long it will. take Sora to generate something. Open AI doesn't even specify what several minutes means, meaning you can't really plan a production using it. Sora isn't cheaper, Sora isn't easier, and Sora certainly isn't more efficient. But you need to remember also that generative video models have been around for over a year, and they're not really seeing mass use. Now, if this thing were capable of making anything truly useful, you'd see it everywhere right now, but you are seeing a little bit of it, and I do want to address that. You probably saw Kalshys ad and heard that it cost $2,000 to make and took only a few days, but I really encourage

Starting point is 00:09:38 you to look at the actual commercial itself. It's completely incoherent nonsense. Each shot completely disconnected with weird glitches and animations in the crowds. And one point towards the end, a woman is meant to say OKC, but the C part does not map to her mouth? It looks really bad. And the only way you could get away with something like this is having these quick hit shots. And also, please go and view the comments about this, that people just rip the fuck out of this thing. Nevertheless, it was made using V-O-3, Google's generative video model, and it apparently took 300 to 400 clips to get 15 usable shots stitched together using traditional editing tools.

Starting point is 00:10:14 Now, the reason this cost 2 grand is that it sucked, and the reason you're not seeing more advertisers do this is because it's impossible to make a coherent video out of this footage. I realize most commercials you see on TV may feel chaotic or kind of bland, but they're remarkably precise, and the generative shots used for the Kalshi commercial are chaotic and failed to convey any real meaning beyond a person yelling Indiana or OKC. The only reason it cost so little was one guy put several days of prompting it to it,

Starting point is 00:10:42 and the end result was shitty, and Kalshi didn't mind because this was a publicity move. Kaushi put out the commercials specifically so the media would write it up, and they succeeded because the media loves to feed on scary stories like AI is going to replace human actors. Since the Kowshi adds, PJ Ace, who made it, has made a few others, A Popeyes wrap one where, again, go and look at the comments. I'm not linking to it, by the way. I don't want to send them any fucking traffic. But the Popeye's one, people are just responding saying,

Starting point is 00:11:08 this looks like shit, what is this? It's incoherent, it's inconsistent. But the funniest one I found was David Beckham's IM8 health supplement ad, which ends with a shot of the bottle of the product with a bunch of garbled generative texts. It does not appear that PJAs has got a ton more work than this, probably because the outputs kind of suck,

Starting point is 00:11:27 and brands really do not like inconsistent things. And also, a fucking health supplement from David Beckham. Jesus Christ. Just say it's a private equity film. Anyway, to conclude, I also want to be clear that the rates for these videos are heavily subsidized by big tech, just like every other generative AI product. While Sora might cost 30 or 50 cents a second right now, once the AI bubble burst, these prices will either skyrocket or these models will cease to exist

Starting point is 00:11:53 for public consumption. The biggest clue I can give you is that Google only allows you to generate four or five VO3 videos a day on the $250 a month Gemini Ultra plan. That suggests that Google's video costs are brutal and the Open AI is burning money by the bucketful to let you fuck around on the Sora app. I don't recommend you do that, but if you have, just know you're burning a hole in Clammy Sammy's pocket. I will add that you may worry about these models getting better. While they might be more nuanced than their ability to generate video in five or ten second burst, their ability to generate longer or consistent videos is inherently impossible due to the probabilistic nature of transformer-based

Starting point is 00:12:28 models. In simple terms, these things are rolling the dice every time. The way you prompt them is what makes them generate, and they don't have minds or thoughts. They're just rolling the dice every time on whatever you say and trying to interpret what you mean. Human beings, by the way, are extremely magical. I think you really underestimate how amazing people are. When we direct someone on a film set, even like an assistant director, that person keeps the production moving and make sure everyone gets what they need and pushes back on a director when something might be impractical. A director is a visionary, but also an actor is someone that takes interpretation and then is directed to do different things. But that direction is not a fucking prompt. Move your elbow.

Starting point is 00:13:12 Look at this way. Look that way. The things that operate on a film or TV set are inherently different to just plugging words into a fucking model. And I get him. I get everyone in Hollywood who's scared right now. I get everyone in creative. in creative arts even, who is scared right now, I feel for you. These people are losing. These people are losing. This stuff does not work.

Starting point is 00:13:37 It's inconsistent. It's incredibly expensive on subsidized rates. And in the end, I really, really believe that once the bubble pops, these things are going away. Thank you so much for listening. Reach out if you have any thoughts. I always love to hear from people. EZ at betteroffline.com.

Starting point is 00:13:54 I love getting your emails. I love getting your emails. your weird little missives on Reddit. I really, I'm truly blessed and I love you all. I love how many of you listen. I love how communicative you are. It's been a big week with the Anthropic exclusive. And yeah, I'm going to have a radio better offline next week as well. Crap I've got a good do an episode. Shit. Damn. Oh, well, I have the best job in the world anyway. Thank you for listening. Another podcast from some SNL late night comedy guy, not quite. Unhumor me with Robert Smigel and friends.

Starting point is 00:14:34 hilarious guests from Bob Odenkirk to David Letterman help make you funnier. This week, my guest, SNL's Mikey Day and head writer Streeter Seidel, help an a cappella band with their between songs banter. Where does your group perform? We do some retirement homes. Those people are starving for banter. Listen to humor me with Robert Smigel and friends on the IHeart Radio app, Apple Podcasts, or wherever you get your podcasts. Life is full of hurdles. So how do you keep going?

Starting point is 00:15:01 On Hurtle with Emily Abadi, we're talking with the most inspiring women. in sports and wellness from professional athletes, coaches, and Olympic champions about the challenges that shape them and the mindset that keeps them moving forward. At our level, at this scale, being able to fail in front of the entire world, like, I can do anything. I can do anything. Listen to Hurtle with Emily Abadi on the IHeart Radio app, Apple Podcasts, or wherever you get your podcasts.

Starting point is 00:15:26 Presented by Capital One, founding partner of IHeart Women's Sports. Hey, I'm Deanna Maria Riva, and on my new podcast, How Hard Can It Be? I call on my Gen X squad from Ohio to Hollywood as we navigate Midlife's most fantastic BS. Unfiltered conversations from night sweats to futas to scheduling sex. Wait, what sex? Is it just me or does every woman my age want to look at Pinterest instead of having sex sometimes? They say we can't polish a turd, but we're sure going to try. So let's get blunt with laughs, tears, or tears of laughter.

Starting point is 00:15:58 Listen to How Hard Can It Be with Diana Maria Riva on the IHeart Radio app, Apple Podcasts, or wherever you get your podcasts. There are times when the mind becomes a difficult place to live. This is David Eagleman with the Inner Cosmos podcast, and for Mental Health Awareness Month, we'll talk with singer-songwriter Jewel about anxiety. I started living in my car, and then my car got stolen. I was having panic attacks.

Starting point is 00:16:21 I was agoraphobic. This is a month of deeply personal and honest conversations about what happens when the brain goes off course. Listen to Inner Cosmos on the IHeart Radio app, Apple Podcasts, or wherever you get your podcasts. This is an IHeart podcast. Guaranteed human.

Better Offline - Monologue: Don't Be Scared Of Sora

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.