Limitless: An AI Podcast - OpenAI Released the Sora 2 Video Platform: Here's How it Works

Starting point is 00:00:03 Just yesterday, Open AI released the most cutting edge, brilliant video generation model that has ever existed in history. The videos are unbelievably realistic. They sound real for the first time there is dialogue and you can even copy yourself and clone yourself into this new application as an AI version of you. It's weird, it's creepy, it's a little bizarre, but it's very effective. And the way that they packaged it is really interesting. We're going to get into all that, but just before we do, I want you to kind of introduce everyone. What is SORA to? What is this new software, this new model all about? I think the videos speak for themselves or playing a little recording here from the eye

Starting point is 00:00:41 themselves, which shows some of the videos. So you can see a mixture of different crazy things of like a man riding two horses at once, a dog in space, eating tennis balls, crazy stuff, but also some super realistic stuff as well, Josh. Like, as you said, you could feature yourself or your friends in it, scaling a mountain if you've never climbed before or skating with a camera. above your head, as you're seeing on the screen right now. Just crazy things like this.

Starting point is 00:01:06 So this is OpenAI's latest and greatest text to video model. Now, if that sounds familiar, it's because we've spoken about Google's V-O-3 in the past. This is Open AI's punchback and response to that. And Josh, I wanted to hate it, but it's really, really good. It's impressive, man. It's really impressive. And it's impressive for a few different reasons. First and foremost, the graphics are insane.

Starting point is 00:01:33 They're like super realistic. It reminds me of like, you know, do you remember the Will Smith Spaghetti meme or the test rather? Where we would run it through like the early versions of mid-jurney and it would just look so ridiculous. Now I actually did the spaghetti test myself. It looks super realistic. So the graphics are insane. But Josh, the physics. Have you tested out the physics for this thing?

Starting point is 00:01:57 I actually have a demo that I want to demonstrate here. Look at this. this is not a real person. Yeah, this is one of the coolest parts for me. I think we're going to get into a lot of reasons why most people actually hate this release. But I think one of the reasons why I love this release and one of the things I think this model is kind of disguised as is this really high-end physics engine where it really has this deep understanding of the world around us and as a result is able to generate this unbelievable content. I mean, this is the textbook glass test for people who are listening.

Starting point is 00:02:27 When you pour water in a glass and an arrow is behind it facing one direction, upon the water reflecting against, the arrow, it should flip directions and it passes the glass reflection test. So what this shows is there actually is this very deep understanding of physics. And I think that's, that's one of the biggest driving forces to making this model feel so real. I think as humans were kind of just used to these expectations we have of how the world works. And when they break, that's when the video starts to feel fake. This very much does not break. It looks really good through and through. So in addition to this amazing physics engine, there was one additional feature that was also equally as exciting for me at least, which is audio and dialogue.

Starting point is 00:03:01 EJES, please, walk us through. Yeah, so in the opening trailer that Open Air released it to demonstrate this product that you can see on your screens now if you're watching, it says everything you are about to see and hear was generated by SORA 2. Let's watch a quick clip. One year ago, Sora 1 redefined what was possible with moving images. Today, we're announcing the SORA app. This isn't Sam, by the way.

Starting point is 00:03:28 But it sounds like him. And it looks like, yeah. It's the most powerful imagination engine. So the point is, if you wanted to get a character to speak and say something, you can type up the dialogue. If you wanted sounds and effects that you can hear that matches the environment that you've described or the video that you've generated, it automatically slots in. And I've tried this in a few different ways since I started.

Starting point is 00:03:53 I said, I asked it to put me in a comedy skit. and it added me into a comedy skip where there was an audience laughing, jeering or booing me offstage. I got it to get me to scale a mountain and you could hear like the rocks kind of like crumbling. So the audio is also super cool. But Josh, I would actually say

Starting point is 00:04:12 the craziest feature that I saw or that I think has made this thing go viral is this thing called cameo. Have you heard about it? Cameo is my favorite part. In testing yesterday, after all the time that I spent with it, the one thing that I uniquely

Starting point is 00:04:27 took away was, oh my God, it looks just like me. It looks just like Sam. It looks just like you. I made this like funny collaborative video with you just to check it out. And it looks amazing. And I think this is the first time where a company has done what Google was able to do with Nanobanana, and that's create character continuity, where you can actually inject yourself into the AI content. And it looks good. It's not perfect, but it's close. And I think that's what was so interesting about this release as a whole. And that's what's making it kind of go viral over the last 24 hours is the fact that you can inject yourself into the video itself. And we have some funny examples of this, right, that we're showing here? Yeah. I mean, just to kind of summarize what this does is

Starting point is 00:05:05 you can basically add yourself into any video that you want to create. So you could be the protagonist, you could be the side character or support character. It's whatever you want. But what's cool is you could also feature your friends or anyone that you follow, maybe your heroes or the influences that you'd like. And that's where it gets really crazy. It kind of like, The way I was thinking about this, Josh, is it kind of breaks the barrier of knowing someone. Even if you either know them directly or you don't know them, you can feel closer to that person that you're followed or whatever that might be. So in this example that I have on screen, this guy who has never met Sam Altman in his life decided to create a video where he goes on an adventure with Sam. And it looks super realistically. Sam follows him all around.

Starting point is 00:05:53 He interacts with OpenAI employees, which, again, he's never met. And what that resulted in was Sam retweeting it and saying, ha, Gabriel, this is hilarious. Like, we should hang out at some point in real life, right? So it's just this kind of like weird interaction or medium that I've never seen before. But there's also some questionable examples of this. In this video that I'm showing on screen right now, someone cameoed Sam Altman shoplifting in Target. Right.

Starting point is 00:06:20 Oh, can you please turn the audio on for this? This is so good. Oh yeah, absolutely, yes. I want people to hear. Please, I really need this for SORA inference. This video is too good. That's great. And for those listening, it's Sam Malman.

Starting point is 00:06:33 He's in a Target, and it's like CCCV footage of him stealing a GPU and trying to run out the front door. It looks real. If they, you guys, if they would have led their promo efforts with this video, if they just would have dropped this without context, that would have been amazing because it's so good, it's questionable. We're like, if I were to see this without understanding that they had a new video model, it would take me a second to figure out it's not real.

Starting point is 00:06:51 It looks really good. I think you also touch upon an important point, Josh, which is you found this video funny, right? You know it's AI, but it also looks super realistic. And so you're like, Sam shoplifting and Target, that's something I would never expect to see. And the point is, memes are so viral and Open AI realized that. We're going to talk about a bit more about the social app that they just created, but I think the point around them allowing you to cameo anyone, including yourself in any video, means that they instantly have this viral network effect

Starting point is 00:07:24 where people want to watch the content that is on their feed because it's created by the friends, that people that they know or that they follow, and their friends are doing the same thing. So it has this kind of viral effect where you just kind of want to see more and more content. And Edos, this isn't the first time we've heard the word cameo used in this way. Are you familiar with the cameo platform?

Starting point is 00:07:45 Oh, yeah. So the way it works is for the people who aren't familiar, you pay influencers or famous people or celebrities a certain amount of money and they will record a video of themselves saying something nice to someone. So they'll be like, hey, happy birthday, whoever and like it's normally a funny gift or you'll see funny memes about it. But this is the AI version of that cameo where if you're a celebrity, it's probably very effective to and lucrative even to insert yourself into the platform. Not that you'll make money, but just use your likeness. I mean, I have seen Sam Altman on my timeline more in the last 24 hours than I have in my entire life.

Starting point is 00:08:18 it's funny to see him doing things that are out of character. And if you want publicity, I mean, this is a great opportunity. In the settings, when you create your cameo, your digital avatar, you're able to set the privacy settings. So you could allow someone to either not collaborate with you. Only mutuals can collaborate or openly collaborate. And for the ones that have open collaboration like Sam, it's really fun. I kind of, like I've kind of loved watching it because you see this person who's normally very, very stoic, very proper in his portrayal around the company doing goofy things like stealing GPUs from Target. So yeah, I think cameo is pretty cool. Yeah, and of course there's also the Duma takes, which I've seen a lot of over the last 24 hours, which is like you're stealing someone's IP, you can put them in a precarious situation or spread misinformation, all of that being correct. And I think it's going to come onto OpenAI's shoulders to basically moderate and curate a lot of the content and make sure there is no copyright infringement. Fun fact, actually, Open AI announced when they launched Sauru, too, that they're probably using a lot of copyright material.

Starting point is 00:09:18 And if someone that owns the IP of something that they're seeing, for example, Super Mario wants to sue them, they have the option to opt out. They just need to reach out to them. But very aggressive stance that Sam is taking here. But Josh, the other headline news about this SORATU launch isn't about the video model itself, but how they surface it to users. They created a brand new social media app. This is where things get a little weird because you just, you don't remember just like, two days ago, we were just like, hey, this new meta feature that creates AI content that kind of looks like TikTok, we don't really like. Unfortunately, or fortunately, I guess,

Starting point is 00:09:56 depending who you're asking, Open AI did the same for this release, where in order to access SORA 2, you actually need to download a new app, get a beta code, sign up, and then scroll an algorithmic feed that surfaces these videos that we've been showing. And it is amazing tech, but an interesting way of delivering it. Now, I do want to give Open AI credit. Their advantage is that they almost always close the product loop. So like they had GPT and it kind of crystallized into chat GPT. Now they have video gen and it's kind of crystallizing into SORA. So they're taking the tech and they're doing what Google has kind of failed to do, which just create good products around it to like lock it in its place. But this product seems a little questionable. I think we've kind of notoriously

Starting point is 00:10:37 been against the AI slot. This is an AI exclusive platform. Basically you sign up and you are only allowed to post AI generate videos, whether it be with your face or without, the entire algorithm is just designed to get you to scroll this feed. And we actually have an example of this right now, but there's a cool additional thing that we saw on top of this. I don't know if this is the first time, but one of the earlier times, which is sign in with Chat ChbTHS. I haven't seen this out in the wild just yet. I think they offered it to a few third-party developers, but this is the first instance where we see a really curated sign-up process. For those of you listening, when you log on to the app,

Starting point is 00:11:13 you sign on with chat GBT or Gmail or whatever that might be. And I would say it takes under 90 seconds to sign up. The coolest part is what I would consider a five second facial scan and voice recognition. You're seeing it on screen right now where Josh is looking at a bunch of numbers. He's reading them out. And then it's asking him to direct his head in particular direction.

Starting point is 00:11:35 So it gets all kind of like angles of your face so it knows how to portray you. And after that, you're done. It can basically put you in any video and make it sound like you and look like you. Everything from like you jumping up high on a trampoline to you speaking and it mimicking your lips and accent perfectly. It's pretty insane. So Josh is like retaking it now. I think a lot of people actually retake this scene because it affects how high fidelity it is.

Starting point is 00:12:04 And then once you're in the app, you'll see the screen here where you can basically define where you want your content to be spread. whether it's to only yourself, people that you approve, or everyone. And I've kind of gone rogue, Josh, and I've gone with everyone. But that's the onboarding process, super simple and easy. And they've done the viral thing where I think each person that logs on gets like five invites and you just send it to five more people and then they get five more invites. So I don't know how many users they've taken on board, but it's a pretty slick process. The most amazing part about the onboarding process was how easy it is to clone yourself.

Starting point is 00:12:38 So that process that you saw where you scan your face, it has two purposes. One is to actually verify that it's you. So they're diagnosing the fact that you're actually a real person and you're not trying to clone someone else that isn't you. And then the other is, as you're saying these three numbers, which serve as verification, they also serve as voice identification. And using just three numbers that you say out loud, they are able to generate a pretty accurate version of your voice to then use in the videos. So I think the onboarding process was very clean, very slick, very impressive. Normally when you're feeding AI models data to emulate your voice. You need to give them like quite a bit of words. At least a couple of sentences, this was three

Starting point is 00:13:14 numbers. So whatever magic they're doing, it's working. It works really well. And I guess now we can kind of get into takes, right, of what people are saying about this because it's been a mixed bag of reviews from people all over the internet, right? Before we get into takes, I just want to say I wanted to hate this product, Josh. To your point, we spoke about Meta's version of this that they announced a few days ago. And we were like, this is the end of entertainment. Like everyone's going to read or watch garbage slop and our attention spans are going to dwindle to zero. But after I started using the product, I was like, I can see why I would want to engage with my friends more with this. I can see how this could potentially be a productive thing, a very

Starting point is 00:13:55 creative thing. And I think Justine Moore summarizes the difference between whether this new social media app is competing with meta or whether it's competing with TikTok. She goes, Open AI is building a social network, like the OG Instagram, and not a content network, like TikTok. They're letting users generate video memes starring themselves, their friends and their pets, and it sounds like your feed will be heavily weighted to show content from friends. This feels like a more promising approach. You're not competing against the other video generator players because you're allowing people to create a new type of content. And the videos are inherently more interesting, funny, engaging, because they star the people you know.

Starting point is 00:14:40 And I can't help but agree with this. I don't know whether featuring myself in a video makes me like it more because it's me. Maybe that's egocentric and I need to discuss that with my therapist later. Or maybe it's because it makes it feel more personal and at home and I can share it with friends because they know me. And I think it would be funny to kind of joke about it in some kind of way. Josh, do you agree or disagree with this take? I kind of disagree. I don't think this is sticky.

Starting point is 00:15:06 I don't think this is durable. I think it's interesting because of the novelty, because this is the first time you've been able to do this stuff. As this becomes normalized, as in like 24 hours later, I find myself being decreasingly excited about it. In fact, I haven't even opened up the app today, even though I probably spent like three hours on it yesterday. So I feel the drop off hitting very hard,

Starting point is 00:15:27 the novelty wearing off. I hope that they're able to figure out some sort of durable solution. But at the same time, Sam, he says here, in response to criticism, where the person who he's responding to says, Sam Malman two weeks ago said, we need $7 trillion in 10 gigawatts to cure cancer. And then Sam Malman today is saying, we are launching AI slot videos marketed as personalized ads. So this is, we're getting mixed signals from Sam. And Sam's response to this, which I appreciate the fact that he responded. He said, I get the vibe here, but we do mostly need the capital to build AI that can do science. And for sure,

Starting point is 00:16:00 we are focused on AI with almost all of our research effort. It is also nice to to show people cool new tech and products along the way, make them smile and hopefully make some money given all that compute need. I think he is kind of thinking about this in the sense that he needs to make a product. He wants to try to go viral. They need to raise money. They want more users. And this is a good attempt at that. EJ.S, if you remember the companions from GROC and how viral that went when they launched it, it was a different strategy, but it was a viral strategy in order to get GROC into more people's hands, get more daily active users, get more people paying. And I think this is probably a similar strategy to that, that they're pursuing in parallel with this 10 gigawatts

Starting point is 00:16:39 and $7 trillion scare cancer. Do you have similar takes, different takes? Yeah, I just think it's a necessary evil. I don't, I want to be aware of my biases when it comes to this, because, you know, in the example of GROC companions, I think Elon's an amazing builder. And he's doing so many other cool things, right? He's helping us get to space. He's helping beam 5G anywhere in the world. and many other things. But he's also building, you know, these AI companions that can kind of like take over your attention. And the question that becomes like, why is he doing this?

Starting point is 00:17:13 I think probably part of the reason is, you know, he needs to appease shareholders. He needs to bring in money somehow. And one of the main ways to do it is attention. Attention pays for everything, right? You get a subscription and off you go. But I do think that there is part of Sam Altman, Elon Musk.

Starting point is 00:17:33 and now Zuckerberg launching these slot machines, as people like to call them, that are trying to create a new social media. I mean, like, Josh, do you honestly believe that social media is going to look the same that it does right now in five to ten years' time? I don't know about you, but my answer is no, right? We're going to be in a world where there's going to be a lot of AR, VR, and AI-generated things. We're going to exist in that world.

Starting point is 00:17:57 So the question then becomes, what does that world look like? and what types of content are served to people. I think the other more nefarious take here is we may not be the ultimate audience that they're designing for, Josh. We've spoken a lot about how these big companies that control social media networks have teams specifically engineered around how to hook you on things. I think the nefarious take on this is these Gen Zers and younger generations that are growing up are used to the short form content. They're grown up on it. They don't watch long-form content. I don't know the last time a kid under 15 has said, like, you know, I've watched a movie that's longer than an hour and a half.

Starting point is 00:18:38 Probably not, right? So I think they're capitalizing on this trend. I'm not saying it's good, but I think from a business perspective, they're probably like, this is the social media content that people want to see versus a black and white biopic that is three and a half hours long. Yeah, that sounds about right. I think at worst, this is another TikTok, Instagram, Facebook feed. I think at best this is a really high fidelity physics engine in disguise that is just being wrapped in this wrapper so they can make some money, get some users. But we have some memes and some other takes, right?

Starting point is 00:19:12 What is this on screen here? Yeah, this is the new doom cycle for everyone if they weren't aware. So you start off with meta introducing vibes, an AI video feed. Now you have OpenAI introducing SORA 2, an AI video feed. and it says you are here. And then next it has, Google's probably going to release Vyotube shorts, an AI video feed,

Starting point is 00:19:35 and X finally is going to release Vine 2, an AI video feed. The point being is, it seems like every company is trending towards some kind of new social media feed that capitalizes on AI generated video specifically because it's so easily digestible and can go so viral and can get more users on board,

Starting point is 00:19:53 as we've said. So yeah, I just thought it was a funny take. So some people, People hate it, some people love it, but there are some cool mediums that can be explored. You have a couple examples that you've prepared for us. Can you walk through this first one? Yeah, so I've seen some really creative ways that people have used SORA too. One of these examples, someone basically added a browser rendered HTML code as the prompt, basically adding a bunch of code to create a website.

Starting point is 00:20:19 And the video model ended up not only creating the website, but scrolling through the website and some click-throughs for that user. So kind of shifting the use of this tool from just purely entertainment into something that's quite productive. You can create not just static mock-ups, but real-life mock-ups that you can interact with, which I thought was super cool. This other video, I need to use the sound for this one, was someone exploring whether they could play a copyrighted song through a generated video. And the answer is, hell yes, is the answer. I don't know how Open AI is going to deal with all the... Yeah, I don't know how Open AIA is going to deal with all the lawsuits, which I definitely see coming that way, but I'm glad that he took a bold approach. So it sounds like it's trying to emulate the sound and the vibe and the cadence of the words, but those lyrics are all wrong.

Starting point is 00:21:12 That's not actually the lyrics of the song. So while I recognize the song it's trying to make, it's not the real lyrics. So I wonder how that's going to do. Is it the same tune? Joshua is this something completely different. No, it's the same chords, same melody, same cadence in the singing. It's just the words. The words are all wrong.

Starting point is 00:21:27 So that's just like an interesting observation having known the song that like, hmm, okay, they're close. They're not quite copying it, but like immediately I know what song they were trying to copy. So that's just a funny side note. I think we also have one more example. Yeah, we have one more. Bob Ross, maybe. So we have the style of a classic Bob Ross episode, except this was never filmed and this

Starting point is 00:21:51 dialogue is not real at all. In fact, he's references at the end, the infamous, could 100 men beat a gorilla? Could a thousand men beat a gorilla? It's pretty funny. You can really get creative with it. And I think what we're starting to see today, day two, is people getting creative with it and pressing the boundaries of what's possible. Infringing, not infringing on copyright, but you know, using references that are popular in culture. I've seen a lot of Pokemon examples as well where Pikachu is just everywhere, like infiltrating D-Day, causing havoc, robbing banks, a whole bunch of stuff. So it's been funny to see the examples. But EJ, you had a real-life

Starting point is 00:22:23 example that happened last night, right? Yeah. So yesterday when this released, we got invites and I was super excited to use it. And my girlfriend heard all this noise and was like, you know, what the hell are you excited about? And I showed her some SORA videos and she was like, I need to use this immediately because she leads marketing at a big company and she was like, I could do so much with this tool. And so I sent her an invite and she played around with it for about two to three hours, Josh. And then we went our separate ways because we had to go to different events. And she went to this networking event, which had, I think, 200 of the top CMOs at some crazy brands and companies ranging across fashion, consumer products, all that kind of stuff.

Starting point is 00:23:04 And they went around the table, each explaining about a bit of content that they watched recently that they enjoyed. And because SORA had just released, she was the only one to speak about Sora. And the only way I could describe the reaction was she was inundated with people asking for invite codes and for her to generate video prompts that they had come up with as she explained what they could do. And the point I want to make around this is I think we're underestimating how much this kind of a tool is desired by people that are in marketing, PR or promotional effects. And I think that whilst we view SORA II as purely an entertainment platform, I think there are wider, more enterprise or business-like effects that could end up creating quite a lot

Starting point is 00:23:50 of value for open AI. I mean, you could even use this for sound design. You don't even need to use the video. Create a song. You could copy a song without actually infringing on copyright. There's a lot of utility to the model, which is why I'm like, I'm not sure how to feel because the model is fantastic. It is so good. It's just wrapped up in this kind of like kind of gross looking wrapper, which is the AI slot factory. There's one interesting thing about the application, the SORA app that we didn't mention is that for the first time that I'm aware of, you can actually type to the AI what you want your feed to look like, and then the feed will algorithmically readjust based on your prompt. So they are doing something novel in that sense where you can choose

Starting point is 00:24:30 your own feed, you can kind of curate the experience you want. But whether that experience is something I want to lean into and fully support, TBD. Like, I don't know. I don't know if it's going to be a serious problem because I don't know how long this will last, but I think the actual physical like digital model is fantastic. And I hope that people figure out ways to creatively extract value out of that. It's something cool. It's something novel. But I think they need to introduce a few new sticky loops before this becomes like a

Starting point is 00:25:00 truly viral thing. But that is all on the agenda today. Well, thank you for listening. I hope you guys enjoyed today's episode. As usual, Josh and I, we feel the vibes on things. When there's a new product launch that we get super excited about, especially ones that we can use in real time, we go hammer and tong to give you the best content and update

Starting point is 00:25:19 and our views on it as soon as we can. But you know what's more valuable than our views? Your views. The feedback that we've seen and heard from you guys via comments, likes, DMs, sharing has been invaluable. And I just want to encourage you guys to please keep doing it. The feedback is good. The feedback is bad.

Starting point is 00:25:37 Let us know. DM us and share it with all your friends. And we will see you on the next one.

Limitless: An AI Podcast - OpenAI Released the Sora 2 Video Platform: Here's How it Works

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.