Everyday AI Podcast – An AI and ChatGPT Podcast - EP 211: OpenAI's Sora - The larger impact that no one's talking about

Starting point is 00:00:00 This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live in Adobe Firefly, the all-in-one creative AI studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. I think most people have open AIs, SORA, all wrong.

Starting point is 00:00:52 So if you don't know much about it, Open AI, last week just released a new text video model called SORA. And it is extremely impressive. And everyone has been collectively talking about just that, how impressive the video. is. But there's something larger at work here that I don't really see anyone talking about. We're going to talk about that today and more on Everyday AI. Welcome, what's going on, everyone. My name's Jordan Wilson, and I'm the host of Everyday AI. Everyday AI, it's for you.

Starting point is 00:01:32 It's your daily live stream, podcast, and free daily newsletter helping everyday people learn and leverage generative AI. So if you haven't already, make sure to go to your everyday AI.com and sign up for the free daily newsletter. So every single day, we not only go over what's happening in the world of AI news. We cover kind of new tools, fresh finds from across the internet, but we also break down each and every podcast conversation in pretty great detail by me, a human writing this or other humans, right? So we tell you not just what's going on in the world of AI,

Starting point is 00:02:04 but a topic each and every day on how you can learn and leverage what we're talking about. I tell people it is a free generative AI university. No matter what you care about, you can go on our website. You can go listen and watch to now more than 210 different podcast, episodes, videos, et cetera, and go read every single newsletter we've ever written. So it is, I still don't know any other source, any other single source that has more free generative AI education across every single medium. It's actually wild how much content we have there.

Starting point is 00:02:36 So before we get to that topic on what's actually the larger impact of Open AI's SORA, let's start as we do every single day by going over the AI news. All right, so let's start with Scale AI and the U.S. Department of Defense are joining forces. So Scale AI and the U.S. Department of Defense are partnering to develop a comprehensive test in evaluation framework for the responsible use of large language models within the Department of the Department of Defense. So this framework will allow the DoD to deploy AI safely and accurately in military applications. This partnership between scale AI and the DOD will create benchmark tests tailored specifically to

Starting point is 00:03:22 DOD use cases to measure LLM performance and provide real-time feedback for war use. So pretty big news there worth taking a look at. Next, Google has released. a new AI feature. And just like Gemini Ultra 1.5, this one is also a limited release. So Google is introducing a new feature called Shop with Google AI that allows users to search for products and generate AI-inspired outfit variations in a single click. That's interesting. So this will enhance the user experience by providing personalized and effortless shopping options. And it will obviously be driving a lot more business and growth for retailers,

Starting point is 00:04:05 partnering with Google. So right now, this is just slowly rolling out to a select few users inside the Google search app. So, hey, holler at us if you have access. I kind of want to know, right? Can I use AI to just make myself dress better? That's what it sounds like. All right.

Starting point is 00:04:22 And last but not least, the internet has all at once fallen in love with a large language model that is not new. So this could be a Valentine's Day hangover, but internet users have all seemingly fallen in love with an older large language model named GROC at the same time. So let me just say this out loud. This is GROC with a K or sorry, not the GROC with a K. See it's so confusing. So Twitter has their GROC with a K.

Starting point is 00:04:51 This is GROC with a Q. Okay. So GROC is actually a California-based semiconductor company. So GROC with a Q is a generative AI Solutions company that has developed a unique technology called LPU, which is language processing unit. And this is their interface engine. And it's designed to accelerate the processing of large language models. So the reason why I think everyone is all of a sudden in the last 24 hours sharing about this

Starting point is 00:05:23 on social media is because last week, Brock with a Q won the one a large language model benchmarking competition. So now the internet is all at the same time. I'm going wild over it. And yes, you should check it out. It is very impressive. And essentially by using this kind of LPU language processing unit technology versus the general GPU technique that all other large language models are using, it can generate text-based output

Starting point is 00:05:54 much faster, right? Is the quality there, you know, maybe, you know, it just runs other, it runs other local models, such as, you know, models from meta. So, you know, the quality is really just obviously dependent on what model that you're using, but the speed is wild. So yeah, it's worth checking out. So we'll be sending a link and probably a video of that so you can just see it in our newsletter. So make sure if you haven't already, go to your EverydayAI.com and sign up for that free daily newsletter.

Starting point is 00:06:26 All right. That's the intro. Now, I'm excited for today. I'm very excited for today to talk about OpenAI SORA and, And the larger impact that no one is really talking about. And hey, to our live stream audience, thank you so much. If you're listening to it on the podcast, come join us live. It's a good time.

Starting point is 00:06:43 You know, you can network. That's the good thing. There's a lot of networking happening in our comments here at everyday AI with our daily live stream. So you can come and connect with, you know, Tara, who's joining us here from Nashville and Dr. Harvey Castro, a top voice in AI. Rolando, thanks for joining us, Nancy and Jay Douglas, everyone. I'm excited to have you all on here.

Starting point is 00:07:05 But hey, I want to know right now, what are your thoughts? You know, before I get too deep, and I'm going to tell you right off the bat, all right? So if you're on a limited, you know, time scheduled here and you're like, all right, Jordan, get to the point. I'm going to tell you right away. Don't worry. I'm not going to drag you on for 20 or 30 minutes. But I want to know from our live stream audience, specifically, what do you think of OpenAI's SORA? All right.

Starting point is 00:07:27 So let me just, before we get there, ask one more question. And I want to know from everyone who's joining us live. Give me a yes or a no. Or if you're joining us on the podcast, I always leave in the show notes, you know, in the episode description. You can email me. You can connect with me on LinkedIn. I read all your emails, all your messages.

Starting point is 00:07:44 Sometimes it just takes a minute. But let me know. Should hot take Tuesday? This is hot take Tuesday. Right? Every Tuesday, I come with a hot take, you know, something that's outside of the normal AI news or bringing on guests, which is what we do the rest of the week. Should this be a call-in show?

Starting point is 00:07:58 Right? So the software we used to speak. stream called stream yard. There's, there's a feature where we can kind of have a little waiting room and you can call in, just like this was, you know, an AM radio show, right? And you can call him with your opinion. So shout out to Nisi Adi, who dropped this recommendation yesterday. Should we do that for Tuesday? Should you come on and, you know, you can, you can join and, you know, we'll have someone kind of, quote unquote, take your call, you know, in the waiting room. And you can wait in line to come in and drop your hot take.

Starting point is 00:08:31 Let me know, yes or no. Does that sound fun? Or is it too early in the morning for you to come with some spicy hot takes? I want to know. Let me know, yes or no. Should this be a call-in? Or should this just be a random rant from me? All right.

Starting point is 00:08:49 Woozy says, do it. Old school radio style. I love some just call-in radio. AM, right? All right. Let's get cooking. And it's hot take Tuesday. I promise you we're going to deliver.

Starting point is 00:09:04 So let's just go over the facts and then I'm getting straight to the end point. All right. So SORA is a new text to video model from OpenAI. And it is very impressive. All right. It is very impressive. But let's just go ahead and tell you the three things that you need to know that are about the larger impact. And two of them are things that no one is talking about.

Starting point is 00:09:29 I'm sure maybe someone is somewhere. I can't find anyone. Ready? Here's three things that you need to know. And two, that no one is talking about. So one, Sora is light years ahead of any other text-to-video platform right now. So including runway, Pika Labs, and also Mata and Google have previewed their new text-to-video, right? So Mata's emu video and Google's Lumiere.

Starting point is 00:09:57 All right. It's also worth noting right now out of the five that I just mentioned. So OpenAI Sora, Runway, Gen 2, PICA Labs, 1.0, Metas, Emu, and Google's Lumiere. Only two of them are publicly available right now. So only runway and Pica Labs, right? So Open AI, SORA is not publicly available. However, people do have access. So kind of red teamers are those people putting it through safety precautions in a select

Starting point is 00:10:22 group of visual artists. So yes, it is technically kind of out there, but not really. the only ones publicly available for all of us are runway and peekup. But Open AI SORA is, it's in its own. I would say even to say it's in its own category is doing it injustice. It is in its own sphere. You can't compare, at least early results. You literally cannot compare the quality of these products.

Starting point is 00:10:52 All right. So that's number one. And we're going to talk about that more. Number two, the timing of SORA. With all that's been going on in February at OpenAI means that AGI is near. Artificial general intelligence is closer than we all think. Hot take Tuesday. We're bringing some fire.

Starting point is 00:11:15 And you know I got receipts. Stick around. Right. And then number three, I don't think anyone will catch the combination of Open AI in Microsoft. All right. And I'm going to lay out, you know, a couple scenarios. But I think that they, that SORA and what SORA means specifically as it comes to artificial general intelligence, AGI, right?

Starting point is 00:11:40 If you don't know much about AGI, I'm going to get into it here in a minute. But that's essentially when, you know, when AI systems become smarter than the average human or the smartest humans at general things, right? You already have narrow kind of AGI or narrow AI that's outperforming humans in specific tasks. But AGI is when, okay, when these AI systems are better than most humans at general tasks. I think we're closer than we think. And I don't think anyone right now is able to catch the combination of Open AI and Microsoft, right? Microsoft reportedly owns 49% of OpenAI with a large, many, many billion dollar investment

Starting point is 00:12:24 in the company. I don't think anyone's going to catch them. I really don't. I'm going to lay down a couple scenarios in which it could happen, but it's an unfair advantage right now. All right. So let's get to those three things. I cut them down.

Starting point is 00:12:40 Sores light years ahead, number one, number two, the timing of this means AGI is near. And number three, I don't think anyone will catch Open AI in Microsoft. So let's look at number one. Let's look at number one first, right? And if you're joining us on the podcast, I apologize. You're not going to be able to see this very well. But I'm going to go ahead and I'm going to show some examples. All right.

Starting point is 00:13:01 So make sure to check the show notes. You can come see this. I'm sure other people have done this, but I took four different clips from OpenAI's SORA. Okay. The good thing that I liked in their research paper, Open AI, allows you to download their results, as well as see the prompt that they use to generate these results. But there's a lot more that goes on behind the scenes than just that, right? We don't know how many attempts, right?

Starting point is 00:13:29 Because if you've used something like runway or PICA Labs like I have, sometimes, you know, the 10th attempt is better than the first, right? Or sometimes after 20 attempts, you'll get something you're like, oh, this is much better than the first attempt. So we don't know kind of the process of how Open AI generated this, or it could have just been, yeah, here's literally one prompt, copy and paste, one shot. It could be. All right.

Starting point is 00:13:54 But I'm going to go ahead for our live stream audience anyways. And I'm going to share. All right. I'm going to share this video that I did, took the exact same prompt, downloaded the videos from OpenAI. So we have OpenAI first, and then we have runway gen 2. All right. And I want you all to see it and I want to hear all of your thoughts as well.

Starting point is 00:14:15 I'm going to do some light narration here just to tell you what's going on. All right. So our first one here, we have woolly mammoths walking through the snow. And again, first we have open AI and second, we have runway. So with open AIs, it is fantastic. So here we go. The video is playing. It looks somewhat real, right?

Starting point is 00:14:34 No one knows what a woolly mammoth actually looks like. And then we have runway version two. It doesn't look bad. Runway Gen 2, but you'll see some of these woolly mammoths are kind of walking backwards, some are missing limbs where the one from Open AI is pretty sound, it's pretty solid. All right, and then we have Open AI's SORA model with this kind of astronaut theatrical, you know, wearing these red hats on another planet. The Open AI is fantastic.

Starting point is 00:15:03 Also, I'm going to pause here, and I'm going to say one thing that's worth noting that I should have started at the top of the show is what you can produce, allegedly, right? a minute, a minute of video in Open AI with a single prompt. And you'll see in this example here, this kind of astronaut red hat example from SORA, it actually cuts to multiple shots, which is something that right now you cannot do in runway, Gen 2 without regenerating, right? So you have a minute, right? So open AI, SORA creates a minute of video with splicing in times, multiple shots together.

Starting point is 00:15:36 Sometimes it doesn't. I'm sure there's ways to control that. And with runway, Gen 2 and all the other. other AI video models right now, it's essentially four seconds. You can kind of stitch them together and extend it. You know, there's kind of some workaround so you can get maybe up to 16 seconds, but it's all kind of the same. So open AIs, SORA model great. Runway Gentube didn't do that good, right? It's just a random, random guy here in a motorcycle helmet, not really moving, nothing very theatrical. All right. Then we have our SORA. This is supposed to be a drone shot over a kind of like a gold mining town from

Starting point is 00:16:09 back in the day. And Sora looks pretty nice. It's pretty smooth. It does just that runway, Gen 2. You know, it has a similar look and feel. But if you look at the subjects of the image, it looks like maybe two horses and one person. But if you look kind of the person or the horse morphs into something else at the end, not super cinematic.

Starting point is 00:16:30 Doesn't really seem like a drone shot. All right. And then our last one here, Open AIs, Sora. It is a close up. It is supposed to be a close up of a woman's eye. kind of her stare. The detail on this one is mind-boggling. I wish I had more words this early in the morning to describe this one from Open AIs SORA. But you can see the, oh gosh, I mean, the details of the eyelashes, the details of the skin, the lighting right from the pupil. There is a reflection in the

Starting point is 00:17:03 pupil with even the reflection in the pupil seemingly has correct composition. This is wild. This video is wild. Right. And then in runway, Gen 2, you know, to Runway's credit here, this doesn't look real. But it's actually a pretty decent shot on this last one from runway, right? You don't get the detail, but you do get a woman for her skin tone is very, looks kind of AI generated, I guess. It looks either like very AI generated.

Starting point is 00:17:33 The skin tones are too smooth. But there are some nice shadows. There's some nice lighting going on with. this last one from runway. So it's actually not too bad from runway. All right. But long story short, if you're not going to watch this one, the comparisons are that's apples in basketball, right? We're not comparing apples and apples. We're not comparing apples and banana chips. We're not comparing apples and fruit. Thora is in another sphere. It's not even in the same category, right? I have a little bit of a background. I talk about this some

Starting point is 00:18:09 in in Martec communications, but a lot of times that involved creating videos, right? Created a lot of video in my day. So I have a decent eye, you know, more than, more than the average person I've spent probably more than a thousand hours safely, I would say multiple thousands of hours, either shooting or editing or, you know, video production. It's not close. It's not even fair. How far ahead for a first generation, model, right, for a first generation model, which is what Open AIs SORA is. It should not be this good, right? And that's going to lead us to our second point. You know, even if you want to go back and look when chat GPT came out, right? So when chat

Starting point is 00:18:58 GPT came out, GPD 3.5, the world was shook. I was not impressed, if I'm being honest, because our team had been using the GPT technology for three years. And I thought at the time of its release. If you wanted to use the GPT technology to create text from text, right? So a text to text prompt, I didn't think chat GPT was even a top two option. At the time, I thought what was then called Jarvis, what is now called Jasper was better. I thought copy AI was better. So even open AI, right, which they're known for their now, I think they're really well known for chat GPT and kind of text to text and multimodal with text. It wasn't even a, I don't think a top use of its own technology at the time, right? The same thing. When Dali, you know, if you look at Dahl E3,

Starting point is 00:19:46 Dahl E3, I'm being honest, is not, I would say, in the top three. Maybe it's in the top three, right? But Mid Journey is so far ahead of everyone else. Then you have Stabilities, you know, image generation. You have Leonardo, you have some other great products. Is Dali top three? I don't know, maybe. Maybe they're number three or number four, but they're not number one. Right. So when you look at Open A.I.I. first iteration or first attempt at something, it's usually not mind-boggling. The Sora model is mind-boggling. I don't understand. What this means for the future of creative, we're going to have another show on that because

Starting point is 00:20:27 that's not what this is about. That's what everyone else is talking about. We'll talk about this. Let me just say this. It is going to be extremely difficult for the average human to understand what is real and what's not, right? Because we always thought, oh, okay, well, you know, on. photos, you know, if you look at a photo long enough, I think it's actually kind of easier on

Starting point is 00:20:46 photos because it's still, you can inspect it, right? You can look at the fingers. Oh, are there six fingers, right? Or, oh, is the arm bent in a weird way? With video, I mean, some of this video looks so smooth, right? And how your brain processes things, it is much harder. If the quality passes a bar, if it's a yes or no, if the quality passes a bar, it is much harder to detect AI video than it is in AI images. And because we've had this long period over the last probably year now, where AI images have been good enough to almost pass for real. So, you know, people out there on social media, your average consumer, et cetera, we've,

Starting point is 00:21:27 we've been exposed to this in a large scale. And we've had time for our brains to kind of rewire things and to first see something and be like, oh, okay, is that AI image? Is that an AI image? Maybe. There's no warning sign. There's no ramp up here with Open AI. It is with SORA.

Starting point is 00:21:44 It is entirely too powerful. Yes, there's still some instances where, you know, you can clearly tell, right? But if you can, in one prompt, create a minute video and maybe you generate it two or three times, I can guarantee the, at least from samples we're seeing so far, the majority of that one minute, if you cut it up, no one's going to be able to tell. This is coming from someone again. I've spent thousands of hours in video photo production. You're not going to be able to tell.

Starting point is 00:22:15 And this is the first model. My gosh. Wild times, y'all. Yeah, what Juan is saying here, Juan, thanks for joining us. Juan says, wow, that's a huge difference between SORA and runway, Gen 2. Definitely a lot more realistic for Sora. Yeah, absolutely. Yeah, mind-blown emoji.

Starting point is 00:22:34 A lot of mind-blown emoji. If you're listening on the podcast, make sure to check the show notes today, the episode description. We'll leave a link. You got to watch this side-by-side comparison. All right. It's pretty telling. Hey, here we go. Nancy, am I paying you today?

Starting point is 00:22:52 Nancy says something, she says, this will have to be a premium add-on because tokens. Yes. All right. We're going to get to tokens and compute because that's a big piece. So let's now talk about number two. which I think is the hottest topic for today's hot top, hot take Tuesday. Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience.

Starting point is 00:23:26 Meet Firefly AI Assistant, now live in the Adobe Firefly app, the All In One Creative AI Studio. Powered by Adobe's Creative Agent, Firefly AI Assistant lets you start with your vision, just describe what you want, and shape the outcome as it takes form with the assistant. The assistant orchestrates multi-step workflows, drawing on 60 plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for common creative tasks, like batch editing photos, creating mood boards, portrait retouching, and creating social variations. Every step the assistant takes is visible, so you can refer. find, redirect, or take over at any time.

Starting point is 00:24:15 You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.adobie.com. I didn't ask for the normal flame emojis, but I saw some. Tara wanted me to burn this down. We'll try to keep it real. That's what we do here. But we bring receipts.

Starting point is 00:24:42 All right. So Sora signals that Open AI is. close to AGI or maybe it has already achieved it. All right. Y'all, I don't say this lightly. Don't worry. I bring receipts. I bring a paper trail.

Starting point is 00:25:00 The fact that this is a first model and for some of the facts that I'm about to lay out, we have to wonder, are we at that point of AGI? If you read a lot about AGI like I do or maybe you don't, a common train of thought around artificial general intelligence. And again, and I'm oversimplifying it here because our audience is for the everyday people, right? We're not talking to, you know, an entire audience of people who, you know, build machine learning applications on a daily basis. So I'm trying to make this simple to understand. Essentially, artificial general intelligence is different than artificial intelligence. So AGI, this is what Open AI has been openly working toward

Starting point is 00:25:42 for years, right? Meta and Mark Zuckerberg have recently said that they are openly working toward AGI, right? So AGI is kind of scary. It's unknown, right? And that's essentially what happens to oversimplify things. It's when the AI becomes smarter than us across the board. It's when the AI doesn't necessarily need us. It's when the AI can fix itself and improve itself, right? Essentially is when it displays intelligence across the general kind of general field of expertise or a general field of skills, right? So you have something that's called kind of like narrow artificial intelligence in general, right? So narrow is back, or sorry, kind of more skill set based, right? And in those instances, and I'm going to have a chart here from

Starting point is 00:26:28 DeepMine, a very famous chart at the end to explain it. But essentially, in certain tasks, obviously, AI is already way outperforming humans. But when we talk about artificial general intelligence, AGI, that's when AI can outperform humans, the average human, on a variety of tests, all at once, not 10 specific individual tests, but one AI system, can it outperform the average human on general tests? Let's get to receipts here. Let's look at the timeline. The last month, and I've been talking about this a lot.

Starting point is 00:27:11 if you follow the show every day, first of all, thank you. I appreciate you guys. Second of all, you'll know I've kind of hinted at this before. The writing's been on the wall for a while now, and I saved it for a hot take Tuesday. All right. But let's just take a look in the last month. So Sam Altman has talked multiple times in multiple interviews about the next version of GPT, you know, presumably GPD5 or, you know, who knows, we might see a 4.5 first. But the next version of GPD4 or GPD 5, Sam Altman has talked multiple times about the increased ability to reason, which leads to AGI implications, right? We've talked about agents here on the show, right?

Starting point is 00:27:58 We've known for a long time that OpenAI has been working on agents, but we just saw the first kind of official reports, I believe it was from the information. We'll link it in the newsletter. but we saw the first official report on Open AIs agents, right? And I talk about it here on the show, essentially two different kinds, one that can control your device, whether that's your computer or your phone, we'll see, and another agent that can perform actions on your behalf, right? So it can perform actions on a website or an app.

Starting point is 00:28:30 All right? You see how that leads to AGI? Yeah. You can't have AGI without a system being. able to perform like a human. All right. So it has to be able to control the device. And it has to be able to perform actions on your behalf.

Starting point is 00:28:46 And it has to have a better model. Right. Okay. So we've crossed those two things out. Number one, there's a new model. Number two, it can perform actions like a human. Number three, compute. Compute.

Starting point is 00:29:01 Sam Altman has, you know, it's been rumored and widely reported. And he even joked about it on Twitter. I still can't call Twitter axiol. It just sounds weird. So Sam Altman joked, but, you know, it's been reported. He's trying to raise $7 trillion, you know, for essentially compute, for energy, for these GPU chips or for who knows, whatever a chip is after a GPU. He's really doubling, tripling, 10xing down on compute, on chips, on powering generative AI. All right?

Starting point is 00:29:39 So combine those three things. Next model being better at reasoning. Number one, number two, open AI's agents. Number three, raising $7 trillion for more compute. And then last but not least, open AI previewing SORA. All right. So I get what you might be saying. Hey, Jordan, you're just a nerd.

Starting point is 00:30:02 Sora has nothing to do with AGI. It's video. No, it's not. No, it's not. That's topical. This is not video. Is it video on the surface? Yes.

Starting point is 00:30:21 You have to understand what that means. All right? Let's unwrap that, shall we? Because I think most people are so blown away by the output. They're not looking beneath the surface. And y'all, guess what? It is literally right in front of us. All right.

Starting point is 00:30:44 So Open AI released quite a few things all at once when this SORA announcement came out. We've now seen that there's been a small team working on this for more than a year. All right. But reportedly a lot of internal employees at OpenAI just found out about this right before it's released last week. So there's been a small team going stealth on this, investing presumably a lot of time, energy, and resources into this project. So most people just stopped and they only look.

Starting point is 00:31:13 looked at the Open AI page. And they looked at all the videos, right? I was talking about those videos they downloaded. But there was something, I wouldn't say hidden, but you had to really care to read the research paper. It was a completely separate piece, right? Everyone just went and, you know, oh, you know, all our billy boys out there that are just trying to, you know, trick you into, you know, reading their newsletter or whatever that's

Starting point is 00:31:40 written by AI and they're just selling you crap products. They didn't care to look at this. All right, we do. I'm sure someone else on the internet has, but, you know, this isn't where the conversation is. But you gotta read the research paper, y'all. I think I read the research paper before I even downloaded any of the videos. And I said, okay, here's our hot take Tuesday.

Starting point is 00:32:02 All right, so let me just, let me just read excerpts of this. Okay, so in the research paper, which I recommend, you all read, It doesn't take a lot. It says video generation models as world simulators. All right. And then I'm going to read the last sentence in one more sentence of this little excerpt from the OpenAI research paper on SORA. So they said, our results suggest that scaling video generation models is a promising

Starting point is 00:32:37 path towards building general purpose simulators of the physical world. One more sentence. SORA serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI. Let me highlight those things one more time in case you're not watching them and you can see them highlighted on my screen. Open AI is calling this world simulators, building general purpose simulators of the physical world. And they're saying SORA serves as a foundation model for models, sorry, SORA serves as a foundation for models that can understand and simulate the real world. A capability, we believe, will be an important milestone for achieving AGI.

Starting point is 00:33:33 Okay. Guess what, y'all? No other Open AI releases were this direct about their correlation to AGI. AGI. Okay. Not the new version of GPT4 Turbo, not custom GPs, not the new memory feature, right, that was released last week, not Dahl Lee. None of these things.

Starting point is 00:33:58 None of these things did OpenAI draw this strong of a correlation between a new product or a new release in AGI? All right. Let's continue to unpack and talk about this. So Sora is a world simulator. I'm going to try to remember this. And, you know, actually shout out. I mean, you do have to give a shout out to runway here as well, right?

Starting point is 00:34:26 Because runway quite a while ago, this is many months ago, they kind of introduced what they called general world models. And I do believe that in a similar vein, Sora is kind of positioning or sorry, Open AI is kind of positioning SORA as such. a world simulator. So this is more about understanding how the world works. So we're talking about real world intelligence in three-dimensional physics.

Starting point is 00:34:55 Okay? This is not, will it mainly just be used for looking at this cool video I can create with text? Yes. But Open AI is using it for a much larger purpose. And we're going to get a little bit into the technical aspect here as well.

Starting point is 00:35:11 But in short, This is helping Open AI understand and predict the real world. Let me say that one more time and tie it back to AGI. This helps SORA helps. Open AI understand and predict the real world. What does AGI need? It needs to be able to understand how the world works. Physically, spatially, physics, relationships between things, etc.

Starting point is 00:35:40 You can't do that right now. Can't do that with SORA? Maybe that's what it seems like where we're going. All right. So again, let's go into more receipts. All right. Let's look at again, Sora's research paper for an example on what this real world understanding is. All right.

Starting point is 00:36:01 So if you're joining on the podcast, I'm going to do my best here. But Open AI shared three different examples, three different videos. in applying different tiers of compute. Again, if you're a machine learning researcher, I mean, yeah, you can email me and tell me how I'm wrong. Again, when I'm talking here on everyday AI, I try to oversimplify things, okay, so people can understand that. But essentially, think of computes as layers of technology that is applied to something.

Starting point is 00:36:34 Okay? So in this example, there's three different videos of a cute little dog playing in a snow, playing in snow, right? So your base level of compute, right? Let's just say if you apply this technology behind SORA on one layer, you can't really tell. You can't really tell. Is it a dog? Is it the snow morphing into something? You can't tell.

Starting point is 00:36:57 It doesn't even look like a dog. You can barely tell anything on a base level of compute. All right. Then you're saying, then they're showing an example of four X. compute, right? So let's just say, so we can easily visualize it, the technology behind SORA, let's just say you stack it times four, or you run it through it four times, or you apply four times the compute, four times the technology toward it. At this point, in the middle here, you have what, you know, looks like a dog, looks like an AI generated dog. You know, this is kind of

Starting point is 00:37:28 what we kind of have now, I think, with the current models. You know, this, you know, you can go look at it, that video looks a little closer. Maybe Runway and PICA are a little better, but it looks a little closer to our current day text of video. So who knows what PICA has cooked up for 2.0 or what runway has cooked up for Gen 3. But at least right now, and when you look at 32x compute, that is what is on the right on my screen, or if you go look at those three videos, that's what it is.

Starting point is 00:37:59 So now think of applying that technology 32 times or applying 32 times the compute power behind Open AI SOR technology. Then you get a video that looks pretty real, right? We'll share this one in the newsletter as well, but there's a very famous, you know, famous video. One of the first AI videos that kind of went viral on the internet was Will Smith eating spaghetti, right? And Will Smith just keeps on morphing and, you know, even in every single frame, everything changes. You know, the spaghetti is morphing with his face and his mouth is morphing with his eyes, right? Like, it's a mess, right? And that's kind of on the left hand side here.

Starting point is 00:38:44 That's when you look at something like a base compute or a 4x compute in this one specific example. Right. But this 32x compute, it looks like real life. But again, think this is not just for. allowing us all to generate hyper-personalized, realistic-looking one-minute videos. This model, and as we use it, right, and as millions of people use it, right, when you use a model, people don't know this. Sometimes you get a split screen and it says, which one's better? You know, there's always a thumbs up or a thumbs down, right? We are training these models.

Starting point is 00:39:22 So we are telling OpenAI and we are telling SORA, this is how the world is. works. This is not correct. This is correct. And as millions of people use this technology, Open AI gains a better understanding of the world beyond what models currently can comprehend. And it starts with more compute.

Starting point is 00:39:49 We need more compute. Here's another thing to keep in mind. Again, I'm speaking in generalities here. But generally, current large language models, not just chat GPT, but most large language models, they try to use as few tokens as possible while still giving a good answer, right? But it could use if it wanted to, right? And people, you know, people much smarter than me, you know, have examples of this online. You can get much better responses from current large language models if you can fine tune them to not care about tokens,

Starting point is 00:40:23 to not care about memory, right? So the reason why all the big companies do this, It makes sense, right? Because you have to balance cost with quality, right? But what I'm saying is current models and future models are obviously capable of much, much more. What we get out of them on a daily basis, right, for most of us, for everyday use, for everyday people, right? I'm not talking about, you know, your LLM hackers out there and your machine learning

Starting point is 00:40:55 experts, but for the majority of us, what we see out of large language models is a balance that does not show us the full capabilities of models. These models have to balance compute. They have to balance cost with quality. They're capable of obviously much, much more. So, you know, earlier, we had a great comment from Nancy, right? She said something about, you know, tokens and cost. And yes, this will be, this will have to be a premium add-on, I would assume, right? I would assume, but I also wouldn't be surprised if this does get released. And Open AI is just for a while taking a bath on this. It's going to be very expensive.

Starting point is 00:41:37 Even if we have to pay $20 a month, additionally to use this and we get, you know, a couple generations an hour, maybe we get one or two. I would assume that Open AI will still be losing money on this. I will assume that they will not care early on because they want us all to use it. They want us all to help train their model. You know, we might get two different variations and you might choose which one's better. They want to see those thumbs up and thumbs down. They need real world humans to train a model that is for the future, for real world

Starting point is 00:42:10 EGI. Right? We have to talk about compute for today versus compute for tomorrow. Have any of you, right, like we talk about on the show all the time on everyday AI. We always talk about, oh, you know, more companies and Nvidia. and, you know, Microsoft creating their own chips, Amazon creating their own chips, you know, Sam Altman trying to raise $7 trillion, which is an assonine amount, right? Why?

Starting point is 00:42:42 Because all of us right now, are we suffering from this lack of compute power? No, we're not. We can go on and use these models, right? Pretty much, yeah, there's limits, there's throttling, there's caps. But we can use them essentially fairly well. we're not crippled by today's lack of compute. This is for tomorrow. This is for AGI, right?

Starting point is 00:43:05 There's a reason. The two leading voices in the push for AGI. Obviously, it's Sam Altman and Open AI so far ahead of everyone else in terms of they've been on the AGI kind of bandwagon for longer. And recently, you know, Mark Zuckerberg has become more vocal. So, you know, you have Zuckerberg and Mata being very vocal and saying, yeah, we're investing billions of dollars in chips, right? Here's Sam Altman who says, okay, I see your billions.

Starting point is 00:43:30 I say seven trillion or joking around on Twitter, eight trillion. It's compute for tomorrow. So let's combine all of that. I know we're still on point two here. We're going to wrap up. Point three is pretty straightforward. But combine all of that, everything that we just talked about with AGI. And a lot of people don't understand what Open AI can currently do with its different

Starting point is 00:43:55 products, right? Can't use them all in the same interface. A lot of them you can. Ready? I'm going to run them down. You have GPT vision so OpenAI can see. You have GPT voice. Open AI can talk.

Starting point is 00:44:12 You have whisper. Open AI can listen. Jukebox, which they've been sitting on for years, and it's actually pretty impressive. Open AI can make music and sing. Data analysis. Open AI can write code and understand data. Just the GPT4 technology.

Starting point is 00:44:37 Open AI can understand. That is your general use case. Then you have SORA where it can understand relationships in the real world now. And then in the blue, you have two things in the future, right? Which I would presume are in the works. You have agents which can perform tasks. And you have that $7 trillion of compute, which is essentially unlimited resources. Right.

Starting point is 00:45:07 Yeah, that could take many months or multiple years to achieve. Are you getting it now? Are you understanding what's happening? Yes, SORA is text video, but Open AI told us. Open AI told us just most people don't want to read. This is a step toward AGI. Y'all, you not see it. Receipts on the board.

Starting point is 00:45:41 So right now, Open AI can see. It can talk. It can listen and understand voices. It can sing and create music. It can write code and analyze data. It can actually understand in future. models will be better with human reasoning. It is starting to understand relationships and it will soon reportedly be able to perform tasks like a human. You wanted the hot take. This is a mild take,

Starting point is 00:46:13 right? Because I don't want to be, you know, too much hyperbole. I don't want to be, you know, too much sensationalized click-bait. The writing is literally on my wall right here. And they are telling us, they are telling us if you bother to read the research paper and to look at past whatever is going viral on Twitter or LinkedIn, you will see. This is about AGI. Reference this earlier. So it's important to, you know, this Google deep mine a couple months ago. And also, if you don't know much about Google deep mine, I'd say they are by far.

Starting point is 00:46:56 The leading AI research team in the world. All right. So they released a couple months ago. this little chart, I'm not going to spend a lot of time on it. I'm going to give you a high level overview. But essentially, it's different levels of AI and different levels of AGI. So you have narrow, which are certain tasks, right? And right now, AI can already perform when you look at specific tasks better than any human, right? You have your level two, which is, you know, average. You have your level three, which is expert, let's just say 90th percentile. And then you have your level four,

Starting point is 00:47:28 you're virtuoso, which is 99%, which is better than almost anyone. So on narrow tasks, individual tasks, AI already wipes the floor, right? And there's already all these studies that's already been done. General is a different story. That's what all these big companies are working toward. AI, artificial general intelligence, AGI, that's just better at everything than humans. Are we there yet? Maybe.

Starting point is 00:47:53 Maybe not. Probably we're close. Will we know when we get there? not necessarily could be happening before our very eyes. People who follow AI and have been following it for much longer than I have. And AGI say it's just going to happen. Right. There's not going to be a warning.

Starting point is 00:48:08 All of a sudden, it's going to be, oh, okay. Yeah, we have AGI. Right. So level one emerging AGI, it's already there. Right. So that's, is AI better than essentially unskilled humans, right? There's not a nice way to say that, but, you know, people who are unskilled, uneducated. is AI in general better than those people yet.

Starting point is 00:48:31 But you don't really start talking about AGI until you get to level two, which is competent, which is when at general tasks, AI is better than the average human. Are we there today? No. Could we be there soon? Yes.

Starting point is 00:48:45 You know, these predictions, go back and look, five years ago. Five years ago, I'll try to find these studies. I have so many studies floating around in my head. Five years ago,

Starting point is 00:48:54 I believe they said, oh, we'll have AGI by 2060. All right. In 2019, they said 2060. You know, they said we were decades out. Today, they're saying, oh, it could be a year. It could be a year or less, you know? Because all of these, all of this new technology in this race right now, people did not, five

Starting point is 00:49:19 years ago understand how important generative AI would be to the U.S. economy. They did not understand that every. Every single of the largest, like, companies in the U.S. are investing billions of dollars into AI. We did not foresee that five years ago. Most people did not. That is why the developments are coming way faster than even the smartest researchers five, six, seven years ago could have ever predicted. They said decades. And then three years ago, they said, oh, maybe a decade.

Starting point is 00:49:48 And now today, people are saying, ah, it might be a year. It could be as quick as a year. It's enough on AGI. Let's get to point three. I don't think anyone will catch Open AI in Microsoft. It'll take what I'm calling a double acquisition to get close. And if you did listen to my 2024 bold predictions, which, hey, you never would have, you never would have thought this.

Starting point is 00:50:14 A show from two months ago, so many of them have already come true. People were like, no, Jordan, these won't come true. Anyways, I said, two or three months ago, I said there is going to be a very large acquisition in 2024, for an acquisition that most people aren't expecting. And here's kind of the rationale or the reasoning behind it. Companies are now, like now what you see with SORA and when the general public understands that SORA is about more than video, it's about more than text of video. Once the rest of the world and the tech world starts to understand that, you're going to

Starting point is 00:50:47 see pressure. Big companies and their stocks, once they're the analysts and the investors and the general public understands what this means from Open AI and Microsoft again. Microsoft owns 49% of Open AI. So you've got to talk about them in tandem. Once the rest of the world catches up, whether that's in two weeks, two months or two years, an acquisition is the only way out for these other companies, period. Whether that happens, like I said, tomorrow, who knows, maybe an acquisition is close. Maybe it's going to still be a couple months, but a big acquisition is going to happen. There is no other way around it in 2024.

Starting point is 00:51:23 All right. So I'm going to categorize these. We have our tech titans and what I call startups that can burst or startups that can ignite. I will say highly flammable startups, right? Startups that once they combine with a tech titan, they can do something big. So our tech titans, we essentially have Amazon, meta, alphabet, which is Google, Apple, and Invidia.

Starting point is 00:51:45 Right. So aside from Microsoft, I'd say, and you know, you can throw Tesla in there. as well, but they're kind of competing in a different space. So you have Amazon, Meta, Google, Apple, and Vivida. And then you have your startups that can burst or your startups that can ignite. You have Anthropic, you have Mid Journey, you have Cohere, you have stability AI, you have hugging face, you have runway, you have PICA. Yeah, there's probably one or two more.

Starting point is 00:52:12 You might be able to throw in there. But I'm not talking, you know, companies that are just like rappers, like GPT rappers. I'm not talking about this. Yeah, I know there's companies that are valued at billions of dollars that are essentially GPT wrappers. I'm talking about companies that have unique technology that they've built in-house, right? So, entropic, mid-jorney, cohere, stability, AI, hugging face, runway peek-up. Yeah, there's probably one or two more.

Starting point is 00:52:38 It is going to take either two tech titans, combining forces. You know, I think you also have to throw in probably, you know, IBM in the, you know, IBM in there, you know, some of those more that are in hardware as well, right? But then I think they're, you're going to have to acquire multiple, right? If you're Amazon, you might have to acquire Anthropic and Mid-Journey. They've already invested heavily into Anthropic, right? If you're meta, I would, I would keep my eyes on meta, right? Meta, I think from a multimodality standpoint is the one closest to where Open AI will be soon. if that makes sense, right?

Starting point is 00:53:24 That might have to acquire, you know, as an example, a hugging face and a runway. Or, you know, Apple might have to acquire a PICA and a stability AI. I don't know. But it is going to take either multiple tech titans combining forces, which I don't see that happening. You know, I should have thrown IBM and some others in the mix here. Or it will take one of these tech titans acquiring multiple. igniting startups to compete with the combination of OpenAI and Microsoft. They are so far.

Starting point is 00:53:58 They are so far ahead of everyone else. Invidia, not, you know, I guess Nvidia is also kind of in its own category, right? Because, yeah, they just released their chat with RTX, but technically, you know, most of these big tech titans are clients or they pay Nvidia for their chips. So I know Nvidia is kind of on the outside. Tesla is kind of on the outside. IBM's kind of on the outside. But even specifically, if we're looking at Amazon, Meta, Google, Apple, they're going to have to acquire multiple of these companies, I think, to compete. Because, geez, look at Gemini 1.5.

Starting point is 00:54:38 I'm not super impressed. Look at these video models, right? even that meta and Google have previewed. Oh, gosh, I feel bad for those very smart people who have spent a lot of time building what are very impressive models. And then you see Sora, right? It's in its own hemisphere. It is in its own stratosphere.

Starting point is 00:55:03 It is not close. I want to hear from you. That's all I got for today. It is about so much more to recap. Right? It is about so much more than just text to video. You have to look at the bigger picture. All right.

Starting point is 00:55:26 Number one, SORA is light years ahead of everyone else with text to video. So if you're just looking at that at face value. Number two, the timing. You've got to look at the timing. That has to mean AGI is much closer than we think. And number three, I don't think anyone right now without doing something drastic. and catch Open AI in Microsoft. I hope this was helpful, y'all.

Starting point is 00:55:48 We spent so much time doing this. People always ask me, hey, Jordan, how can we support everyday AI? Well, we're going to be officially launching some consulting services soon. But right now, hey, if this was helpful, please share this episode. You know, if you're listening on LinkedIn, repost it.

Starting point is 00:56:04 If you're on Twitter, retweet it or re-exit or whatever that's called. Share this with friends. Tag them in the comments here. You know, text them, talk to them about it. we are trying to be your best friend in AI. We are trying to cut through the noise, cut through the Billy Boys. They're just trying to make a buck off you. They're just trying to lead you down the wrong road.

Starting point is 00:56:23 We bring receipts. We bring you the facts. Yeah, every Tuesday, we spice in some hot takes and we give you some opinions. But I think that generative AI education is essential for all of us to grow our companies and to grow our careers. I'd appreciate it if you let others know. And join us tomorrow. Your voice and your contacts.

Starting point is 00:56:43 to scale a content engine with AI. I'm excited for tomorrow's conversation. I'm excited. Thank you for joining us. Make sure to go to your everyday AI.com. This is going to be a big newsletter. So make sure you check it out. Thank you for joining us.

Starting point is 00:56:55 We'll see you back tomorrow and every day for more. Every day. AI. Thanks y'all. Meet Firefly AI assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest,

Starting point is 00:57:15 orchestrating multi-step workflows across Adobe Creative Cloud apps including Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating.

Starting point is 00:57:48 It helps keep us going. For a little more AI magic, visit Your EverydayAI.com. and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

Everyday AI Podcast – An AI and ChatGPT Podcast - EP 211: OpenAI's Sora - The larger impact that no one's talking about

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.