Limitless Podcast - Grok-4 Is Now The Smartest AI Model In The World | Everything You Need To Know

Episode Date: July 10, 2025

Grok-4 just leap-frogged every frontier model, topping AGI and coding benchmarks, crushing PhDs across disciplines, and even selling vending machines better than humans, all barely 28 months ...after XAI was born. We unpack the launch event, new voice mode, gaming-first tools, Grok-Heavy’s multi-agent powerhouse, and why Elon’s Tesla integration could make AI your on-the-go co-pilot. Stick around for what this acceleration means as GPT-5, Claude 4.5, and Gemini 3.0 line up next.-----💫 LIMITLESS | SUBSCRIBE & FOLLOWhttps://limitless.bankless.com/https://x.com/LimitlessFT-----TIMESTAMPS00:00 Grok4 Is Now The Smartest AI In The World05:19 How Did They Do It?08:00 Humanities Last Exam13:12 The AGI Test18:50 Grok Gaming23:12 Video Generation26:01 Grok Heavy Is Insane32:02 Grok For Tesla35:39 What's Next------RESOURCESJosh: https://x.com/Josh_KaleEjaaz:https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures⁠

Transcript
Discussion (0)
Starting point is 00:00:03 We have a new top model in town. A new king has been crowned. Grockfort is announced. It is now the smartest model that's ever existed in the history of all time. According to all the benchmarks that were shared last night, it was pretty amazing. I stayed up late last night watching the event. It went, according to Elon time. It was very late.
Starting point is 00:00:21 I stayed up probably until like a little after one in the morning. But we have all the notes and it's amazing. This model is smarter than you. It is smarter than your PhD friend. It is smarter than any PhD in the world at any category that you can imagine. It's incredible. And one thing that I wanted to highlight before we start this episode is just how impressive the rate of acceleration is from the XAI team, because now they're sitting above OpenAI, they're sitting above Claude, they're even sitting above Google, and they haven't
Starting point is 00:00:48 been around for that long. So in this chart, we're showing kind of like each one of these bullet points as a model that has been released. So you'll notice GROC has, what is that, two, four, six points. GROC has released six models over the course of the last 24 months compared to Open AI that's been doing it since well before 2018. We have Anthropic that's released many more models than XAI. And the rate of acceleration is incredibly impressive. So before we get into exactly how everything works, what is in this? I want you to just kind of share first impressions because to me, this is like home run. They did it. No one thought they would do it. They did it. They now hold the crown for the best model, at least in terms of benchmarks in the world. I'm honestly shocked,
Starting point is 00:01:25 to be honest. I'm a massive fan of Elon, but something about starting a company 28 months ago when you've had all the anthropics and the open AIs in this world, just kind of hammer and tonguing it for years on end. I just didn't think it was possible, but he's not only come through at creating the best generalized model. So that's feature number one. It's better than chat GPT, which I know the viewer and listener listening to this uses on a daily basis.
Starting point is 00:01:53 So you now have a new model, which is arguably better than the experience that you have on your favorite model, right? So I'm using Grok4 now more than I use, chat chibati, and it's only been like 11 hours since it got released, right? The number two feature was something really unexpected, Josh. So for a number of episodes now, we've always heralded Anthropics model, Claude, as the number one coding model. It's been displaced. It's done. It's GROC 4 now. I hate to say it, but now GROC4 has somehow managed to do what Open AI has done and also matched the coding level, which is something open AI themselves have failed to do. But I have a third feature
Starting point is 00:02:32 which I'm super pumped about, which is, you know, some AI model producers like to compete at the same categories. You know, they like to compare themselves at the same features. Okay. Grop decided to create a completely new feature category, and that's in gaming. They announced, and they spent, I think, like 10 minutes in the live stream, Josh, talking about how Grock 4 is going to be really amazing at helping you create games.
Starting point is 00:02:56 So think about, like, vibe coding and how products like Cursor were really good for coding up. any kind of generalized app, but it was never specialized onto anything. GrogFo is specialized for creating games. So now you can create like a Minecraft level game or a high fidelity racing game or something as simple as Tic Tactoe or Tetris in a matter of seconds. And if you remember actually, and we can get into this later, but this is something that we predicted in yesterday's episode where we were like,
Starting point is 00:03:22 I think GrogFo is going to come out with something gaming related because Elon is such a major gamer. So super cool to see this. And then the final thing, which sounds the nerdiest, but I think is super important to focus on is it is smarter, not smarter than just any PhD, but any PhD in any kind of sector. So you may have a PhD in science,
Starting point is 00:03:44 specifically like physics or maths, or you could be a PhD in kind of art and philosophy, and this new model is now better than that. And the final feature, which I just remembered, because there's so many features, this new model is kind of like topping, is the video and audio side of things. Josh, I know you've been playing around with the voice mode quite a bit.
Starting point is 00:04:04 Actually, maybe you want to talk about the video side of things. Yeah, so some of the stuff is here. Some of the stuff is coming. So the game stuff, the coding engine, the video generation, that is coming soon. So before the end of the year, we'll get this. It's built on top of the Grock model. They're kind of iterating. But in terms of things that they have today, they do have a new advanced voice mode,
Starting point is 00:04:20 and the new advanced voice is excellent. One of the things that I noticed when I was playing around with it this morning is not only just the voice sound great, but the latency between the request and the answer is so short. It feels like you're actually conversing with the person. You say something. Wow. It's fit something back with you. And you could also control the speed at which it replies to you. So the way you listen to a podcast, maybe at 1.5 times speed, you could actually just change the speed that the AI speaks back to you. So if you get a little impatient like me, this is a very nice feature. I toggled it up to like 1.4. We're going to try that, see how it goes. But yeah, the news that they announced is amazing.
Starting point is 00:04:53 So I think people are probably wondering like what exactly makes this so good. Where's the proof that this is good? How does this all work? How do they accomplish this? this. I mean, going from zero to number one in 28 months is no easy feat, especially because GROC 2 has been out. When GROC 2 was released, it was less than 12 months ago. So the amount of progress they've made over the course of the last years is pretty incredible. And we have it here on this, this visual you just pulled up. GROC 4 is smarter that pretty much all grad students at everything. And what was interesting about GROC 4 is that they did this thing called reinforcement learning training, where they applied 10 times the amount of compute that they did in the previous
Starting point is 00:05:27 model towards reasoning. And reasoning basically is taking these facts, but applying realistic knowledge to them. So it's like if you could imagine GROC 3 was a student in school that learned a lot of textbooks, but never actually went out and got a real job. GROC 4 is the person in the workforce who's applying this knowledge to the real world. And reinforcement training, it's been debated whether or not it actually works at scale. This, I think, proves that it is. And basically what happens is you feed it a bunch of problems and you say, hey, this answer is correct or the answer is wrong and it iterates through that over and over and over again until it learns how to apply this knowledge to a broad base. So it's incredibly smart at that. It's something
Starting point is 00:06:04 that's pretty novel in terms of AI training. No one's ever applied this much compute to reasoning. And I think it shows in this model, then that's part of the reason why it's so smart is because it's been trained on all this data, but then iterated through all of these questions until it is brilliant, highly skillful model. Got it. So if I were to summarize what you just said, Josh, it sounds like, okay, you know the people that spend their entire time in academics, right? They're getting degree after degree. They're getting their master's degree. They're getting their PhD degree. Now, that's a lot of intelligence and knowledge that they're absorbing in that whatever five to 10 year period that they're studying, right? But it's all kind of theoretical to an extent.
Starting point is 00:06:44 You know, and there's certain disciplines where you go out, you do an internship, you get some practical work experience, but it's not really real life. You're not really on the job, right? You're not really at the edge. And what you're saying is, you're not really on the job. And what you're saying here is pretty much the equivalent amount of knowledge that is gained from the academics and studying and kind of like school period is equaled with the real-time work experience that someone has, right, for a model. And that's really where this model kind of like separates itself from all the other models that are out there. It has real world practical knowledge. It understands all the different terms that you're referencing maybe in social media culture or any
Starting point is 00:07:24 kind of work terms that you're mentioning that you're currently experiencing in your job, it just kind of overall gets you better and it understands where you are at the edge of your learning and what you're trying to achieve at your task. Is that, is that right, Josh? Is that, is that fair? Yeah, it's applied knowledge. You could imagine it. Now it's like, imagine Grock was, was a million people that learned in college and then went out into the workforce. And it's accumulated millions and millions of years worth of work experience. And it's now applying that to the answers that it gives you. So, yeah, that's the benefit that they found from. this. Actually, another thing on this topic, Josh, was actually a concept I kept on seeing,
Starting point is 00:07:58 which was Humanity's Last Exam and how GROC4 had basically achieved the highest score. It was actually almost double what the previous model had achieved. And I kind of want to set the context as to why this is so cool. Humanity's Last Exam is basically AI researchers bet on AI models getting to human intelligence. That means AGI level, as smart as humans or even smarter than us. So as you can imagine it's a really, really tough exam, and it's hard for AI models that have currently existed today to crack. But Grokfall kind of like came in, and they were kind of expecting it for it to surpass the previous score, which I think was about 24.9% achieved by Open AI model. And they were kind of like, yeah, it'll probably hit like 30 or something. It almost doubled it. It's almost at
Starting point is 00:08:44 50%. And the way I kind of look at that is that like, if it's improving at such a quick rate, How long has this company been around? 28 months? Where is it going to be in the next 28 months? Because this is like an exponential curve. We just looked at a graph that you showed us where after six models, Open AI, sorry, Croc has already kind of reached frontier level model. It's beaten every single benchmark.
Starting point is 00:09:06 I can't help but think that this exam is going to be blown out of the water in a matter of, I don't know, a couple of years at this point, which is shocking for me because I assume this AGI thing is still a number of years out, despite, you know, all these papers opining about it being, you know, ready in 2027. Do you have any takes on this, Josh? Like, I'm freaked out about this. Yeah. Well, again, we're getting to this point where, like, is this AGI? It depends on the definition. But what we're seeing happening is, I mean, we have Humanities last exam, which it reached 40-something percent in. But there are a lot of other benchmark tests that are actually fully saturated, meaning it scored a 100 percent on these benchmarks. I mean, there's actually no room for improvements.
Starting point is 00:09:46 in any of these. And I think that was something that was interesting to me is like, okay, how are we going to continue to measure the success, measure the improvement of these models in an objective way? Because we kind of are, and we have this, yeah, we have the base the bench which is like, okay, first of all, number one across the board. So congratulations. But also, we have a 88.9% and 98.4%, 90%. These are like really, really high numbers where we're probably just one more iteration away from just fully saturating all of them. And that was what was interesting to me is like, we really need to re-measure or re-index how we even classify these models because we're very much running out of time. And then I guess the AGI definition, we've kind of said this
Starting point is 00:10:31 in the last few episodes, but it's, I mean, I don't really know. Like, are we there? Is this it? Because if you asked someone a few years ago, like, sure, this would totally be AGI. But Today, it's like, eh, probably not. It doesn't feel like it. But, man, it's really smart. It's just about anything a human can do. Yeah, it's pretty insane. One thing I actually wanted to point out in this tweet, Josh, is it has a, it has something
Starting point is 00:10:51 called a 256,000 context. Now, I kind of want to, pun intended, set that into context on this show, which is that that's like two novels worth of information that you can just chuck into a single prompt with Grot 4. Now, think about what kind of practical context you can put that into. That means you can put a bunch of research papers of which you have no clue or understand nothing about and ask GROC to summarize it and relate to you in a way that you can understand. That is the difference between typing out simple algebraic formula and kind of like learning
Starting point is 00:11:28 how that builds into a massive scientific problem to just copy pasting the entire thing. And I think something like that is just super cool, but it's not just the context, it's how much it costs as well, if you look at this, it's, it's $3 per million input, $15 per million output for tokens. That is, for context here, just incredibly cheap for what this model is achieving and for the benchmarks that it just broke. So I just thought that was super cool to point out. And another part also in terms of cost is this is now a free product. You are actually able to use GROC4 right now, even if you don't pay for an account. You can go and actually access the GROC4 model. So I'd encourage you if you're listening. Even if you don't have an account
Starting point is 00:12:06 with GROC, try it out. It is a amazing. amazing. It is really smart. And one of the things that also stood out is when comparing it to 2003, which I use a lot, or comparing it to Gemini 2.5, which is Google's offering, is that time to the first token feels significantly faster. So with 03, a lot of the complaints that I have and that other people have is it just kind of takes a little bit to get to you, like to get where you want to go. You ask a question, it thinks for a little bit. Sometimes it'll think for a minute. Sometimes it'll think for two minutes. Grogfour really spits out answers fairly quickly. So I think if you're building an app experience,
Starting point is 00:12:38 if you are using this as a day-to-day model, just trying to query things against that, the timed token, that time to the first token is a really big deal, and it's noticeably different in this new model. And then there's another benchmark. You have a pulled up here, which I really want you to introduce and share,
Starting point is 00:12:52 because there was one line in this in particular that kind of freaked me out. And I'd love for you, just walk us through what's happening here on screen. All right, okay. So we have Greg Kamrat, I think that's how you pronounce his name, who is basically the guy that manages this
Starting point is 00:13:07 benchmark call Arc AGI. For simplistic terms, this is the AI AGI benchmark. So it's kind of measuring how close these AI models are to artificial general intelligence, which is like, you know, the precipice of where we want to get to with this entire AI trend. And he says, we got a call from XAI 24 hours ago. And he puts in quotation marks, we want to test GROC for an Arc AGI. We heard the rumors. We knew it would be good.
Starting point is 00:13:34 We didn't know it would become the number one public bottle on Arc AGI. KGI, though. Here's the testing story. And then he goes on to explain how he spoke to the XAI team. He kind of explained the rules and he said, hey, guys, like, we're going to set the rules here. You can't manipulate it in any way. And the reason why I say that is a lot of AI model providers have been rumored to manipulate score results to kind of like make them seem like the models are much better than they are. But here we have a kind of authentic case of the model team coming to the benchmark provider and saying, hey, we're good to go. Throw us anything you've got and let's see how well our model does. We back it. We know it's going to do very well.
Starting point is 00:14:12 And so he goes, exactly. And so he goes, they were on board. So we got started. And he goes, there was some initial kind of errors in terms of like setting it up. But once it got going, it absolutely blew it out the water. And he goes, the previous top score was around 8% set by Opus 4. And he says below 10% is kind of noisy. And then he goes here, Josh, take the, take the sentence. This is the one that you're right. So, getting 15.9% breaks through that noise barrier. GROC 4 is showing non-zero levels of fluid intelligence. And if you're not familiar with what fluid intelligence is, it's basically it's the capacity to reason abstractly. So like, it's kind of the ability to solve novel problems and adapt new situations without relying on prior knowledge or experience.
Starting point is 00:14:57 So this was the most interesting thing to me where I'm like, hmm, okay, this is the first time where it's actually able to to solve novel problems. which gets me to a point that Elon actually mentioned later in the show or later in the presentation, which was like, hey, we are actually really close to solving unique technical research unlocks through AGI. And he said, I think the first new technology unlocks that will be learned through the GROC model will come next year. And then the first new physics breakthroughs will come the following year. So I think this is kind of the first step towards what Sam Altman often alludes to in the
Starting point is 00:15:34 world of bioengineering, where he frequently says the thing he's most excited about is new bioengineering breakthroughs that are generated through an AI model. Well, GROC is now a contender in this as well, where I think we can very well expect to see genuinely novel technology breakthroughs and physics breakthroughs over the next 24 months. And particularly at this rate of acceleration that they have, that seems really exciting to me. And that was the thing that stood out of this whole thing is like, okay, we're actually at the point where we're right on the cusp of novel unlocks due to these large language models, which was really cool. And then in addition to all of this, we had our episode yesterday where we shared our predictions.
Starting point is 00:16:11 And I'm pretty happy with our predictions. I think we did pretty well. I don't want to say we fully knocked it out of the park, but we got like almost everything we said came true, which is some high signal. Listen, if you're listening here, you're in a right place. Two out of three or three out of four, I would say. So not bad. And some of the predictions were kind of out there. Some of them were technically moonshot predictions and we kind of nailed it.
Starting point is 00:16:31 So I'm going to start with one of my Moonshot predictions, which was Grokfall was going to excel at gaming. So not just cursor or vibe coding for any general application, but Grokful was specifically going to focus on letting anyone create the funest, most engaging games. And from there, sprout some kind of like an app store plethora for gaming, where anyone and everyone can share games, interact with each other. And the reason why I said that was nothing novel, but like Elon was a massive gamer. That was literally my thesis. We were saying on yesterday's episode, he is the number one ranked playing
Starting point is 00:17:08 and I think Dota or whatever the game is, which is a highly strategic, pretty intensive game. And it just kind of like was well attuned with his characteristics. So I was like, I bet you he's going to make a model that is super good at gaming. And in this post that I have pulled up here, that's pretty much what they spent 10 minutes
Starting point is 00:17:24 on the live stream talking about. Grock will develop and play, play 3D games. So not just, we're not talking about Tetris here. We're not talking about tick-tac-toe. We're talking about real-life, 3-D games that, you know, you and I grew up loving that kids nowadays love Minecraft-type Roblox type games. You can now spin up in a matter of seconds or minutes. Not just that, but Grock will have good taste for fun games, meaning it'll understand what you're trying to pitch it instead of like giving you some kind of like black and white game with boxes or whatever. It kind of like senses your taste. It senses your vibe. It says
Starting point is 00:17:56 that it'll have excellent video understanding, improved tool use, a gaming foundational model. That's super exciting because that's something that we haven't really seen being pitched by the major model makers. You know, we had this like niche indie gaming companies that are like, hey, we're integrating AI. We've had this popular gaming coding engine called Unity kind of spin up their own thing. But we haven't really seen the big boys kind of lean into gaming. X is doing that now. Grock 4 is doing that. This isn't out yet. Do we know when this is coming out, Josh? Yeah, so I mean, Elon's prediction, the first real AI video game in 2026, I want to add some commentary to the video game thing because I think it's actually more impressive than what people realize. When you're designing and developing games, the actual code to generate the game is not the hardest part.
Starting point is 00:18:43 You could kind of ask a game engine or an AI model to generate you a copy of Flappy Bird, generate you a racing game, generate you kind of whatever generic game you want, even a first person shooter. And there were some examples that people used a first person shooters. The difficult part of building a good game is the environment around you. It's nailing the physics. It's nailing the textures. It's nailing the actual design of the visual elements because by all means, games are reinventing the physical world in a digital space. And it's really difficult to emulate the physics, the design, the lighting, the texture,
Starting point is 00:19:13 everything that kind of makes base reality look real. So one of the interesting things that they're doing with this new gaming model, whenever it gets released, whenever the capabilities really come into full form, is they are going to allow it to work together with existing game engines like Unity. And EJaz, we actually talked about this a week or two ago, where you asked the difference between like a V-O-3 versus a Unity engine in terms of generating content. And VO3 is very much trained on the perception of physics, meaning it's seen a lot of videos and it can kind of guess how physics work based on its perception.
Starting point is 00:19:46 But a game engine like Unity, it's actually hard-coded with a physics engine, with a lighting engine with all of the things that make games look real because it's been taught how to use, like how to recreate this reality. And you kind of see it with the new GTA trailers. The world's now look incredible. So what Grock is doing is it's pairing these tools together. So it's pairing the generative part of it with the like hard coded super high quality part of it. And those two things when combined together can make for some really amazing experiences because it takes the hardest part of gaming out of the equation, which is designing the world around you. And it just gives this model a real-life physics engine. And that's going to be
Starting point is 00:20:25 freaking awesome. It's a really strategic move from Elon and the XAI team as well, isn't it? So from a infrastructure level, what you're basically saying is it's not trying to own the entire stack. It's just trying to own the brain. And it's welcome to inviting or integrating other tools like Unity or any other coding generators that are really good at nailing the physics, as you say, within its tool stack, right? It seems like its goal is just to make it easiest to make the coolest games. And I can't help but think that,
Starting point is 00:20:57 you know, Elon's original vision when he kind of renamed Twitter to X was I wanted to be the Everything app. And we said this on yesterday's episode. The Everything app right now is WeChat that operates in Asia where people can do all their finances, they socialize, they play a lot of games.
Starting point is 00:21:14 And we haven't really had that app in the West. And it seems like X might end up being that app. I'm convinced now that the next step is surfacing these games to anyone and everyone. And so you can kind of like go on to an old school mini clip or Apple App Store like experience and browse the top games that are trending at that moment and interact with them in real time, maybe even with your friends as well. But Josh, I also want to mention these other two sneaky points that he's mentioned down here, which is first half-hour watchable TV 2025.
Starting point is 00:21:45 So what he's saying here is, like you watch these regular sitcom. that appear on Netflix or Apple TV every day, where they're kind of like half an hour episodes, you can now have fully AI generated episodes. So what he's implying here is, I'm guessing it's going to be super easy to create these kind of narratives and directed scenes similar to a Hollywood-style VFX studio,
Starting point is 00:22:08 but for nothing, straight from your X account. So he's kind of like not only taking on the gaming sector, but he's taking on the Hollywood sector, all with one single model, which is just insane. And then he says here, first watchable AI movie, 2026. I've got a bunch to say on this, but Josh, please take the mic. You go for it. Yeah, so they have like this, this very clear roadmap of everything they want to destroy.
Starting point is 00:22:30 It's like, okay, Grock 4 is released today. They have the coding model coming in August. They have the multimodal agent in September. They have video generation, which is what we're discussing now, in October. And every single one of those is going to sequentially, and like in a way that compounds get better and better and better. He says, I'm curious why you think the AI video generation is so impressive because we've kind of seen this with V-O-3. That's the first version that we had that had audio, really, so the characters that you were making could talk. It had spatial awareness. So if you were to cut something or interact with something, it would emulate the perceived sound.
Starting point is 00:23:04 So what do you think the impact of GROC for doing this? I mean, presumably better, we'll have on the world of entertainment. I think GROC4 is going to nail the AI episodes, the AI movies, better than anyone else, not necessarily because it's a better model, but because it's going to copy all the best traits of all the other video models, Josh. Okay? And this is not something that is uncommon with other AI model providers, right? We've seen the likes of OpenAI copy some of the coding training methods that Anthropic did with Claude, and now it's become like a really good coding model. We've seen Anthropic do vice versa with Open AI. We've seen Meta Lama do the same thing. So there's a history of, you know, mimicry is the highest form of flattery, blah, blah, blah.
Starting point is 00:23:48 I think Elon has looked at Google's V-O-3 and said, huh, the visuals are really, really accurate. It's really high fidelity, but there's no character consistency. And then he looks over at Mid Journey and their recent model, and he's like, huh, their video aesthetics isn't as good as V-O-3, but their character continuity is really good. Wow, look at that anime episode that I've just watched. So I think he's picking and choosing all these different things, Josh, and he's bunging it into Grog4. I think that's what he's going to launch. He's not necessarily going to launch a higher aesthetic model than V-O-3, but he's going to launch a model that has all these, that combines all these different characteristics such that you can
Starting point is 00:24:27 go on it and say, hey, I've generated this really cool anime character using Mid-Journey or whatever, and I'm going to copy and paste it into my GROC4 model on my X account, and I want it to now direct a scene for me, using this one character. That's kind of where I see this going. What do you think? I'm all for it. I think that's great. I think, well, if they're going to have a TV show by next year or the end of this year, it needs to have character continuity.
Starting point is 00:24:52 So all of these things that we are lacking right now, it must accomplish in order to have that. So in that sense, yeah, I totally think that's going to happen. And I'm really, really excited because I think XAI has access to a lot of visual data that the rest of the world doesn't. And I'm not sure how valuable it is, but in the sense of like the test, the network, I'm sure that data is available for training, which is a lot of real world data. They have a lot of factories. They have a lot of robots. They just have a lot of this weird real world data that is kind of proprietary to them. And I'm hopeful we'll make a difference in understanding the world. I think that's, yeah, it's going to be interesting. We'll see.
Starting point is 00:25:30 There also is one other thing that I wanted to mention before we wrap up, which I think is notable. And it's what they're offering because they're not just offering GROC 4, right? There's another model here. It's called GROC4. Heavy. And Grog4 Heavy is really impressive because Grog4 Heavy doesn't just rely on a single model. It relies on a series of agents that are kind of working together to give you the ANSEL. Yeah. So multi-agent, multi-modality, multi-every thing. It is really impressive. It takes a ton of compute, actually. So the cost of Grock Heavy is very expensive. It's what, I think, $300 a month, $3,000 a year. So we're talking about a good amount. This is probably the most.
Starting point is 00:26:11 expensive subscription that exists in an AI model right now. But the outcome is the best in the world. And when we showed those benchmarks a little bit earlier, it shows that GROC heavy, when it has multi-agent models, will produce the single best answer in the world. So if you're doing research, if you're doing any hard problem solving, this will solve that. And the way it works is it basically takes a version of GROC4. It clones itself into a series of these agents. And they all search for the answer to the same question that you asked. And then what they do is after they've come to a conclusion, they look at each other and they compare notes. And then they form consensus on what the best answer is and then push that
Starting point is 00:26:51 best answer forward. So what you'll oftentimes find if you're using a language model is that you'll get a slightly different answer every time you ask a question. So you could ask the same prompt and you'll get a different answer twice. And sometimes it'll be better than the other. And what this does is it provides the redundancy to guarantee that each answer is as close to the answer as possible. And that was super interesting to me. So I don't have the Grot-heavy account. We're not paying $300 yet a month, but we might have to try this out for a demo because I'm really fascinated at how- Dude, we're going to end up paying our entire rent on AI models. That's like, I'm paying like, I think, what, 200 bucks on Open AI's like premium tier plan
Starting point is 00:27:32 or whatever it is. And it gives me access to all their cool features, their video models and agent thing. GROC's now like, GROC-heavy, you just say. said was 300 bucks. That's insane. Okay, so my take on this is there's been a few experiments that were talked about recently, and I say the word experiment, because that's literally what they were, to see how these different models would interact with each other on real-life scenarios. So, we spoke about it in, I don't know, like five episodes ago, one research group which put Anthropics Claude model, Open Airs model, GROC, all in a room, and said, hey, I want you to raise money for charity. Go, figure it out. You're going to have access to any tool that you want.
Starting point is 00:28:14 And what was funny about that little segment that we did was it described how some models were lazy, some models were super practical, and some models worked really well together. And on that last point, the models that worked really well together often gave a way better, I'm not talking about marginal, I'm talking about a way better response and output to the original query. They raised way more money for charity. they were way more entertaining and they were way more strategic and most importantly they would call each other out
Starting point is 00:28:43 for the mistakes that they would make right? So all these traits were specific to agents that work together or models that work together. That's why Grock Heavy is going to win. They've seen that pattern happen, Josh. So imagine you don't just work
Starting point is 00:28:58 with one singular terminal saying, hey, can you figure out this research problem for me? It takes your research problem and in the back end speaks to million replications of that exact model, which runs off and does one part of the query, which runs off and does research on another part. It comes back with answers. You have an orchestrator agent which evaluates the answers and responses. And all of that happens in milliseconds, or seconds, rather, and gives you the best answer that you could have possibly get given
Starting point is 00:29:27 that would have taken you days or months to figure out. Just insane. It's amazing. And it's funny you use that example. I actually just shared a post. with you. You wouldn't mind pulling it up. Yeah. And it is an example that they used from the presentation last night, which was using AI to make money. And the example that they used was a vending machine. And they showed the benchmarks here where Grock, when tasked with the problem of solving, how can I make money with vending machines? They rolled this out virtually and it actually made a lot of money. They sold 4,569 of these units. And more than double. That's more than double the second. More than triple the second, which is Claude Opus 4. So that begs the question,
Starting point is 00:30:10 like, I mean, you look at the net worth over time and it's much higher than the other models. Hang on a second, mate. This is a crazy chart. Isn't that cool? What is that? That's insane. Yeah. So there's this world in which like, hey, it's now smart enough where I could actually conduct business on your behalf and kind of ideate and apply these ideas to the real world to generate money. It did really good. And you could see where the human falls in this. It's pretty disappointing. So the net worth of a human is $844. The next up is clawed at just over $2,000. And then we have GROC at $4,700. GROC sold $4,000 of these units, while a human sold $344. So in this particular example, rock four is already an order of magnitude plus better than a human at selling vending
Starting point is 00:31:01 machines at least. That's our benchmark. So it's just another example of how these things are just getting more aware. They have more context. They have more capability. And again, because of the reinforcement training that we talked about earlier in the show, they just have the practical knowledge to apply these ideas to the real world. And I think that's kind of what you're seeing highlighted in this chart is like, damn, it's pretty good. Like it's doing things in the real world and it's making a difference. All right, Josh, I want to get back to the predictions that we nailed because I just remember that you made a banging one, which couldn't be more on point. It's coming to test with us.
Starting point is 00:31:36 Let's go. This is so exciting. Yeah, so yesterday we mentioned like, hey, I'd really love to see GROC in a Tesla. I did cheat a little bit because there's an account that I follow that shares the change logs within the apps. And it showed there were some mentions of GROC last week. There was no guarantee that it was going to be announced. And then Elon just this morning posted, GROC is coming to Tesla vehicles very soon next week at the latest. This is very exciting.
Starting point is 00:31:58 I am very hopeful that it has. Yeah, I'm very hopeful that it has the things that we mentioned yesterday, which is multimodality awareness. It can read from the cameras. It could hear you from the microphone. You could have a conversation with it. You could talk about things that you're seeing. It has access to your GPS and navigational data. So it can kind of interact with you, perhaps as you're driving around, give you a tour of a neighborhood. It could tell you of interesting places nearby. It could just converse with you about whatever you'd like. It can teach you things. It can entertain you by telling stories. It can just, you have this AI superpower assistant. Now, it's just, you have this AI superpower assistant. Now, it's just, you can just, you know, inside of these cars. And I think that's a really fun application of it, particularly when you think about robotaxies, because if you're getting into a robo taxi, you have this screen, which is a fun entertainment system, and you could watch like pre-created content. You could go on YouTube, you can go on Netflix. But now you also have this superpowered assistant inside that you can kind of converse with about anything. And the idea, I would assume, is if people aren't familiar, when you get into Tesla, even if it's another person's Tesla, you have a profile on your account. And that profile, will automatically sync to the car when you get in it. So it will automatically adjust the C. It'll
Starting point is 00:33:02 log you into the correct accounts. It will change your temperature preferences to the way that you like. And that also probably gets paired with your GROC memory profile. So it knows all of the memory about you. And when you get into a robo taxi that even if it doesn't belong to you, it still has all the context of your past experiences. And that's going to be really fun because you just now have this hyper-personalized profile that travels around with you everywhere you go when you're in a car. So that was a fun prediction that is seemingly coming in the next seven days. I mean, I said this on yesterday's episode, but the multimodality point is a really important one because it means that your AI is going to be everywhere that you go.
Starting point is 00:33:42 And that's ultimately where we're heading, right? Like we went from desktop computers to smaller computers called laptops that were portable, but you still had to open up to these tiny, you know, metal slabs that you can kind of like, use wherever you are, right, and interact and socialize and all the likes, but it's still clunky, you know, I need to pick it up, I need to open up apps and stuff, and then AI just kind of like spun,
Starting point is 00:34:06 blown all of that out the water, but the thing with AI is you need to tell it stuff. You know, you need to tell it about yourself, you need to explain the context of things, and now you have this kind of like all in one AI model that not only sits on your social media feed and sees all the things that you like, sees all the people that you follow,
Starting point is 00:34:25 sees all the things that you search, but it's also your personal assistant, it's also your therapist, and now it can also be your eyes, right? So if it jumps in your Tesla car, it's seeing everything that you see. It might even point out different kinds of shops or historical sites that it knows you might like
Starting point is 00:34:42 and say, hey, you should take it right down here and you'll have a more scenic route or whatever that might be. And I'm not going to bother to try and opine on what kinds of new experiences that's going to unlock right now because I need to think more deeply about it, but tremendously excited about what this is going to become. Yeah, it's going to be really cool. I think GROC 4, the announcement we got last night,
Starting point is 00:35:02 is very much the starting point. And it kind of laid out the roadmap for where we want to go. So next week when Tesla gets GROC, it's probably not going to have the multi-modality. In fact, they said they were going to try to roll that out sometime in September. We have the coding model in August. We have the video generation in October. But I think it's safe to say that by the end of this year,
Starting point is 00:35:20 this form of GROC, this version of GROC will be feature complete. and that's going to be a very different world than we're living in today. I mean, we saw what happened when V-O-3 came to the market, how quickly video content changed, and how it's now, like, even this morning, I saw this viral video from Popeyes. It was generated by a guest that we had on the show a few weeks ago. And now they're in direct competition with McDonald's, and it was generated for like a couple hundred bucks from a dude in his office. And it's like that was not even possible to do two months ago. Like, we're talking a matter of weeks.
Starting point is 00:35:55 So as these tools roll out, as we get new game generation, as we get this new coding model, this new video generation that understands the world and can apply the Unity engine, the unreal gaming engines that we used to see AAA video games. Yeah. Like, we're going to have some pretty amazing new stuff to be entertained by to create ourselves. It's going to get really crazy, really quick. And I think that was kind of the idea that Elon opened up the presentation with is like, hey, we are very much in the big bang, the big bang like time of the intelligence boom.
Starting point is 00:36:28 And we are like very, very early stages. And to go back to the chart that we started with, the rate of acceleration, the velocity at which these things get better is so fast. And if you imagine, I mean, yeah, here's the chart. If you imagine we were at GROC 2 less than 12 months ago. GROC 2 by today, like you couldn't even pay me to use it. It's so bad. So if we continue that rate of acceleration, the rate of velocity, and just extrapolated out 12 more months. I mean, the world's totally different place, because Grock 4 will then be this like kind of dumb model that's stupid that like probably fits on your phone, but even though it does, you don't even want it anymore. And it's like, it's getting really good. And this is where we start to get those
Starting point is 00:37:07 second order effects occurring where it's like, hey, you start to get novel technology breakthroughs, novel physics breakthroughs, novel bioengineering breakthroughs. And all of those things are seemingly coming at a rate that I think is going to be surprising to a lot of people. I mean, I couldn't agree more. I think the general theme of these AI developments over the last two years that I've been kind of like heads down studying this, Josh, is we are in the Wild West. And every time I think one model has ended all the others, like it'll never be beaten, i.e. my own words literally within the week. And so, and I thought that we'd reach that point about two months ago where they would talk about how the new compute clusters would require. billions and billions, potentially trillions of dollars of money, so they had to raise funds,
Starting point is 00:37:56 where we were running out of data. Do you remember that, Josh? And everyone was like, ah, these models are all going to reach a certain level of intelligence and it's all going to become a commodity. And I just keep eating my words. The graph just keeps going up. And I'm waiting for it to stop. I'm waiting for Nvidia's market cap to flatten. It's just not. It's worth more than the UK's entire economy right now. It's above $4 trillion. It was which is 14% more than the British economy, my home where I'm from, which is just a first world country,
Starting point is 00:38:26 an insane thing to even say on this show. So the general theme is, I just need to keep setting the bar higher, basically. I think that's the trend is, if you're listening to this and you are following AI closely and you're here for the day to day, expect things to continue to move faster. And as fast as they are today,
Starting point is 00:38:44 again, you need to re-index this. They're going to move faster. So for the people who are still listening, thank you. very much appreciate you sticking with us being here for the ride. There's a lot of stuff to look forward to, and I just kind of want to take a second to highlight what we are going to be talking about, and that's coming down the pipeline soon. So we have chat GPT5, which is confirmed. That's coming this summer. That is going to probably beat Rock 4.4. It's probably going to be better. It is going to be
Starting point is 00:39:07 incredible. Then next week, Open AI is actually open sourcing a model. So we have that to look forward to. New Claude has been spotted, Cloud 4.5, possibly. It's been out in the wild. It's been rumored. And then we have Gemini 3.0, which has also been spotted in the wild. And these are a lot of really big models. So I think for the past few months, we had this breather where it was like, okay, nothing really has come in terms of frontier models. We've been using 03 for quite some time now. I think that's all about to change in the next few months. So if you're listening to us, buckle up. There's a lot of acceleration, a lot of AI, a lot of intelligence to come. Again, thank you for the comments yesterday about sharing preferences. Some people liked the daily show. Some people didn't. We're just going to continue to iterate. I think, Today, the episode works perfect. By the afternoon, you should have all the news. So thank you for listening. Thank you for sharing. Thank you just for making it here, rocking with us the whole time.
Starting point is 00:39:57 And we will be back soon with another episode. See you guys next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.