Limitless: An AI Podcast - Grok-4 Is Now The Smartest AI Model In The World | Everything You Need To Know

Episode Date: July 10, 2025

Grok-4 just leap-frogged every frontier model, topping AGI and coding benchmarks, crushing PhDs across disciplines, and even selling vending machines better than humans, all barely 28 months ...after XAI was born. We unpack the launch event, new voice mode, gaming-first tools, Grok-Heavy’s multi-agent powerhouse, and why Elon’s Tesla integration could make AI your on-the-go co-pilot. Stick around for what this acceleration means as GPT-5, Claude 4.5, and Gemini 3.0 line up next.-----💫 LIMITLESS | SUBSCRIBE & FOLLOWhttps://limitless.bankless.com/https://x.com/LimitlessFT-----TIMESTAMPS00:00 Grok4 Is Now The Smartest AI In The World05:19 How Did They Do It?08:00 Humanities Last Exam13:12 The AGI Test18:50 Grok Gaming23:12 Video Generation26:01 Grok Heavy Is Insane32:02 Grok For Tesla35:39 What's Next------RESOURCESJosh: https://x.com/Josh_KaleEjaaz:https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures⁠

Transcript
Discussion (0)
Starting point is 00:00:03 We have a new top model in town. A new king has been crowned. Grockfort is announced. It is now the smartest model that's ever existed in the history of all time. According to all the benchmarks that were shared last night, it was pretty amazing. I stayed up late last night watching the event. It went, according to Elon time. It was very late.
Starting point is 00:00:21 I stayed up probably until like a little after one in the morning. But we have all the notes and it's amazing. This model is smarter than you. It is smarter than your PhD friend. It is smarter than any PhD in the world at any like category that you can imagine. It's incredible. And one thing that I wanted to highlight before we start this episode is is just how impressive the rate of acceleration is from the XAI team, because now they're sitting above Open AI, they're sitting above Claude, they're even sitting above Google. And they haven't
Starting point is 00:00:48 been around for that long. So in this chart, we're showing kind of like each one of these bullet points as a model that has been released. So you'll notice GROC has, what is that, two, four, six points. GROC has released six models over the course of the last 24 months compared to Open AI that's been doing it since well before 2018. We have Anthropic that's released many more models than XAI, and the rate of acceleration is incredibly impressive. So before we get into exactly how everything works, what is in this. I want you to just kind of share first impressions because to me, this is like home run. They did it. No one thought they would do it. They did it. They now hold the crown for the best model, at least in terms of benchmarks in the world. I'm honestly shocked,
Starting point is 00:01:25 to be honest. I'm a massive fan of Elon, but something about starting a company 28 months ago when you've had all the anthropics and the open AIs in this world, just kind of hammer and tonguing it for years on end. I just didn't think it was possible, but he's not only come through at creating the best generalized model, so that's feature number one. It's better than chat GPT, which I know the viewer and listener listening to this uses on a daily basis.
Starting point is 00:01:53 So you now have a new model, which is arguably better than the experience that you have on your favorite model, right? So I'm using GROC4 now more than I use, chat chbt and it's only been like 11 hours since it got released, right? The number two feature was something really unexpected, Josh. So for a number of episodes now, we've always heralded Anthropics model, Claude, as the number one coding model. It's been displaced. It's done. It's GROC4 now. I hate to say it, but now GROC4 has somehow managed to do what Open AI has done and also matched the coding level, which is something open AI themselves have failed to do. But I have a third feature
Starting point is 00:02:32 which I'm super pumped about, which is, you know, some AI model producers like to compete at the same categories. You know, they like to compare themselves at the same features. Grok decided to create a completely new feature category, and that's in gaming. They announced, and they spent, I think, like 10 minutes in the live stream, Josh, talking about how Grock 4 is going to be really amazing at helping you create games.
Starting point is 00:02:56 So think about, like, vibe coding and how products like cursor were really good for coding up any kind of generalized app, but it was never specialized onto anything. GrogFo is specialized for creating games. So now you can create like a Minecraft level game or a high fidelity racing game or something as simple as Tic-Tac-Tow or Tetris in a matter of seconds. And if you remember actually, and we can get into this later, but this is something that we predicted in yesterday's episode where we were like,
Starting point is 00:03:22 I think GrogFo is going to come out with something gaming related because Elon is such a major gamer. So super cool to see this. And then the final thing, which sounds the nerdiest, but I think is super important to focus on, is it is smarter, not smarter than just any PhD, but any PhD in any kind of sector. So you may have a PhD in science, specifically like physics or maths,
Starting point is 00:03:47 or you could be a PhD in kind of art and philosophy, and this new model is now better than that. And the final feature, which I just remembered, because there's so many features, this new model is kind of like topping, is the video and audio side of things. Josh, I know you've been playing around with the voice mode quite a bit. Actually, maybe you want to talk about the video side of things.
Starting point is 00:04:06 Yeah, so some of the stuff is here. Some of the stuff is coming. So the game stuff, the coding engine, the video generation, that is coming soon. So before the end of the year, we'll get this. It's built on top of the Grock model. They're kind of iterating. But in terms of things that they have today, they do have a new advanced voice mode, and the new advanced voice is excellent.
Starting point is 00:04:22 One of the things that I noticed when I was playing around with it this morning is not only just the voice sound great, but the latency between the request and the answer is, so short. It feels like you're actually conversing with the person. You say something. Wow. It's fit something back with you. And you could also control the speed at which it replies to you. So the way you listen to a podcast, maybe at 1.5 times speed, you could actually just change the speed that the AI speaks back to you. So if you get a little impatient like me, this is a very nice feature. I toggled it up to like 1.4. We're going to try that, see how it goes. But yeah, the news that they announced
Starting point is 00:04:52 is amazing. So I think people are probably wondering like what exactly makes this so good. Where's the proof that this is good? How does this all work? How do they accomplish this? I mean, going from zero to number one in 28 months is no easy feat, especially because GROC 2 has been out. When GROC 2 was released, it was less than 12 months ago. So the amount of progress they've made over the course of the last years is pretty incredible. And we have it here on this, this visually you just pulled up. GROC 4 is smarter than pretty much all grad students at everything. And what was interesting about GROC 4 is that they did this thing called reinforcement learning training, where they applied 10 times the amount of compute that they did in the previous model towards reasoning. And reasoning
Starting point is 00:05:30 basically is taking these facts, but applying realistic knowledge to them. So it's like if you could imagine GROC 3 was a student in school that learned a lot of textbooks, but never actually went out and got a real job. GROC 4 is the person in the workforce who's applying this knowledge to the real world. And reinforcement training, it's been debated whether or not it actually works at scale. This, I think, proves that it is. And basically what happens is you feed it a bunch of problems and you say, say, hey, this answer is correct or this answer is wrong, and it iterates through that over and over and over again until it learns how to apply this knowledge to a broad base. So it's incredibly smart at that. It's something that's pretty novel in terms of AI training. No one's ever
Starting point is 00:06:06 applied this much compute to reasoning. And I think it shows in this model, then that's part of the reason why it's so smart is because it's been trained on all this data, but then iterated through all of these questions until it is the brilliant, highly skillful model. Got it. So if I were to summarize what you just said, Josh, it sounds like, okay, you know the people that spend their entire time in academics, right? They're getting degree after degree. They're getting their master's degree. They're getting their PhD degree. Now, that's a lot of intelligence and knowledge that they're absorbing in that whatever five to 10 year period that they're studying, right? But it's all kind of theoretical to an extent. You know, and there's certain disciplines where you go out, you do an internship, you get some practical work experience. But it's not really really, real life. You're not really on the job, right? You're not really at the edge. And what you're saying here is pretty much the equivalent amount of knowledge that is gained from the academics and studying and kind of like school period is equaled with the real-time work experience that someone has, right, for a model. And that's really where this model kind of like separates itself from all the other
Starting point is 00:07:14 models that are out there. It has real-world practical knowledge. It understands all the different terms that you're referencing maybe in social media culture or any kind of work terms that you're mentioning that you're currently experiencing in your job, it just kind of overall gets you better and it understands where you are at the edge of your learning and what you're trying to achieve at your task. Is that, is that right, Josh? Is that, is that fair? Yeah, it's applied knowledge. You could imagine it. Now it's like, imagine Grock was, was a million people that learned in college and then went out into the workforce. And it's accumulated millions and millions of years worth of work experience. And it's now applying that to that.
Starting point is 00:07:50 the answers that it gives you. So yeah, that's the benefit that they found from this. Actually, another thing on this topic, Josh, was actually a concept I kept on seeing, which was Humanity's last exam and how GROC 4 had basically achieved the highest score. It was actually almost double what the previous model had achieved. And I kind of want to set the context as to why this is so cool. Humanity's last exam is basically AI researchers bet on AI models getting to human intelligence. That means AGI level, as smartly, as humans or even smarter than us. So as you can imagine, it's a really, really tough exam. And it's hard for AI models that have currently existed today to crack. But Grokfo kind of like came in
Starting point is 00:08:31 and they were kind of expecting it for it to surpass the previous score, which I think was about 24.9% achieved by Open AI model. And they were kind of like, yeah, it'll probably hit like 30 or something. It almost doubled it. It's almost at 50%. And the way I kind of look at that is that like if it's improving at such a quick rate, how long has this company been around? 28 months? Where is it going to be in the next 28 months? Because this is like an exponential curve.
Starting point is 00:08:57 We just looked at a graph that you showed us where after six models, Open AI, sorry, Croc has already kind of reached frontier level model. It's beaten every single benchmark. I can't help but think that this exam is going to be blown out of the water in a matter of, I don't know,
Starting point is 00:09:11 a couple of years at this point, which is shocking for me because I assume this AGI thing is still a number of years out, despite, you know, all these papers opining about it being, you know, ready in 2027. Do you have any takes on this, Josh? Like, I'm freaked out about this. Yeah. Well, again, we're getting to this point where, like, is this AGI? It depends on the definition. But what we're seeing happening is, is, I mean, we have Humanities last exam, which it reached 40-something percent in. But there are a lot of other benchmark tests that are actually
Starting point is 00:09:41 fully saturated, meaning it scored 100 percent on these benchmarks. I mean, there's actually no room for improvement in any of these. And I think that was something that was interesting to me is like, okay, how are we going to continue to measure the success, measure the improvement of these models in an objective way? Because we kind of are. And we have this, yeah, we have the which is like, okay, first of all, number one across the board. So congratulations. But also, we have a 88.9% and 98.4%, 90%. These are like really, really high numbers where we're, We're probably just one more iteration away from just fully saturating all of them. And that was what was interesting to me is like, we really need to re-measure or re-index how we even
Starting point is 00:10:25 classify these models because we're very much running out of time. And then I guess the AGI definition, we've kind of said this in the last few episodes, but it's, I mean, I don't really know. Like, are we there? Is this it? Because if you asked someone a few years ago, like, sure, this would totally be AGI. But today, it's like, eh, probably not. It doesn't feel like it.
Starting point is 00:10:42 But, man, it's really smart. It's just about anything a human can do. Yeah, it's pretty insane. One thing I actually wanted to point out in this tweet, Josh, is it has something called a 256,000 context. Now, I kind of want to, pun intended, set that into context on this show, which is that that's like two novels worth of information that you can just chuck into a single prompt with Grot 4. Now, think about what kind of practical context you can put that into. That means you can put a bunch of research papers of which you have. no clue or understand nothing about and ask Grock to summarize it and relate to you in a way that you can understand. That is the difference between typing out simple algebraic formula and kind of like learning how that builds into a massive scientific problem to just copy-pacing the entire thing. And I think something like that is just super cool. But it's not just the context, it's how much it costs as well. If you look at this, it's it's $3 per million input,
Starting point is 00:11:41 $15 per million output for tokens. That is for context. For context, say it's just incredibly cheap for what this model is achieving and for the benchmarks that it just broke. So I just thought that was super cool to point out. And another part also in terms of cost is this is now a free product. You are actually able to use GROC4 right now, even if you don't pay for an account, you can go and actually access the GROC4 model. So I'd encourage you if you're listening, even if you don't have an account with GROC, try it out. It is amazing. It is really smart. And one of the things that also stood out is when comparing it to O3, which I use a lot, or comparing it to Gemini 2.5, which is Google's offering, is that time to the first token
Starting point is 00:12:18 feels significantly faster. So with 03, a lot of the complaints that I have and that other people have is it just kind of takes a little bit to get to you, like to get where you want to go. You ask a question, it thinks for a little bit, sometimes it'll think for a minute, sometimes it'll think for two minutes. Grogfor really spits out answers fairly quickly. So I think if you're building an app experience, if you are using this as a day-to-day model, just trying to query things against that the timed token, that times the first token is a really big deal, and it's noticeably different in this new model. And then there's another benchmark. You have a pulled up here, which I really want you to introduce and share because there was one line in this in particular that kind of
Starting point is 00:12:54 freaked me out. And I'd love for you. Just walk us through what's what's happening here on screen. All right. Okay. So we have Greg Kamrat, I think that's how you pronounce his name, who is basically the guy that manages this benchmark called Arc AGI. For simplistic terms, this is the AI AGI benchmark. So it's kind of measuring how close these AI models are to artificial general intelligence, which is like, you know, the precipice of where we want to get to
Starting point is 00:13:21 with this entire AI trend. And he says, we got a call from XAI 24 hours ago. And he puts in quotation marks, we want to test GROC for on Arc AGI. We heard the rumors. We knew it would be good. We didn't know it would become the number one public bottle on Arc AGI
Starting point is 00:13:37 though. Here's the testing story. And then he goes on to explain how he spoke to the XAI team. He kind of explained the rules and he said, hey, guys, like, we're going to set the rules here. You can't manipulate it in any way. And the reason why I say that is a lot of AI model providers have been rumored to manipulate score results to kind of make them seem like the models are much better than they are. But here we have a kind of authentic case of the model team coming to the benchmark provider and saying, hey, we're good to go. Throw us anything you've got.
Starting point is 00:14:07 and let's see how well our model does. We back it. We know it's going to do very well. And so he goes, exactly. And so he goes, they were on board, so we got started. And he goes, there was some initial kind of errors in terms of like setting it up. But once it got going, it absolutely blew it out the water. And he goes, the previous top score was around 8% set by Opus 4.
Starting point is 00:14:29 And he says below 10% is kind of noisy. And then he goes here, Josh, take the sentence. This is the one that you're right. So getting 15.9% breaks through that noise barrier. GROC4 is showing non-zero levels of fluid intelligence. And if you're not familiar with what fluid intelligence is, it's basically it's the capacity to reason abstractly. So like, it's kind of the ability to solve novel problems
Starting point is 00:14:53 and adapt new situations without relying on prior knowledge or experience. So this was the most interesting thing to me where I'm like, hmm, okay, this is the first time where it's actually able to solve novel problems, which gets me to a point. that Elon actually mentioned later in the show, or later in the presentation, which was like, hey, we are actually really close to solving unique technical research unlocks through AGI. And he said, I think the first new technology unlocks that will be learned through the GROC model will come next year. And then the first new physics breakthroughs will come the following year.
Starting point is 00:15:29 So I think this is kind of the first step towards what Sam Altman often alludes to in the world of bioengineering, where he frequently says the thing he's most excited about is, is new bioengineering breakthroughs that are generated through an AI model. Well, GROC is now a contender in this as well, where I think we can very well expect to see genuinely novel technology breakthroughs and physics breakthroughs over the next 24 months. And particularly at this rate of acceleration that they have, that seems really exciting to me. And that was the thing that stood out of this whole thing, is like, okay, we're actually
Starting point is 00:16:01 at the point where we're right on the cusp of novel unlocks due to these large language models, which was really cool. And then in addition to all of this, we had our episode yesterday where we shared our predictions. And I'm pretty happy with our predictions. I think we did pretty well. I don't want to say we fully knocked it out of the park, but we got like almost everything we said came true, which is somehow saying, listen, if you're listening here, you're in a right place. Two out of three or three out of four, I would say. So not bad. And some of the predictions were kind of out there. Some of them were technically moonshot predictions and we kind of nailed it. So I'm going to start with one of my
Starting point is 00:16:34 Predictions, which was GROC4 was going to excel at gaming. So not just cursor or vibe coding for any general application, but GrogFOR was specifically going to focus on letting anyone create the funest, most engaging games. And from there, sprout some kind of like an app store plethora for gaming, where anyone and everyone can share games, interact with each other. And the reason why I said that was nothing novel, but like, Elon was a massive gamer. That was literally my thesis. We were saying on yesterday's episode,
Starting point is 00:17:06 he is the number one ranked playing and I think Dota or whatever the game is, which is a highly strategic, pretty intensive game. And it just kind of like was well attuned with his characteristics. I was like, I bet you he's going to make a model that is super good at gaming.
Starting point is 00:17:20 And in this post that I have pulled up here, that's pretty much what they spent 10 minutes on the live stream talking about. Grok will develop and play 3D games. So not just, we're not talking about Tetris here. We're not talking about Tic-Tac-Tow. We're talking about real-life 3D games that you and I grew up loving that kids nowadays love, Minecraft-type Roblox-type games.
Starting point is 00:17:40 You can now spin up in a matter of seconds or minutes. Not just that, but Grock will have good taste for fun games, meaning it'll understand what you're trying to pitch it instead of like giving you some kind of like black and white game with boxes or whatever. It kind of like senses your taste. It senses your vibe. It says that it'll have excellent video understanding, improved tool use, a gaming foundational model, that's super exciting because that's something that we haven't really seen
Starting point is 00:18:05 being pitched by the major model makers. You know, we had this like niche indie gaming companies that are like, hey, we're integrating AI. We've had this popular gaming coding engine called Unity kind of spin up their own thing. But we haven't really seen the big boys kind of lean into gaming. X is doing that now. Grock 4 is doing that. This isn't out yet. Do we know when this is coming out, Josh? Yeah. So, I mean, Elon's prediction, the first real AI video game, in 2026. I want to add some commentary to the video game thing because I think it's actually more impressive than what people realize. When you're designing and developing games, the actual code to generate the game is not the hardest part. You could kind of ask a game engine or
Starting point is 00:18:45 an AI model to generate you a copy of Flappy Bird, generate you a racing game, generate you whatever generic game you want, even a first-person shooter. And there were some examples that people used a first-person shooters. The difficult part of building a good game is the environment around you. It's nailing the physics. It's nailing the textures. It's nailing the actual design of the visual elements because by all means, games are reinventing the physical world in a digital space. And it's really difficult to emulate the physics, the design, the lighting, the texture, everything that kind of makes base reality look real. So one of the interesting things that they're doing with this new gaming model, whenever it gets released, whenever the capabilities
Starting point is 00:19:22 really come into full form, is they are going to allow it to work together with existing game engines like Unity. And EJas, we actually talked about this a week or two ago where you asked the difference between like a V-O-3 versus a Unity engine in terms of generating content. And V-O-3 is very much trained on the perception of physics, meaning it's seen a lot of videos and it can kind of guess how physics work based on its perception. But a game engine like Unity, it's actually hard-coded with a physics engine, with a lighting engine, with all the things that make games look real, because it's been taught how to use, like how to recreate this reality.
Starting point is 00:19:58 And you kind of see it with the new GTA trailers. The world's now looking credible. So what Grock is doing is it's pairing these tools together. So it's pairing the generative part of it with the like hard-coded super high quality part of it. And those two things when combined together can make for some really amazing experiences because it takes the hardest part of gaming out of the equation,
Starting point is 00:20:19 which is designing the like world around you. And it just gives this model a real life. physics engine and that's going to be freaking awesome. It's a really strategic move from Elon and the XAI team as well, isn't it? So from a infrastructure level, what you're basically saying is it's not trying to own the entire stack. It's just trying to own the brain and it's welcome to inviting or integrating other tools like Unity or any other coding generators that are really good at nailing the physics as you say within its tool stack, right? It seems like its goal is just to make it easiest to make the coolest games. And I can't help but think that, you know, Elon's original
Starting point is 00:20:58 vision when he kind of renamed Twitter to X was, I wanted to be the Everything app. And we said this on yesterday's episode. The Everything app right now is WeChat that operates in Asia where people can do all their finances, they socialize, they play a lot of games. And we haven't really had that app in the West. And it seems like X might end up being that app because I'm convinced now that the next step is surfacing these games to anyone and everyone. And so you can kind of like go on to an old school mini clip or Apple App Store like experience and browse the top games that are trending at that moment and interact with them in real time, maybe even with your friends as well. But Josh, I also want to mention these other two sneaky points that he's mentioned down here, which is
Starting point is 00:21:41 first half-hour watchable TV 2025. So what he's saying here is, like you watch these regular sitcoms that appear on Netflix or Apple TV every day, where they're kind of like half an hour episodes. You can now have fully AI generated episodes. So what he's implying here is, I'm guessing it's going to be super easy to create these kind of narratives and directed scenes similar to a Hollywood-style VFX studio,
Starting point is 00:22:08 but for nothing, straight from your X account. So he's kind of like not only taking on the gaming sector, but he's taking on the Hollywood sector, all with one single model, which is just insane. And then he says here, first watchable AI movie, 2026. I've got a bunch to say on this, but Josh, please take the mic. You go first. Yeah, so they have like this, this very clear roadmap of everything they want to destroy. It's like, okay, Grock 4 is released today. They have the coding model coming in August. They have the multimodal agent in September. They have video generation,
Starting point is 00:22:37 which is what we're discussing now, in October. And every single one of those is going to sequentially, and like in a way that compounds get better and better and better. I'm curious why you think the AI video generation is so impressive, because we've kind of seen this with V03. That's the first version that we had that had a lot of, like that had audio, really. So the characters that you were making could talk. It had spatial awareness. So if you were to like cut something or interact with something, it would emulate the perceived sound. So what do you think the impact of GROC for doing this?
Starting point is 00:23:07 I mean, presumably better we'll have on the world of entertainment. I think Grockfall is going to nail the AI episodes. AI movies better than anyone else, not necessarily because it's a better model, but because it's going to copy all the best traits of all the other video models, Josh. Okay? And this is not something that is uncommon with other AI model providers, right? We've seen the likes of OpenAI copy some of the coding training methods that Anthropic did with Claude, and now it's become like a really good coding model. We've seen Anthropic do vice versa with Open AI. We've seen Meta Lama do the same thing. So there's a history of, you know, mimicry is the highest form of flattery, blah, blah, blah.
Starting point is 00:23:48 I think Elon has looked at Google's V-O-3 and said, huh, the visuals are really, really accurate. It's really high fidelity, but there's no character consistency. And then he looks over at Mid Journey and their recent model, and he's like, huh, their video aesthetics isn't as good as V-O-3, but their character continuity is really good. Wow, look at that anime episode that I've just watched. So I think he's picking and choosing all these different things, Josh, and he's bunging it into Grok 4. I think that's what he's going to launch. He's not necessarily going to launch a higher aesthetic model than V-O-3,
Starting point is 00:24:23 but he's going to launch a model that has all these, that combines all these different characteristics such that you can go on it and say, hey, I've generated this really cool anime character using Mid-Journey or whatever, and I'm going to copy and paste it into my Grok 4 model on my X account, and I want it to now direct a scene for me, using this one character. That's kind of where I see this going. What do you think? I'm all for it.
Starting point is 00:24:45 I think that's great. I think, well, if they're going to have a TV show by next year or the end of this year, it needs to have character continuity. So all of these things that we are lacking right now, it must accomplish in order to have that. So in that sense, yeah, I totally think that's going to happen. And I'm really, really excited because I think XAI has access to a lot of visual data that the rest of the world doesn't.
Starting point is 00:25:07 And I'm not sure how valuable it is, but in the sense of like the test. the network, I'm sure that data is available for training, which is a lot of real world data. They have a lot of factories. They have a lot of robots. They just have a lot of this weird real world data that is kind of proprietary to them. And I'm hopeful we'll make a difference in in understanding the world. I think that's, yeah, it's going to be interesting. We'll see. There also is one other thing that I wanted to mention before we wrap up, which I think is notable. And it's what they're offering, because they're not just offering grok four, right? There's a, there's another model here. It's called Grog 4 Heavy. And Grog4 Heavy is really impressive because Grog4 Heavy doesn't just rely on a
Starting point is 00:25:48 single model. It relies on a series of agents that are kind of working together to give you the antel. Yeah. So multi-agent, multi-modality, multi-everything. It is really impressive. It takes a ton of compute actually. So the cost of Grock Heavy is very expensive. It's what, I think $300 a month, $3,000 a year. So we're talking about a good amount. this is probably the most expensive subscription that exists in an AI model right now. But the outcome is the best in the world. And when we showed those benchmarks a little bit earlier, it shows that GROC heavy, when it has multi-agent models,
Starting point is 00:26:26 will produce the single best answer in the world. So if you're doing research, if you're doing any hard problem solving, this will solve that. And the way it works is it basically takes a version of GROC4, it clones itself into a series of these agents, and they all search for the answer to the same question that you asked. And then what they do is after they've come to a conclusion, they look at each other and they compare notes.
Starting point is 00:26:48 And then they form consensus on what the best answer is and then push that best answer forward. So what you'll oftentimes find if you're using a language model is that you'll get a slightly different answer every time you ask a question. So you could ask the same prompt and you'll get a different answer twice. And sometimes it'll be better than the other. And what this does is it provides the redundancy to guarantee that each answer is as close to the best answer as possible.
Starting point is 00:27:11 And that was super interesting to me. So I don't have the Grot-Cheavy account. We're not paying $300 yet a month, but we might have to try this out for a demo because I'm really fascinated at how... Dude, we're going to end up paying our entire rent on AI models. That's like, I'm paying like, I think, what, 200 bucks on Open AI's like premium tier plan or whatever it is
Starting point is 00:27:33 and it gives me access to all their cool features, their video models and agent thing. GROC's now, like, GROC heavy, you just said it was 300 bucks. That's insane. Okay, so my take on this is there's been a few experiments that were talked about recently, and I say the word experiment because that's literally what they were, to see how these different models would interact with each other on real-life scenarios. So we spoke about it in, I don't know, like five episodes ago.
Starting point is 00:28:00 One research group which put Anthropics Claude model, Open Airs model, Grok, all in a room and said, hey, I want you to raise money for charity. Go, figure it out. You're going to have access to any tool that you want. And what was funny about that little segment that we did was it described how some models were lazy, some models were super practical, and some models worked really well together. And on that last point, the models that worked really well together often gave a way better. I'm not talking about marginal.
Starting point is 00:28:31 I'm talking about a way better response and output to the original query. They raised way more money for charity. They were way more entertaining and they were way more strategic. And most importantly, they would call each other out for the mistakes that they would make. Right. So all these traits were specific to agents that work together or models that work together. That's why Grock Heavy is going to win. They've seen that pattern happen, Josh.
Starting point is 00:28:56 So imagine you don't just work with one singular terminal saying, hey, can you figure out this research problem for me? It takes a research problem. and in the back end speaks to million replications of that exact model, which runs off and does one part of the query, which runs off and does research on another part. It comes back with answers. You have an orchestrator agent which evaluates the answers and responses, and all of that happens in milliseconds, or seconds, rather,
Starting point is 00:29:24 and gives you the best answer that you could have possibly get given that would have taken you days or months to figure out. Just insane. It's amazing. And it's funny you use that example. just shared a post with you, if you wouldn't mind pulling it up. Yeah. And it is an example that they used from the presentation last night, which was using AI to make money. And the example that they used was a vending machine. And they showed the
Starting point is 00:29:48 benchmarks here where Grock, when tasked with the problem of solving, how can I make money with vending machines? They rolled this out virtually and it actually made a lot of money. They sold 4,569 of these units. And more than double. That's. More than triple. More than triple the second, which is Claude Opus 4. So that begs the question, like, I mean, you look at the net worth over time and it's much higher than the other models. Hang on a second, mate. This is a crazy chart.
Starting point is 00:30:19 Is that cool? What is that? That's insane. Yeah. So there's this world in which, like, hey, it's now smart enough where I could actually conduct business on your behalf and kind of ideate and apply these ideas to the real world to generate money. It did really good. And you could see where the human falls in this. It's pretty disappointing.
Starting point is 00:30:39 So the net worth of a human is $844. The next up is clawed at just over $2,000. And then we have GROC at $4,700. GROC sold $4,500 of these units, while a human sold $344. So in this particular example, GROC4 is already an order of magnitude plus better than a human S. selling vending machines, at least.
Starting point is 00:31:02 That's our benchmark. So it's just another example of how these things are just getting more aware. They have more context. They have more capability. And again, because of the reinforcement training that we talked about earlier in the show, they just have the practical knowledge to apply these ideas to the real world. And I think that's kind of what you're seeing highlighted in this chart. It's like, damn, it's pretty good.
Starting point is 00:31:21 Like, it's doing things in the real world and it's making a difference. All right, Josh, I want to get back to the predictions that we nailed because I just remember that you made a banging one, which was, couldn't be. be more on point. It's coming to Tesla. Let's go. This is so exciting. Yeah. So yesterday we mentioned like, hey, I'd really love to see GROC in a Tesla. I did cheat a little bit because there's an account that I follow that shares the change logs within the apps. And it showed there were some mentions of GROC last week. There was no guarantee that it was going to be announced. And then Elon just this morning posted, GROC is coming to Tesla vehicles very soon next week at the latest. This is
Starting point is 00:31:58 very exciting. I am very hopeful that it has. Yeah. I'm very hopeful that it has the thing. that we mentioned yesterday, which is multi-modality awareness. It can read from the cameras. It can hear you from the microphone. You could have a conversation with it. You could talk about things that you're seeing. It has access to your GPS and navigational data. So it can kind of interact with you, perhaps as you're driving around, give you a tour of a neighborhood. It could tell you of interesting places nearby. It could just converse with you about whatever you'd like. It can teach you things. It can entertain you by telling stories. It can just, you have this AI superpower assistant now inside of these cars. And I think that's a really fun application of it.
Starting point is 00:32:32 particularly when you think about robotaxies, because if you're getting into a robotaxie, you have this screen, which is a fun entertainment system, and you could watch like pre-created content, you could go on YouTube, you can go on Netflix, but now you also have this superpowered assistant inside that you can kind of converse with about anything. And the idea, I would assume, is if people aren't familiar, when you get into Tesla, even if it's another person's Tesla, you have a profile on your account, and that profile will automatically sync to the car when you get in it. So it will automatically adjust the C, it'll log you into the correct accounts, it will change your temperature preferences to the way that you like, and that also probably
Starting point is 00:33:08 gets paired with your GROC memory profile. So it knows all of the memory about you. And when you get into a robo taxi that even if it doesn't belong to you, it still has all the context of your past experiences. And that's going to be really fun, because you just now have this hyper personalized profile that travels around with you everywhere you go when you're in a car. So that was on prediction that that is seemingly coming in the next seven days. I mean, I said this on yesterday's episode, but the multimodality point is a really important one because it means that your AI is going to be everywhere that you go. And that's ultimately where we're heading, right? Like we went from desktop computers to smaller computers called laptops that were portable, but you still
Starting point is 00:33:51 had to open up to these tiny, you know, metal slabs that you can kind of like use, use wherever you are, right, and interact and socialize and all the likes, but it's still clunky. You know, I need to pick it up. I need to open up apps and stuff. And then AI just kind of like spun, blown all of that out the water. But the thing with AI is you need to tell it stuff. You know, you need to tell it about yourself. You need to explain the context of things. And now you have this kind of like all in one AI model that not only sits on your social media feed and sees all the things that you like, sees all the people that you follow, sees all the things that you search, but it's also your personal assistant. It's also your therapist. And now it can
Starting point is 00:34:31 also be your eyes, right? So if it jumps in your Tesla car, it's seeing everything that you see. It might even point out different kinds of shops or historical sites that it knows you might like and say, hey, you should take it right down here and you'll have a more scenic route or whatever that might be. And I'm not going to bother to try and opine on what kinds of new experiences that's going to unlock right now because I need to think more deeply about it, but tremendously excited about what this is going to become. Yeah, it's going to be really cool. I think Grock 4, the announcement we got last night, is very much the starting point. And it kind of laid out the roadmap for where we want to go. So next week, when Tesla gets GROC, it's probably not going to have the multimodality.
Starting point is 00:35:11 In fact, they said they were going to try to roll that out sometime in September. We have the coding model in August. We have the video generation in October. But I think it's safe to say that by the end of this year, this form of GROC, this version of GROC will be feature complete. And that's going to be a very different world than we're living in today. I mean, we saw what happened when V-O-3 came to the market, how quickly video content changed, and how it's now, like, even this morning, I saw this viral video from Popeyes. It was generated by a guess that we had on the show a few weeks ago. And now they're in direct competition with McDonald's, and it was generated for like a couple hundred bucks from a dude in his office. And it's like, that was not even possible to do
Starting point is 00:35:51 two months ago. Like, we're talking a matter of weeks. So, as these tools roll out, as we get do game generation, as we get this new coding model, this new video generation that understands the world and can apply the Unity engine, the Unreal gaming engines that we used to see AAA video games. Yeah. Like, we're going to have some pretty amazing
Starting point is 00:36:09 new stuff to be entertained by to create ourselves. It's going to get really crazy, really quick. And I think that was kind of the idea that Elon opened up the presentation with. It's like, hey, we are very much in the Big Bang.
Starting point is 00:36:24 the Big Bang time of the intelligence boom. And we are like very, very early stages. And to go back to the chart that we started with, the rate of acceleration, the velocity at which these things get better is so fast. And if you imagine, I mean, yeah, here's the chart. If you imagine we were at GROC 2 less than 12 months ago. GROC 2 by today, like you couldn't even pay me to use it. It's so bad.
Starting point is 00:36:48 So if we continue that rate of acceleration, the rate of velocity, and just extrapolated out 12 more months, I mean, the world's totally different place. Because Groch 4 will then be this kind of dumb model that's stupid that like probably fits on your phone. But even though it does, you don't even want it anymore. And it's like, it's getting really good. And this is where we start to get those second order effects occurring where it's like, hey, you start to get novel technology breakthroughs, novel physics breakthroughs, novel bioengineering breakthroughs. And all of those things are seemingly coming at a rate that I think is going to be surprising to a lot of people.
Starting point is 00:37:21 I mean, I couldn't agree more. I think the general theme of these AI developments over the last two years that I've been kind of like heads down studying this, Josh, is we are in the Wild West, and every time I think one model has ended all the others, like it'll never be beaten, i.e. my own words, literally within the week.
Starting point is 00:37:43 And so, and I thought that we'd reach that point about two months ago, where they would talk about how the new compute clusters would require billions and billions, potentially trillions of dollars of money, so they had to raise funds, where we were running out of data. Do you remember that, Josh?
Starting point is 00:37:59 And everyone was like, ah, these models are all going to reach a certain level of intelligence and it's all going to become a commodity. And I just keep eating my words. Like, the graph just keeps going up. And I'm waiting for it to stop. I'm waiting for Nvidia's market cap to flatten.
Starting point is 00:38:13 It's just not. It's worth more than the UK's entire economy right now. It's above $4 trillion, which is 14% more. than the British economy, my home where I'm from, which is just a first world country, an insane thing to even say on this show. So the general theme is,
Starting point is 00:38:31 I just need to keep setting the bar higher, basically. I think that's the trend, is if you're listening to this and you are following AI closely and you're here for the day to day, expect things to continue to move faster. And as fast as they are today, again, you need to re-index this.
Starting point is 00:38:46 They're going to move faster. So for the people who are still listening, thank you. We very much appreciate you sticking with us. being here for the ride. There's a lot of stuff to look forward to, and I just kind of want to take a second to highlight what we are going to be talking about, and that's coming down the pipeline soon. So we have chat GPT5, which is confirmed. That's coming this summer. That is going to probably beat Grock 4.4. It's probably going to be better. It is going to be incredible. Then next week,
Starting point is 00:39:09 Open AI is actually open sourcing a model. So we have that to look forward to. New Claude has been spotted, Cloud 4.5, possibly. It's been out in the wild. It's been rumored. And then we have Gemini 3.0, which has also been spotted in the wild. And these are a lot of really big model. So I think for the past few months, we had this breather where it was like, okay, nothing really has come in terms of frontier models. We've been using 03 for like quite some time now. I think that's all about to change in the next few months. So if you're listening to us, buckle up. There's a lot of acceleration, a lot of AI, a lot of intelligence to come. Again, thank you for the comments yesterday about sharing preferences. Some people liked the daily show. Some people didn't. We're just going to
Starting point is 00:39:47 continue to iterate. I think today the episode works perfect. By the afternoon, you should have all the news. So thank you for listening. Thank you for sharing. Thank you just for making it here, rocking with us the whole time. And we will be back soon with another episode. See you guys next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.