Limitless: An AI Podcast - Grok-4 Is Now The Smartest AI Model In The World | Everything You Need To Know
Episode Date: July 10, 2025Grok-4 just leap-frogged every frontier model, topping AGI and coding benchmarks, crushing PhDs across disciplines, and even selling vending machines better than humans, all barely 28 months ...after XAI was born. We unpack the launch event, new voice mode, gaming-first tools, Grok-Heavy’s multi-agent powerhouse, and why Elon’s Tesla integration could make AI your on-the-go co-pilot. Stick around for what this acceleration means as GPT-5, Claude 4.5, and Gemini 3.0 line up next.-----💫 LIMITLESS | SUBSCRIBE & FOLLOWhttps://limitless.bankless.com/https://x.com/LimitlessFT-----TIMESTAMPS00:00 Grok4 Is Now The Smartest AI In The World05:19 How Did They Do It?08:00 Humanities Last Exam13:12 The AGI Test18:50 Grok Gaming23:12 Video Generation26:01 Grok Heavy Is Insane32:02 Grok For Tesla35:39 What's Next------RESOURCESJosh: https://x.com/Josh_KaleEjaaz:https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures
Transcript
Discussion (0)
We have a new top model in town.
A new king has been crowned.
Grockfort is announced.
It is now the smartest model that's ever existed in the history of all time.
According to all the benchmarks that were shared last night, it was pretty amazing.
I stayed up late last night watching the event.
It went, according to Elon time.
It was very late.
I stayed up probably until like a little after one in the morning.
But we have all the notes and it's amazing.
This model is smarter than you.
It is smarter than your PhD friend.
It is smarter than any PhD in the world at any like category that you can
imagine. It's incredible. And one thing that I wanted to highlight before we start this episode is
is just how impressive the rate of acceleration is from the XAI team, because now they're sitting
above Open AI, they're sitting above Claude, they're even sitting above Google. And they haven't
been around for that long. So in this chart, we're showing kind of like each one of these bullet
points as a model that has been released. So you'll notice GROC has, what is that, two, four, six
points. GROC has released six models over the course of the last 24 months compared to Open AI that's
been doing it since well before 2018. We have Anthropic that's released many more models than XAI,
and the rate of acceleration is incredibly impressive. So before we get into exactly how everything
works, what is in this. I want you to just kind of share first impressions because to me,
this is like home run. They did it. No one thought they would do it. They did it. They now hold the
crown for the best model, at least in terms of benchmarks in the world. I'm honestly shocked,
to be honest. I'm a massive fan of Elon, but something about starting a company
28 months ago when you've had all the anthropics and the open AIs in this world,
just kind of hammer and tonguing it for years on end.
I just didn't think it was possible,
but he's not only come through at creating the best generalized model,
so that's feature number one.
It's better than chat GPT,
which I know the viewer and listener listening to this uses on a daily basis.
So you now have a new model,
which is arguably better than the experience that you have on your favorite model, right?
So I'm using GROC4 now more than I use,
chat chbt and it's only been like 11 hours since it got released, right? The number two feature
was something really unexpected, Josh. So for a number of episodes now, we've always heralded
Anthropics model, Claude, as the number one coding model. It's been displaced. It's done. It's
GROC4 now. I hate to say it, but now GROC4 has somehow managed to do what Open AI has done and also
matched the coding level, which is something open AI themselves have failed to do. But I have a third feature
which I'm super pumped about, which is, you know,
some AI model producers like to compete at the same categories.
You know, they like to compare themselves at the same features.
Grok decided to create a completely new feature category,
and that's in gaming.
They announced, and they spent, I think, like 10 minutes in the live stream, Josh,
talking about how Grock 4 is going to be really amazing
at helping you create games.
So think about, like, vibe coding
and how products like cursor were really good for coding up
any kind of generalized app, but it was never specialized onto anything.
GrogFo is specialized for creating games.
So now you can create like a Minecraft level game or a high fidelity racing game
or something as simple as Tic-Tac-Tow or Tetris in a matter of seconds.
And if you remember actually, and we can get into this later,
but this is something that we predicted in yesterday's episode where we were like,
I think GrogFo is going to come out with something gaming related because Elon is such a major gamer.
So super cool to see this.
And then the final thing, which sounds the nerdiest,
but I think is super important to focus on,
is it is smarter, not smarter than just any PhD,
but any PhD in any kind of sector.
So you may have a PhD in science,
specifically like physics or maths,
or you could be a PhD in kind of art and philosophy,
and this new model is now better than that.
And the final feature, which I just remembered,
because there's so many features,
this new model is kind of like topping,
is the video and audio side of things.
Josh, I know you've been playing around with the voice mode quite a bit.
Actually, maybe you want to talk about the video side of things.
Yeah, so some of the stuff is here.
Some of the stuff is coming.
So the game stuff, the coding engine, the video generation, that is coming soon.
So before the end of the year, we'll get this.
It's built on top of the Grock model.
They're kind of iterating.
But in terms of things that they have today, they do have a new advanced voice mode,
and the new advanced voice is excellent.
One of the things that I noticed when I was playing around with it this morning
is not only just the voice sound great,
but the latency between the request and the answer is,
so short. It feels like you're actually conversing with the person. You say something. Wow. It's fit
something back with you. And you could also control the speed at which it replies to you. So the way
you listen to a podcast, maybe at 1.5 times speed, you could actually just change the speed that the AI
speaks back to you. So if you get a little impatient like me, this is a very nice feature. I toggled
it up to like 1.4. We're going to try that, see how it goes. But yeah, the news that they announced
is amazing. So I think people are probably wondering like what exactly makes this so good. Where's
the proof that this is good? How does this all work? How do they accomplish this? I mean, going from
zero to number one in 28 months is no easy feat, especially because GROC 2 has been out. When GROC 2 was
released, it was less than 12 months ago. So the amount of progress they've made over the course
of the last years is pretty incredible. And we have it here on this, this visually you just pulled
up. GROC 4 is smarter than pretty much all grad students at everything. And what was interesting
about GROC 4 is that they did this thing called reinforcement learning training, where they
applied 10 times the amount of compute that they did in the previous model towards reasoning. And reasoning
basically is taking these facts, but applying realistic knowledge to them. So it's like if you could
imagine GROC 3 was a student in school that learned a lot of textbooks, but never actually went out
and got a real job. GROC 4 is the person in the workforce who's applying this knowledge to the real world.
And reinforcement training, it's been debated whether or not it actually works at scale. This, I think,
proves that it is. And basically what happens is you feed it a bunch of problems and you say,
say, hey, this answer is correct or this answer is wrong, and it iterates through that over
and over and over again until it learns how to apply this knowledge to a broad base. So it's incredibly
smart at that. It's something that's pretty novel in terms of AI training. No one's ever
applied this much compute to reasoning. And I think it shows in this model, then that's part of
the reason why it's so smart is because it's been trained on all this data, but then
iterated through all of these questions until it is the brilliant, highly skillful model.
Got it. So if I were to summarize what you just said, Josh, it sounds like, okay, you know the people that spend their entire time in academics, right? They're getting degree after degree. They're getting their master's degree. They're getting their PhD degree. Now, that's a lot of intelligence and knowledge that they're absorbing in that whatever five to 10 year period that they're studying, right? But it's all kind of theoretical to an extent. You know, and there's certain disciplines where you go out, you do an internship, you get some practical work experience. But it's not really really,
real life. You're not really on the job, right? You're not really at the edge. And what you're saying
here is pretty much the equivalent amount of knowledge that is gained from the academics and
studying and kind of like school period is equaled with the real-time work experience that someone has,
right, for a model. And that's really where this model kind of like separates itself from all the other
models that are out there. It has real-world practical knowledge. It understands all the different
terms that you're referencing maybe in social media culture or any kind of work terms that you're
mentioning that you're currently experiencing in your job, it just kind of overall gets you better
and it understands where you are at the edge of your learning and what you're trying to achieve
at your task. Is that, is that right, Josh? Is that, is that fair? Yeah, it's applied knowledge.
You could imagine it. Now it's like, imagine Grock was, was a million people that learned in college
and then went out into the workforce. And it's accumulated millions and millions of years worth
of work experience. And it's now applying that to that.
the answers that it gives you. So yeah, that's the benefit that they found from this.
Actually, another thing on this topic, Josh, was actually a concept I kept on seeing,
which was Humanity's last exam and how GROC 4 had basically achieved the highest score. It was
actually almost double what the previous model had achieved. And I kind of want to set the
context as to why this is so cool. Humanity's last exam is basically AI researchers bet on
AI models getting to human intelligence. That means AGI level, as smartly,
as humans or even smarter than us. So as you can imagine, it's a really, really tough exam. And
it's hard for AI models that have currently existed today to crack. But Grokfo kind of like came in
and they were kind of expecting it for it to surpass the previous score, which I think was about
24.9% achieved by Open AI model. And they were kind of like, yeah, it'll probably hit like
30 or something. It almost doubled it. It's almost at 50%. And the way I kind of look at that is that like
if it's improving at such a quick rate,
how long has this company been around?
28 months?
Where is it going to be in the next 28 months?
Because this is like an exponential curve.
We just looked at a graph that you showed us
where after six models,
Open AI, sorry, Croc has already
kind of reached frontier level model.
It's beaten every single benchmark.
I can't help but think that this exam
is going to be blown out of the water
in a matter of, I don't know,
a couple of years at this point,
which is shocking for me
because I assume this AGI thing
is still a number of years out, despite, you know, all these papers opining about it being,
you know, ready in 2027. Do you have any takes on this, Josh? Like, I'm freaked out about this.
Yeah. Well, again, we're getting to this point where, like, is this AGI? It depends on the
definition. But what we're seeing happening is, is, I mean, we have Humanities last exam,
which it reached 40-something percent in. But there are a lot of other benchmark tests that are actually
fully saturated, meaning it scored 100 percent on these benchmarks. I mean, there's actually no
room for improvement in any of these. And I think that was something that was interesting to me
is like, okay, how are we going to continue to measure the success, measure the improvement of
these models in an objective way? Because we kind of are. And we have this, yeah, we have the
which is like, okay, first of all, number one across the board. So congratulations. But also,
we have a 88.9% and 98.4%, 90%. These are like really, really high numbers where we're,
We're probably just one more iteration away from just fully saturating all of them.
And that was what was interesting to me is like, we really need to re-measure or re-index how we even
classify these models because we're very much running out of time.
And then I guess the AGI definition, we've kind of said this in the last few episodes,
but it's, I mean, I don't really know.
Like, are we there?
Is this it?
Because if you asked someone a few years ago, like, sure, this would totally be AGI.
But today, it's like, eh, probably not.
It doesn't feel like it.
But, man, it's really smart.
It's just about anything a human can do.
Yeah, it's pretty insane. One thing I actually wanted to point out in this tweet, Josh, is it has something called a 256,000 context. Now, I kind of want to, pun intended, set that into context on this show, which is that that's like two novels worth of information that you can just chuck into a single prompt with Grot 4. Now, think about what kind of practical context you can put that into. That means you can put a bunch of research papers of which you have.
no clue or understand nothing about and ask Grock to summarize it and relate to you in a way
that you can understand. That is the difference between typing out simple algebraic formula and
kind of like learning how that builds into a massive scientific problem to just copy-pacing
the entire thing. And I think something like that is just super cool. But it's not just the
context, it's how much it costs as well. If you look at this, it's it's $3 per million input,
$15 per million output for tokens. That is for context. For context,
say it's just incredibly cheap for what this model is achieving and for the benchmarks that it just
broke. So I just thought that was super cool to point out. And another part also in terms of cost
is this is now a free product. You are actually able to use GROC4 right now, even if you don't
pay for an account, you can go and actually access the GROC4 model. So I'd encourage you if you're
listening, even if you don't have an account with GROC, try it out. It is amazing. It is really smart.
And one of the things that also stood out is when comparing it to O3, which I use a lot,
or comparing it to Gemini 2.5, which is Google's offering, is that time to the first token
feels significantly faster. So with 03, a lot of the complaints that I have and that other people
have is it just kind of takes a little bit to get to you, like to get where you want to go.
You ask a question, it thinks for a little bit, sometimes it'll think for a minute, sometimes it'll
think for two minutes. Grogfor really spits out answers fairly quickly. So I think if you're building an
app experience, if you are using this as a day-to-day model, just trying to query things against
that the timed token, that times the first token is a really big deal, and it's noticeably different
in this new model. And then there's another benchmark. You have a pulled up here, which I really
want you to introduce and share because there was one line in this in particular that kind of
freaked me out. And I'd love for you. Just walk us through what's what's happening here on screen.
All right. Okay. So we have Greg Kamrat, I think that's how you pronounce his name,
who is basically the guy that manages this benchmark called Arc AGI. For simplistic terms,
this is the AI AGI benchmark.
So it's kind of measuring how close these AI models are
to artificial general intelligence,
which is like, you know,
the precipice of where we want to get to
with this entire AI trend.
And he says,
we got a call from XAI 24 hours ago.
And he puts in quotation marks,
we want to test GROC for on Arc AGI.
We heard the rumors.
We knew it would be good.
We didn't know it would become the number one public bottle on Arc AGI
though.
Here's the testing story.
And then he goes on to explain how he spoke to the XAI team.
He kind of explained the rules and he said, hey, guys, like, we're going to set the rules here.
You can't manipulate it in any way.
And the reason why I say that is a lot of AI model providers have been rumored to manipulate score results to kind of make them seem like the models are much better than they are.
But here we have a kind of authentic case of the model team coming to the benchmark provider and saying, hey, we're good to go.
Throw us anything you've got.
and let's see how well our model does.
We back it.
We know it's going to do very well.
And so he goes, exactly.
And so he goes, they were on board, so we got started.
And he goes, there was some initial kind of errors in terms of like setting it up.
But once it got going, it absolutely blew it out the water.
And he goes, the previous top score was around 8% set by Opus 4.
And he says below 10% is kind of noisy.
And then he goes here, Josh, take the sentence.
This is the one that you're right.
So getting 15.9% breaks through that noise barrier.
GROC4 is showing non-zero levels of fluid intelligence.
And if you're not familiar with what fluid intelligence is,
it's basically it's the capacity to reason abstractly.
So like, it's kind of the ability to solve novel problems
and adapt new situations without relying on prior knowledge or experience.
So this was the most interesting thing to me where I'm like, hmm, okay,
this is the first time where it's actually able to solve novel problems,
which gets me to a point.
that Elon actually mentioned later in the show, or later in the presentation, which was like,
hey, we are actually really close to solving unique technical research unlocks through AGI.
And he said, I think the first new technology unlocks that will be learned through the GROC model
will come next year. And then the first new physics breakthroughs will come the following year.
So I think this is kind of the first step towards what Sam Altman often alludes to in the world
of bioengineering, where he frequently says the thing he's most excited about is,
is new bioengineering breakthroughs that are generated through an AI model.
Well, GROC is now a contender in this as well, where I think we can very well expect to
see genuinely novel technology breakthroughs and physics breakthroughs over the next 24 months.
And particularly at this rate of acceleration that they have, that seems really exciting
to me.
And that was the thing that stood out of this whole thing, is like, okay, we're actually
at the point where we're right on the cusp of novel
unlocks due to these large language models, which was really cool. And then in addition to all of this,
we had our episode yesterday where we shared our predictions. And I'm pretty happy with our predictions.
I think we did pretty well. I don't want to say we fully knocked it out of the park,
but we got like almost everything we said came true, which is somehow saying, listen,
if you're listening here, you're in a right place. Two out of three or three out of four,
I would say. So not bad. And some of the predictions were kind of out there. Some of them were
technically moonshot predictions and we kind of nailed it. So I'm going to start with one of my
Predictions, which was GROC4 was going to excel at gaming.
So not just cursor or vibe coding for any general application, but GrogFOR was specifically
going to focus on letting anyone create the funest, most engaging games.
And from there, sprout some kind of like an app store plethora for gaming, where anyone
and everyone can share games, interact with each other.
And the reason why I said that was nothing novel, but like, Elon was a massive gamer.
That was literally my thesis.
We were saying on yesterday's episode,
he is the number one ranked playing
and I think Dota or whatever the game is,
which is a highly strategic,
pretty intensive game.
And it just kind of like was well attuned
with his characteristics.
I was like, I bet you he's going to make a model
that is super good at gaming.
And in this post that I have pulled up here,
that's pretty much what they spent 10 minutes
on the live stream talking about.
Grok will develop and play 3D games.
So not just, we're not talking about Tetris here.
We're not talking about Tic-Tac-Tow.
We're talking about real-life 3D games that you and I grew up loving that kids nowadays love,
Minecraft-type Roblox-type games.
You can now spin up in a matter of seconds or minutes.
Not just that, but Grock will have good taste for fun games,
meaning it'll understand what you're trying to pitch it instead of like giving you
some kind of like black and white game with boxes or whatever.
It kind of like senses your taste.
It senses your vibe.
It says that it'll have excellent video understanding, improved tool use,
a gaming foundational model, that's super exciting because that's something that we haven't really seen
being pitched by the major model makers. You know, we had this like niche indie gaming companies
that are like, hey, we're integrating AI. We've had this popular gaming coding engine called Unity
kind of spin up their own thing. But we haven't really seen the big boys kind of lean into gaming.
X is doing that now. Grock 4 is doing that. This isn't out yet. Do we know when this is coming out,
Josh? Yeah. So, I mean, Elon's prediction, the first real AI video game,
in 2026. I want to add some commentary to the video game thing because I think it's actually
more impressive than what people realize. When you're designing and developing games, the actual
code to generate the game is not the hardest part. You could kind of ask a game engine or
an AI model to generate you a copy of Flappy Bird, generate you a racing game, generate you
whatever generic game you want, even a first-person shooter. And there were some examples that people
used a first-person shooters. The difficult part of building a good game is the environment
around you. It's nailing the physics. It's nailing the textures. It's nailing the actual
design of the visual elements because by all means, games are reinventing the physical world
in a digital space. And it's really difficult to emulate the physics, the design, the lighting,
the texture, everything that kind of makes base reality look real. So one of the interesting things
that they're doing with this new gaming model, whenever it gets released, whenever the capabilities
really come into full form, is they are going to allow it to work together with existing game
engines like Unity. And EJas, we actually talked about this a week or two ago where you asked the
difference between like a V-O-3 versus a Unity engine in terms of generating content. And V-O-3 is very
much trained on the perception of physics, meaning it's seen a lot of videos and it can kind
of guess how physics work based on its perception. But a game engine like Unity, it's actually
hard-coded with a physics engine, with a lighting engine, with all the things that make games look real,
because it's been taught how to use,
like how to recreate this reality.
And you kind of see it with the new GTA trailers.
The world's now looking credible.
So what Grock is doing is it's pairing these tools together.
So it's pairing the generative part of it
with the like hard-coded super high quality part of it.
And those two things when combined together
can make for some really amazing experiences
because it takes the hardest part of gaming out of the equation,
which is designing the like world around you.
And it just gives this model a real life.
physics engine and that's going to be freaking awesome. It's a really strategic move from Elon and the
XAI team as well, isn't it? So from a infrastructure level, what you're basically saying is
it's not trying to own the entire stack. It's just trying to own the brain and it's welcome to
inviting or integrating other tools like Unity or any other coding generators that are really
good at nailing the physics as you say within its tool stack, right? It seems like its goal is just to make
it easiest to make the coolest games. And I can't help but think that, you know, Elon's original
vision when he kind of renamed Twitter to X was, I wanted to be the Everything app. And we said
this on yesterday's episode. The Everything app right now is WeChat that operates in Asia where people
can do all their finances, they socialize, they play a lot of games. And we haven't really had that
app in the West. And it seems like X might end up being that app because I'm convinced now that the
next step is surfacing these games to anyone and everyone. And so you can kind of like go on to
an old school mini clip or Apple App Store like experience and browse the top games that are trending
at that moment and interact with them in real time, maybe even with your friends as well. But Josh,
I also want to mention these other two sneaky points that he's mentioned down here, which is
first half-hour watchable TV 2025. So what he's saying here is, like you watch these regular sitcoms
that appear on Netflix or Apple TV every day,
where they're kind of like half an hour episodes.
You can now have fully AI generated episodes.
So what he's implying here is,
I'm guessing it's going to be super easy
to create these kind of narratives and directed scenes
similar to a Hollywood-style VFX studio,
but for nothing, straight from your X account.
So he's kind of like not only taking on the gaming sector,
but he's taking on the Hollywood sector,
all with one single model,
which is just insane. And then he says here, first watchable AI movie, 2026. I've got a bunch to say on this,
but Josh, please take the mic. You go first. Yeah, so they have like this, this very clear roadmap
of everything they want to destroy. It's like, okay, Grock 4 is released today. They have the coding model
coming in August. They have the multimodal agent in September. They have video generation,
which is what we're discussing now, in October. And every single one of those is going to
sequentially, and like in a way that compounds get better and better and better.
I'm curious why you think the AI video generation is so impressive, because we've kind of seen this with V03.
That's the first version that we had that had a lot of, like that had audio, really.
So the characters that you were making could talk.
It had spatial awareness.
So if you were to like cut something or interact with something, it would emulate the perceived sound.
So what do you think the impact of GROC for doing this?
I mean, presumably better we'll have on the world of entertainment.
I think Grockfall is going to nail the AI episodes.
AI movies better than anyone else, not necessarily because it's a better model, but because it's
going to copy all the best traits of all the other video models, Josh. Okay? And this is not something
that is uncommon with other AI model providers, right? We've seen the likes of OpenAI copy some of
the coding training methods that Anthropic did with Claude, and now it's become like a really good
coding model. We've seen Anthropic do vice versa with Open AI. We've seen Meta Lama do the same thing.
So there's a history of, you know, mimicry is the highest form of flattery, blah, blah, blah.
I think Elon has looked at Google's V-O-3 and said, huh, the visuals are really, really accurate.
It's really high fidelity, but there's no character consistency.
And then he looks over at Mid Journey and their recent model, and he's like, huh, their video aesthetics isn't as good as V-O-3, but their character continuity is really good.
Wow, look at that anime episode that I've just watched.
So I think he's picking and choosing all these different things, Josh,
and he's bunging it into Grok 4.
I think that's what he's going to launch.
He's not necessarily going to launch a higher aesthetic model than V-O-3,
but he's going to launch a model that has all these,
that combines all these different characteristics such that you can go on it
and say, hey, I've generated this really cool anime character using Mid-Journey or whatever,
and I'm going to copy and paste it into my Grok 4 model on my X account,
and I want it to now direct a scene for me, using this one character.
That's kind of where I see this going.
What do you think?
I'm all for it.
I think that's great.
I think, well, if they're going to have a TV show by next year or the end of this year, it needs
to have character continuity.
So all of these things that we are lacking right now, it must accomplish in order to have
that.
So in that sense, yeah, I totally think that's going to happen.
And I'm really, really excited because I think XAI has access to a lot of visual data that
the rest of the world doesn't.
And I'm not sure how valuable it is, but in the sense of like the test.
the network, I'm sure that data is available for training, which is a lot of real world data. They have a lot of
factories. They have a lot of robots. They just have a lot of this weird real world data that is
kind of proprietary to them. And I'm hopeful we'll make a difference in in understanding the world.
I think that's, yeah, it's going to be interesting. We'll see. There also is one other thing that I
wanted to mention before we wrap up, which I think is notable. And it's what they're offering,
because they're not just offering grok four, right? There's a, there's another model here. It's
called Grog 4 Heavy. And Grog4 Heavy is really impressive because Grog4 Heavy doesn't just rely on a
single model. It relies on a series of agents that are kind of working together to give you the
antel. Yeah. So multi-agent, multi-modality, multi-everything. It is really impressive. It takes a ton of
compute actually. So the cost of Grock Heavy is very expensive. It's what, I think $300 a month,
$3,000 a year. So we're talking about a good amount.
this is probably the most expensive subscription that exists in an AI model right now.
But the outcome is the best in the world.
And when we showed those benchmarks a little bit earlier,
it shows that GROC heavy, when it has multi-agent models,
will produce the single best answer in the world.
So if you're doing research, if you're doing any hard problem solving,
this will solve that.
And the way it works is it basically takes a version of GROC4,
it clones itself into a series of these agents,
and they all search for the answer to the same question that you asked.
And then what they do is after they've come to a conclusion,
they look at each other and they compare notes.
And then they form consensus on what the best answer is
and then push that best answer forward.
So what you'll oftentimes find if you're using a language model
is that you'll get a slightly different answer every time you ask a question.
So you could ask the same prompt and you'll get a different answer twice.
And sometimes it'll be better than the other.
And what this does is it provides the redundancy to guarantee that each answer
is as close to the best answer as possible.
And that was super interesting to me.
So I don't have the Grot-Cheavy account.
We're not paying $300 yet a month,
but we might have to try this out for a demo
because I'm really fascinated at how...
Dude, we're going to end up paying our entire rent on AI models.
That's like, I'm paying like, I think, what, 200 bucks
on Open AI's like premium tier plan or whatever it is
and it gives me access to all their cool features,
their video models and agent thing.
GROC's now, like, GROC heavy, you just said it was 300 bucks.
That's insane.
Okay, so my take on this is there's been a few experiments that were talked about recently,
and I say the word experiment because that's literally what they were,
to see how these different models would interact with each other on real-life scenarios.
So we spoke about it in, I don't know, like five episodes ago.
One research group which put Anthropics Claude model, Open Airs model,
Grok, all in a room and said,
hey, I want you to raise money for charity.
Go, figure it out.
You're going to have access to any tool that you want.
And what was funny about that little segment that we did was it described how some models were lazy, some models were super practical, and some models worked really well together.
And on that last point, the models that worked really well together often gave a way better.
I'm not talking about marginal.
I'm talking about a way better response and output to the original query.
They raised way more money for charity.
They were way more entertaining and they were way more strategic.
And most importantly, they would call each other out for the mistakes that they would make.
Right.
So all these traits were specific to agents that work together or models that work together.
That's why Grock Heavy is going to win.
They've seen that pattern happen, Josh.
So imagine you don't just work with one singular terminal saying, hey, can you figure out this research problem for me?
It takes a research problem.
and in the back end speaks to million replications of that exact model,
which runs off and does one part of the query,
which runs off and does research on another part.
It comes back with answers.
You have an orchestrator agent which evaluates the answers and responses,
and all of that happens in milliseconds, or seconds, rather,
and gives you the best answer that you could have possibly get given
that would have taken you days or months to figure out.
Just insane.
It's amazing.
And it's funny you use that example.
just shared a post with you, if you wouldn't mind pulling it up.
Yeah. And it is an example that they used from the presentation last night, which was
using AI to make money. And the example that they used was a vending machine. And they showed the
benchmarks here where Grock, when tasked with the problem of solving, how can I make money
with vending machines? They rolled this out virtually and it actually made a lot of money.
They sold 4,569 of these units. And more than double. That's.
More than triple.
More than triple the second, which is Claude Opus 4.
So that begs the question, like, I mean, you look at the net worth over time and it's much higher than the other models.
Hang on a second, mate.
This is a crazy chart.
Is that cool?
What is that?
That's insane.
Yeah.
So there's this world in which, like, hey, it's now smart enough where I could actually conduct business on your behalf and kind of ideate and apply these ideas to the real world to generate money.
It did really good.
And you could see where the human falls in this.
It's pretty disappointing.
So the net worth of a human is $844.
The next up is clawed at just over $2,000.
And then we have GROC at $4,700.
GROC sold $4,500 of these units,
while a human sold $344.
So in this particular example,
GROC4 is already an order of magnitude plus better than a human S.
selling vending machines, at least.
That's our benchmark.
So it's just another example of how these things are just getting more aware.
They have more context.
They have more capability.
And again, because of the reinforcement training that we talked about earlier in the show,
they just have the practical knowledge to apply these ideas to the real world.
And I think that's kind of what you're seeing highlighted in this chart.
It's like, damn, it's pretty good.
Like, it's doing things in the real world and it's making a difference.
All right, Josh, I want to get back to the predictions that we nailed because I just
remember that you made a banging one, which was, couldn't be.
be more on point. It's coming to Tesla. Let's go. This is so exciting. Yeah. So yesterday we mentioned
like, hey, I'd really love to see GROC in a Tesla. I did cheat a little bit because there's an
account that I follow that shares the change logs within the apps. And it showed there were some
mentions of GROC last week. There was no guarantee that it was going to be announced. And then Elon
just this morning posted, GROC is coming to Tesla vehicles very soon next week at the latest. This is
very exciting. I am very hopeful that it has. Yeah. I'm very hopeful that it has the thing.
that we mentioned yesterday, which is multi-modality awareness. It can read from the cameras. It
can hear you from the microphone. You could have a conversation with it. You could talk about
things that you're seeing. It has access to your GPS and navigational data. So it can kind of interact
with you, perhaps as you're driving around, give you a tour of a neighborhood. It could tell
you of interesting places nearby. It could just converse with you about whatever you'd like.
It can teach you things. It can entertain you by telling stories. It can just, you have this AI
superpower assistant now inside of these cars. And I think that's a really fun application of it.
particularly when you think about robotaxies, because if you're getting into a robotaxie,
you have this screen, which is a fun entertainment system, and you could watch like pre-created
content, you could go on YouTube, you can go on Netflix, but now you also have this superpowered
assistant inside that you can kind of converse with about anything. And the idea, I would assume,
is if people aren't familiar, when you get into Tesla, even if it's another person's Tesla,
you have a profile on your account, and that profile will automatically sync to the car when you
get in it. So it will automatically adjust the C, it'll log you into the correct accounts,
it will change your temperature preferences to the way that you like, and that also probably
gets paired with your GROC memory profile. So it knows all of the memory about you. And when you
get into a robo taxi that even if it doesn't belong to you, it still has all the context of
your past experiences. And that's going to be really fun, because you just now have this hyper
personalized profile that travels around with you everywhere you go when you're in a car. So that was
on prediction that that is seemingly coming in the next seven days. I mean, I said this on yesterday's
episode, but the multimodality point is a really important one because it means that your AI is
going to be everywhere that you go. And that's ultimately where we're heading, right? Like we went
from desktop computers to smaller computers called laptops that were portable, but you still
had to open up to these tiny, you know, metal slabs that you can kind of like use, use
wherever you are, right, and interact and socialize and all the likes, but it's still clunky.
You know, I need to pick it up. I need to open up apps and stuff. And then AI just kind of like
spun, blown all of that out the water. But the thing with AI is you need to tell it stuff.
You know, you need to tell it about yourself. You need to explain the context of things.
And now you have this kind of like all in one AI model that not only sits on your social media
feed and sees all the things that you like, sees all the people that you follow, sees all the
things that you search, but it's also your personal assistant. It's also your therapist. And now it can
also be your eyes, right? So if it jumps in your Tesla car, it's seeing everything that you see.
It might even point out different kinds of shops or historical sites that it knows you might
like and say, hey, you should take it right down here and you'll have a more scenic route or whatever
that might be. And I'm not going to bother to try and opine on what kinds of new experiences that's
going to unlock right now because I need to think more deeply about it, but tremendously excited
about what this is going to become. Yeah, it's going to be really cool. I think Grock 4, the announcement
we got last night, is very much the starting point. And it kind of laid out the roadmap for where we
want to go. So next week, when Tesla gets GROC, it's probably not going to have the multimodality.
In fact, they said they were going to try to roll that out sometime in September. We have the coding
model in August. We have the video generation in October. But I think it's safe to say that by the
end of this year, this form of GROC, this version of GROC will be feature complete. And that's going to be a
very different world than we're living in today. I mean, we saw what happened when V-O-3 came to the market,
how quickly video content changed, and how it's now, like, even this morning, I saw this viral
video from Popeyes. It was generated by a guess that we had on the show a few weeks ago.
And now they're in direct competition with McDonald's, and it was generated for like a couple
hundred bucks from a dude in his office. And it's like, that was not even possible to do
two months ago. Like, we're talking a matter of weeks. So,
as these tools roll out, as we get
do game generation, as we get this new
coding model, this new video generation that
understands the world and can apply
the Unity engine, the Unreal gaming
engines that we used to see AAA video games.
Yeah. Like, we're going to have some pretty amazing
new stuff
to be entertained by to
create ourselves. It's going to get
really crazy, really quick. And I think that was
kind of the idea that Elon
opened up the presentation with.
It's like, hey, we are very much in the
Big Bang.
the Big Bang time of the intelligence boom.
And we are like very, very early stages.
And to go back to the chart that we started with,
the rate of acceleration, the velocity at which these things get better is so fast.
And if you imagine, I mean, yeah, here's the chart.
If you imagine we were at GROC 2 less than 12 months ago.
GROC 2 by today, like you couldn't even pay me to use it.
It's so bad.
So if we continue that rate of acceleration, the rate of velocity,
and just extrapolated out 12 more months,
I mean, the world's totally different place.
Because Groch 4 will then be this kind of dumb model that's stupid that like probably fits on your phone.
But even though it does, you don't even want it anymore.
And it's like, it's getting really good.
And this is where we start to get those second order effects occurring where it's like, hey, you start to get novel technology breakthroughs, novel physics breakthroughs, novel bioengineering breakthroughs.
And all of those things are seemingly coming at a rate that I think is going to be surprising to a lot of people.
I mean, I couldn't agree more.
I think the general theme of these AI developments
over the last two years
that I've been kind of like heads down studying this,
Josh, is we are in the Wild West,
and every time I think one model has ended all the others,
like it'll never be beaten,
i.e. my own words, literally within the week.
And so, and I thought that we'd reach that point
about two months ago,
where they would talk about how the new compute clusters
would require billions and billions,
potentially trillions of dollars of money,
so they had to raise funds,
where we were running out of data.
Do you remember that, Josh?
And everyone was like,
ah, these models are all going to reach
a certain level of intelligence
and it's all going to become a commodity.
And I just keep eating my words.
Like, the graph just keeps going up.
And I'm waiting for it to stop.
I'm waiting for Nvidia's market cap to flatten.
It's just not.
It's worth more than the UK's entire economy right now.
It's above $4 trillion,
which is 14% more.
than the British economy, my home where I'm from,
which is just a first world country,
an insane thing to even say on this show.
So the general theme is,
I just need to keep setting the bar higher, basically.
I think that's the trend,
is if you're listening to this
and you are following AI closely
and you're here for the day to day,
expect things to continue to move faster.
And as fast as they are today,
again, you need to re-index this.
They're going to move faster.
So for the people who are still listening,
thank you.
We very much appreciate you sticking with us.
being here for the ride. There's a lot of stuff to look forward to, and I just kind of want to take
a second to highlight what we are going to be talking about, and that's coming down the pipeline
soon. So we have chat GPT5, which is confirmed. That's coming this summer. That is going to
probably beat Grock 4.4. It's probably going to be better. It is going to be incredible. Then next week,
Open AI is actually open sourcing a model. So we have that to look forward to. New Claude has been
spotted, Cloud 4.5, possibly. It's been out in the wild. It's been rumored. And then we have Gemini
3.0, which has also been spotted in the wild. And these are a lot of really big model. So I think for
the past few months, we had this breather where it was like, okay, nothing really has come in
terms of frontier models. We've been using 03 for like quite some time now. I think that's all
about to change in the next few months. So if you're listening to us, buckle up. There's a lot of
acceleration, a lot of AI, a lot of intelligence to come. Again, thank you for the comments yesterday
about sharing preferences. Some people liked the daily show. Some people didn't. We're just going to
continue to iterate. I think today the episode works perfect. By the afternoon, you should have all the
news. So thank you for listening. Thank you for sharing. Thank you just for making it here,
rocking with us the whole time. And we will be back soon with another episode. See you guys next time.
