Limitless Podcast - AGI is Back! Google Gemini 3.0 Crushed Our Expectations

Starting point is 00:00:00 If I had a crown in my hands, I would place it on the heads of Google because they have done it again. They have the world's best AI model ever in history by a shockingly large margin. Gemini 3.0 just got released. It's available now to anybody in the world to go use it. And the benchmarks are kind of blowing everyone's expectations out of the water, myself included. And most importantly, it places another data point on the chart that shows we are continuing to ascend up this exponential curve towards AGI. And the roadmap is still intact and we are very quickly moving through it. He says, I was just, I was going through the benchmarks before recording this. And I, it's shocking

Starting point is 00:00:35 because we live in this world. And yet somehow I'm still continually blown away by the progress that's made by these models. So let's get into it. Please walk everyone through. Tell me, what did Gemini and the Google team just release with this 3.0 update? People probably think we say the world's best model every week. But this time we really, really mean it. Like they have blown every single other model provider out the water. The things that this thing can do, well, How about I just show you? How about I show you, Josh? Please, let's see some examples.

Starting point is 00:01:04 We have a thread here. And Sundar basically says, you can give Gemini 3 anything, images, PDF scribbles on a napkin, and it'll create whatever you like. For example, an image becomes a board game. A napkin sketch transforms into a full website and a diagram could turn into an interactive lesson, right?

Starting point is 00:01:21 So there's two examples I want to show you, Josh. I want to get your opinion on this. So number one, there's a short video of someone playing a pickleball, here and he or she rather uploads it into Gemini and says hey can you tell me how well I've done here and how I can improve my game and it analyzes the entire video it knows that she's wearing a knee brace it analyzes her positions telling her where she can move to better position herself to score the point that's pretty nuts but before I get your reaction to that

Starting point is 00:01:50 because Josh I know you're an athlete I know you're you're very competitive when it comes to these things so this is a tool you could definitely use the second thing is probably applicable to a lot of listeners on this show They've embedded Gemini 3 into Google search and into new generative UI experiences. The way I would summarize this is it basically is very intuitive, Josh. It understands what you're asking for without you needing to really kind of explain yourself. The example they're showing on the video here is, can you explain the three body problem to me? And rather than just give you kind of like this simplistic text, which explains the concept,

Starting point is 00:02:24 it decides to create a video diagram from scratch to show you a visual depiction of how this works. Right, give me your reaction in order from one to two. So starting with the top. The first example. Yes, sir. So this is really cool, the napkin example, where you can scribble something down on a piece of paper. It'll generate it in the real world. What all of these examples are kind of showing me is what we always talk about with Google, where it has this awareness of physics, reality, and visuals and understanding what it's seeing. And all three of these examples are leaning into that. So it leads me to believe Gemini really is a multimodal first model. where it's meant to ingest, meant to understand the world around us.

Starting point is 00:03:03 This example of the chessboard and the napkin is amazing because a lot of people oftentimes have sketches. You just draw it down on paper and it intuitively understands it. But the one that was most surprising to me is the video example because as far as I'm concerned, as far as I'm aware, there has never been a model that can ingest video and understand the video that it sees. And if it does exist, I've never tried it before.

Starting point is 00:03:24 So the idea that you can, I mean, I play baseball growing up. If I could take a video of myself swinging and get a corrective coach to walk me through exactly what was wrong. A lot of people play golf, I'm sure who are watching this, if you could have a phone recording of you playing golf, and it can actually critique it, and then critique me as if you are Tiger Woods. Critique me as if you are whoever else is good that plays golf.

Starting point is 00:03:43 I don't know, Rory or McElroy, whatever they are. But like, critique me as if you are an expert who is really good at golf and can give me some feedback on how I could better my swing. And what this offers in this one just narrow case example is now you have this personal tutor that can do anything. If you're dancing, if you're doing anything physical, if you're, whatever it is, it can evaluate things for you. Even if you have a video, a podcast, EGS, we uploaded to Gemini 3.0, it could critique us. What did we do well?

Starting point is 00:04:06 What did we do not? What did the visuals look like? How can we improve them? And that awareness of video is like really cool. Yeah, I just want to say, I think the closest we got to this was with GPT where you can upload an image, like, what's under my car bonnet and say, hey, what's wrong? My car stopped working. and it can kind of like identify the point that you need to kind of like change, change the all, blah, blah, blah.

Starting point is 00:04:29 But that's just a static image. To go from that to live video and for it to analyze all the frames in that video and then give you a response on that is a massive leap upwards. We just haven't seen that anyway. So yeah, you're right. It's amazing. And every example we go through, it kind of breaks the mold of what I believe to be possible. And it's, I find that it's going to be difficult to use Gemini 3.0 because there is so many

Starting point is 00:04:51 possibilities now that have not existed previously. you kind of need to relearn how to engage with AI because it's so capable. And there's a fourth example here that I just want to touch on briefly, which was also cool, is that it works just as well for the other things. The example is a trip planning one where it starts to plan a trip and a vacation. And it shows you a full list that is fully interactive of all the places broken up day by day. And there's an option that you could just choose visual layout. And you see on the screen here, it'll take every single day of your trip,

Starting point is 00:05:18 break it into images, sectioned it out into this really nice visual grid. So what I'm seeing the themes here are, okay, real world understanding, video first, and really nice presentation, which I think a lot of models sometimes struggle on. So demos out of control. I'm excited to use it. Everyone else can use it now. It's live. Now I want to get to benchmarks, EJS, because this is where things get kind of crazy, where we could actually compare one to another and see exactly how impressive this is relative to everybody else. So please, we have the card here.

Starting point is 00:05:47 Walk us through what we're seeing in this model card and all the specs that we need to know. As you guys probably know by now, benchmarks is typically how we evaluate any typical AI model against each other. And they're measured against a range of different benchmarks. A benchmark can be considered as sort of like a test. Now, right at the top, you've got Humanity's last exam. This is by default the hardest exam that an AI model is tested against. And it's kind of like an academic reasoning test with no tools accessible to it. It scored a very impressive 37.5%, which is more than, I think it's about a,

Starting point is 00:06:20 15% increase from its previous model. Very, very impressive. But what really blew my mind was the second stat listed here, which is the Arc AGI2 benchmark. Josh, when I say this 2xed, the previous state of the art model, I absolutely mean it. In fact, let me just show you this chart here.

Starting point is 00:06:41 Now, you may notice a couple of like busily specs here, GPT5 Pro, GROC for thinking. And then can you see that, that outlier right at the top right? Do you see that, Josh? That's insane. The two outliers. The two outliers. So these are the Gemini three pro and the Gemini three deep thinking model, deep thinking being like, you know, a large model that can like kind of give you a more research response. They are a standout from every single other model. And the reason why this is so crazy, well, there's a few reasons. Number one, all the other model progressions, as you can see over time,

Starting point is 00:07:18 has been kind of impressive, but kind of small. Like, they've been a good jump. It's been impressive, but it hasn't been as impressive to be like, oh, you know, another model provider couldn't catch up. These results from Google literally put it miles ahead of every other model. So when I sat at this chart, I think, wow, Google probably has the lead for another six months. And in six months time, they're going to have a more impressive model by then.

Starting point is 00:07:45 So at this point, I'm kind of thinking, can anyone catch up to Google? Josh, do you have any reactions to this benchmark? This is the chart that, like, the first thing I said to myself when I saw this is like, oh my God, there is no wall. We are not going to stop scaling. The scaling wall still apply because these two new data points that we have blow everything else out of the water.

Starting point is 00:08:04 And this is how exponential growth happens. It seems like a really small cluster down there in the bottom. But the reality is that was the top just a couple hours ago. And Gemini kind of re-kind of factored this entire chart to make it seem like it's so small because the progress is so high. And although Gemini 3.0 thinking is seemingly the most impressive, the really anomaly chart is the Gemini 3.0 Pro, which is basically a vertical line up from these other models, where the score is higher, but the cost is actually slightly lower. And if you connect a dot between these averages, you start to see literal vertical line in terms of improvement and

Starting point is 00:08:39 acceleration in these models. And that to me shows that there is no Zoscaling wall that we're hitting. Like we can continue to scale resources, energy, compute, and we can continue along this path towards AGI in a world where some people were saying, we don't know if it continues. The answer to me is very clearly, it continues. This is a step much closer to AGI. And again, that real world understanding makes it feel much closer to AGI than it ever has before

Starting point is 00:09:03 because now it really like intuitively understands the world through video, through photo, through audio, through basically every sensory input we have outside of what taste and feel. So this to me, I saw this chart, I was like, oh my God, Gemini, you really outdid yourselves. I'm just going to be honest. I think over the last couple of months, I've been getting a little bored with the models that have been released by other model providers.

Starting point is 00:09:28 And it led me to think that we're not going to make many breakthroughs until, you know, some model provider figures out a new, a unique way to train their model. Gemini or Google has convinced me otherwise with this release. But I know you guys are probably like fed up with listening to us hop on about benchmarks. So how about I materialize that for you in a much more easy to understand way, right? So here are the four big takeaways that you need to learn about Gemini 3. Number one, for its intelligence, for the intelligence that you're getting, it is not that super expensive. Google train this from scratch, as this tweet says, using their own GPU infrastructure. And it used this kind of like layout called a mixture of experts, which basically means that whenever you prompt the model, it's not going to use the entire model.

Starting point is 00:10:15 So it actually ends up being cheaper than what it could eventually become. 1 million token context input, 64K token output. We'll get to the costs in a bit of a second, but the point that I'm making here is that it's not as expensive as you would expect for the intelligence that you're getting. Now, if you compared Gemini 3 to GPT 5.1 from Open AI,

Starting point is 00:10:34 on a relative basis, it is more expensive, but for the jump in intelligence that you're getting, it's way better. So it's, in my opinion, worth it. Number two, when it comes to computer use, so that means letting the AI model control your computer, and do tasks for you whilst you go do something else, it is state of the art.

Starting point is 00:10:53 It is the best here. They measured it against a benchmark called ScreenSpot Pro, which is a benchmark which kind of like analyzes its ability to understand images and visuals on a desktop. It just absolutely crushes it. Number three, it is the best AI for math by far. So again, the point I'm making here or the theme that we're seeing here is it's not just good at one thing, it's good at many things,

Starting point is 00:11:15 which makes it the best generalist AI model in the way. world right now, by far. And the final thing, Josh, and this is where it might slip up. I'm curious to get your take on this. It is insanely good at coding, but we don't quite know if it is the best at coding yet. What I mean by that is it completely crushed everyone else on one coding benchmark, but the coding benchmark that matters, which is the software engineering, SWE, it didn't do as well as its competitor, Claude 4.5 from Anthropic. So those are the four main takeaways. I would much prefer a model that understands the world than understands how to code. And I think we're starting to see these subset niches where if Anthropic has the best coding model, that's great. Let them focus on code,

Starting point is 00:11:57 let them narrowly make that the best model. Like Google handle everything else. And I think that's what Gemini is focusing on. So the code thing doesn't really bother me because I don't care to use Gemini for code. I'm happy to be in ClaudeCamp for code and then use Gemini for everything else. And then one of the points earlier that you mentioned on the pricing, I find it a little interesting because it's a little bit more than just a little bit more expensive. The pricing, I was looking through it, and it's for over 200,000 tokens, they're charging $4 for inputs and $18 for outputs. Now, relative to GPT 5.1, which just got released,

Starting point is 00:12:30 they're charging for a million tokens, $1.25 in, $10 out. So you're talking about, what is that? That's about $20 versus $1.25 on inputs. And that is a fairly significant margin that you're paying for this quality. So we're starting to see the tradeoffs happening on that parado curve that we talked about in a few episodes earlier, where there are tradeoffs coming in terms of cost and quality. And it's clear that while Open AI may have optimized for cost, Google is kind of optimizing for a little further up the cost curve in exchange for super high quality. And it seems like this is kind of a balanced data point for now because unless you are using this via API and you're requiring a ton of tokens, a $20 a month Google membership will get you all of the use that you need. and that is just fine.

Starting point is 00:13:14 So in terms of a usability perspective, I think that's okay. But it's just an interesting thing to know is that this is a better model. It is also more expensive. And it is a tradeoff that was made. And in the case that Open AI decides to make this trade off with GPP6

Starting point is 00:13:27 or GROC decides to make this with GROC 5 or GROC 6, I'm losing track of all these models now, I think we're going to start to see the dynamic shift in terms of that Pareto curve and what model architects decide to remove and add. And in this case, it looks like Google added quality,

Starting point is 00:13:41 but they also did add quite a significant cost increase. I personally don't think it matters. I think it's a nothing burger. I think that if Google wanted to make it affordable for everyone, including the developers that want to get API access, that think it might be too expensive, they could subsidize it. They are a cash flow giant. They have enough money to do that. Open Air has been doing that for so long now that it doesn't even matter. I don't see any reason why Google couldn't do that. The other reason is Google just released their latest TPU, which is their GPU that they use to train their models and inference their models. and typically with every generation, we get a much cheaper cost of inference.

Starting point is 00:14:15 I think by the time that they release their next generation model, which might be Gemini 3.1, we're going to see a considerable reduction in the cost for using Gemini 3 Pro and Gemini Pro deep research. So I'm not too worried about that. I think it's kind of like a short-term problem and not a long-term problem. But speaking of TPUs, I just want to take a moment to really kind of belay the point that using their own TPUs to train a state-of-the-art model that is 2x better than the previous state of the art model

Starting point is 00:14:47 and probably puts them in a six-month lead after Google started off on the back foot creating probably the worst model I've ever seen and changing that all around in, what's it, under two years is nothing short of insanity. TPUs is Google's kind of version of the GPU. GPU is kind of like what Nvidia controls the monopoly over. this is the hardware that you used to train your AI and inference your AI.

Starting point is 00:15:12 The unique part here is that Google's never used an Nvidia GPU in any considerable way to train their models. They've always trained it in-house. And that's such a difficult and tricky thing to do because designing and building these TPUs at scale, these GPUs at scale, is a super hard and complex thing. You need so much talent. You need so much expertise and insight to be able to do that. The unique thing about Google's TPUs, well, there's two main takeaways.

Starting point is 00:15:36 Number one, it's cheaper to train the same amount of intelligence that an Nvidia GPU is. So it's more cost efficient. And the second way is, this is their secret source, you can stack those TPUs on top of each other in a really scalable way that you can start training really, really large models. If you wanted to train the same size model with Nvidia GPUs, it would cost way more and it would take way longer. So Google made a really risky and big bet about a decade ago saying we're going to build our infrastructure in-house. We're not going to rely on Nvidia, and we're going to benefit from the full-stack experience. And this model is a prime example of that bet paying off. So I just want to call them out.

Starting point is 00:16:15 Like, it's not like Google has gotten lucky here. They've been planning it for a while now. The interesting thing to me is that this is the first number one model in the world built on something other than an Nvidia GPU. And that's fairly significant because every company in the world is trying, but this is proof that it's actually possible. And I think when we talk about Tesla and AI5 and the XAI team, when we talk about OpenAI working with whoever they're working with to build their own in-house

Starting point is 00:16:41 GPUs, I think this sets a precedent that it is possible. And I suspect that will result in more companies putting their foot on the gas when it comes to kind of destructing part of Nvidia's monopoly that it holds over GPUs. So that to me is the interesting takeaway of this. And hearing that it was fully done, trained on these TPUs, that's very high signal to me saying, okay, there is an architecture ships happening. There is a real benefit to vertical integration if you could figure out manufacturing these compute units at scale. And now the race is on for everyone to do this. Because again, using the Apple example, the M-Series chips, unbelievable, and they unlocked the best computers in the world. And if companies can really start to refine this vertical integration

Starting point is 00:17:19 of their own chips, you're going to see that exponential curve go vertical times 10. Like, it is going to, I suspect that is very obviously now how we reach AGI faster than people previously thought. Because the efficiency improvements from those vertical integrations, once they're able to manufacture these at scale are going to be unbelievable and I'm so excited for that to happen in the near future. Google has a big head start, but let me tell you,

Starting point is 00:17:44 the other companies are not far behind. Well, let me introduce you to another big advantage of being the big dog Google. You thought you were going to come on to this episode and listen to us hopping on about a generalized model? No. You're forgetting Google has many other products in their arsenal and you're forgetting that they can plug in their new state-of-the-art model

Starting point is 00:18:04 into all of them. So Google, not only today, announced Gemini 3, but they also announced a different product. It's called Google Anti-Gravity, which is basically a new software environment for you to code up AI agents, except this time these AI agents are going to be super, super smart, because they get plugged in with Gemini 3. Now, if you remember earlier, I mentioned that one of the cool benchmarks that this new model sets is in computer use, which means that it can control your computer, it can do things autonomously for you. Now, typically the reason why we haven't really spoken about that on this show is that they've been kind of lame. Like, they can book you a dinner reservation and do different kinds of stuff.

Starting point is 00:18:42 With this model, it's way more intuitive. It can do way more intelligent tasks and it can take a lot more complex work off of your hands, such that the value that it produces to you over like the eight hours that you take to sleep overnight would be considerable for you to actually be serious to use in your enterprise and your business or just your at-home lifestyle, right? So the point I want to make around here is Google's moat is not just its intelligence or ability to create new models. It's not its TPUs. It's its distribution.

Starting point is 00:19:12 It's the entire product suite that it has that regular users like you and I that use Gmail that use Google Suite can now kind of benefit from simply by plugging in that model. And I think like products like this anti-gravity. I bet you, Josh, we're going to see a slew of new Google product releases over the next couple of weeks simply because they created this model. I hope so. I guess the contrarian take is like, okay, how many people are actually going to want to use them? We just spoke about how Claude is the superior code model. Everyone loves cursor. No one really uses the mobile applications of these. A lot of people are engaging with AI on their phone. So maybe it works for the right type of person. But Google still does have that product problem where they kind of have a tough time. They have the amazing intelligence. They just

Starting point is 00:19:55 have a tough time using it. I mean, I don't have the Gemini app on my phone. I mostly use GROC and chat GPT, and there is this bar that they still need to cross that I think they're trying with Google AI Studio. And we had Logan Kilpatrick on who was the head of that to talk about it when Nano Banana came out. But there is still a bit of a long shot for them to get good at products to actually develop this. But what we saw this week is that there was a resounding, overwhelming amount of support to your point you does, where the market just believes in Google. And in a week where all of the stocks, all of the Mag 7 was down, Google was the one anomaly. Google was up this week. And I think it's because the market is starting to realize, one, vertical integration through these TPUs is a huge deal.

Starting point is 00:20:33 Two, Google has an existing business that is not reliant on AI. And sure, AI places a huge, like, hand on that scale, but it is not everything. And they are cash flow positive in the absence of AI. So all of this innovation that they're doing is really just applying later fluid on top of an already great business. And the market is starting to evaluate that properly. So Google is positioned very strongly. They have very high intelligence. Gemini 3 rocks.

Starting point is 00:20:56 And I mean, again, we continue on the bull train for Google. I am a believer. I am a supporter. I am stoked that they have the crown. I assumed it was only a matter of time. And now the question is, who's next? Who is the next competitor? Who's going to set the next plot on that chart and set the vertical trajectory on the

Starting point is 00:21:12 next dimensional curve we're on? Do you have any guesses? Who do you think it's going to be? Yeah, well, I don't because I don't think it's going to be anyone for a while. I said this all in the show and I'm going to say it again. I think there's going to be a six-month period now. where either the other model providers don't release a model because it's not as good as Google's, or they just kind of release these kind of mediocre kind of like consumer products that kind of maybe

Starting point is 00:21:37 benefit certain consumers in one way or another, but doesn't really kind of break the generalized model standard that Google has just set. Just the last point on the kind of Google bull case thesis, they may not just play in the same ring as Cursa does. Like I was critiquing, Microsoft on another episode, Josh, do you remember? And then I got off that episode and I was just like, Microsoft like dominates the enterprise environment. All the boomer companies and institutions love Microsoft. And they have all their data and memory. And just because you and I don't use it or just I'll speak for myself, just because I don't use it. And I think it's boomer doesn't mean that they're not absolutely crushing. Google just came off a hundred billion dollar quarter of revenue.

Starting point is 00:22:19 That's like the highest they've ever had. So I don't want to be too hasty to say that like Google's not going make it because they can't make a sick consumer product like Open AI can maybe. I just think they're maybe playing in different fields. But to the point around like I don't think anyone else is going to catch up, look at these comments Josh. I want to show you two comments, all right? One is from Sam Altman. He goes, congrats to Google and Gemini 3.

Starting point is 00:22:40 This looks like a great model. The other is from the almighty being Elon Musk saying, I can't wait to try this out. And this is just one of a series of tweets that he's been putting out this week saying, can you guys just drop Gemini 3 because I need to see how good this thing is? And the reason why I bring up these two people is both Sam Altman and Elon Musk have released new versions of their models, GPT and GROC respectively. But it's been the 0.1 upgrade. It's GPT 5.1. It is GROC 4.1.

Starting point is 00:23:07 And they are almost identical updates. You want to know what the biggest and coolest thing about their model updates were? Personality traits, which don't get me wrong is cool. Like I would like my model to kind of respond in a very intuitive manner and get me. But it's nowhere near the state of the art standard that we've just seen broken by Gets. Gemini. So the point I'm making is, I think these two companies might have run out of fuel for the near term. Grock is going to be next. You think? They're the next one. By the end of Q1, Grock will have the crown. Why? And I assume by a fairly large margin. But I assume it will be a different type

Starting point is 00:23:38 of crown. And this is where I'm really excited to see how these models progress. We spoke a little bit early about how Cursor is kind of the coding model. Google has a very deep understanding of the real world and physics and video and understanding how that works. Groc and the XAI team are are very focused on the pursuit of truth and information. And I think that's kind of the alley that will see them going down. So they have the real-time data with X. They have the pursuit of truth. And where Google and OpenAI and all these other companies are trained on an existing

Starting point is 00:24:07 data set, the XAI team and the GROC team are developing an entirely new synthetic data set that is maximally truth-seeking. And we saw that early version with Grockapedia that should provide the most accurate and, I guess, thoughtful information. It should be the best of thinking because. is the closest to source truth. So while I think Gemini will probably be better at physics and video and understanding the real world for quite some time, I suspect GROC will be really good at just communicating via text. If text is a modality in which we interface from, GROC should be really good.

Starting point is 00:24:40 And again, the rate of acceleration. GROC has been around for the least amount of time. They're accelerating the fastest. And I'm very, very, very excited for a GROC 6, GROC 5, whatever we're at announcement, hopefully early next year. So that's the predictions. That's the episode. that's Gemini 3.1. It is an unbelievable new model. Everyone can try it out. So here's how you try it out. I believe you need to be a Google premium plus subscriber, whatever it's called. It's $20 a month. You can go on the Gemini website and it's just a text box and you can play around with it. They also have a mobile application. It's very easy to download on your phone. Play around with it. I'd love to see examples of cool things because I think one of the problems for me and one of the

Starting point is 00:25:15 things I'd love help with from anyone who's listening is how do you use this thing to test it? What do I ask it? And how are you interfacing with it? to get the maximum amount of results from it. Because intuitively, I would never think to record myself and ask for feedback, but that's a new possibility. So I guess the challenge to anyone who's listening is figuring out of these new models as these new features get released.

Starting point is 00:25:36 And Gemini 3 has just opened up the gates to a gazillion new use cases. Yeah, I mean, this is a super cool release for Google. And weirdly enough, it's not the only release over the last week. I mean, I've got a list pulled up here. They've released new Android iOS. updates, they've got a new search AI mode, they've released anti-gravity that we mentioned earlier. We've got Seema 2 research, which we demoed on a previous episode.

Starting point is 00:26:00 You should definitely go check that out. I mean, they are just not stopping. And they're a force to reckon with. And kind of similar to them, Josh, just to kind of round this episode out and thank you guys for listening. We are here in Argentina, in Buenos Aires. We are kind of meeting some of the fans that are at here. And we spoke to one just this afternoon, Josh. And you know what he said to me?

Starting point is 00:26:22 have a guess. What's that? He said your podcast limitless is like the state of the art AI podcast. In fact, it is 2x better than any other AI podcasts that I've ever heard. And you know what? Hell yeah, brother. That sounds very similar to the Gemini 3. So you could potentially call us the Gemini 3 of AI podcasts.

Starting point is 00:26:41 And so if you're a listener to this, if you are a non-subscriber on our YouTube, you should probably click that subscriber button. You should probably click that notification button. Because guess what? We've got more episodes coming this week. And guess what? The five-star ratings help us out massively. So if you enjoyed this episode and if you want to hear more episodes of this nature

Starting point is 00:26:59 and of cutting-edge news in AI, you should give us a follow. And we will see you on the next one.

Limitless Podcast - AGI is Back! Google Gemini 3.0 Crushed Our Expectations

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.