The AI Daily Brief: Artificial Intelligence News and Analysis - How Gemini 3 Changes the AI Race

Starting point is 00:00:00 Today on the AI Daily Brief, Gemini 3 has officially arrived. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, Blitzy, Rovo, assembly, and robots and pencils. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcasts to learn about sponsoring the show and pretty much anything else about the show, including jobs, speaking opportunities, etc. at AIDailybrief.A.I.

Starting point is 00:00:39 As I mentioned yesterday, we are in the last week of accepting submissions. So if you want the full readout and report, go to ROIurvey.com. And contribute your use cases now. Related to that, between this new benchmarking study and some other exciting things coming up, I have a few exciting projects for researchers slash analysts. If you have immediate availability and a research or data analysis background, shoot us a note at jobs at AIdailybrief. A.I. Put research in the subject line. Share your background and I'll tell you a little bit more about

Starting point is 00:01:10 what we're thinking. With that, though, let's talk Gemini 3. Welcome back to the AI Daily Brief. Happy Gemini 3 day to those who celebrate. It has been a long time coming. The final anticipated model drop of the year, barring some big surprise from OpenAI, has finally arrived as Google has announced Gemini 3. Now, the focus has been on Gemini 3 for at this point months. really ever since GPT-5 launched. In fact, the launch of GPD-5, which initially had a bit of a bobble around first impressions, ratcheted up the pressure in some ways on Gemini 3 for the entire industry.

Starting point is 00:01:51 Over the last few weeks, though, the rumor mill has been swirling and hype has been building incredibly for this. An example from just a few days ago, Marmaduke 091 on Twitter says, OK, I'm saying it now with full confidence, Gemini 3 will make every other LLM irrelevant, and Nanobanana 2 will make every other image model irrelevant. It is not close in Google 1. This is just my honest feelings after seeing things. Nobody is ready. There were effectively infinite posts of that type, including people who said

Starting point is 00:02:19 that they had access and were seeing things, so much so, in fact, that in the last couple of days, you've actually started to see a bit of a shift where some folks have nudged in the other direction. Peter Wilderford shared the meter chart for the length of autonomy that a model can perform at, saying Gemini 2.5 released in June was a great model, but it was a bit disappointing on the meter's benchmark, below trend at 39 minutes. Where will Gemini 3 be? If on trend, it should be around three hours. My forecast as Gemini 3 remains a bit below trend at 2.7 hours. Synthwave DD went much farther. On Monday, they wrote, almost every day I hear and see more things that lower my expectations for Gemini 3.

Starting point is 00:02:57 That is once again the case today, and honestly I'm not that exact. excited for this launch anymore. The model regresses with every new checkpoint. DeepMind should be managing expectations. Instead, we get more hype posting with little substance or transparency. Hopefully it will be a lot better in the real world, but my expectations there are similarly low. Now, to be clear, this was the exception going into this release, not the rule. Most people were incredibly hyped, and if you needed any more confirmation of that, last night, the normally, highly reserved Google DeepMind CEO, Demisisisis tweeted, it's nearly three here, my favorite part the night shift, locked in, Demis does not hype post. One little coter writes, you guys even made

Starting point is 00:03:36 the calmer Demis hype up. But let's talk for a second about the stakes of this launch for Anthropic and Open AI. Google has been surging throughout the year, picking up market share, picking up mind share, and picking up brand relative especially to ChatGPT. For the first time earlier this year, the Gemini app actually jumped up over ChatGPT for a little while in the Apple App Store charts. There is a lurking narrative surrounding OpenAI and Anthropic that the 800-pound gorilla that is Google will eventually just have too many resources and has a certain ultimate inevitability.

Starting point is 00:04:08 And so the question coming into this for those companies would be, would Gemini 3 blow them out of the water? Would there still be areas where their models, especially for Anthropic, their coding models, were on par or still better than Gemini 3? And in general, what did this say for the state of the consumer AI appraise? This launch also had some implications for Nvidia. While it wouldn't be as direct as the comparison between Gemini and ChatGPT, for example,

Starting point is 00:04:32 Google is also building out the full stack around AI, including their own chips called Tensor Processing Units or TPUs. If Google trained this model primarily with their TPUs, and it was more performant than other models that were trained on Nvidia chips, maybe that would have implications for how the market viewed Nvidia. Speaking of the market, AI Bubble Talk continues to surge and continues to be, the dominant theme in markets right now. Google CEO Sundar Pichai actually contributed to this,

Starting point is 00:05:00 saying in a recent interview with the BBC, there is some irrationality in the current AI boom. The growth of AI investment has been an extraordinary moment. When asked what happens if the AI bubble pops, he said, I think no company is going to be immune, including us. We can look back at the internet. There was clearly a lot of excess investment, but none of us would question whether the internet was profound. I expect AI to be the same. So this is a theme we've heard from lots of leaders before, that yes, there might be a rational exuberance when it comes to valuations, and there may even be a correction, but ultimately the technology underneath will be every bit as significant as people say it is. Now that, plus broader macro factors, including Federal Reserve expectations,

Starting point is 00:05:37 contributed on the day of the Gemini launch to a big tumble in stock prices led by the AI tech sector. And so the reason that Gemini 3 matters from that macro market perspective is that one of the things that would confirm for some investors, their concerns about an AI bubble, would be if it appeared that we had reached some meaningful plateau. In fact, the conversation around GPT5 being overwhelming when it came to expectations helped ratchet up this latest generation of AI valuation and AI bubble fears. So you've got implications for their AI model competitors, implications for their infrastructure competitors, implications for the market as a whole. For Google itself, honestly, I kind of think the stakes were the lowest. Google is basically locked in an incredibly good year on their

Starting point is 00:06:23 AI products. The number of users has gone up, the amount of tokens they're processing has gone up, and unless Gemini 3 was an absolute total flop, it's hard to see that trend changing. Now, of course, that doesn't mean that they couldn't use this moment to really reinforce a new leapfrog position, but when it comes to downside risk, I was actually least concerned about Google heading into this announcement. Now, pretty much everyone knew it was coming today, and indeed, At 8 a.m. Pacific time, 11 a.m. Eastern time, Sundar Pichai tweeted, Introducing Gemini 3. It's the most powerful model in the world for multimodal understanding and our most powerful agentic and vibe coding model yet. Gemini 3 can bring any idea to life,

Starting point is 00:07:02 quickly grasping context and intent, so you can get what you need with less prompting. The companion blog post was called a new era of intelligence with Gemini 3. In the note, we got a few statistics. The Gemini app is now up to 650 million users per month. That's about 50 million more than last we heard, and Sundar also said that 13 million developers have now built with their models. Now, one important thing, it's clear that Google has learned from their own launches in the past that the market does not like when you make announcements and say coming soon. Sundar writes, starting today, we're shipping Gemini at the scale of Google. That includes Gemini 3 in AI mode in search with more complex reasoning and new dynamic experiences. This is the first time we're

Starting point is 00:07:41 shipping Gemini in search on day one. Gemini 3 is also coming today in the Gemini app to developers in AI Studio and Vertex AI, and in our new Agentic Development Platform, Google Antigravity, which I will get into in a little bit. In Demis's section of the announcement post, he calls this another big step on the path to AGI and gets into the monster set of benchmarks that we'll get into in just a moment. Now, interestingly, the blog post is really focused on what you can do with this model, organized around learning anything, building anything, and planning anything. In his announcement tweet, Demis said that beyond the benchmarks, it's been by far his favorite model to use for its style and depth. In an example, he writes,

Starting point is 00:08:20 I've been doing a bunch of late-night vibe coding with Gemini 3 in Google AI Studio and it's so much fun. I recreated a test bed of my game theme park that I programmed in the 90s in a matter of hours, down to letting players adjust the amount of salt on the chips. In his announcement post, Google VP Josh Woodward discusses six additional features that come with the launch. The first is generative interfaces, which is a new type of experimental interface that's generated on the fly and adapts to the user's needs based on the prompt. Gemini agent, which he writes is an experimental tool that orchestrates and completes complex multi-step tasks, a new look for the Gemini app, better shopping results, support for 23 more languages, and a new promotion

Starting point is 00:08:58 where U.S. college students get a free year of AI Pro. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand Enterprise-scale codebases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzie platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating

Starting point is 00:09:41 Blitzie as their pre-I-D-E development tool, pairing it with their coding pilot of choice to bring an AI-native SDLC into their org. Visit Blitzie.com. and press get a demo to learn how Blitzy transforms your SDLC from AI-assisted to AI-native. Meet Rovo, your AI-powered teammate. Robo unleashes the potential of your team with AI-powered search, chat, and agents, or build your own agent with Studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work.

Starting point is 00:10:14 Connect Rovo to your favorite SaaS app so no knowledge gets left behind. Rovo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Robo is already built into Jira, Confluence and Jira Service Management Standard, premium, and enterprise subscriptions. Know the feeling when AI turns from tool to teammate? If you rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in VictoryO.com.

Starting point is 00:10:44 If you're building anything with Voice AI, you need to know about Assembly AI. They've built the best speech-to-text and speech-understanding models in the industry, the quiet infrastructure behind products like Granola, Dovetail, Ashby, and Cluley. Now, as I've said before, voice is one of the most important modalities of AI. It's the most natural human interface, and I think it's a key part of where the next wave of innovation is going to happen. Assembly AI's models lead the field in accuracy and quality so you can actually trust the data your product is built on. and their speech understanding models help you go beyond transcription, uncovering insights, identifying speakers, and surfacing key moments automatically.

Starting point is 00:11:22 It's developer first, no contracts, pay only for what you use, and scales effortlessly. Go to assemblyaI.com slash brief, grab $50 in free credits, and start building your voice AI product today. Today's episode is brought to you by robots and pencils. When competitive advantage lasts mere moments, speed to value wins the AI race. While big consultancies bury progress under layers of process, robots and pencils builds impact at AI speed. They partner with clients to enhance human potential through AI, modernizing apps, strengthening data pipelines, and accelerating cloud transformation. With AWS certified teams across US, Canada, Europe, and Latin America, clients get local expertise and global scale.

Starting point is 00:12:03 And with a laser focus on real outcomes, their solutions help organizers work smarter and serve customers better. They're your nimble, high-service alternative to big integrators. Turn your AI vision into value fast. Stay ahead with a partner built for progress. Partner with Robots and Pencils at Robots and Pencils.com slash AI Daily Brief. Now, one thing that was interesting to some folks is that Google launched this huge model with just blog posts and videos. No live stream, no presentation. But as it turns out, there was plenty for people to talk about even without that.

Starting point is 00:12:40 If you are a regular listener, you'll know that I'm very skeptical of the ultimate value in benchmarks when it comes to a new model. I think most of our benchmarks are getting fairly saturated, and they don't necessarily tell you how a model is going to interact with your particular use cases. In other words, while they can be useful guides, they're no substitute for just getting in there and trying the thing out. However, to the extent that benchmarks can make a splash, boy do Gemini 3's benchmarks make a splash. Vraserx sums up the attitude of basically everyone that I've seen online, writing, The Gemini 3 Pro benchmark results are genuinely unreal. Google didn't just catch up today. They walked into the arena and rewrote the difficulty settings. To take a few examples, on the academic reasoning-focused

Starting point is 00:13:20 humanity's last exam, GPD 51 was at 26.5%. Gemini 3 Pro is at 37.5%. On MMMLU, one of the more saturated benchmarks for multilingual Q&A, Gemini 3 Pro nudges out over GPD 51, 91.9% to 91%. On GPQA Diamond's scientific knowledge, GPT 51 is at 88.1% versus 91.9% for Gemini 3 Pro. And basically this story repeats itself kind of across the board. The only benchmarks where it either shared the title or was behind were on the AIME 2025 with code execution, where it tied with Sonnet 4.5 and 100%, and Sween bench verified, where it was just ever so slightly behind both Sonnet 4.5 and GPD 5.1 at 76.2% compared to 5.1,77.2%. Now before you think that means that it's worse at coding. On another coding benchmark, Terminal Bench 2.0, which is a

Starting point is 00:14:19 gentic terminal coding, Gemini 3 Pro set a new standard at 54.2% compared to Sonnet 42.8% and 5-1s, 47.6%. Now, beyond the biggies, some people noticed a few where the jump really stood out. Matt Schumer writes, one of Gemini 3's biggest leaps is its towering score on screen spot pro. This, by the way, is a measure of a model's ability to understand what's going on on a screen. The previous state of the art was Sonnet 4.5 at 36.2%. Gemini 3 Pro got 72.7%. Schumer says it just massively accelerated my timeline to full computer using agents. Another one that stood out to many was the performance on Arc AGI 2. Arc AGI is of course explicitly designed to be difficult for computers, and while GPD 51 mustered a 17.6%, Gemini 3 Pro got a 31.1%. Arc Prize founder Francois

Starting point is 00:15:12 Chalet, called it impressive progress. On the VPCT spatial reasoning test, Gemini 3 absolutely smashed the test at 91% compared to GPT5 high 66%. And when it comes to user preference overall, LM Arena writes, Breaking, Google DeepMinds Gemini 3 Pro is now number one across all major arena leaderboards, number one in text, vision and web dev, number one in coding, math, creative writing, long queries, and nearly all occupational leaderboards. Massive gains over Gem 2.5. And indeed, Gemini 3 also released a deep think mode that had even higher results. For example, the deep think mode drove the score on RKGI up to 45.1%. Again, Matt Schumer writes, the last time we saw a capability jump of this magnitude was the release of GPT4 in March

Starting point is 00:16:01 2023. We're entering a new era. In addition to LM Arena, we also got the independent review from artificial analysis. They put it quite clearly. Gemini 3 Pro is the new leader in AI. In their aggregate score, Gemini 3 Pro shows up three points ahead of GPT5.1. Going back to that concern from the market bubble people, Simon Smith writes, so I guess we haven't hit a wall. Now, at the time of recording, this model has been out to the general public for less than an hour. In fact, outside of Google AI Studio, I'm not seeing it yet in any of my literally four different Gemini accounts. So aside from early preview tests, people haven't really had a chance to get their hands on it. There are folks who have had a few reps, though, and here's some of the things

Starting point is 00:16:43 that they've shared. Dan Shipper and the Every Team were testing it earlier this morning, and said that so far, on coding, it's extremely fast and the quality seems very high. In long context understanding, it has a, quote, massive context window and it can use it. Dan writes, I found it was able to find, synthesize, and use pieces of information in a long book draft that other models couldn't. Now, interestingly, Dan says that when it comes to writing, in our early test it's not as good of a writer or editor as Sonnet or haiku. Dan says it appears worse at telling whether writing is interesting. Far Al started to play around with it on AI Studio and said it's so freaking fast and is actually pretty good. Schumer again writes, Gemini III's biggest differentiator in my opinion is that it's far

Starting point is 00:17:25 easier to get it out of the standard slop style for writing, coding, etc. Just ask for what you want and it'll make it happen. Referencing the tendency of AI coding models to design purple shaded interfaces, no more purple gradients. He also gave the example of it redesigning his personal website. Matt actually wrote a blog post about his last three days using the model in preview. The TLDR, he writes, Gemini 3 is a fundamental improvement on daily use, not just on benchmarks. It feels more consistent and less spiky than previous models. He says it's fast, intelligence per second is off the charts, often outperforming Jeep D5 Pro without the weight. Front-end capabilities, which we were just discussing, he says, are excellent, with it nailing

Starting point is 00:18:04 design details, micro-interactions, and responsiveness on the first try. This might not be great for the people who love 4-0, but for the work user, Matt writes, it respects your time and doesn't waste tokens on flowery preambles. Lastly, he says creative writing is finally good. It doesn't sound like AI slop anymore, the voice is coherent and the pacing is natural. Now, of course, that sounds a little bit different than what we heard from Dan, and also different than what Murdochan Coelan found. He wrote, I'm doing my writing with constraints tests and Gemini 3 Pro is not living up to the hype, at least for this task. He basically talks about a writing test that he's created and goes in depth about how he thinks it fails, coming to the conclusion,

Starting point is 00:18:42 so early to say, but this model lacks the taste restraint and structural intelligence necessary for challenging creative work. My big takeaway is that writing is one area where we're going to have to test it a little bit more. Now, a lot of the folks who are sharing their first impressions are focused on improvements in its coding ability and improvements in its design sensibility. P.HO. Sherano writes, I asked Gemini 3 Pro to create a 3D Lego editor. In one shot, it nailed a UI, complex spatial logic and all the functionality. Reiterating something that we heard from others, Pietro concludes, we're entering a new era. He also said, it's also amazing at games. It recreated the old iOS game called Ridiculous Fishing from just a text prompt, including sound effects and music.

Starting point is 00:19:22 He continues, it also did something I've seen other LLM struggle to do before. It built a fully functional Game Boy emulator, and yes, it even drew the Game Boy as an SVG. Flavio Adamo also tested at building games, showing off not only the ability to build a game engine, but also its understanding of spatial physics, with a game where you rotate a ball running through a tunnel to try to avoid oncoming objects, with Flavio saying, built this fun game in literally five minutes, and it's way better at coding than I expected. Now, on the coding topic, one of the things that came alongside Gemini 3 was a new Google native IDE called Anti-Gravity. In their announcement post, Google wrote, as model intelligence accelerates with Gemini 3,

Starting point is 00:20:01 we have the opportunity to reimagine the entire developer experience. Today, we're releasing Google Antigravity are new agentic development platform that enables developers to operate at a higher task-oriented level. Using Gemini 3's advanced reasoning, tool use, and agentic coding capabilities, Google Antigravity transforms AI assistance from a tool in a developer's toolkit into an active partner. While the core of Google Antigravity is a familiar AI-IDE experience, its agents have been elevated to a dedicated surface and given direct access to the editor, terminal, and browser. Now, agents can autonomously plan and execute complex end-to-end software tasks simultaneously on your behalf while validating their own code.

Starting point is 00:20:39 Finally, they point out that in addition to taking advantage of Gemini 3 Pro, Antigravity also has access to their latest computer use model for browser control, as well as Nanobanana for image generation. Now, this might have been the part of the announcement that a lot of at least the developers and builders were most excited for. In his post-Goal AI studio lead Logan Kilpatrick wrote, Antigravity is a faster way to develop. U-Act is the architect collaborating with intelligent agents that operate autonomously across the editor-terminal and browser. These agents plan and execute complex software tasks communicating their work with the user via detailed artifacts.

Starting point is 00:21:12 This elevates all aspects of development from building features, UI iteration, and fixing bugs, to researching and generating reports. So is this the end for cursor? Hyperbole aside, people do seem excited about this new launch. The AI for Success account wrote, I was one of the early testers and this thing is crazy good. Steran says an IDE isn't actually the right framework. He writes, I've been using Google Antigravity for a few days,

Starting point is 00:21:36 and to me it's not an IDEe, it's a coding agent UI powered by Gemini 3 Pro. Notably, he writes, the agent can control a Chrome browser enabling it to validate open and use the app that it builds. A surprising example of anti-gravity accessing a browser, Sterran writes, I asked to generate an SVG,

Starting point is 00:21:52 it did, of course, then asked to transform to PNG. Anti-gravity looked for image magic, couldn't find it, then looked for FFM-Peg, couldn't find it. It rendered the SVG in Chrome and saved the pixels. Max Weinbach writes, The anti-gravity IDE from Google is my favorite one now, been using it for a few days, and it's been outperforming cursor and windsurf for me, even most CLIs. Richard Serrata writes, I've been using Google anti-gravity for a few weeks, and it's wild. It's an early preview so you'll find quirks, but you'll be blown away by the agent stuff.

Starting point is 00:22:21 So, where is this all net out? Well, firstly, I certainly think that the AI bubble narrative did not get worse today. Maybe we will see over the next week people quibble that there aren't some monster crazy, highly noticeable advances, but this appears to be at first glance a pretty significant jump in capabilities. When it comes to the competitive dynamics, we'll have to see. Gemini 3 looks like a great model, but I've been absolutely loving 5.1 as well, and it's going to take a lot to displace it for me.

Starting point is 00:22:50 Ultimately, what we have here is what appears to be by all accounts. a great new model from Google, one that is shifting behaviors already for some, and even new tools like anti-gravity around it that could shift how people interact with AI. Next up for me, of course, is getting in there and actually trying it. That's basically what I plan to spend the whole rest of the day doing, and I will be back with what I find in the next couple of days. For now, as I said at the beginning, happy Gemini three day to those who celebrate, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - How Gemini 3 Changes the AI Race

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.