The AI Daily Brief: Artificial Intelligence News and Analysis - Does Gemini 3.1 Pro Matter?

Episode Date: February 20, 2026

Gemini 3.1 Pro arrives with big benchmark gains and a sharp jump in reasoning, coding, and efficiency—but in a world where the frontier rotates weekly, raw performance isn’t the story. This episod...e looks at what actually matters: cost per task, multimodal dominance, and where Gemini fits in a model portfolio that now demands specialization over supremacy. In the headlines: India’s AI Impact Summit and the Altman-Amodei moment, Walmart bets on AI for growth, Amazon tracks employee AI usage, and Accenture ties promotions to adoption. Want to build with OpenClaw?LEARN MORE ABOUT CLAW CAMP: ⁠⁠⁠⁠https://campclaw.ai/⁠⁠⁠⁠Or for enterprises, check out: ⁠⁠⁠⁠https://enterpriseclaw.ai/⁠⁠⁠⁠Brought to you by:KPMG – Agentic AI is powering a potential $3 trillion productivity shift, and KPMG’s new paper, Agentic AI Untangled, gives leaders a clear framework to decide whether to build, buy, or borrow—download it at www.kpmg.us/NavigateMercury - Modern banking for business and now personal accounts. Learn more at ⁠⁠https://mercury.com/personal-banking⁠⁠Rackspace Technology - Build, test and scale intelligent workloads faster with Rackspace AI Launchpad - ⁠⁠⁠⁠⁠⁠⁠⁠http://rackspace.com/ailaunchpad⁠⁠⁠⁠⁠⁠⁠⁠Blitzy - Want to accelerate enterprise software development velocity by 5x? ⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠Optimizely Agents in Action - Join the virtual event (with me!) free March 4 - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.optimizely.com/insights/agents-in-action/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠AssemblyAI - The best way to build Voice AI apps - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.assemblyai.com/brief⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠LandfallIP - AI to Navigate the Patent Process - https://landfallip.com/Robots & Pencils - Cloud-native AI solutions that power results ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://pod.link/1680633614⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Interested in sponsoring the show? sponsors@aidailybrief.ai

Transcript
Discussion (0)
Starting point is 00:00:00 Today on AI Daily Brief, Gemini 3.1 Pro is here, and I think its point is to flex multimodal. Before that, in the headlines, a lot of talk about AI in India, but is there anything worth listening to? The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Insight-wise, Super Intelligent, and Blitzy. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcasts. To learn about sponsoring the show, send us a note at sponsors at AIDDailybrief.ai. And of course, one more quick reminder about the projects that we launched this week.
Starting point is 00:00:45 Claw Camp, a free self-directed program to build an agent team using OpenClaw. We have kicked off the first four-week sprint, so come join about 3,500 of your best friends in becoming an agent boss. Meanwhile, for the enterprises out there who want to figure out how to use OpenClaw and other systems to build agent teams and change how you do things, we've got an executive sprint coming up. I will be sending more information at the very beginning of next week. So if you were interested in that, check out EnterpriseCla.aI. Lastly, if you want the single coolest job of all time, come apply to be our Clarkitect and work on Agending Viag coding projects with me across the AIDB ecosystem.
Starting point is 00:01:19 As always, all of this information is linked at AIDilybrief.A.I. for easy finding. Today we start with the AI Impact Summit. It's a gathering in New Delhi that has brought together world leaders and AI executives. This is the first time the event has been held in a developing country with previous iterations hosted in the UK, France and South Korea. The selection of India as the host country was symbolically important, allowing the event to platform a political call to address AI inequality. Earlier in the week, a UN report highlighted that AI adoption is still growing more rapidly in the developed world, risking a permanent technological divide.
Starting point is 00:01:56 UN Secretary General Antonio Gutierrez wrote in an ex-post, The future of AI cannot be decided by a handful of countries or left to the whims of a few billionaires. AI must belong to everyone. AI must be accessible to to everyone. AI must benefit everyone. AI must be safe for everyone. Let's build AI for everyone. In a follow-up post, he called for a global fund on AI to, quote, build skills, data, affordable computing power, and inclusive ecosystems everywhere. Now, this is one of the first times we've heard world leaders proclaim the need to deliver affordable AI to the global south. Until now, the discussions have largely been about national or regional interests. By way of example, last year's
Starting point is 00:02:31 summit in Paris was squarely focused on European leaders establishing the need to invest and compete in the AI race. This year's event was a shift towards recognizing the need to treat AI as a global public good. The other big theme of the summit was India itself declaring their ambition to become a global AI power. The event featured huge investment commitments for Medani and Reliance Industries, who will each spend more than $100 billion on local data centers over the coming decade. The Indian government also earmarked a $1.1 billion fund to the efforts. Aside from global leaders, the summit also saw tech leaders fly in, including Google CEO Sundarpe Chai, DeepMinds CEO Demasas Sabas and Mistral CEO Arthur Munch.
Starting point is 00:03:08 Slightly overshadowing other things going on at the event, Bill Gates canceled his keynote because of continued scrutiny over his appearance in the Epstein files. And yet still, with all of that, all eyes were on Sam Altman and Dario Amaday. Specifically on one moment where more than a dozen tech leaders joined Prime Minister Modi on stage. The leaders joined hand and raised their arms in celebration, save for Altman and Amade, who refused to hold hands. Beth Jaisos broke down the tape and determined that Dario had been the one to refuse to hold
Starting point is 00:03:35 Albin's hand, but regardless of who instigated, the moment reflected just how bitter the rivalry has become. While the two were on stage, a chart from Epic AI went viral, suggesting that Anthropic is on a pace to overtake open AI in revenue terms by the middle of this year. So with that bombastic framing established, the two AI rivals took to the stage and delivered vastly contrasting speeches. Dario, it must be said, um, did not his way through a generic and well-trodden narrative read from an iPhone screen. He said nothing he hadn't said before, and many people commented on just how bad it looked for him to be reading off his iPhone. Wrote terminally online engineer on X,
Starting point is 00:04:12 The oral loss is crazy. I take back everything good I said about Anthropic. Altman was more eloquent, discussing how the fundamental uncertainty of AI interacts with global issues of democracy, social contracts, and job loss. His major call to action was for global leaders to continue iterative deployment and allowed people to access each successive layer of the technology as it unfolded. Offstage in an interview with CNBC, Altman expressed skepticism over the present fear of AI job loss, remarking, I don't know what the exact percentage is, but there's some AI washing where people are blaming AI for layoffs that they would otherwise do, and then there's
Starting point is 00:04:44 some real displacement by AI of different kinds of jobs. Now, it is difficult for me to take very seriously these global talk fests. I guess theoretically sometimes genuine action arises from them, but mostly the model is that world leaders arrive, exchange platitudes about the state of the world, and then return to doing exactly what they were already doing. It's about the silly photo op of the arms up of all these people, which was incredibly awkward and weird even if there hadn't been the scruffled between Sam and Dario. Sean Wang, aka Swix, really nailed it in a post he called, Why do AI conferences keep not getting AI?
Starting point is 00:05:17 He wrote, I feel for my brothers and sisters in India. This was their big moment on a global stage and perhaps an inflection point for one and a half billion people who will have to figure out their place in the new AI-shaped economy, and yet the powers that B decisively demonstrated that nothing will change. They care more about bad photo ops and hobnobics. with celebrities than they care about the builders that are supposed to drive the Indian AI economy forward. Ultimately, I think the less time you spend caring about what's said at events like this, and the more time you spend on building things, the better off you're going to be.
Starting point is 00:05:46 Still, we had a huge portion of the big tech AI leaders and a number of sovereign leaders as well, so we couldn't let it pass completely undiscussed. Next up, we shift over to business world, where Walmart is turning to AI as their next big growth driver after a soft earnings result. The past quarter has been a mixed bag for Walmart. They've briefly achieved the milestone of becoming a trillion-dollar company. However, they also lost the crown as the world's largest company by revenue to Amazon after 17 years on top. This week's earnings report guided lower earnings and revenue growth for the coming year,
Starting point is 00:06:15 reflecting the shaky position of the consumer economy. And yet, in spite of, or perhaps because of that, the earnings call focused heavily on Walmart's AI transformation strategy. Newly installed CEO, John Ferner said, The way we're using technology in AI is helping us create great customer solutions, reduce friction, simplify decision-making, and pinpoint where our inventory is, all while maintaining the trust we've earned from our customers and members. Now, Walmart has, of course, been rolling out AI into every corner of their business over the past couple of years. Furner flagged that their
Starting point is 00:06:43 shopping assistant, Sparky, has shown early promise and will become core to their strategy moving forward. He reported that around half of Walmart's online customers have used Sparky, and that those using the assistant ordered 35% more than those who didn't. U.S. CEO and President David Gugina noted that AI is driving a complete transformation in the way that Walmart thinks about their business. He said, Sparky is essentially helping us evolve from traditional search to intent-driven commerce. From an economic standpoint, better discovery and higher conversion translates into bigger baskets and greater frequency. Sparky is helping customers find the things they need, they want, and they love, and it's strengthening our digital unit economics as it scales.
Starting point is 00:07:19 Next up, moving over to the company that dethroned Walmart off the top of the Fortune 500, Amazon is keeping a close eye on AI adoption with new metrics in their employee tracking system. The information reports that Amazon has been using an internal system called Clarity to measure various elements of AI tool use within the company. The system, which is also used to measure other elements of employee performance, is now being used to track overall AI usage by teams, as well as which tools are seeing the most use. The monitoring doesn't just include Amazon's in-house tools,
Starting point is 00:07:47 but also external AI products that staff are encouraged to use. The tracking goes well beyond software engineering and standard white-collar functions, with Amazon also keeping tabs on how the company's supply chain optimization team is making use of AI. While Amazon has maintained that AI was not the direct cause of their massive recent layoffs, the framing of the assessment certainly implies a push to realize AI productivity gains. Employees are asked how they have, quote, accomplished more with less, and for specific examples where they have remained innovative, force-multipied using AI and delivered results while reducing or not growing headcount.
Starting point is 00:08:18 Moving over to the big consulting world, Accenture is laying down the law when it comes to AI use in the workplace, telling senior managers that no AI, no promotion. The consulting giant has begun collecting data on how some senior employees use AI tools and explicitly tied the metrics to career progression. According to an email viewed by the Financial Times, Accenture has told staff that promotion to leadership roles will require regular adoption of AI. You might remember that Accenture embarked last year on one of the more ambitious AI upskilling projects, at the time CEO, Julie Sweet, said that the staff who failed to adopt
Starting point is 00:08:50 AI workflows will be, quote, exited from the company. This week's email reinforced that initial training is now over and use of AI. is a fundamental requirement of the job. It stated, use of our key tools will be a visible input to talent discussions during the summer promotion cycle. In their story about this, Financial Times noted that AI holdouts are becoming a major problem across the consulting industry. Three executives at Big Four accounting and consulting firms said that convincing senior managers and partners to use AI has been a much more difficult task than introducing the tools to junior staff. One executive said that older, more senior figures at the firms are more set in their ways,
Starting point is 00:09:23 requiring a carrot-and-stick approach. It'll be interesting to say, see how much internal resistance they find. One person familiar with the policy change said they would, quote, quit immediately if it affected them, while another source criticized the quality of the tools deployed at Accenture, describing them as broken slop generators. In a press statement, Accenture, we explain the need to keep pushing, commenting, our strategy is to be the reinvention partner of choice for our clients and to be the most client-focused AI-enabled great place to work. That requires the adoption of the latest tools and technologies to serve our clients most effectively. and to understand why you only need glance at Accenture's share price.
Starting point is 00:09:56 The stock is down 17% year-to-date and 45% over the past year. Now, this is pretty interesting to me as a bellwether of where corporations might go. I think Hedgey at Hedgy Markets on X probably sums up the feeling of a lot of folks when he writes, if these tools were actually useful, people will just use them. You don't need to track logins and tie them to promotions. The fact that companies are resorting to this tells me adoption isn't happening organically, which raises questions about whether the tools are delivering value,
Starting point is 00:10:23 or just generating metrics for leadership to point at. I don't think this is necessarily a super cynical take, but I do think it's wrong. The biggest issue that we find across all of our surveys at AID LeBief as well as everything we do at Superintelligent is the problem of time. People inside enterprises report that they don't have time to learn the technology that would save them time. And unfortunately, the vast majority of companies we interact with don't create specific time carveouts for their people to learn how to use these tools.
Starting point is 00:10:50 They simply expect people to figure out that time on their own. That creates a situation where people feel negatively about these tools because they're just another layer of stuff that they have to do, which creates the need for mandates like this. Now, to the extent we're talking about tool quality, I do think that in many corporations there is an issue of the tools that are approved for work, being pretty far behind what people have access to in their personal lives. Probably the second most frequent complaint we see outside of the I don't have time to learn this stuff, is at home I'm using Opus 4.6, and at work I have a terrible old version of co-pilot. In any case, I do think that to some extent Accenture is an extreme.
Starting point is 00:11:22 example of this because of the point that they're making that if they are in the business of bringing this new technology to their people, they really kind of need to know about it. But I wouldn't be surprised to see more mandates like this in the months and year to come. For now though, that is going to do it for today's AI Daily Brief Headlines edition. Next up, the main episode. Agentic AI is powering a $3 trillion productivity revolution, and leaders are hitting a real decision point. Do you build your own AI agents, buy off the shelf, or borrow by partnering to scale faster. KPMG's latest thought leadership paper, Agendic AI Untangled, navigating the build, buy or borrow decision, does a great job cutting through the noise with a practical framework to help
Starting point is 00:12:03 you choose based on value, risk, and readiness. And how to scale agents with the right trust, governance, and orchestration foundation. Don't lock in the wrong model. You can download the paper right now at www.kpmg.us slash navigate. Again, that's www.kpmg.us.us slash navigate. As a consultant, responding to proposals can often feel like playing tennis against a wall. You're serving against yourself, trying to guess what the client really wants. That all changes with Insight-Wise. Now you've got an AI Proposals engine that thinks just like your client. It returns to the brief time and time again, picking apart your work,
Starting point is 00:12:39 identifying key evaluation criteria and win themes, and making recommendations to ensure you stand out. Suddenly, you're on center court. But this time, you've got a secret weapon. Insight-wise gets rid of all the time-consuming manual work so you can focus on winning more business more often. Generate reports, pull insights from your own data, build competitive advantage, and go to sleep before 2 a.m.
Starting point is 00:13:00 When it comes to proposals, you only get one shot. With Insight-Wise, make yours an ace. Today's episode is brought to you by my company Superintelligent. In 2026, one of the key themes in Enterprise AI, if not the key theme, is going to be how good is the infrastructure into which you are putting AI and agents. Super Intelligence agent readiness audits
Starting point is 00:13:20 are specifically designed to help you figure out, one, where and how AI and agents can maximize business impact for you, and two, what you need to do to set up your organization to be best able to leverage those new gains. If you want to truly take advantage of how AI and agents
Starting point is 00:13:34 can not only enhance productivity, but actually fundamentally change outcomes in measurable ways in your business this year, go to B-Supert.aI. Blitzy is driving over 5x engineering velocity for large-scale enterprises. A publicly traded insurance provider leveraged Blitzy to build a bespoke payments processing application, an estimated 13-month project, and with Blitzy, the application was completed in live in production in six weeks. A publicly traded vertical SaaS provider used Blitzy to extract services from a 500,000 line monolith,
Starting point is 00:14:03 without disrupting production, 21 times faster than their pre-Blitzy estimates. These aren't experiments. This is how the world's most innovative enterprises are shipping software in 20, You can hear directly about Blitzy from other Fortune 500 CTOs on the modern CTO or CIO classified podcasts. To learn more about how Blitzy can impact your SDLC, book a meeting with an AI Solutions consultant at blitzy.com. That's BLITZY.com. Welcome back to the AI Daily Brief. Today we are talking about Gemini 3.1 Pro.
Starting point is 00:14:38 But I want to situate it in a larger question. And I will start by saying, sorry to Google for drawing the short end of the episode naming straw on this one, if it had been OpenAI that released 5.3, it would have been something very similar. The context we now all operate in is one where instead of getting big model releases infrequently, we get very incremental model releases much more frequently. There is in fact this meme which came from 2025, but which is more true than ever,
Starting point is 00:15:03 which is a circular chart that starts open AI, introducing the world's most powerful model, that moves to GROC, introducing the world's most powerful model, that moves to Gemini, introducing the world's most powerful model, that moves to anthropic, introducing the world's most powerful model, that moves to OpenAI introducing the world's most powerful model, and so on. In that, with the release of 3.1 Pro, we are now at the Gemini section of that chart. And the point of course is that at this stage, state-of-the-art in terms of incremental gains on benchmarks, feels less significant as a barometer
Starting point is 00:15:31 of a model's importance than it ever has before. When people say, what is the best model, it is not only constantly shifting, but also, I think, in practice, a question that is use case-dependent. So let's talk about Gemini 3.1 The first reactions, both good and bad, and then try to figure out where it fits in the ecosystem of models. Now, it is worth pointing out that I think Gemini was absolutely due for a bit of an upgrade. The conversation for pretty much all of 2026 and really heading back into the end of 2025 has been dominated by Anthropic v. OpenAI, or more specifically, Codex versus ClaudeCode
Starting point is 00:16:06 code. Despite Gemini 3 having such wide acclaim when it came out towards the end of last year, Google and Gemini have been really nowhere in the conversation when it comes to this incredibly important use case of coding. Now, it is worth noting that there are lots of different categories of AI users, and it is not the case that for all of them, coding is what matters. It would be completely reasonable, in other words, for Google to put its priority in other areas. However, it certainly doesn't seem like Google is explicitly not trying to compete in that area. They're clearly investing a lot in Google AI Studio and anti-gravity, but when it comes at
Starting point is 00:16:38 least to the most enfranchised subset of users, they were kind of, at least in our recent survey results at distinct third. All of the big models, Claude, ChatGBTGBT and Gemini had some broad usage in our January AI usage survey. Gemini, in fact, matched Claude with 80% of respondents having used it sometime last month, both falling slightly behind Chatchabit, which was at 87%. However, in terms of the number of people reporting that it was their primary model, Gemini was down in third at 16.1%. And at first blush, there is a lot to be impressed with Gemini 3.1 Pro. Going by the benchmarks, It is a distinction number one when it comes to humanity's last exam, not using tools, sets a new high for the GPQA Diamond Scientific Knowledge benchmark,
Starting point is 00:17:19 sees a big jump up for Gemini on Terminal Bench 2.0, coming in ahead of Opus 4.6. And while it wasn't ahead of Opus 4.6 on Sweet Bench-Verified Agentic coding test, it was nipping at its heels 80.6% compared to 80.8%. The biggest jump in the one that a lot of folks are talking about was on Arc AGI2. While Opus 46 scored a 68.8% on that test, the jump between Gemini 3 Pro and Gemini 3.1 Pro was from a 31.1% with Gemini 3 to 77.1% on Gemini 3.1 Pro.
Starting point is 00:17:51 Google CEO Sundarpa Chai says, Gemini 3.1 Pro is great for super complex tasks like visualizing difficult concepts, synthesizing data into a single view, or bringing creative projects to life. Demis Hasabas points to major improvements in core reasoning and problem solving. Google VP Josh Woodward calls out who they want the model to appeal to, writing, to the scientists, the engineer, and the developer. Gemini 3.1 Pro has arrived. It's a significant leap in complex reasoning.
Starting point is 00:18:17 Once again, he points to Arc AGI2, so it's great edigentic tasks, intricate coding, and data synthesis projects. You should see fewer errors, better logic, and surprisingly good SVGs. Attached to the post is an animated image of a seal bouncing a beach ball on its nose. So what are the first impressions? The model is still rolling out and it's only available in certain pockets of the Google ecosystem, which, by the way, is its own challenge that people like Ethan Malik had pointed out, that the Google ecosystem of AI is so diverse that it's sometimes hard to wrap your head around
Starting point is 00:18:46 what model lives where, but among those who have tried it, a lot of the responses are pretty positive. AI developer Eric Hartford wrote, loving Gemini 3.1 Pro. It made three huge improvements to my compiler and saw things that even chat GPT 5.2 Pro extended and Claude Opus 4.6 extended it couldn't see. Designer and entrepreneur Mang 2 writes, Gemini 3.1 Pro is an absolute beast for creating landing pages. It understands design details and animation so well. Insane
Starting point is 00:19:11 upgrade for web designers. And then of course there's Ark EGI 2, where it came in at a 77.1%, but that might not even be the most impressive thing. The Arc leaderboard measures not only the score but the cost per task. So for example, although Gemini 3 Deep Think, which was released last week,
Starting point is 00:19:28 got a higher overall score, it did so at more than 10 times. the cost. 3.1 Pro achieved that score at less than a buck a task. On artificial analysis's overall intelligence index, Google jumped all the way from the sixth spot behind various versions of Claude, GPT, and even a Chinese model, GLM5, all the way up to number one. What's more, artificial analysis points out that it's doing so at a more efficient cost. They write, Google is once again the leader in AI. Gemini 3.1 Pro preview leads the artificial analysis intelligence index, four points ahead of Claude Opus 4.6, while costing less than half as much to run.
Starting point is 00:20:05 They said that on their tests, it led six of the ten evaluations that make up the index, with the biggest gains in reasoning and knowledge, coding, and hallucination reduction. They also point out that it does so with some serious token efficiency. They write that its processing efficiency combined with lower per token pricing means that 3.1 Pro Preview costs less than half as much as Opus 4.6 max to run, although it still is nearly twice as much as the leading open weights model, which is that GLM5 that I mentioned. In terms of specific tests, they found that Gemini 3.1 Pro led their coding index achieving the hardest score on both Terminal Bench Hard and SciCode,
Starting point is 00:20:40 but that one area where they were kind of lacking was on real-world agentic performance. This is around that GDP-Val test, which we've talked about before, which is an agentic evaluation that focuses on real-world tasks. While Gemini 3.1 Pro did jump up meaningfully from Gemini 3 Pro, it was behind Sonnet 4.6, Opus 4.6, GPT 5.2, and GLM5. That was something that a number of skeptical commentators focused on. Scaling01 on Twitter writes, Gemini 3.1 Pro's GDP vows scores are concerning. Simon Smith points out that maybe that suggests that work tasks aren't Google's focus. Indeed, he even goes so far as to speculate, they have a stake in Anthropics, so maybe they're okay with that. When it comes to coding outside of that one example that I've mentioned already,
Starting point is 00:21:21 I'm just not seeing enough feedback yet to really know. Some people had trouble actually finding the model or getting it to work inside anti-gravity or Gemini-CLAI, although when they did, as reported by Matt Veloso, they had, quote, awesome results so far. Akash Gupta gets at what I think is likely to become a more discussed aspect of this, which is the cost performance frontier. He writes, Best to AI Model Crown now rotates on a weekly basis,
Starting point is 00:21:45 with each lab holding a different column of the same spreadsheet. The real number in this release is the 96 cents per task on Archage G.I.2. Google went from 31.1% to 77.1% in three months while keeping pricing at $2.00 per million input tokens. The same pricing is Gemini 3 Pro. They doubled the intelligence and charged zero incremental cost. That's the game now. The frontier is commoditizing so fast that benchmark leadership lasts weeks, not quarters. OpenAI, Anthropic, and Google are all within single-digit percentage points of each other on most evals. The three labs are converging on comparable intelligence, but diverging on distribution. Google has 2 billion Chrome users, Android workspace,
Starting point is 00:22:24 and cloud. That's the real moat in this chart, not the 77.1%. Whoever makes intelligence ambient and cheap wins. And this benchmark table with its patchwork of leaders across every column is the clear sign yet that raw capability is table stakes. I think there is a lot of truth in that, and so one of the reasons why, yes, Gemini 3.1 Pro does matter, is that it's pushing on the cost frontier, not just a performance frontier. Now, the other thing about Gemini is that it's very clear that the productization of its multimodal capabilities is something that really matters to Google. Alongside the new model update, Google Labs announced a new feature for their Prameli app called Photoshoot. They write, with Photoshoot, you can start from a single image of your product
Starting point is 00:23:05 and easily create high-quality customized product shots to elevate your marketing. That tweet went wildly viral. In fact, where a CEO Sundarpe Chai's tweet, announcing 3.1, had around 1 million views, the Google Labs tweet, announcing photo shoot, has 12.2 million views at the time of recording. Google Labs product director Jacqueline Kondselman wrote, clearly this hit a nerve. Turns out a lot of people have been waiting for a way to get professional product photos but didn't have the time of resources to make it happen. Now they can. Go try it. It's free. When folks like A16Z partner Justine Moore tried it, they also came away in practice. Another example of Gemini flexing its multimodal bona fides came with a partner announcement
Starting point is 00:23:46 from Replit when they introduced Replet animation. It is exactly what it sounds like, a tool to vibe code infographic videos, powered, they say, by Gemini 3.1 Pro. Replit's CEO, I'm Jud Massad, wrote, vibe coding as a term is a bit tragic because it implies you're merely making software, but you can really make anything. We've been having a lot of fun making videos with Replit animation, the kind I used to pay thousands of dollars for when we needed to do a launch video.
Starting point is 00:24:11 Also, if you dig around enough, you can see the types of things that people are using Gemini 3.1 Pro for are just a little bit different than the other tools. Sure, there's a bunch of weird Pelican SVG tests, but you also have examples like this one from Daniel Z who writes, Gemini 3.1 Pro vibe coded a double wishbone suspension. Independent double wishbone design, dynamic coilover shock absorver, vented disc brakes with performance caliper, real-time kinematic travel and steering simulation. AI isn't just generating visuals anymore.
Starting point is 00:24:41 Demis Heshabis shared an official example from the Google Deepmine account, where they used 3.1 Pro to build a realistic city planner app that has complex terrain, infrastructure mapping, and even simulates traffic. Google DeepMind chief scientist Jeff Dean shared an example of 3.1 Pro doing heat transfer analysis based on a CAD file and material properties, and then turning that heat transfer analysis at different times into a visual representation. Overall, I agree on the surface with latent space when they wrote, it's getting a little hard to say interesting things with all the round robin minor version updates
Starting point is 00:25:11 at frontier models every week. Gemini 3.1 Pro seems like a decent enough advance to catch up and in some cases supersede the fellow frontier models. It's better at some SVG design things and translating textual vibes to visual aesthetics, but that's kind of all they had to say. I think though, coming back to this question of why 3.1 Pro matters or why any new model release matters, the point that I was trying to make at the beginning is that it's not just about state-of-the-art of the benchmarks. That is, as Akash pointed out, table stakes. What's important is to try to understand what it does uniquely well. It's very clear, when you actually dig deep that Gemini is flexing its multimodal capabilities in a full spectrum of ways, from being able to do much more technically and scientifically advanced
Starting point is 00:25:53 work, to being at the core of products that aren't possible with the other models. Now, that doesn't necessarily mean for Google that they can still get away with competing on core use cases like coding, but part of the reason I think we found that even though it was the primary model for just 16.1%. Still a full 80% of people had used Gemini in the previous month because there are just some use cases that it is ideally suited for. It is very clear that as we head deeper into the AI and age and age, the greatest gains will not come from just shifting wholesale from one model to the next as new capabilities emerge, but instead to understand with each model release what that particular model is going to do best and where it should be in your model portfolio. I'm excited to dig into
Starting point is 00:26:32 3.1 Pro and I'm sure I will have more to report in the week to come. For now, though, that that is going to do it for today's AI Daily Brief. Appreciate you listening or watching and until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.