The AI Daily Brief: Artificial Intelligence News and Analysis - Sonnet 4.6 Changes the Agent Math

Episode Date: February 18, 2026

Anthropic drops Sonnet 4.6 with a million-token context window and major gains in computer use, coding, and agentic workflows at a dramatically lower price point—immediately reshaping the economics ...of OpenClaw-style agents. Meanwhile, Grok 4.2 enters public beta with a multi-agent debate system and promises rapid weekly improvement, and Apple ramps up AI wearables. In the headlines: Apple’s AI glasses push, Spotify engineers stop writing code by hand, Meta commits to millions of Nvidia GPUs, Chinese AI price wars, and a possible SaaS rebound. Want to build with OpenClaw?LEARN MORE ABOUT CLAW CAMP: ⁠⁠https://campclaw.ai/⁠⁠Or for enterprises, check out: ⁠⁠https://enterpriseclaw.ai/⁠⁠Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.kpmg.us/AIpodcasts⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Mercury - modern banking for business and now personal accounts. Learn more at https://mercury.com/personal-bankingRackspace Technology - Build, test and scale intelligent workloads faster with Rackspace AI Launchpad - ⁠⁠⁠⁠⁠⁠http://rackspace.com/ailaunchpad⁠⁠⁠⁠⁠⁠Blitzy - Want to accelerate enterprise software development velocity by 5x? ⁠⁠⁠https://blitzy.com/⁠⁠⁠Optimizely Agents in Action - Join the virtual event (with me!) free March 4 - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.optimizely.com/insights/agents-in-action/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠AssemblyAI - The best way to build Voice AI apps - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.assemblyai.com/brief⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠LandfallIP - AI to Navigate the Patent Process - https://landfallip.com/Robots & Pencils - Cloud-native AI solutions that power results ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://pod.link/1680633614⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Interested in sponsoring the show? sponsors@aidailybrief.ai

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, we've got a new exciting model in Claude Sonnet 4.6 plus a new public beta from GROC. Before that in the headlines, Apple is getting in on the AI wearables game. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, Mercury, and Blitzy. If you are looking for an ad-free version of the show, you can find that over on Patreon.com or you can subscribe on Apple Podcasts, add free as just three bucks a month. To learn about sponsoring the show or really anything else about the AIDDB ecosystem, go to
Starting point is 00:00:42 AIDDailybrief.aI. Quick updates on a couple of the projects that we've talked about this week. It seems that you guys are in fact definitely interested in OpenClaw, as nearly 2,000 of you have signed up for Claw camp in the first 36 hours. I've also seen a ton of excitement from some really excellent companies for an enterprise executive sprint around OpenClaw and agent building more broadly, which you can, of course, find at EnterpriseClaW.A.I. And lastly, on the jobs front, I am still looking for the AIDB Clarkitect, someone to help me keep track of all of the OpenClaw resources out there and then actually build the new capabilities into products for this ecosystem. Like I said, all of this information and all
Starting point is 00:01:18 of these links are available at AIDDailybrief.A.I. Earlier this week, we discovered that Apple will be holding a product announcement event at the beginning of March, and now we are getting stories that the company is ramping up work on multiple wearable devices for the AI era. Bloomberg's Mark German reports that development is being fast-tracked on a trio of AI wearables. Apple apparently plans to create a pair of smart glasses, a pendant that can be worn as a pin or a necklace, and camera-laden AirPods with expanded AI capabilities. The three devices are all intended to connect to an iPhone and provide a hands-free interface
Starting point is 00:01:50 for AI Siri. The pendant and AirPods are intended to be the low-end offering. Both will have low-resolution cameras that can provide context to the AI assistant, but which won't be good enough for taking pictures or recording video. The design brief is simply to offer a cheap, always-on camera and microphone to function as series eyes and ears. No word on when to expect the pendant, but the camera-equipped AirPods have been in development for some time and could be on shelves as early as this year. The smart glasses are designed to be more upscale and feature-rich, competing directly with metaray bands. Several prototypes of the smart glasses have been distributed
Starting point is 00:02:21 internally after significant progress in recent months. The glasses won't feature a display, but will have speakers, microphones, and high-resolution cameras. Apple is hoping their build quality and camera technology can give them the edge against meta and their current domination of the nascent category. Reportedly, December is the target for the start of production with a public release next year. Now between this, the March 4th announcement, and of course the absolute proliferation of Mac minis as the device of choice for open-claw agents, there has been a huge discussion on X this week regarding Apple's AI strategy. Many shared this chart of AI Cappex going parabolic at rival big tech firms while Apple is actually guiding a 19% drop in Kappex. The tone of the conversation was summed up by
Starting point is 00:02:59 Akash Gupta who said, did Apple just luck into the smartest AI strategy in tech? The argument, of course, is that while the hyperscalers spend hundreds of billions of dollars on data centers, with very difficult or at least long-term ROI calculus, Apple is in the meantime shipping Mac minis as fast as they can make them, and licensing Google's models for a billion dollars a year, basically pocket change compared to the cost of building their own training cluster for an in-house model. If Apple can actually get the trifecta of AI wearables to market alongside a functional version of AI Siri, maybe things start to look better for them. CEO Tim Cook seemed to imply that this is in fact the strategy during an all-hands meeting last week. He reportedly told staff that Apple is working on new categories of products powered by
Starting point is 00:03:37 AI, remarking we're extremely excited about that. The world is changing fast. And despite skepticism of of AI wearables in the past, Ben Pouladian summed up the vibes when he posted, I'll take all three. Where should I leave my credit card? Next up, another interesting story from the earnings call cycle. During their last week's earnings call, Spotify co-CEO Gustav Sauterstrom said that his company's top developers are pretty much done writing code by hand. He reported that his most senior engineers are saying that they haven't written a single line of code since December.
Starting point is 00:04:07 Soderstrom gave a concrete example of a developer that gave clawed instructions for a bug fix or a new feature over Slack on their phone during the morning commute. Spotify's internal platform allows them to receive the code, validated, and push it to production all before they arrive at the office. Sotomström said that he believes that this is just the beginning of the AI coding era with much greater efficiencies yet to be unlocked. He emphasized, this is a big change. It is real, it is happening fast. We're retooling the entire company for this age, and it's going to be a lot of change. But as I said before, change if you capture it is opportunity. Moving over to Chip World, Meta has signed a massive
Starting point is 00:04:39 partnership with Nvidia, including a commitment to buy millions of AI chips. The multi-year strategic partnership will involve deployment of current generation Blackwell GPUs as well as the next generation Rubin chips. In addition, Meta will use standalone gray CPUs as well as utilizing their next generation networking equipment. Now, big tech company partners with Nvidia stories are basically an everyday occurrence at this point, so what makes this one interesting? The story here is really the scale. At this stage, the largest data centers contain several hundred thousand GPUs and you can likely count those on one hand. The purchase of millions of chips implies that meta plans to build multiple new data centers at world-leading scale over the
Starting point is 00:05:13 coming years. Invita only produced around 5 million AI chips last year, so an order of this size could be a strategic move to corner the market on the leading AI chips. Analysts said the deal likely stretches in the tens of billions, and will soak up a good portion of meta's 135 billion capex plan for 2026. The deal also isn't just about AI training and inference, with meta planning to migrate large portions of their social media recommendation engines to Nvidia Silicon. For meta, it's an interesting commitment to just paying Nvidia for their technology rather than trying to find alternatives. Each of the large AI companies have spent the last year spinning up custom silicon projects or partnering with AMD in an attempt to avoid
Starting point is 00:05:48 the Nvidia tax, with meta pursuing both of those avenues last year. This deal would seem to imply they've settled on Nvidia as their major supplier, but it could also just simply be about volume, with invidia being the only chipmaker with a proven track record of delivering chips at this scale. announcing the deal, Jensen Huang said, no one deploys AI at meta scale, integrating frontier research with industrial scale infrastructure to power the world's largest personalization and recommendation systems for billions of users. Through deep co-design across CPUs, GPUs, networking, and software,
Starting point is 00:06:16 we are bringing the full Nvidia platform to meta's researchers and engineers as they build the foundation for the next AI frontier. Summing it up pretty simply, Amidhiz investing writes, AI data center buildout cycle is simply not over. Speaking of which,
Starting point is 00:06:30 software sellers might be exhausted as the stock market levels out. Both major indices eeked out slight gains on Tuesday as sentiment began a cautious turnaround. Louis Nivellier, CIO, for Navellié and Associates said, it is likely that we will look back on the current volatility as a buying opportunity,
Starting point is 00:06:45 though it's difficult to estimate when the volatility will be behind us. The past month has, of course, been brutal for AI stocks, with the MAG 7 now at five-month lows. It's been even worse for AI-exposed software firms, with sector flagships like Salesforce and Adobe down more than 20% on the year. The sell-off has been so severe that some executives took direct action to steady the market. ServiceNow CEO Bill McDermen announced in a regulatory filing that he would buy $3 million in his company's stock.
Starting point is 00:07:10 McDermott is the first major SaaS CEO to buy stock during this bloodbath, which made it seem like even the insiders had lost faith in the sector. Multiple ServiceNow executives also canceled all future selling plans. Meanwhile, several private software companies released their earnings early in a bid to show they haven't been disrupted by AI. McAfee's Q4 earnings were little changed from last year at $626 million. Rocket software disclosed 5.2% revenue growth, while Perforce software had a slight revenue decline but detailed AI product development plans in their earnings call. Absolutely, it is way too early to say the SaaSpocalypse is over, but this week does seem to be giving investors a slight breather to reassess the value of AI and software stocks moving forward. Over in China, it is the Chinese New Year,
Starting point is 00:07:50 and AI companies are the ones handing out the red envelopes. Alibaba, Tencent, and ByteDance are all offering massive giveaways in a bid to capture new chatbot users. The promotions vary with ByteDance running a high-value sweepstakes, while Alibaba and Tencent are giving away a few dollars. worth of vouchers to each user. Part of the big push is to get users to try out nascent AI shopping agents. And yet, the information notes that Chinese AI companies could be facing an even tougher path to AI monetization than their U.S. counterparts. Each of the major Chinese labs is still offering high-volume usage and advanced features for free. Leon Fan, a Beijing-based AI founder noted a cultural barrier, commenting, in China, consumers know they can always find most online services for free. If one major
Starting point is 00:08:29 AI chatbot started charging its users, people would immediately migrate to other free chatbots that are just as good. That said, while it's not a pathway to profitable AI, the giveaways are serving their purpose by boosting usage during the Spring Festival. Bightan said their Monday night promotion garnered 1.9 billion chatbot interactions, while Alibaba said their agentic shopping focus promotion had led to 130 million first-time users trying out the service so far this month. There is, of course, another AI story going on in China, which is the rise of embodied AI in the form of robotics. At some point, we're probably due for an update show, as the videos coming out this year suggests a pretty extraordinary pace of development.
Starting point is 00:09:05 For now, however, that is going to do it for today's AI Daily Brief Headlines edition. Next up, the main episode. Hello, friends. If you've been enjoying what we've been discussing on the show, you'll want to check out another podcast that I have had the privilege to host, which is called You Can With AI from KPMG. Season 1 was designed to be a set of real stories from real leaders, making AI work in their organizations, and now season 2 is coming and we're back with even bigger conversations.
Starting point is 00:09:33 This show is entirely focused on what is the show. like to actually drive AI change inside your enterprise and as case studies, expert panels, and a lot more practical goodness that I hope will be extremely valuable for you as the listener. Search you can with AI on Apple, Spotify, or YouTube, and subscribe today. This episode is brought to you by Mercury, radically different banking, now available for personal accounts. I already use Mercury for my business, so when they introduced personal accounts, it made immediate sense for me. I try to bring the same level of intention to my personal financial finances that I bring to building companies, and most traditional banks just do not
Starting point is 00:10:09 feel designed for that. With Mercury Personal, you can toggle between business and personal in a click. You can set up subaccounts for specific goals, automate transfers so projects and savings fund themselves, and put idle cash to work with high-yield savings, all without friction. It's built for people who care about how their money moves and want tools that actually keep up. Visit Mercury.com slash Personal to learn more. Mercury is a fintech company, not an FDIC insured bank. Banking services provided through Choice Financial Group and Column A.
Starting point is 00:10:35 FDIC. If you're looking to adopt an agentic SDLC, Blitzy is the key to unlocking unmatched engineering velocity. Blitzie's differentiation starts with infinite code context. Thousands of specialized agents ingest millions of lines of your code in a single pass, mapping every dependency. With a complete contextual understanding of your code base, enterprises leverage Blitzy at the beginning of every sprint to deliver over 80% of the work autonomously. Enterprise-grade, end-to-end tested code that leverages your existing services, components, and standards. This isn't AI autocomplete. This is spec and test-driven development at the speed of compute. Schedule a technical deep dive with our AI experts at blitzie.com.
Starting point is 00:11:11 That's BLITZY.com. One more quick thing before we get back to the show, if you are a business leader who is thinking about how all of this crazy open claw and agent stuff can impact your business, I've got something for you. If you go to enterprise claw.aI, you can sign up to get more information about a new executive sprint that we're going to be doing that will help leaders inside companies figure out what the real challenges and opportunities of agents and agent systems like OpenClaugh are going to be for your particular companies. That program will involve you learning at least on a
Starting point is 00:11:43 personal level how to build agents and agent teams so that you have that basis of experience to then walk through a set of blueprints for the types of challenges you're going to face around things like security, governance, and more. The first cohort is kicking off in March, so head on over to EnterpriseClaw.ai to sign up for more information. Welcome back to the AI Daily Brief. You had a sense at the beginning of this week, that we might be in for a good one when it came to new model releases, and so far that is absolutely the case. We have not yet seen the much-rumored deep-seek version 4, but we did get an early preview of Grok 4.20, as well as Sonnet 4.6, which, as you'll see, especially in the context
Starting point is 00:12:23 of the open-claw conversation, has a lot of people excited. Now, we're going to look at the new models in terms of some benchmarks, of course, as well as first impressions from the peanut gallery, but the big thing that I think is notable. After looking especially at the reaction, to Sonnet 4.6 is just how different evaluation of new models is getting. It is much more discrete, much more specific, and honestly much more useful. Yes, sometimes with the big flagship models like Opus 4.6, or I'm sure when we get GPD 5.3 it'll be this way. The question is how much, if at all, does this push the state of the art? How much does it be the previous best model in terms of raw capability? Increasingly, however, the discourse is not about just raw capability.
Starting point is 00:13:06 but instead a set of questions about what specifically the new model adds to the capability set and how it can be plugged into people's model stack. The questions that people explore are about cost, contextual performance, discrete capabilities, and how those add up to new value around specific use cases. So with that in mind, let's talk about Sonnet 4.6. As has been the case with their previous Sonnet releases, this model is all about delivering more reasonably priced high performance, specifically now in the context of agents.
Starting point is 00:13:35 Anthropic writes that it is open, Opus-level intelligence at a price point that makes it practical for far more tasks. Couple of the key details. One is that it has a million token context window, which is the first time for that in a Sonnet class model. Anthropic describes it as enough to hold entire codebases, lengthy contracts, or dozens of research papers in a single request. Now, that difference in the context window opens up so many use cases that one of the things
Starting point is 00:13:57 that will be interesting to watch going forward is how much of Opus's usage was just about that million token context window as opposed to any other performance differentiations. Now, one of the big callouts in terms of new capability set is around computer use. Anthropic writes, Almost every organization has software it can't easily automate, specialized systems and tools built before modern interfaces like APIs existed. To have AI use such software, users would previously have had to build bespoke connectors. But a model that can use a computer the way a person does changes the equation.
Starting point is 00:14:29 In the 18 months since Anthropics started tracking computer use, via the OS World series of benchmarks, The Sonnet models have jumped from a 14.9% all the way up to 72.5% today. The latest jump between Sonnet 4-5 and Sonnet 46 was from 61.4 to 72.5. The model certainly still lags behind the most skilled humans at using computers, but the rate of progress is remarkable nonetheless. It means that computer use is much more useful for a range of work tasks and that substantially more capable models are within reach.
Starting point is 00:14:58 I think their point that we are on the verge of models that can use computers like humans without APIs, is a powerful one. Compared to the previous Sonnet 4.5 model, this model is much stronger on coding benchmarks, being now roughly in line with Opus 4.5. The model is also now state-of-the-art in agentic financial analysis and office task benchmarks even beating Opus 4.6. The cost is $3 per million input and 15 per million output tokens compared to Opus's $5 and $25, and Sonnet 4.6 is available to free users which could end up being meaningful. In their testing in Claude Code, Anthropic found that users preferred Sonnet 4.6 over Sonnet 4.5 about 70% of the time. Users, they say, reported that it more effectively read the
Starting point is 00:15:40 context before modifying code and consolidated shared logic rather than duplicating it. This, they said, made it less frustrating to use over long sessions than earlier models. Interestingly, they say users even preferred Sonnet 4.6 to Opus 45, the model that launched this huge inflection point that we've been talking about all year, 59% of the time. Those users said that Sonnet 4.6, was significantly less prone to over-engineering and laziness, and meaningfully better at instruction following. One really interesting test was around the Vending Bench Arena, which is a test to see how well different models can run a simulated business over time. They write, Sonnet 4.6 developed an interesting new strategy. It invested heavily in capacity for the first 10 simulated months, spending significantly
Starting point is 00:16:19 more than its competitors, and then pivoted sharply to focus on profitability in the final stretch. The timing of this pivot helped it finish well ahead of the competition, all of which is to say, that Anthropic is very clearly saying that Sonnet 4-6 is not just a cheaper opus. It has some things that it does that are unique and highly capable, and it's worthy of consideration on its own terms, not just because of cost. So we haven't had this for long, but what is the state of the conversation? There's been a surprising amount of conversation around rumors of whether this was originally supposed to be Sonnet 5.
Starting point is 00:16:50 Viramus Ronnie writes, rumor going around, Anthropic Sonnet 5 didn't hit internal benchmarks and may ship a Sonnet 4.6 instead. If true, that tells us a few things. The jump from 4.X to 5 was expected to be meaningful. Whatever they tested didn't clear that bar. They'd rather relabel than overpromise. Veer says that either means it's conservative branding or there's a performance plateau.
Starting point is 00:17:11 He concludes, if they're saving Sonnet 5, then something bigger is still in the oven. If not, we may be entering the era of smaller, hard-won improvements instead of flashy jumps. Without having any privileged information about what's going on inside the company, it is very clear that overall, across all the companies, we are definitely in the era of smaller, harder-won improvements instead of flashy jumps. jumps, although that might not be because of some constraints in the scaling, it might be just a response to consumer expectations, and the absolute cudgeling that OpenAI took when the jump to GPT5 wasn't big enough to get people excited. Others think this is just business strategic.
Starting point is 00:17:43 Sean Sullivan writes, I have a feeling that Sonnet 5 has been done for some time now, but it's way cheaper than Sonnet 4.5, and Anthropics still has market leadership and API usage, meaning that they don't have to drop it until someone comes up to compete. Now, in terms of people who have actually used it, the response is pretty good. Aaron Levy from Box, writes, we tested Sonnet 4.6 in early access on our Box AI complex work eval, and it's a big upgrade over Sonnet 4.5, seeing a 15 percentage point jump in performance in accuracy. Sonnet is delivering a huge boost across reasoning capabilities, tool use, working with complex data, and more. All of these, Aaron points out, are necessary improvements for agents to be involved in sophisticated
Starting point is 00:18:21 workflows in an enterprise. Reinforcing the idea that this isn't just cheaper opus, artificial analysis writes, Claude Sonnet 4.6 is the new leader in GDP Val, slightly ahead of Anthropics Opus 4.6, on agentic performance of real-world knowledge work tasks less than two weeks after its launch. That said, they did note that in their testing, Sonnet actually used significantly more tokens than previous versions of Sonnet and meaningfully more than Opus 4.6 as well. That meant that although Sonnet 4.6 slightly beat out Opus 4.6, that the story might not be as simple as, hey, this cheaper model does even better. Basically, the cost for Sonnet to outperform Opus made Sonnet even more expensive than Opus.
Starting point is 00:19:01 From a positioning standpoint, Trung Fan pointed out that Anthropic, as focused as they are on enterprises, seems to have made a decision to not totally see the ground for consumers as well. They point out that in the Sonnet 4.6 demo, they show Claude renewing someone's license plate at the DMV, obviously a very benign, everyday painful type of use case. A lot of the chatter is, of course, around computer use and what that's going to mean, and many are hammering just how important the cost dimension is in the context of what we're using these models for today. Kaleser writes,
Starting point is 00:19:30 The price point thing matters way more than people realize. Running agents that loop hundreds of times per task, dropping to Sonnet tier pricing while staying near Opus level means the same budget goes 5x farther. That's not a minor upgrade that's a different category of what you can build. Zach Schmau writes, Opus class reasoning at Sonnet pricing means you can actually afford to let agents think harder on every step without blowing through your API budget.
Starting point is 00:19:51 That was the real bottleneck. And of course, given where the state of the conversation is right now, a lot of people pointed out its relevance for OpenClaw. OpenClawe's super champion Alex Finn writes, This is the best model for OpenClawe ever. It is human-level at computer use, the most important part of Claw for a fraction of the price. Meta Alchemist writes, Sonnet 4.6 feels like it was made for OpenClaw, with how much emphasis they put on running
Starting point is 00:20:14 the apps on your computer and tool usage. If you were using Clawed with OpenClau, using Sonnet 4.6 will be faster and cheaper compared to Opus. Prajwell Tomar writes, I burned through a stupid amount of money in 48 hours using Opus 4.6, switch to Sonnet 4.6, it feels almost the same but cost of fifth as much. For pure coding, Opus is still better. But for agendic workflows inside OpenClaw, Sonnet 4.6 performs nearly as well, and that's what actually matters. When agents are looping, researching, and executing tasks all day, cost efficiency becomes
Starting point is 00:20:43 everything. If you're using Opus 4.6 for OpenClaw right now, switch to Sonnet 4.6. You'll save a lot of money without sacrificing real performance. OpenClaw for its part very quickly pushed an update to officially support the new model. Summing up the discourse in their AI news newsletter, latent space writes that Sonnet 4.6 matters because one, long context, i.e. that 1 million token window is becoming operational versus just a spec. That two, agent performance claims are increasingly harness dependent, meaning that you have to ask not just about model,
Starting point is 00:21:12 but where and how it's being used, and that 3, computer use is becoming a marquee capability. Overall, the vibe is people are excited, and excited especially to try it in the their agent systems like OpenClaw. Now, that wasn't the only model we got yesterday. Elon Musk posted, The GROC 4.2 release candidate public beta is now available for use. You need to select it specifically. Critical feedback is appreciated.
Starting point is 00:21:36 Unlike prior versions of GROC, 4.2 is able to learn rapidly, so there will be improvements every week with release notes. So this then is a little bit different. It's not a full-on release that has a benchmark scorecard like Sonnet 4.6, nor is it a fixed state where the next set of improvements are going to come with the next model number, instead GROC 4.2 itself is supposed to improve over time. Indeed, Elon separately said, GROC 4.2 will be about an order of magnitude smarter and faster than GROC 4 when the public
Starting point is 00:22:02 beta concludes next month. Still many bug fixes and improvements landing every day. The public beta gives us more critical feedback to address. Now, one of the things that is extraordinarily difficult, anytime we get an XAI model, is what I'll call the Elon Rorschach test. If you dislike Elon and the X algorithm knows that you like content where people are crapping on things that Elon does, you are going to see endless tweets about how 4.2 is just a total POS. On the flip side, if you are an Elon Stan, that same X algorithm is going to deliver you
Starting point is 00:22:34 a whole slew of tweets about how awesome 4.2 is. Among the very few people that I can find that I think exist in between those two paradigms, first impressions are that it is, if nothing else, improved. Dr. Daria Anutmasz writes, I just got access to GROC 4.20 beta and I'm testing it on biomedical questions. I can already say it has greatly improved. Now, the one specific feature that lots are talking about is the approach that 4.20 takes,
Starting point is 00:22:58 where, in responding to a prompt, four separate agents think on their own, debate amongst themselves, and then come up with the best answer together. Benjamin DeKracker writes, the Grogh 4.2 agent teamwork system is cool and appears well done. However, he says, the real value in these multi-agents
Starting point is 00:23:12 is when they're not all the same model or even the same provider. A mixed team from four different models, Grock, Claw, GopT, and Gemini is the sweet spot. Ultimately, from where I'm sitting, there is not quite enough available on 4.2 to really know what to make of it. I think the thing that I will be watching most closely
Starting point is 00:23:26 is this idea that it itself is going to get better rapidly. Last thing I wanted to flag today isn't a new model release but a new product release. Normally, I wouldn't necessarily feature this until it had a lot more folks with hands on it, but there's a new platform called Dreamer that seems to be focused on abstracting away all the complexity around agent design to still build the agents that you need to solve your problems.
Starting point is 00:23:48 I don't necessarily think that they describe it super well. The announcement tweet calls it a place to discover, build, and enjoy agentic apps, and your home for personal intelligence, whatever the heck that means. But the early users of it did a better job at describing where the value is. Ben Tossel from Ben's Bytes writes, 2026 is the year of the personal agent. Dreamer is the closest I've seen to making that accessible to everyone. In his newsletter, he writes, Dreamer is a platform where you build agenic apps by talking. You describe what you want and an AI agent called Sidekick builds it for you in minutes.
Starting point is 00:24:18 There's also a more detailed coding agent for when you want to go deeper. Either way, you never think about hosting your deployment the platform handles it all. That's the bit I care about most. I spent a stupid amount of time on infrastructure. Getting servers running, keeping things alive, debugging why something crashed. That stuff is fine when you're learning, but it's not the point. The point is the thing you're trying to make. Sidekick learns about you over time and acts as the privacy layer,
Starting point is 00:24:39 controlling what data each app and Dreamer can access. It can spin up temporary agents for specific tasks, integrate with third-party tools, and coordinate between your different apps. All of that wiring is done for you out of the box. Sean Wang Swix writes, Dreamer is the most ambitious full-stack consumer encoding agent startup I've ever seen. When this was first debaed to me, my jaw dropped. Now, he writes a lot more, but says, I think Dreamer is the right form factor for mass adopted personal software agents. You stop fussing over the code. You just use the app and then talk to your sidekick to fix bugs. Sean's belief is that, quote, very unexpected things happen when you let Normies build their
Starting point is 00:25:13 own AI apps rather than force them through expensive developers. Basketball apps, knockoff Harry Potter Galleries, story times for kids, cal train apps. And so far that seems to be people's early experience. Joanna Stern, formerly of the Wall Street Journal writes, started testing Dreamer yesterday, and this might be the vibe coding slash agent tool for Normies. Super simple to build little tools without deploying anything to a server. So to the extent that today we are talking about new models and discrete capabilities, it seems like Dreamers want to watch. For now, though, that is going to do it for today's AI Daily Brief. Appreciate you listening or watching as always, and until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.