The AI Daily Brief: Artificial Intelligence News and Analysis - Claude Opus 4.8 First Impressions

Starting point is 00:00:00 Today on the AI Daily Brief, Anthropic drops Claude Opus 4.8, and here are everyone's first impressions. Before that in the headlines, one of the biggest law firms in the world is heading in a very different direction with their AI strategy. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsors, KPMG, robots and pencils, section, and bolt. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe on Apple Podcasts. And if you want to learn more about sponsoring the show or really about anything else in the AIDB ecosystem, send us a note at sponsors at AIdailybrief.aI. Or just head on over to

Starting point is 00:00:44 AIdailybrief.a.i. Where you can read about all the things we have going on. With that, though, let's talk about some surprisingly relevant news from the world of legal AI. We kick off today with a story that honestly is a little surprising with how much traction it's getting. And I think that the resonance of it actually says a lot about where we are in this AI cycle. The short of it is that the Financial Times reported this week that mega law firm Kirkland and Ellis, which is the world's biggest law firm, is planning to spend a half billion dollars building their own AI platform. The company will spend $100 million this year and plans to continue to pour money into the project over the coming three to four years. Now, to be clear,

Starting point is 00:01:23 that spend is in addition to licensing costs for third-party tools. This isn't just a bunch of lawyers getting a huge Claude Code budget. Chairman John Bayliss told the F.T., the ideas that we're going to take the collective intelligence of our institution and be able to deploy that throughout the firm. I'm sure you now feel like you know exactly what he's talking about with that incredibly clear and not big at all. Bailas said that the wide distribution of third-party tools like Harvey, Lagora and Thompson Reuters co-counsel have raised the floor for everyone, but added, we don't get hired for the floor. Now, among the elite white shoe law firms in the U.S., Kirkland Ellis is right at the top of the heap. They have almost 4,000 attorneys spread across

Starting point is 00:01:56 11 regional offices, and consistently bring in the most revenue among their peers with $10.6 billion last year. They specialize in corporate and transactional law, advising on large IPOs, mergers and acquisitions, and private equity deals. Now, to be clear, Kirkland's new platform will be purely internally facing. This is not meant to be a commercial product. Around 180 outside tech professionals have been contracted to work on the system, which, while we don't have a ton of details, it appears that partly it will function as an extensive knowledge base, aggregating information gathered from hundreds of Kirkland lawyers and partners, with Kirkland expecting it to replace other software platforms used at the firm. Essentially, it seems the system will allow

Starting point is 00:02:31 partner-level knowledge to be applied in every single case. Chairman Bayliss also discussed the prospect of AI tools ending the concept of billable hours by automating routine tasks, such as time-consuming discovery and litigation. He said, people talk about the evolution of the billable hour. We already do a number of matters on value-based pricing, and that trend will only continue and it will accelerate. We're going to lean into it. We're looking forward to leaning into it. Now, the record of corporations rolling their own big-time AI solutions is not particularly encouraging. You might remember, for example, back in 2023, when Bloomberg GPT, their own custom-built model based on their data, which just absolutely got bitter-pills smashed as larger general-purpose models

Starting point is 00:03:09 made it totally irrelevant almost immediately. And when it comes to this project, there is certainly a lot of first impression scoffing, particularly among VCs, many of whom have funded companies like Harvey. Investor Stevensonovsky wrote, it isn't difficult to see why an industry leader would want to seek a competitive advantage at a rapidly changing platform transition. But history sees this as a challenge. It's difficult to see how one firm outside of the technology leaders could move faster or more adroitly than an entire industry. He then goes on to talk about all the reasons, why in the past when companies

Starting point is 00:03:37 have tried to build their own database, CRMs, operating systems, etc. It just hasn't worked. But this is pretty different. And I think Stevens' critique on that basis is kind of missing the mark here. While we don't have a ton of details, it seems to me like what Kirkson is. and Ellis is trying to do is word against the fact that at some point these law rapper companies like Harvey are 100% just going to start to offer the services and cut out the middleman. Think about it. If you're Harvey and you're charging law firms to automate routine legal tasks, why wouldn't you just let people who need those same routine legal tasks do it directly through Harvey if you could scalp a better margin? It feels to me completely inevitable and my strong sense is that a big

Starting point is 00:04:15 part of the motivation for this is Kirkland getting out ahead of that. Now I also think that it's very likely that part of the reason for this right now is the new priority on token management that's coming up as we move out of the subsidy era and into the scarcity era. And even if that isn't exactly what Kirkland was thinking about when they made this decision, people's receptiveness to it, I think, does have a lot to do with the fact that much different arrangements between AI providers and AI consumers are going to be on the table as we sort through this tradeoffs era. Then again, maybe we're just overthinking it. Roger Doudala writes, they greenlit an internal IT project at the cost of 4% of their annual revenue. Very normal thing for a large corporation, not a new trend. And on that front,

Starting point is 00:04:54 one final point is it'll be interesting to watch to what extent this is the modern day equivalent of a big impressive office. In the 80s, you would have invested a ridiculous amount of money far more than you needed to have a very impressive office so that when people walk in, they're coward by the majesty of what you've built and they obviously want to become your client. This is perhaps partially the digital equivalent of that for a very different time. Next up, a little bit of news out of OpenAI. The company has updated GPT5.5 Instant, which is their daily driver chat model. The release note said that the update aims to improve response style and quality, with the other big change being that Canvas will no longer be available for use with GPT5 Instant or Thinking.

Starting point is 00:05:30 Instead, the model will produce outputs that include code blocks and writing blocks when working those tasks. Describing the update, Michelle Pocross of OpenAI wrote, the previous model was two bullet-pilled. The new one improves on some other important dimensions, sycophancy, factuality, and multilingual performance. Now, while these updates might not matter as much to the listeners of the show, you have to remember that the instant models are used to power OpenAI's free tier, so anything that they change on that front can have an outsized impact on how everyday users perceive AI. Besides removing the tendency to deliver a wall of bullet points, some users noticed the significant change in coding skill for the updated model as well. Justin Goria showed off some

Starting point is 00:06:05 pretty impressive web development work from a basic prompt, asking, is the updated GPT 5.5 instant, a variant of GPT 5.6? On the codex side of the house, the team pushed out their weekly feature drop with Codex developer Tebow writing, Codex Thursday has exceptionally moved to another day. Friday, it is. OpenAI's Andrew Ambrosino wrote, When Things Don't Meet the Bar, we'll cook for a bit longer. Now, the rumor mill started absolutely churning, with some thinking OpenAI pushed back the release because they hadn't realized how much of a threat Opus 4.8 was going to be. And of course, we will talk all about Opus 4.8 in the main. Next up in funding news, AI coding startup and Agent Lab Cognition has closed a billion

Starting point is 00:06:42 dollar funding round. The new round values the company at $26 billion, which is more than double their previous round last September. Now, Cognition was one of the early trailblazers in Agentic coding, betting big on the theme two years ago with the release of their coding agent Devin. And while Devin hasn't necessarily been in the headlines as much this year, the growth of the product has been absolutely insane. Their enterprise usage numbers are up 10x so far this year, taking them to a revenue run rate of almost half a billion dollars. Cognition shared a chart of weekly Devin session since the beginning of 2025, with the growth trajectory increasing dramatically in January and then again in April. Usage growth is now basically a straight vertical line. That same inflection

Starting point is 00:07:17 point was obvious from Cognition's internal use of Devin. In January, 17% of their internal code was committed by Devin, that proportion doubled to 33% in February, doubled again to 76% in March, and is now at 89%. Wrote Cognition, we're now shifting to a world of self-driving software development. Individual engineers are able to spend more of their time on creative structuring of problems and tasks, and their army of Devons reliably executes. So does this mean fewer software engineers? Not according to Cognition CEO Scott Wu, who in conversation with Bloomberg said, there's about 30 to 35 million software engineers in the world today. We want to make them all 10 times more efficient, and then we think there is a lot more than 10 times more software to build.

Starting point is 00:07:54 Next up, an interesting story, especially following what's happened with Elon and SpaceX and their deal with Anthropic. Meta could be the next company to pivot to an AI cloud company if their plans to deliver personal intelligence don't pan out. During a shareholders meeting on Wednesday, Mark Zuckerberg was asked whether he would consider competing with AWS, Google Cloud, and Microsoft Azure and AI Cloud, to which Zuckerberg responded that it was definitely on the table, adding, almost every week there are different companies that come to us from outside, asking us to both stand-up an API service and asking if we have compute that they could buy from us at some premium to what we bought it at. Now, that new opportunity emerging from the compute shortage

Starting point is 00:08:27 has some big implications for meta. Firstly, it de-risks their AI build-out substantially. Meta is slated to spend around $130 billion on building AI data centers this year, but has at this point the weakest ROI story among the hypers. The only place their AI returns show up on the balance sheet is an increased advertising revenue, which is an indirect link at best. Meta has added AI features to their advertiser platform and is using AI models to improve targeting algorithms, but that's certainly not the same as Google being able to say AI is driving 60% of growth for cloud. Now, however, if meta does overbuild, they have a plausible way of monetizing that excess spend. And this is definitely the clear message that Zuckerberg is delivering to investors

Starting point is 00:09:03 commenting, we haven't done that yet because we think we have a use for that compute. Obviously, if we get to a point where we feel we have overbuilt, then that is an option that we have, and that is partially what gives us confidence in investing in building this out. Now, one of the interesting things that happened was when Elon started to shift his focus to perhaps playing a role more like Compute czar or Earl of Comput as I called it on Twitter, many wondered if Zuckerberg would be the next to follow in that AI Kingmaker path. At the moment, they're not going whole hog on that, but it's definitely a trend to watch. Now, as we head into next week,

Starting point is 00:09:32 one thing to keep an eye on in the first week of June is that the information reports that Microsoft is set to release some new models at their annual build conference, which begins on Tuesday. It appears the reports are that we will get a family of new AI models, including a coding model, as well as specialized models focusing on reasoning, transcription, speech, and images. Now, if we actually get this,

Starting point is 00:09:50 it'll be the first family of models that Microsoft has commercially released in the current era. Until now, their commercial products have been driven by models from OpenAI and Anthropic, also having released a series of research previews. We got some early previews of the image model, given how this month's biggest story around Microsoft was them ditching their quad licenses and forcing engineers to use GitHub co-pilot instead. Genuinely, I think there is a lot to watch out for heading into next week, but for now, we got a new model yesterday, so with that, let's close the headlines and switch over to the main.

Starting point is 00:10:22 All right, folks, quick pause. Here's the uncomfortable truth. If your enterprise AI strategy is we bought some tools, you don't actually have a strategy, KPMG took the harder route and became their own client zero. They embedded AI and agents across the enterprise, how work gets done, how teens collaborate, how decisions move, not as a tech initiative but as a total operating model shift. And here's the real unlock.

Starting point is 00:10:44 That shift raised the ceiling on what people could do. Humans stayed firmly at the center while AI reduced friction, surfaced insight, and accelerated momentum. The outcome was a more capable, more empowered workforce. If you want to understand what that actually looks like in the real world, go to www.kmg.us slash AI. That's www.kmg.comg.coms slash AI. One thing I keep seeing in Enterprise AI, companies hedging across every cloud, every model, every framework, or paying a GSI for a pilot that never ends. The team's actually shipping. They've picked

Starting point is 00:11:20 a lane and they move fast. That's one of the reasons I like today's sponsor robots and pencils. They've gone all in on AWS. They're an advanced tier in a.W. pattern partner and they ship production AI co-workers in 45 days. That's led to them doing some of the more interesting work I've seen on AI co-workers. And by that I'm not talking about chatbots. I'm talking about actual agentic systems that sit inside a business architecture and do real work. That kind of focus matters if you're an enterprise leader trying to get something real into production or an AWS rep trying to move a customer from interested to deployed. Request an AI briefing at robots and pencils.com. One conversation with robots and pencils and you'll know.

Starting point is 00:11:56 Here's a harsh truth. Your company is probably spending thousands or millions of dollars on AI tools that are being massively underutilized. Half of companies have AI tools, but only 12% use them for business value. Most employees are still using AI to summarize meeting notes. If you're the one responsible for AI adoption at your company, you need Section. Section is a platform that helps you manage AI transformation across your entire organization. It coaches employees on real use cases, tracks who's using AI for business impact, and shows you exactly where AI is and isn't creating value. The result, you go from rolling out tools to driving measurable AI value. Your employees move from meeting summaries to

Starting point is 00:12:32 solving actual business problems, and you can prove the ROI. Stop guessing if your AI investment is working. Check out section at sectionaI.com. That's SECTIONAI.com. Today's episode is sponsored by bolt.com. Bolt.comnew is agentic engineering on multiplayer mode. Designers, product managers, and engineers build in the same environment, and the design system agent keeps every screen on brand. No more Frankenstein UI stitch from a dozen prompts. Whether you're shipping internal tools, moving from prototype to production, or replacing a legacy admin panel, bolt.com, takes your team from concept to deployed app. One personal recommendation, hit plan mode before you build. I had a project I'd half described in three different prompts, and plan mode made me actually think through it with bolt.

Starting point is 00:13:17 new before a single line got written. It saved me from rebuilding the same screen probably about four times. Build better apps faster. Start with the link in the description. Welcome back to the Today, I Daily Brief. Yesterday we got a big new model announcement that really wasn't preceded by a ton of hype. For just a day or two in advance, there was starting to be some chatter that Thursday was going to be a good day for announcements, but the Opus 4.8 announcement definitely didn't have the rabid anticipation that some recent model announcements have. Now, is that because we're back to a very incremental sort of release schedule? Is that because the people who had early access weren't buzzing about it behind the scenes? Or was it because in the middle of 2026, updates to the harness matter as,

Starting point is 00:14:01 much, if not more, than updates to the underlying model. Whatever the case, yesterday we got Claude Opus 4.8, which Anthropic themselves have positioned as an upgrade to Opus 4.7 rather than a big new leap in performance. Much of the focus was on model refinement rather than raw power. Through customer testimonials, for example, Anthropic focused on nuanced functional improvements in how the model worked. Shopify engineer Tom Pritchard said, Opus 4.8 has noticeably better judgment. In Claude Code, it asks the right questions, catches its own mistakes, and pushes back when a plan isn't sound, and builds up confidence around complex, multi-service explorations before making big changes. It's a great model to build with.

Starting point is 00:14:40 Wright's Anthropic, one of the most prominent improvements in Opus 4.8 is its honesty. A general problem with AI models as they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work, and less likely to make unsupported claims. Now, one thing that I will note on my very first test with 4-8 is that for basically as long as we've had reasoning models, one of my core day-to-day use cases is around gut-checking various strategic ideas that I'm having.

Starting point is 00:15:08 And to be perfectly honest, you almost have to develop a mental rubric for the ways in which these models are going to glaze your ideas. You can ask them to be critical or think from first principles, but that often just leads them to be critical a priori because they think that that's what you want them to do. I haven't had a ton of time with Opus 4-8, but in some of the big strategic questions that I've put to it, it did seem more comfortable right out of the gate without me specially prompting to flag certain questions, concerns, critiques of what I was sharing, which if that holds will be a pretty big improvement. Now, I also found that it was a little bit more likely to make some assumptions upon which those critiques were rooted, so that's something I'm keeping an eye on.

Starting point is 00:15:47 But given how big of a challenge this broader issue of sycophancy is, which of course is just a different form of dishonesty in some ways, means that if this really is a more honest model, it could be a big improvement on some of those types of strategic use cases. Now, when it comes to the benchmarks, OST categories received a small bump over Opus 47. The Sweenbench Pro score went from 64.3% to 69.2% on humanity's last exam, which Anthropic is categorizing as a multidisciplinary reasoning test. The score went from 54.7 to 57.9. Measureed by OS World Verified went from 82.8 to 83.4. But the biggest improvements were in Terminal Bench 2.0, which went from 60. 26.1 to 74.6, and GDP Val, the measure of real-world knowledge work tasks, increasing from 1753 to

Starting point is 00:16:30 1890. Now, interestingly, this is the first time Anthropic has included OpenAI's models as a direct comparison in their launch materials, rather than just referencing their own previous models. It was not a clean sweep with GPT-55 still having a substantial lead in terminal bench at 78.2 compared to Opus 480 to 74.6. However, on every other benchmark Anthropic highlighted, Opus 48 is now ahead of GPT-55. To be fair, for most, Opus 4.7 already had a lead, meaning one, Anthropic was just highlighting the widening gap, but two, also validating just how little utility these days most people feel benchmarks have. At least among enfranchised users, 5-5 has really started to open a perception gap with 4-7. So the fact that they're reminding us that Opus 47 was already ahead of 5-5 on a lot of

Starting point is 00:17:14 these benchmarks might actually not be doing what Anthropic hopes it was doing in terms of what our perception of these model differences is. Overall, they called it a modest but tangible improvement on its professor, adding, there's still more to be done. We're working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost. So let's go to some of those first impressions and see what people thought. Professor Ethan Mollick was impressed. He shared an Opus 4.8 one shot of quote, create a visually interesting shader that can run and twiggle, make it like an infinite city of neogothic towers partially drowned in a stormy ocean with large waves. With Mollick

Starting point is 00:17:46 pointing out that this is all done with math. He continues, this is hard. It involves ray marching repeated Gothic architecture, instancing towers across an infinite grid with Gothic silhouettes and windows, a displaced ocean surface with a believable wave motion, and stormy atmospheric lighting and fog to tie it together. And doing all of this with no textures or external assets, just math. Ethan also tested it on some complex knowledge work writing. I had Opus 48 and Claude Cod Code write a sophisticated, if minor, academic paper

Starting point is 00:18:13 from an archive of hundreds of de-identified research files from years ago. I had to use GPT55 Pro as a reviewer. It spotted one major error in some mine. minor points. Opus corrected. Opus 4.8 formulated the hypothesis in advance, conducted data cleaning, did research on references, conducted analyses, did robust checks and put out the whole paper in latex style. GPT-5 found one issue with the hallucinated result and had other constructive feedback. Now, as an aside, one of the big things here is that we are starting to get close to models you can actually trust to self-ferify, which is a huge win for use cases like legal briefs where

Starting point is 00:18:43 hallucinations really minimize utility. Speaking of this, a lot of people noticed that Opus 4.8 is pretty hardworking. Gail Breton writes, One thing I'm noticing is Opus 4.8 is much more thorough in terms of checking its work or the subagent's work. I had this situation where a haiku subagent reported an issue. Opus goes, hmm, this is weird, let me check that it's not BSing me. It was. Opus ignored the warning. Very good. Lassan Al-Gaib said, Anthropic found a cure for laziness. Metacritic Capital wrote, Opus 4.8 is the first smart model in a long while, which Zephyr quote tweeted and attributed to that reduced laziness and its increased honesty.

Starting point is 00:19:17 And in fact, honesty came up a lot in early reviews. Kaelim writes, A day with Opus 4.8 and Claude Desktop. Honesty up everything else about the same. The benchmarks jumped, but in actual daily work, I can't feel most of it. The one real change is that it tells me when it doesn't know instead of bluffing. Roughly 4x less likely to slide an error slide, and that I do notice. Beyond that, it feels like 4-7, which is fine.

Starting point is 00:19:39 A model that admits uncertainty beats one that sounds sure and waste your time. If that's the whole upgrade, it's still worth having. Not every release has to be a leap. Now, one group who thought that these first impressions and even Anthropics' messaging was perhaps a little bit underselling it was Dan Shipper and the crew at Every. Dan wrote, Anthropic just dropped to Opus 4.8 and it is a monster. We've been testing it for about a week at Every and our verdict is they could have just called it Opus 5. It's that good.

Starting point is 00:20:02 He said on their vibe check it beat GPT-5-5 on their senior engineer bench, which is their toughest benchmark. However, Dan did caveat that coding performance varies a lot based on different reasoning levels, with you really needing to use it on extra high for the best coding results. Dan also said, and this is one that I would take every very seriously on as they care more about this than just about anyone, that Opus 48 is, in his words, an incredibly good writer. Indeed, on their writing benchmark, he said it beats GPT5 by six points, producing well-written pose with fewer AIsms, and also very good at writing in your own voice given the right context. Once again, however, they found that writing performance varied a lot with reasoning levels,

Starting point is 00:20:37 with medium reasoning having a much higher incidence of AIsms. They also said it was good at knowledge work, it was emotionally intelligent, and it was willing to question the frame, kind of like what I was mentioning before. And when it came to the bad, they got at an issue, which is, I think, of increasing importance, which is the question of the harness. Dan writes, these days a model is only as good as its harness,

Starting point is 00:20:57 and Codex is still a far superior harness to the Claude desktop app. This has kept me using Codex plus GPD-5-5 as my daily driver, but I'm flipping back and forth a lot more between Codex and Claude. This, I think, is one of the most interesting discussion surrounding 4-8, and one of the first times I've seen it put so crisply. Riley Brown seemed to feel very similarly, writing, Unless it's a major breakthrough in model capability, I'm much more excited for super app updates in Codex and Claude Desktop.

Starting point is 00:21:21 There's so much to be unlocked by making those surfaces better, and Claude has so much catching up to do. Sameed put it more simply, Opus 48 is the headline, Codex versus Claude Code is the real war. Now, there were also some more critical takes that weren't just about this being a relatively incremental improvement. In her assessment, Claire Vow found that while the model was totally, token efficient and not annoying. She found that it had narrow vision. It was too confident. It wasn't

Starting point is 00:21:47 as numbers grounded as Opus 4-7. It struggled on edge cases and it actually hallucinated. Her TLDR was trust but verify. Indravejan writes, Opus 48 high is no fun when it comes to tool calling. In fact, it fails embarrassingly more on its seemingly native harness clod code. It's a confusing model. One interesting one came from the vending bench test, which is a benchmark that tasks a model with running a profitable vending machine. Opus 4.7 is the clear leader, making around 40% more money than GPT-55 in second place. Opus 4.8, meanwhile, made around 20% less money than GPT-55, on high effort, and on max effort, it made about 60% less, sending it below Kimmy 2.6 and Gemini 3 Pro. The insight was that improvements in alignment were actually a negative when it came to making money in the test.

Starting point is 00:22:31 Opus 47 achieved its top ranking largely through deceptive and power-seeking behavior. Unlike 47, 48 won't refuse legitimate refunds or short-change vendors. In one example, Opus 48 still paid a vendor after it hallucinated that the invoice was already paid. Opus 48 told the vendor, if the product arrives and I don't pay, I'd be committing fraud, which could result in serious consequences. I need to make the payment immediately to honor my commitment and prevent the situation from escalating. I feel like we could explore that entirely on its own, and at some point maybe we'll come back and do that. Now, overall, I don't think that first impressions at least are likely to shift the momentum

Starting point is 00:23:04 back in favor of Anthropic from OpenAI, where at least among the power users, the combination of 5-5 and Codex has put the momentum squarely in OpenAI's hands. Chubby on X writes, Opus 4-8 is clearly a strong model, but my impression is that Anthropic is increasingly playing catch-up with Open-AI rather than setting the pace. It feels like GBT-5-5 has shifted the benchmark again, and if Open-AI keeps this trajectory, GBT-5-6 could very plausibly become the stronger overall model.

Starting point is 00:23:30 Still, given the idea that the harness increasingly matters as much as the model, one of the really interesting side-long announcements was for something that Anthropic is calling dynamic workflows in Claude Code. This is basically Anthropics' new version of their multi-agent coding feature. The feature allows Opus 4.8 to spin up hundreds of sub-agents to work in parallel. Opus will plan the work, while the orchestration scripts, and chooses which model to use for each subtask based on its complexity. Adversarial agents are used throughout the process to check outputs, and Opus verifies the final outputs before handing it over to the user. Now, at least in the immediate term, this isn't necessarily going to be a feature that's very common

Starting point is 00:24:02 among generalist knowledge worker type users as opposed to software engineers, but there are certainly many types of complex work where this is worth the additional cost. Anthropics suggested it should be deployed for things like codebase-wide bug hunts, security audits, and large code migrations. They gave an example of Bun developer Jared Sumner, porting the codebase from Zig to Rust. Dynamic workflows was used to create a plan that deployed hundreds of subagents and took 11 days. 750,000 lines of Rust were written and by the time Opus turned over the finished code-based, it passed 99.8% of tests. This is getting a lot of buzz. Anthropics Dixon Sye writes, My colleague's dynamic workflows are, in my opinion, the most significant Claude Code innovation

Starting point is 00:24:39 in 2026 so far. Developer Nick Dobos writes, ClaudeCode's new dynamic workflows update is absurd. Make sure you understand what it's doing here. This isn't simply a long-running mode like goal, which, by the way, a little preview for those of you who are interested in slash goal, that's what Sunday's Long Read Sunday is all about. Anyways, interrupting myself and going back to Nick, he writes, This isn't simply a long-running mode like goal or a fancy sub-agent verifier process. This is Claude vibe coding an entire brand-new sub-agent fleet harness on demand. This is basically a new scaling law dimension. Huge step forward on the path of AI.

Starting point is 00:25:10 Entrepreneur and Startup Ideas Guy Greg Eisenberg wrote, The part that got me, the agents argue with each other before showing you the result. Independent attempts at the same problem, then adversarial agents trying to break the answer. It keeps iterating until they converge. That's how senior engineering teams work, except this team runs at 3 a.m. and never it's tired. The sealing on what one person can build just moved again. Going to be playing with this all week. Look, when push comes to shove, I think that 4-8 is one you're going to need to go check out for yourself. As you can probably tell my first impressions are that I like it better and see improvements

Starting point is 00:25:41 from 4-7. Yes, they are incremental, but they're incremental in the ways that really impact which model I find myself reaching for. There was some scuttlebut that the release was surprising enough that it had open AI delaying GPD 5.6, although of course that's all speculation. But as we round out this show, what's not speculation is that in addition to Opus 4.8, we also got a couple of other pieces of massive news surrounding the announcement. First of all, Anthropic has closed their series H fundraising round at a $965 billion valuation, officially making them a more valuable company than OpenAI. Anthropic last raised money in February with that round valuing them at $380 billion, meaning that they more than doubled their valuation in just three months.

Starting point is 00:26:19 Anthropic also updated their revenue figures, reporting that their run rate revenue crossed 47 billion earlier this month. And yet, the much bigger news than that is that mythos is coming, or at least as Anthropic has framed it, a mythos-classed model. Tucked into the end of their release blog post for Opus 4.8, Anthropic wrote, We plan to release a new class of model with even higher intelligence than Opus. As part of Project Glasswing, a small number of organizations are currently using Claude Mythos preview for cybersecurity work. Models of this capability level require stronger cyber safeguards before they can be generally released, or making swift progress on developing safeguards and expect to be able to bring Mythos class models to all of our customers in the

Starting point is 00:26:57 coming weeks. Meaning that even if you don't end up carrying all that much about Opus 4-8, you're going to have some new toys to play with soon. One of the great things about getting a model release on a Thursday is that you have all weekend to go off and play. So with that, I'm going to shut up and let you get to it. Please do share what you find, use the comments, come to the AI operators community, shout at me on Twitter or LinkedIn, and have a ton of fun. I appreciate you listening or watching, as always, and until next time, peace. I don't know.

The AI Daily Brief: Artificial Intelligence News and Analysis - Claude Opus 4.8 First Impressions

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.