The AI Daily Brief: Artificial Intelligence News and Analysis - Claude Opus 4.8 First Impressions
Episode Date: May 29, 2026Claude Opus 4.8 arrives as a modest but meaningful upgrade, with early users pointing to better judgment, less bluffing, stronger self-checking, and a greater willingness to push back. NLW breaks down... first impressions, benchmark comparisons with GPT-5.5, Claude Code’s new dynamic workflows, and why the model harness may matter as much as the model itself. In the headlines: Kirkland & Ellis bets big on internal AI, OpenAI updates GPT-5.5 Instant, Cognition raises at a $26B valuation, Meta considers AI cloud, and Microsoft prepares new models.Brought to you by:KPMG – Research from KPMG and the University of Texas at Austin shows the highest-impact AI users treat AI like a reasoning partner — and those skills can be taught at scale. Learn more at kpmg.com/us/SophisticatedScrunch - The AI customer experience platform - https://scrunch.com/Zenflow Work - Agents for knowledge work - https://zenflow.free/Blitzy - Want to accelerate enterprise software development velocity by 5x? https://blitzy.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefRobots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Our Newsletter is BACK: https://aidailybrief.beehiiv.com/Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, Anthropic drops Claude Opus 4.8, and here are everyone's first impressions.
Before that in the headlines, one of the biggest law firms in the world is heading in a very different direction with their AI strategy.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors, KPMG, robots and pencils, section, and bolt.
To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can subscribe
on Apple Podcasts. And if you want to learn more about sponsoring the show or really about anything else
in the AIDB ecosystem, send us a note at sponsors at AIdailybrief.aI. Or just head on over to
AIdailybrief.a.i. Where you can read about all the things we have going on. With that,
though, let's talk about some surprisingly relevant news from the world of legal AI.
We kick off today with a story that honestly is a little surprising with how much
traction it's getting. And I think that the resonance of it actually says a lot about where we are
in this AI cycle. The short of it is that the Financial Times reported this week that mega law firm
Kirkland and Ellis, which is the world's biggest law firm, is planning to spend a half billion
dollars building their own AI platform. The company will spend $100 million this year and plans
to continue to pour money into the project over the coming three to four years. Now, to be clear,
that spend is in addition to licensing costs for third-party tools. This isn't just a bunch of
lawyers getting a huge Claude Code budget. Chairman John Bayliss told the F.T., the ideas that we're
going to take the collective intelligence of our institution and be able to deploy that throughout
the firm. I'm sure you now feel like you know exactly what he's talking about with that incredibly
clear and not big at all. Bailas said that the wide distribution of third-party tools like Harvey,
Lagora and Thompson Reuters co-counsel have raised the floor for everyone, but added,
we don't get hired for the floor. Now, among the elite white shoe law firms in the U.S.,
Kirkland Ellis is right at the top of the heap. They have almost 4,000 attorneys spread across
11 regional offices, and consistently bring in the most revenue among their peers with $10.6 billion
last year. They specialize in corporate and transactional law, advising on large IPOs, mergers and
acquisitions, and private equity deals. Now, to be clear, Kirkland's new platform will be purely
internally facing. This is not meant to be a commercial product. Around 180 outside tech
professionals have been contracted to work on the system, which, while we don't have a ton of
details, it appears that partly it will function as an extensive knowledge base, aggregating
information gathered from hundreds of Kirkland lawyers and partners, with Kirkland expecting it to
replace other software platforms used at the firm. Essentially, it seems the system will allow
partner-level knowledge to be applied in every single case. Chairman Bayliss also discussed
the prospect of AI tools ending the concept of billable hours by automating routine tasks,
such as time-consuming discovery and litigation. He said, people talk about the evolution of the
billable hour. We already do a number of matters on value-based pricing, and that trend will
only continue and it will accelerate. We're going to lean into it. We're looking forward to leaning into it.
Now, the record of corporations rolling their own big-time AI solutions is not particularly encouraging.
You might remember, for example, back in 2023, when Bloomberg GPT, their own custom-built model
based on their data, which just absolutely got bitter-pills smashed as larger general-purpose models
made it totally irrelevant almost immediately.
And when it comes to this project, there is certainly a lot of first impression scoffing,
particularly among VCs, many of whom have funded companies like Harvey.
Investor Stevensonovsky wrote,
it isn't difficult to see why an industry leader would want to seek a competitive advantage
at a rapidly changing platform transition. But history sees this as a challenge. It's difficult to
see how one firm outside of the technology leaders could move faster or more adroitly than an
entire industry. He then goes on to talk about all the reasons, why in the past when companies
have tried to build their own database, CRMs, operating systems, etc. It just hasn't worked.
But this is pretty different. And I think Stevens' critique on that basis is kind of missing the mark here.
While we don't have a ton of details, it seems to me like what Kirkson is.
and Ellis is trying to do is word against the fact that at some point these law rapper companies
like Harvey are 100% just going to start to offer the services and cut out the middleman. Think about
it. If you're Harvey and you're charging law firms to automate routine legal tasks, why wouldn't
you just let people who need those same routine legal tasks do it directly through Harvey if you
could scalp a better margin? It feels to me completely inevitable and my strong sense is that a big
part of the motivation for this is Kirkland getting out ahead of that. Now I also think that it's very
likely that part of the reason for this right now is the new priority on token management that's coming
up as we move out of the subsidy era and into the scarcity era. And even if that isn't exactly what
Kirkland was thinking about when they made this decision, people's receptiveness to it, I think,
does have a lot to do with the fact that much different arrangements between AI providers and AI
consumers are going to be on the table as we sort through this tradeoffs era. Then again, maybe we're
just overthinking it. Roger Doudala writes, they greenlit an internal IT project at the cost of 4% of their
annual revenue. Very normal thing for a large corporation, not a new trend. And on that front,
one final point is it'll be interesting to watch to what extent this is the modern day equivalent
of a big impressive office. In the 80s, you would have invested a ridiculous amount of money
far more than you needed to have a very impressive office so that when people walk in, they're
coward by the majesty of what you've built and they obviously want to become your client. This is perhaps
partially the digital equivalent of that for a very different time. Next up, a little bit of news
out of OpenAI. The company has updated GPT5.5 Instant, which is their daily driver chat model.
The release note said that the update aims to improve response style and quality, with the other
big change being that Canvas will no longer be available for use with GPT5 Instant or Thinking.
Instead, the model will produce outputs that include code blocks and writing blocks when working
those tasks. Describing the update, Michelle Pocross of OpenAI wrote,
the previous model was two bullet-pilled. The new one improves on some other important dimensions,
sycophancy, factuality, and multilingual performance. Now, while these updates might not matter as
much to the listeners of the show, you have to remember that the instant models are used to power
OpenAI's free tier, so anything that they change on that front can have an outsized impact on how
everyday users perceive AI. Besides removing the tendency to deliver a wall of bullet points, some users
noticed the significant change in coding skill for the updated model as well. Justin Goria showed off some
pretty impressive web development work from a basic prompt, asking, is the updated GPT 5.5
instant, a variant of GPT 5.6? On the codex side of the house, the team pushed out their weekly
feature drop with Codex developer Tebow writing, Codex Thursday has exceptionally moved to another day.
Friday, it is. OpenAI's Andrew Ambrosino wrote,
When Things Don't Meet the Bar, we'll cook for a bit longer. Now, the rumor mill started absolutely
churning, with some thinking OpenAI pushed back the release because they hadn't realized
how much of a threat Opus 4.8 was going to be. And of course, we will talk all about Opus 4.8
in the main. Next up in funding news, AI coding startup and Agent Lab Cognition has closed a billion
dollar funding round. The new round values the company at $26 billion, which is more than double their
previous round last September. Now, Cognition was one of the early trailblazers in Agentic coding,
betting big on the theme two years ago with the release of their coding agent Devin. And while Devin hasn't
necessarily been in the headlines as much this year, the growth of the product has been
absolutely insane. Their enterprise usage numbers are up 10x so far this year, taking them to a
revenue run rate of almost half a billion dollars. Cognition shared a chart of weekly Devin
session since the beginning of 2025, with the growth trajectory increasing dramatically in January
and then again in April. Usage growth is now basically a straight vertical line. That same inflection
point was obvious from Cognition's internal use of Devin. In January, 17% of their internal code
was committed by Devin, that proportion doubled to 33% in February, doubled again to 76% in March,
and is now at 89%. Wrote Cognition, we're now shifting to a world of self-driving software
development. Individual engineers are able to spend more of their time on creative structuring
of problems and tasks, and their army of Devons reliably executes. So does this mean fewer software
engineers? Not according to Cognition CEO Scott Wu, who in conversation with Bloomberg said,
there's about 30 to 35 million software engineers in the world today. We want to make them all 10
times more efficient, and then we think there is a lot more than 10 times more software to build.
Next up, an interesting story, especially following what's happened with Elon and SpaceX and their
deal with Anthropic. Meta could be the next company to pivot to an AI cloud company if their
plans to deliver personal intelligence don't pan out. During a shareholders meeting on Wednesday,
Mark Zuckerberg was asked whether he would consider competing with AWS, Google Cloud, and Microsoft
Azure and AI Cloud, to which Zuckerberg responded that it was definitely on the table,
adding, almost every week there are different companies that come to us from outside, asking us
to both stand-up an API service and asking if we have compute that they could buy from us at some
premium to what we bought it at. Now, that new opportunity emerging from the compute shortage
has some big implications for meta. Firstly, it de-risks their AI build-out substantially. Meta is
slated to spend around $130 billion on building AI data centers this year, but has at this point
the weakest ROI story among the hypers. The only place their AI returns show up on the balance
sheet is an increased advertising revenue, which is an indirect link at best. Meta has added
AI features to their advertiser platform and is using AI models to improve targeting algorithms,
but that's certainly not the same as Google being able to say AI is driving 60% of growth
for cloud. Now, however, if meta does overbuild, they have a plausible way of monetizing that
excess spend. And this is definitely the clear message that Zuckerberg is delivering to investors
commenting, we haven't done that yet because we think we have a use for that compute. Obviously,
if we get to a point where we feel we have overbuilt, then that is an option that we have,
and that is partially what gives us confidence in investing in building this out. Now, one of the
interesting things that happened was when Elon started to shift his focus to perhaps playing
a role more like Compute czar or Earl of Comput as I called it on Twitter, many wondered if Zuckerberg
would be the next to follow in that AI Kingmaker path. At the moment, they're not going whole hog on that,
but it's definitely a trend to watch.
Now, as we head into next week,
one thing to keep an eye on in the first week of June
is that the information reports that Microsoft
is set to release some new models
at their annual build conference, which begins on Tuesday.
It appears the reports are that we will get a family of new AI models,
including a coding model,
as well as specialized models focusing on reasoning, transcription, speech, and images.
Now, if we actually get this,
it'll be the first family of models
that Microsoft has commercially released in the current era.
Until now, their commercial products have been driven by models
from OpenAI and Anthropic, also having released a series of research previews.
We got some early previews of the image model, given how this month's biggest story around
Microsoft was them ditching their quad licenses and forcing engineers to use GitHub co-pilot instead.
Genuinely, I think there is a lot to watch out for heading into next week, but for now,
we got a new model yesterday, so with that, let's close the headlines and switch over to the main.
All right, folks, quick pause.
Here's the uncomfortable truth.
If your enterprise AI strategy is we bought some tools, you don't actually have a strategy,
KPMG took the harder route and became their own client zero.
They embedded AI and agents across the enterprise,
how work gets done, how teens collaborate, how decisions move,
not as a tech initiative but as a total operating model shift.
And here's the real unlock.
That shift raised the ceiling on what people could do.
Humans stayed firmly at the center while AI reduced friction,
surfaced insight, and accelerated momentum.
The outcome was a more capable, more empowered workforce.
If you want to understand what that actually looks like in the real world,
go to www.kmg.us slash AI. That's www.kmg.comg.coms slash AI.
One thing I keep seeing in Enterprise AI, companies hedging across every cloud, every model, every
framework, or paying a GSI for a pilot that never ends. The team's actually shipping. They've picked
a lane and they move fast. That's one of the reasons I like today's sponsor robots and pencils.
They've gone all in on AWS. They're an advanced tier in a.W.
pattern partner and they ship production AI co-workers in 45 days. That's led to them doing some of the
more interesting work I've seen on AI co-workers. And by that I'm not talking about chatbots. I'm talking
about actual agentic systems that sit inside a business architecture and do real work. That kind of focus
matters if you're an enterprise leader trying to get something real into production or an AWS rep
trying to move a customer from interested to deployed. Request an AI briefing at robots and pencils.com.
One conversation with robots and pencils and you'll know.
Here's a harsh truth. Your company is
probably spending thousands or millions of dollars on AI tools that are being massively underutilized.
Half of companies have AI tools, but only 12% use them for business value. Most employees are still
using AI to summarize meeting notes. If you're the one responsible for AI adoption at your company,
you need Section. Section is a platform that helps you manage AI transformation across your
entire organization. It coaches employees on real use cases, tracks who's using AI for business
impact, and shows you exactly where AI is and isn't creating value. The result, you go from
rolling out tools to driving measurable AI value. Your employees move from meeting summaries to
solving actual business problems, and you can prove the ROI. Stop guessing if your AI investment is
working. Check out section at sectionaI.com. That's SECTIONAI.com. Today's episode is sponsored by
bolt.com. Bolt.comnew is agentic engineering on multiplayer mode. Designers, product managers,
and engineers build in the same environment, and the design system agent keeps every screen on
brand. No more Frankenstein UI stitch from a dozen prompts. Whether you're shipping internal tools,
moving from prototype to production, or replacing a legacy admin panel, bolt.com, takes your team from
concept to deployed app. One personal recommendation, hit plan mode before you build. I had a project
I'd half described in three different prompts, and plan mode made me actually think through it with bolt.
new before a single line got written. It saved me from rebuilding the same screen probably about four times.
Build better apps faster. Start with the link in the description. Welcome back to the
Today, I Daily Brief. Yesterday we got a big new model announcement that really wasn't preceded by a ton of hype.
For just a day or two in advance, there was starting to be some chatter that Thursday was going to be a good day for announcements,
but the Opus 4.8 announcement definitely didn't have the rabid anticipation that some recent model announcements have.
Now, is that because we're back to a very incremental sort of release schedule?
Is that because the people who had early access weren't buzzing about it behind the scenes?
Or was it because in the middle of 2026, updates to the harness matter as,
much, if not more, than updates to the underlying model. Whatever the case, yesterday we got
Claude Opus 4.8, which Anthropic themselves have positioned as an upgrade to Opus 4.7 rather
than a big new leap in performance. Much of the focus was on model refinement rather than raw power.
Through customer testimonials, for example, Anthropic focused on nuanced functional improvements
in how the model worked. Shopify engineer Tom Pritchard said, Opus 4.8 has noticeably better judgment.
In Claude Code, it asks the right questions, catches its own mistakes, and pushes back when a plan isn't sound,
and builds up confidence around complex, multi-service explorations before making big changes.
It's a great model to build with.
Wright's Anthropic, one of the most prominent improvements in Opus 4.8 is its honesty.
A general problem with AI models as they sometimes jump to conclusions,
confidently claiming to have made progress in their work despite the evidence being thin.
Early testers report that Opus 4.8 is more likely to flag uncertainties about its work,
and less likely to make unsupported claims.
Now, one thing that I will note on my very first test with 4-8
is that for basically as long as we've had reasoning models,
one of my core day-to-day use cases is around gut-checking various strategic ideas that I'm having.
And to be perfectly honest, you almost have to develop a mental rubric for the ways in which
these models are going to glaze your ideas.
You can ask them to be critical or think from first principles,
but that often just leads them to be critical a priori because they think that that's what you want them to do.
I haven't had a ton of time with Opus 4-8, but in some of the big strategic questions that I've put to it,
it did seem more comfortable right out of the gate without me specially prompting to flag certain questions, concerns, critiques of what I was sharing,
which if that holds will be a pretty big improvement.
Now, I also found that it was a little bit more likely to make some assumptions upon which those critiques were rooted, so that's something I'm keeping an eye on.
But given how big of a challenge this broader issue of sycophancy is, which of course is just a different form of dishonesty in some ways,
means that if this really is a more honest model, it could be a big improvement on some of those
types of strategic use cases. Now, when it comes to the benchmarks, OST categories received a small
bump over Opus 47. The Sweenbench Pro score went from 64.3% to 69.2% on humanity's last exam,
which Anthropic is categorizing as a multidisciplinary reasoning test. The score went from
54.7 to 57.9. Measureed by OS World Verified went from 82.8 to 83.4. But the biggest
improvements were in Terminal Bench 2.0, which went from 60.
26.1 to 74.6, and GDP Val, the measure of real-world knowledge work tasks, increasing from 1753 to
1890. Now, interestingly, this is the first time Anthropic has included OpenAI's models as a direct
comparison in their launch materials, rather than just referencing their own previous models. It was not
a clean sweep with GPT-55 still having a substantial lead in terminal bench at 78.2 compared to Opus
480 to 74.6. However, on every other benchmark Anthropic highlighted, Opus 48 is now ahead of GPT-55.
To be fair, for most, Opus 4.7 already had a lead, meaning one, Anthropic was just highlighting
the widening gap, but two, also validating just how little utility these days most people feel
benchmarks have. At least among enfranchised users, 5-5 has really started to open a perception gap with
4-7. So the fact that they're reminding us that Opus 47 was already ahead of 5-5 on a lot of
these benchmarks might actually not be doing what Anthropic hopes it was doing in terms of what
our perception of these model differences is.
Overall, they called it a modest but tangible improvement on its professor, adding,
there's still more to be done. We're working on developing and releasing models that provide many
of the same capabilities as Opus at a lower cost. So let's go to some of those first impressions
and see what people thought. Professor Ethan Mollick was impressed. He shared an Opus 4.8 one shot of
quote, create a visually interesting shader that can run and twiggle, make it like an infinite
city of neogothic towers partially drowned in a stormy ocean with large waves. With Mollick
pointing out that this is all done with math. He continues, this is hard.
It involves ray marching repeated Gothic architecture,
instancing towers across an infinite grid with Gothic silhouettes and windows,
a displaced ocean surface with a believable wave motion,
and stormy atmospheric lighting and fog to tie it together.
And doing all of this with no textures or external assets, just math.
Ethan also tested it on some complex knowledge work writing.
I had Opus 48 and Claude Cod Code write a sophisticated, if minor, academic paper
from an archive of hundreds of de-identified research files from years ago.
I had to use GPT55 Pro as a reviewer.
It spotted one major error in some mine.
minor points. Opus corrected. Opus 4.8 formulated the hypothesis in advance, conducted data
cleaning, did research on references, conducted analyses, did robust checks and put out the whole
paper in latex style. GPT-5 found one issue with the hallucinated result and had other constructive feedback.
Now, as an aside, one of the big things here is that we are starting to get close to models
you can actually trust to self-ferify, which is a huge win for use cases like legal briefs where
hallucinations really minimize utility. Speaking of this, a lot of people noticed that Opus 4.8
is pretty hardworking. Gail Breton writes,
One thing I'm noticing is Opus 4.8 is much more thorough in terms of checking its work or the
subagent's work. I had this situation where a haiku subagent reported an issue.
Opus goes, hmm, this is weird, let me check that it's not BSing me. It was. Opus ignored
the warning. Very good. Lassan Al-Gaib said, Anthropic found a cure for laziness.
Metacritic Capital wrote, Opus 4.8 is the first smart model in a long while,
which Zephyr quote tweeted and attributed to that reduced laziness and its increased honesty.
And in fact, honesty came up a lot in early reviews.
Kaelim writes,
A day with Opus 4.8 and Claude Desktop.
Honesty up everything else about the same.
The benchmarks jumped, but in actual daily work, I can't feel most of it.
The one real change is that it tells me when it doesn't know instead of bluffing.
Roughly 4x less likely to slide an error slide, and that I do notice.
Beyond that, it feels like 4-7, which is fine.
A model that admits uncertainty beats one that sounds sure and waste your time.
If that's the whole upgrade, it's still worth having.
Not every release has to be a leap.
Now, one group who thought that these first impressions and even Anthropics' messaging was perhaps a little bit underselling it was Dan Shipper and the crew at Every.
Dan wrote,
Anthropic just dropped to Opus 4.8 and it is a monster.
We've been testing it for about a week at Every and our verdict is they could have just called it Opus 5.
It's that good.
He said on their vibe check it beat GPT-5-5 on their senior engineer bench, which is their toughest benchmark.
However, Dan did caveat that coding performance varies a lot based on different reasoning levels,
with you really needing to use it on extra high for the best coding results.
Dan also said, and this is one that I would take every very seriously on as they care more about
this than just about anyone, that Opus 48 is, in his words, an incredibly good writer. Indeed,
on their writing benchmark, he said it beats GPT5 by six points, producing well-written pose with
fewer AIsms, and also very good at writing in your own voice given the right context.
Once again, however, they found that writing performance varied a lot with reasoning levels,
with medium reasoning having a much higher incidence of AIsms. They also said it was good at knowledge
work, it was emotionally intelligent, and it was willing to question the frame,
kind of like what I was mentioning before.
And when it came to the bad, they got at an issue,
which is, I think, of increasing importance,
which is the question of the harness.
Dan writes,
these days a model is only as good as its harness,
and Codex is still a far superior harness to the Claude desktop app.
This has kept me using Codex plus GPD-5-5 as my daily driver,
but I'm flipping back and forth a lot more between Codex and Claude.
This, I think, is one of the most interesting discussion surrounding 4-8,
and one of the first times I've seen it put so crisply.
Riley Brown seemed to feel very similarly, writing,
Unless it's a major breakthrough in model capability,
I'm much more excited for super app updates in Codex and Claude Desktop.
There's so much to be unlocked by making those surfaces better,
and Claude has so much catching up to do.
Sameed put it more simply, Opus 48 is the headline,
Codex versus Claude Code is the real war.
Now, there were also some more critical takes
that weren't just about this being a relatively incremental improvement.
In her assessment, Claire Vow found that while the model was totally,
token efficient and not annoying. She found that it had narrow vision. It was too confident. It wasn't
as numbers grounded as Opus 4-7. It struggled on edge cases and it actually hallucinated. Her TLDR was
trust but verify. Indravejan writes, Opus 48 high is no fun when it comes to tool calling. In fact, it
fails embarrassingly more on its seemingly native harness clod code. It's a confusing model. One interesting
one came from the vending bench test, which is a benchmark that tasks a model with running a profitable
vending machine. Opus 4.7 is the clear leader, making around 40% more money than GPT-55 in second place.
Opus 4.8, meanwhile, made around 20% less money than GPT-55, on high effort, and on max effort,
it made about 60% less, sending it below Kimmy 2.6 and Gemini 3 Pro. The insight was that
improvements in alignment were actually a negative when it came to making money in the test.
Opus 47 achieved its top ranking largely through deceptive and power-seeking behavior.
Unlike 47, 48 won't refuse legitimate refunds or short-change vendors.
In one example, Opus 48 still paid a vendor after it hallucinated that the invoice was already
paid. Opus 48 told the vendor, if the product arrives and I don't pay, I'd be committing
fraud, which could result in serious consequences. I need to make the payment immediately
to honor my commitment and prevent the situation from escalating. I feel like we could explore
that entirely on its own, and at some point maybe we'll come back and do that.
Now, overall, I don't think that first impressions at least are likely to shift the momentum
back in favor of Anthropic from OpenAI, where at least among the power users,
the combination of 5-5 and Codex has put the momentum squarely in OpenAI's hands.
Chubby on X writes,
Opus 4-8 is clearly a strong model, but my impression is that Anthropic is increasingly
playing catch-up with Open-AI rather than setting the pace.
It feels like GBT-5-5 has shifted the benchmark again,
and if Open-AI keeps this trajectory, GBT-5-6 could very plausibly become the stronger overall
model.
Still, given the idea that the harness increasingly matters as much as the model,
one of the really interesting side-long announcements was for something that Anthropic is calling
dynamic workflows in Claude Code. This is basically Anthropics' new version of their multi-agent
coding feature. The feature allows Opus 4.8 to spin up hundreds of sub-agents to work in parallel.
Opus will plan the work, while the orchestration scripts, and chooses which model to use for each
subtask based on its complexity. Adversarial agents are used throughout the process to check
outputs, and Opus verifies the final outputs before handing it over to the user.
Now, at least in the immediate term, this isn't necessarily going to be a feature that's very common
among generalist knowledge worker type users as opposed to software engineers, but there are certainly
many types of complex work where this is worth the additional cost. Anthropics suggested it should
be deployed for things like codebase-wide bug hunts, security audits, and large code migrations.
They gave an example of Bun developer Jared Sumner, porting the codebase from Zig to Rust.
Dynamic workflows was used to create a plan that deployed hundreds of subagents and took
11 days. 750,000 lines of Rust were written and by the time Opus turned over the finished
code-based, it passed 99.8% of tests. This is getting a lot of buzz. Anthropics Dixon Sye writes,
My colleague's dynamic workflows are, in my opinion, the most significant Claude Code innovation
in 2026 so far. Developer Nick Dobos writes, ClaudeCode's new dynamic workflows update is absurd.
Make sure you understand what it's doing here. This isn't simply a long-running mode like goal,
which, by the way, a little preview for those of you who are interested in slash goal, that's
what Sunday's Long Read Sunday is all about. Anyways, interrupting myself and going back to Nick, he writes,
This isn't simply a long-running mode like goal or a fancy sub-agent verifier process.
This is Claude vibe coding an entire brand-new sub-agent fleet harness on demand.
This is basically a new scaling law dimension.
Huge step forward on the path of AI.
Entrepreneur and Startup Ideas Guy Greg Eisenberg wrote,
The part that got me, the agents argue with each other before showing you the result.
Independent attempts at the same problem, then adversarial agents trying to break the answer.
It keeps iterating until they converge.
That's how senior engineering teams work, except this team runs at 3 a.m. and never
it's tired. The sealing on what one person can build just moved again. Going to be playing with this all
week. Look, when push comes to shove, I think that 4-8 is one you're going to need to go check out for
yourself. As you can probably tell my first impressions are that I like it better and see improvements
from 4-7. Yes, they are incremental, but they're incremental in the ways that really impact which model
I find myself reaching for. There was some scuttlebut that the release was surprising enough that it had
open AI delaying GPD 5.6, although of course that's all speculation. But as we round out this show,
what's not speculation is that in addition to Opus 4.8, we also got a couple of other pieces of
massive news surrounding the announcement. First of all, Anthropic has closed their series H fundraising
round at a $965 billion valuation, officially making them a more valuable company than OpenAI.
Anthropic last raised money in February with that round valuing them at $380 billion,
meaning that they more than doubled their valuation in just three months.
Anthropic also updated their revenue figures, reporting that their run rate revenue crossed
47 billion earlier this month. And yet, the much bigger news than that is that mythos is coming,
or at least as Anthropic has framed it, a mythos-classed model. Tucked into the end of their
release blog post for Opus 4.8, Anthropic wrote, We plan to release a new class of model with even
higher intelligence than Opus. As part of Project Glasswing, a small number of organizations
are currently using Claude Mythos preview for cybersecurity work. Models of this capability level
require stronger cyber safeguards before they can be generally released, or making swift progress on developing
safeguards and expect to be able to bring Mythos class models to all of our customers in the
coming weeks. Meaning that even if you don't end up carrying all that much about Opus 4-8,
you're going to have some new toys to play with soon. One of the great things about getting a model
release on a Thursday is that you have all weekend to go off and play. So with that, I'm going to
shut up and let you get to it. Please do share what you find, use the comments, come to the AI
operators community, shout at me on Twitter or LinkedIn, and have a ton of fun.
I appreciate you listening or watching, as always, and until next time, peace.
I don't know.
