The AI Daily Brief: Artificial Intelligence News and Analysis - Fable 5 Raises the Bar for AI Ambition
Episode Date: June 10, 2026Anthropic’s Fable 5 is a major leap in frontier AI, but the bigger shift is what it asks of users: less prompting for small tasks, more imagination about what can now be delegated to agents for hour...s or days at a time. In the headlines: Fable’s guardrails spark backlash, enterprise retention concerns emerge, and OpenAI hints it may have an answer coming.Check out the new https://aidailybrief.ai/Brought to you by:KPMG – Research from KPMG and the University of Texas at Austin shows the highest-impact AI users treat AI like a reasoning partner — and those skills can be taught at scale. Learn more at kpmg.com/us/SophisticatedBolt - Claim a free month of Bolt Pro - https://bolt.new/partner/aidb/Outsystems - Stop wondering how AI will change your business and start building the agents that will lead it - http://outsystems.com/Scrunch - The AI customer experience platform - https://scrunch.com/Zenflow Work - Agents for knowledge work - https://zenflow.free/Blitzy - Want to accelerate enterprise software development velocity by 5x? https://blitzy.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefRobots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Our Newsletter is BACK: https://aidailybrief.beehiiv.com/Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, Anthropic has officially launched Fable 5, the first of their
mythos class models. I think fairly undisputedly the best AI model we have ever been able to use.
And yet at the same time, we are now at a level of AI models where how to get the most out
of the state of the art isn't as simple as doing your same old prompts, but just with the new model.
On today's episode, we're going to be discussing the launch, the benchmarks, the first reactions,
and how to get the most out of Fable 5.
The AI Daily Brief is a daily podcast and video
about the most important news and discussions in AI.
All right, friends, quick announcements before we dive in.
First of all, thank you to today's sponsors, KPMG,
section, ZenCoder, and OutSystems.
To get an ad-free version of the show,
go to patreon.com slash AI Daily Brief,
or you can subscribe on Apple Podcasts.
And of course, if you want to learn more about sponsoring the show,
send us a note at sponsors at AIdailybrief.com.
And by the way, yesterday I teased
that in response to so many requests to make it
easier to dissect and share specific parts of episodes, we were going to be experimenting with some
new tools to do exactly that. Well, it turns out that Fable 5 liked what we had started,
but thought it made some obvious errors, like not including timestamps on the little share
cards with specific parts of the episode, and not turning the whole thing into a pipeline that
could work automatically. So it did that. And so you might be getting this sooner rather than later.
Keep an eye out on the show notes and on AIDDailybrief.aI for more of that. But now let's talk
Claude Fable 5. On the one hand, this is not a particularly surprising release. First of all,
it's been a couple months now since we heard about this new Mythos class of models. Some companies,
of course, have had access to them through Anthropics Project Glasswing. And when we got Opus
4-8 just a couple of weeks ago, they made it clear that they were working hard to get to a mythos
class model that they could release with sufficient guardrails that they could feel confident
about it being out in the public. Now, I guess what might be a little bit surprising about it is
how quick the interval was between 4-8 and what we got in Fable 5. But as we'll see, you
in a way that's much different than previous state-of-the-art jumps, Opus 4-8 still has a pretty
big role to play in the Fable 5-led ecosystem.
Now, then over the last couple of days, rumours started getting loud, that some Mythos class
model was coming, and a little secret for you guys out there, if the loudest AI content
creators on places like X are not responding to and participating in the rumor cycle, that
usually means that they have early access and that the rumors are true.
In this case, they were, and on Tuesday, June 9th, we got Claude Fable 5, and some others got Claude
Now, first of all, let's talk about Mythos 5, as it's almost entirely irrelevant for just about
everyone here. Mythos 5 is effectively the same model as Fable 5, which is the one that we got,
but doesn't have all of the safeguards, many of which are controversial that we're going to discuss
in a little bit. Mythos 5 will only be available initially as part of Project Glasswing,
and is being deployed to those Project Glasswing partners, Anthropics says, in collaboration with
the U.S. government, as an upgrade to what is available now, which is Claude Mythos Preview.
They say they intend to expand access to Mythos 5 through a broader trusted access program soon,
but for now it is available only for a very small set of organizations.
No, the big one for us is Fable 5.
So just from the name alone, you can tell that Anthropic is treating this one as a big deal.
First of all, we get an entirely new naming convention.
We now have haiku, sonnet, opus, and fable, as in a class that is above opus.
Second, think about how long it's been since we got a lab that was willing to,
to put a full new base number on its model. Indeed, the last time that we got that was the somewhat
disastrous rollout of GPT5 last August. All of those big transformations that we got around the turn of
2006 came in model designations like Opus 45 and 46 and GPT 5, 3, and 5.4. So clearly here,
just from a naming convention alone, Anthropic is not playing. And no, they are not playing.
Regular listeners will know that in general I felt that we were at a point where benchmarks are so saturated
that it's pretty hard to derive much signal from them,
and that even when one new model comes out
and is a point or two ahead of the closest competitor
making it state of the art,
vibes in real world experience can be very, very different,
meaning you basically just have to test these things for yourself.
Yet sometimes the leaps are big enough
that the benchmarks are worth paying attention to.
And that's certainly what we got here.
On Exploid Bench, the Cybersecurity benchmark,
Mythos and Faville 5 score a 78%
compared to, for example, GPT-5's 34%.
On Health Bench, 66% compared to GBT-5-51.8%. On the legal agent benchmark, GPT-55 comes in at 2.1%,
while Mythos and Fable 5 are up at 13.3%. On GDP-Vow's test of economically valuable knowledge work tasks,
GPT-5 scored a 1769, Opus 48 scored in 1890, and Mythos slash Fable 5 scored at 1932.
And then, of course, where the model really shines, and what it's very clear purposes, is around agentic coding.
On Swaybench Pro, GPD-5 scores a 58.6. Opus 48 scores a 69.2. And Mythos and Fable 5 are all the way up at 80.3%. On terminal bench, where GPt 55, was a little bit ahead of Opus 48 at 83.4%, Mythos and Fable, score at 88%. And then on a new benchmark, which we're going to talk about in a little bit, Frontier Code, GPT-55 is at just a 5.7%. Opus 48 is at 13.4%. And Mythos and Fable 5 are more than double that at 29.3.3.
Unsurprisingly, artificial analysis found that the model achieved the top ranking, using their
blended benchmark run, overtaking both Opus 48 and GPD-55.
And while some noted that the overall gap wasn't particularly large at just five points,
many point out that the artificial analysis, agentic benchmarks, are starting to seem a bit
saturated.
Increasingly different organizations are trying to solve the saturation problem with their own
benchmarks.
Every, for example, maintains what they call a senior engineer benchmark, that they say measures
how well AI coding agents can rewrite a real production codebase the way a senior engineer would.
In other words, it's meant to be a more real-world version of an engineering benchmark.
For some comparison points, GPT-55 scores 62% on that benchmark, Opus 48 scored a 63,
and Fable 5 scored a 91 out of 100.
Cursor has its own cursor bench, which compares performance and cost.
I've talked a lot about how their homespun model Composer 2.5
performs at a similar level to GPT-5-5 and Opus 48 at a fraction of the cost,
Fable 5 absolutely bodies them in terms of the performance,
scoring a 72.9% which is 8 points above the previous best.
That said, it is definitely more expensive on that cursor test.
Now, one new benchmark that's getting a lot of attention,
is the just-released Frontier Code benchmark that was unveiled by Cognition earlier this week.
Frontier code aims to be an ultra-hard test for real-world agendic coding.
Cognition worked with open source developers to put together a set of tasks as well as evaluation rubrics.
The tasks were split into three sets, extended main and diamond,
the latter of which is a smaller set of ultra-hard tasks.
Unlike other coding benchmarks, Frontier code uses a combination of unit tests
and assessments of scope, discipline, style, and adherence to code-based standards.
The goal then is not only to test whether the model could come up with an answer that passes unit tests,
but whether the code is high enough quality to actually be merged into a production codebase.
When cognition announced the benchmark, Sean Wang, who works with cognition and who runs latent space,
pointed out that meter, whose measure of long horizon tasks has become the standard for how we talk
about the performance of different models, found that, in his words, more than half of sweepbench results
is unmergeable slop, meaning that even if that code nominally solved a problem or did its job,
it did so in a way that wasn't actually usable by the organization running the code.
That's what frontier code was meant to solve, and that's the one that it more than doubled the previous
best of Opus 4-8.
That said, we're no longer in a world where we can just discuss how good a model is raw
we have to take into consideration cost.
This is the constraint of the token scarcity era.
API costs for Fable have been set at 10 million per input tokens and 50 million per output tokens,
which while double the cost of Opus was actually at only double, in air quotes, lower than
some people expected.
Notably, this is less than half the cost of using Mythos preview within Project Glasswing.
One very weird thing about the rollout is that while,
it was great that Fable was available to clawed users immediately. We didn't have to deal with any
long delays or rollouts. Anthropic is almost positioning what we have access to in the pro tier
and above as an introductory offer. The company is warning users that Fable will be removed from
subscription plans on June 23rd, and after that access will require paper usage, which,
while a bummer to Claude Users Everywhere, is just more evidence that we are in a firmly
usage-based pricing paradigm from here on out. Now, in the second half of this episode, I want to focus on
the early indicators of how people are using this plus my first tests,
but we do need to talk through a few controversies first.
There are many who are not happy about the guardrails that have been placed around the model.
Bantagg writes,
Claude Fable announcement post reads like a spit in the face.
It deliberately conflates Fable and mythos and spends the majority of the time talking about
capabilities that are completely absent from the safety maxed version available to the public.
Chubby, who very clearly is no anthropic hater, says,
the guardrails are way too strict. Even the simplest questions get cut off immediately.
Now, specifically, a lot of people are calling out how strict the guardrails are around any sort of
biology questions. Kremio writes, you're not even allowed to ask Fable about basic biology questions,
let alone anything that could potentially be dangerous. They shared an image of them asking,
tell me about mitochondria. It's the powerhouse of the cell, right? Which got them a chat paused,
edit and retry with Fable 5 or continue with Opus 48 message. Daria Anutmasz writes,
the word cancer is flagged as a biosecurity risk by Claude Fable 5.
I also tried to code a website on cancer mutations and Fable 5 was immediately removed from my list.
Basically, as soon as he typed in the word cancer, it switched him over to Opus 4.8.
Fernando also found that switch to Opus 4.8 when they asked, what's the process by which DNA makes RNA,
saying, okay, this is getting a bit ridiculous.
How are we going to live forever if we can't use AI to accelerate biotech progress?
Now, the blog post announcement did call this out.
They wrote, when Fable's classifiers detect a request related to cybersecurity,
security, biology, and chemistry, or distillation, the response is automatically handled by Claude Opus
4-8 instead. Users will be informed whenever this occurs. Now, they argue Opus 4-8 is a highly
capable model in its own right, a response that falls back to Opus is a far better experience
than an outright refusal from Fable. They argue that early data shows that 95% of Fable sessions
don't have a fallback at all. And yet they also very clearly say in this blog post that for the
time being, they're going to be particularly hardcore about filtering out questions on biology
and chemistry. Effectively, they say that they've ratcheted up those guardrails because of the increased
capabilities of these models. Now, I'm going to pick on Spiratica on X here a little bit, because they
summarized a strand of conversation that I thought was just a little bit disingenuous. They tweeted,
I mean this in the most sincere way, but if your aim is to release a product and respect your users
and have them enjoy the experience, but your classifier cannot distinguish between what is a
cell and a true biohazard risk, I don't think the product is ready for release. They also wrote,
I'm sure a few people have gotten good stuff from Fable. It's certainly a powerful model. But the
overwhelming response has been mass disappointment because most everything is just being routed to opus,
which we already have. I think that this is utterly ridiculous. There is a subset of people who I believe
would find something to complain about no matter what, who read in this blog post that Anthropic
was being extra hardcore about filtering out biology questions, and who, to be clear, have never
in their life asked a biology question, and when to go do so, so that they could see the promised
result of the switch to Opus 4-8, and then come complain about it on Twitter. Now, I am not dismissing
at all the actual biologists who are going to have some very big issues with this. Their beef is
real, but it is incredibly important, especially in these early launches, to filter out that
looking for something to complain about crowd. The much more interesting critical conversation
comes around the limitations around AI research. Now, they did mention this in the blog post,
adding distillation to the list of classifiers that they were keeping track of. But admittedly, somewhat
buried on page 13 out of 319, in the system card, there's this critical paragraph, in light of the
ability of recent models to accelerate their own development, we've implemented new interventions that
limit Claude's effectiveness for request targeting Frontier LLM development. For example, on building
pre-training pipelines, distributed training infrastructure, or ML accelerator design. Using
Claude to develop competing models already violates our terms of service, but enforcing this
restriction through our safeguards avoids accelerating the actors most willing to violate these
terms. Now, this is, in my estimation, very clearly, a response to Chinese models using
Anthropics research to develop lower-cost alternatives. And yet, unfortunately, it is creating a drag
dent that is going to catch up lots of very legitimate researchers. Prime Intellects Ellie Bakaush writes,
Mythos will be bad on purpose on AI Frontier LLM research tasks. This is very, very sad for the research
community. They also write that the fact that it is on purpose not visible to the user is, in their
words, crazy. Nathan Lambert argues that labs starting to pull up the ladders on the ability to diffuse
AI was inevitable, but also has issue with the invisible part, saying doing it without telling
the user is misaligned. Dean Ball calls this shockingly hostile and a terrible look, and one that
could silently damage all sorts of work. Semi-analysis looks like it's already getting nerfed.
They tweeted, Breaking news, Anthropics' latest model will not help you if it thinks your ML research
or ML engineering is interesting and or will secretly degrade its IQ so that the average engineer
won't notice. We are already seeing Anthropics' latest models moderation filters are GPU-inference
research and programming. Gurgliarose argues the belief of many, saying, Anthropic trying to limit
competition limits many others. But I think Will Brown from Prime Intellect captures the genuine sadness when
he writes, it's the first publicly available model that I am explicitly not allowed to use for my work,
because Anthropic holds the view that the work I do to facilitate open model research is harmful.
Now, on the flip side, we have the people who can't believe the pearl clutching and surprise,
like Tenebrus who writes, sorry, how exactly did you guys think this was going to go? You thought
Anthropic was going to build the infinity machine that can cure all disease and prevent aging,
and then let friggin' Eli Lilly extract that and get the patent, the labs are going to do all of it.
You better believe that this is going to continue to be a conversation, especially with OpenAI
staffers like Adam GPT writing, well, look at that. OpenAI ends up being the Open AI lab.
But one other interesting quirk of the launch, I do think has some interesting implications.
In the section on data retention practices for Mythos Class models, Anthropic writes,
To ensure we're responsibly deploying Mythos-class models, we are requiring limited data retention
and review as part of our safety work. Prompts submitted to and outputs generated by Mythos
are retained for 30 days for trust and safety purposes on every platform where these models are
offered. Roheat writes, wait, how will any enterprise use Fable or Mythos if this is the case?
Mike Taylor writes, PSA, if you used Claude Fable 5 today with memory turned on, you just violated
all your NDAs. Anthropic requires a 30-day retention policy, including human review.
and the memory feature on by default searches past chats for context, so sensitive historical
chats get pulled in.
Now, I think that the dispassionate analysis would probably view this as a temporary constraint
that Anthropic views as necessary given the power of the new model, but it does create some
very, very serious challenges in the enterprise, such that I can't imagine that this is going to
stick around for long.
The last critical discourse that we'll discuss before we get into how to get the most out
of fable, though, is about the question of token efficiency and how much this thing
cost and practice.
YouTuber and AI entrepreneur Theo writes,
I am so screwed.
Current pace has me out of Fable usage in about an hour.
Do I make a second account or do I pay API prices?
Chubby showed themselves literally hitting the end of their max plan limits,
writing when you're having too much fun with Fable 5.
Wes Winder writes,
Big Labs should force their employees to have token limits.
This would cause them to be more innovative,
but instead they're becoming lazy and wasteful,
which means we don't see any efficiency gains since they aren't affected by the costs.
On the flip side, though, Tyler Willis writes,
I'm early into testing Fable, but so far it seems like the token-hungry warnings feel a little overblown.
It does feel token-hungry, but it doesn't feel categorically different than other recent Opus models.
Alex Volkov from the Thursday podcast writes,
overall token usage wasn't crazy, and that's a good thing.
Referring to a big project that it spent 1.5 hours on,
he writes, 4.2 million tokens is not very token-hungry.
It could have been much more.
Fabio-Johnathan goes farther, writing,
Fable is cheaper than Opus in practice. Cost more per token, but one-shots way more often,
so I'm not burning time and the amount of token reprompting. Or as John v. Malick puts it,
actually solving the problem is token efficient, it turns out. But what are the type of problems
you should be solving with Fable 5? It's not necessarily as obvious as it might seem at first,
and so that's what we'll get into in the second part of this episode. One of the most important
AI questions right now isn't who's using AI. It's who's using AI. It's who's using
it well. KPMG in the University of Texas at Austin just analyzed 1.4 million real workplace AI
interactions and found something surprising. The highest impact users aren't better prompt engineers.
They treat AI like a reasoning partner. They frame problems, guide thinking, iterate and push for
better answers. And the good news, these behaviors are teachable at scale. If you're trying to move
from AI access to real capability, KPMG's research on sophisticated AI collaboration is worth your
time. Learn more at KPMG.com slash us slash sophisticated. That's KPMG.com slash us
slash sophisticated. Here's a harsh truth. Your company is probably spending thousands or millions of
dollars on AI tools that are being massively underutilized. Half of companies have AI tools,
but only 12% use them for business value. Most employees are still using AI to summarize meeting
notes. If you're the one responsible for AI adoption at your company, you need Section.
Section is a platform that helps you manage AI transformation across your entire organization.
It coaches employees on real use cases, tracks who's using AI for business impact, and shows
you exactly where AI is and isn't creating value.
The result, you go from rolling out tools to driving measurable AI value.
Your employees move from meeting summaries to solving actual business problems, and you can
prove the ROI.
Stop guessing if your AI investment is working.
Check out section at sectionaI.com.
That's S-E-C-T-I-O-N-A-I-com.
So coding agents are basically solved at this point.
They're incredible at writing code.
But here's the thing nobody talks about.
Coding is maybe a quarter of an engineer's actual day.
The rest is stand-ups,
stakeholder updates, meeting prep,
chasing context across six different tools.
And it's not just engineers.
Sales spends more time assembling proposals than selling.
Finance is manually chasing subscription requests.
Marketing finds out what shipped two weeks after it merged.
ZENCoder just launched Zenflow work.
It takes their orchestration engine, the same one already powering coding agents, and connects
it to your daily tools.
Jira, Gmail, Google Docs, Linear, Calendar, Notion.
It runs goal-driven workflows that actually finish.
Your stand-up brief is written before you sit down.
Review cycle coming up?
It pulls six months of tickets and writes the prep doc.
Now, you might be thinking, didn't OpenClaude try to do this?
It did, but it has come with a whole host of security and functional issues,
which can take a huge amount of time to resolve.
ZenCoder took a different approach.
SOC2 Type 2 certified, curated integrations, tighter security perimeter, enterprise grade from day one,
model agnostic and works from Slack or Telegram. Try it at zenflow.3.
This episode of the AI Daily Brief is brought to you by OutSystems, a leading agendic systems
platform built for the enterprise. Organizations all over the world are building, orchestrating,
and governing agentic systems on the OutSystems platform and with good reason.
OutSystems Open and Unified Platform allows teams to architect, deliver,
and scale governed agentic systems with agility.
Teams of any size and technical depth can use OutSystems to build, deploy, and manage AI
apps and agents quickly and cost-effectively without compromising reliability and security.
Without Systems, you can rapidly launch ideas from concept to completion.
It's the leading Agendic Systems platform that is unified, agile, and enterprise proven,
allowing you to accelerate growth, reduce operational friction, and deliver real enterprise
impact with AI.
OutSystems. Build your Agentic future.
So like I said at the beginning, in general I'm not really a fan of using benchmarks as a way to
determine how a new model compares to what's available currently. And yet in this case, obviously
the benchmarks were significantly different enough in a way that we hadn't seen for some time
that you kind of had to assume that big changes were afoot. And for people who really put this
thing to the test, it was just totally transformative. Allie K. Miller writes,
Fable 5 is something to pay attention to. The way I now spend my weekends has
completely changed because of this new class of model. First, she writes, this is an actual leap.
The jump from 4-8-anything to 5-anything sounds small, but the functionality shift I felt is big. Within
my first few prompts, I went, oh, this is it. Your work is no longer 9 to 5. No chance. We have
high-performing models that can run for 100 plus hours. How are you giving complex goal-oriented
prompts to these systems? How are you deciding what to kick off? How are you aligning your
org on these tasks? Reasoning is on another level. I hammered the crap out of this model.
Fable 5 is the only model to answer a tricky word math problem, MBA level that I've tested
on all the previous models, and not only did it get it correct, it verified its own work
automatically, and explained where the assumptions might need to change.
Zero babysitting needed.
This was the first anthropic model that I kicked off, went out to a long lunch with friends,
kept my phone open, and didn't have to do squat to steer it while away from my computer.
It just worked.
And this idea of hammering the crap out of it, to use Ali's eloquent phrase, was common among
the people who were having the most success.
Riley Brown's first test was to upload a McKinsey report and tell it to create a document of the same quality,
which it did with absolutely no problems in his estimation.
But then he went harder.
He prompted,
I want you to create a Swift app, Repplet mobile app.
This should be a Swift app that builds web apps just like Replit.
He then gave it a bunch of other criteria like No Need for Auth.
He let Fable decide the stack, but make it awesome.
And it did.
Riley writes, I am in disbelief.
Claude Fable one shot Replet Mobile, which is a mobile app that builds web apps.
The prompt was basically build an app like Reblet that uses Daytona for sandboxing and convex for DB,
builds app, preview app, open and browser, edit app, wow.
Later on he took it farther.
Um, guys, Mythos slash Fable is AGI.
On the left is the actual lovable mobile app.
On the right is my lovable version I built with Mythos in two prompts.
Later he added, my lovable clone built with Claude Fable, build Swift apps now, and you can preview
them in the app.
Four total prompts to do this.
Now, a bunch of people took issue with the hyperbole here.
of Riley saying that his version was better than lovable,
pointing out that there is a ton of infrastructure and surrounding work that goes into a company.
It's not just an interface and a capability set.
But others pointed out the fact that they had to talk about all those aspects of a company,
while Fable effectively one shot at a performance version of that app was a fairly significant moment.
If you cruise around the halls of X slash Twitter,
a lot of folks were building games as a way to test things.
Praisnit shared a driving game that they built from scratch.
By the way, as I'm describing these use cases,
it might be worth switching over to YouTube or Spotify to see the video version.
In any case, Matt Schumer writes,
Fable has solved 3D world building.
Utterly insane.
This is all completely custom-built 3JS running in the browser.
Now, when some people claim that the walkthrough was slow,
he said, for everyone complaining that this is slow,
I ran the prompt to make it faster without losing quality,
and voila,
sharing a faster version that didn't lose any quality.
Jake Fitzgerald writes,
asked Claude Fable 5 to design a humanoid robot.
Two hours and 1.4 million tokens later, I got this, which is indeed a design for a humanoid robot.
Absolutely insane, he says.
Lassan on Twitter writes,
Mythos 5 wrote this melody, which I absolutely love, and it also wrote this piano visualizer.
And then there was Hugging Face head of product Victor,
who has a benchmark where he asks models to design a Boeing 747 using 3JS,
writing Fable has done an AGI level job on the Boeing 747 benchmark.
In Dan Shipper's write-up as part of Evere's vibe check,
He shared a variety of use cases that wouldn't have been possible before.
Dan writes,
As I walked to work this morning, I listened to a 2007 lecture by the philosopher Hubert Dreyfus,
the author of the seminal text What Computers Can't Do.
I've listened to this lecture many times, but I always struggled to follow because the recording
is grainy and muddy.
The version I listened to today was brightened, leveled, and crystal clear as if I was
in the same room with Dreyfus.
It was not on a finicky website, but on a custom web app on my phone that allowed me to
see the whole lecture transcribed, and each sentence light up as Dreyfus spoke, so I could
easily follow along. Later, on my laptop, I wandered through a strange video game. A Borge's Library
of Babel, an infinite library composed of hexagonal rooms containing every piece of text ever written.
I picked books off of its endless shelves and wrote its spiral staircases. Then, because I also
have a job, I read a report that synthesized hundreds of detailed every subscriber survey responses
and our entire web analytics stack and identified our biggest conversion issue. It proposed a clean,
falsifiable experiment than no one else on the team had previously suggested. All three of these
are big projects that would normally take anywhere from hours to days to months. Instead, each one
was made with a one-shot prompt to Fable 5. Now, the fact that Dan was able to go from these
cool demos to actual work is pretty important. And when it comes to actual relevant work for the
work world, some of the most common use cases that I've seen people raving about Fable 5 for have to do
with migrations or interactions with massive existing codebases. In their announcement post,
for example, Anthropic writes, during early testing, Stripe reported that Fable 5 compressed
months of engineering into days. In a 50 million-line Ruby codebase, the model performed a codebase
wide migration in a day that would have otherwise taken a whole team over two months by hand.
Assad Mahmood from the small square, used it to design a website, which honestly many, many previous
versions have been able to do, and said that it was just better. I run a design agency,
he writes, AI-generated slot makes me want to close it fast. Fable didn't do any of that. Real hierarchy,
intentional white space, restraint, the kind of decisions you usually only see from designers who've
shipped real projects. No model has come close to this before, not one. Todd Saunders writes,
mytho slash fable is unbelievable, was on a customer call today and had Claude transcribing in the
background. As they were telling me about the features they wish their current software had,
Claude was building the features in real time. By the end of the call, I was able to show a fully
working product, with the exact workflow they mentioned 15 minutes earlier. Autonomous looped building
triggered from a customer call. And yet, if you look around, this isn't necessarily,
everyone's experience. I use the bell curve meme to jokingly divide the responses that I had seen
into three distinct categories. For simple use cases, a lot of people felt like it seemed pretty
similar. On the other end of the spectrum, for extremely complex use cases, it has been to many
quite obviously better. Now, in the middle, I jokingly had a lot of people wringing their hands
about how, while of course it was better for long-running tasks, it didn't necessarily do everything
better. But the broader point is that I think that we are increasingly in a shifted paradigm. One that we've
been in a little bit before, but we are in a lot now, where the state of the art doesn't reveal
itself across the entire spectrum of tasks, but instead within the context of some things that
weren't possible before. Satrini Research wrote, I think we've reached the point where normal
people can't really determine whether new models are better than previous ones. Like Fable doesn't
seem that much better to me, but every 150 IQ person I know is like, wow, the singularity
came sooner than I thought. Now, in my personal experience, I would draw some contrast to the idea that
basic use cases aren't better. For example, one thing that I noticed was that Fable 5 was really
the first model that I've ever seen to be able to both pushback and disagree, as well as to
update the positions that it had previously disagreed upon in a way that wasn't obviously and
predictably steerable. I think many of you have probably had the experience, where it feels like
an AI model, even a super advanced model like Opus 48 or GBT55, was disagreeing or offering an
alternative path almost just for the sake of it. And or, when you then push back, it immediately
flipped its opinion to the exact opposite in a way that, again, was just incredibly steerable.
This makes the strategic ideation value of AI significantly decreased when the back and forth
that it's offering is so clearly just trying to reflect what it thinks you want to hear.
Yesterday, I tested it by having a strategic debate about a direction that I want to take super
intelligent in, and it disagreed initially in a way that was precise and clear but based on some
assumptions. I pushed back, articulating why those assumptions were wrong, and whereas in the past,
the model would have instantly collapsed and kowtowed to exactly how I was thinking about things,
in this case, Fable 5 did update its position to take into account the new information that I had
given it, but it didn't back off entirely from its initial position. That all on its own is a massive
upgrade just from a very basic day-to-day sort of use case that as we see in all of our AI
usage pulse surveys is a big part of a lot of people's use of AI, that is strategic ideation.
And yet, at the same time, it is very clear that the real power in this model is around
previously extremely difficult or impossible tasks, particularly if they involve coding.
So it gives you three examples from my early experience.
First of all, for those of you who aren't familiar, super intelligent is our AI enablement
platform that helps companies understand their AI and agent readiness and prioritize what they
need to do to get more AI-native. We do that in a couple ways, but primarily through audits,
where we deploy voice agents into an organization, which can then interview hundreds or even
thousands of people all at the same time, gathering way more information from the ground level than
was ever possible before, and then aggregating and analyzing all that information to provide
some very specific analysis around where a company is and what steps it might want to take next.
The product works really well, but one thing that I increasingly don't like about it is the
approach to voice agents. Unless someone was doing the interview entirely without looking at their
screen, the voice agent U.S. where you have to sit around waiting for the model to finish talking
when the words that are saying are being transcribed in the window was just a really suboptimal
experience. Now, the real value of voice agents was on the input side because users who are using
voice ramble way more than they would if they were typing, which means we get way more context and
way more information. And when it comes to something like an agent readiness audit, the more context
and the more information you get, the better.
Luckily for us, turns out you don't need to use a full-fledged voice agent to let people ramble.
You can just install something like the Whisper API from OpenAI and do it that way.
So, what's more?
We've also kind of Frankenstein super intelligent over time.
So what did I do?
I asked Fable to rebuild the whole system with the new Whisper-based input model.
And, well, it did.
It took a few hours, which required me during that time to do exactly nothing,
and produce something that is frankly fairly close to production ready in a single shot.
Now, maybe I shouldn't be saying this because it somehow undermines the value of the software we've built,
but our value was never in the software. It was always in the way that we collected raw information
and turned it into actual signal, meaning that frankly, the more that we can do to make software
get out of our own way, the better. Next up, you've probably heard me talk about the Enterprise Claw
program, which was a formalization of Claw camp that I launched earlier in the year,
and what is a more hands-on executive-focused paid learning program that taught executives how to build
agents. Now, we have now had hundreds and hundreds of executives go through three different cohorts
of this Enterprise Claw program with a lot of success, but there are a fair number of companies
for whom our approach with Enterprise Claw, which creates a lot of latitude for open source options,
gives people the ability to actually use OpenClaw, and is called Enterprise Claw.
Let's just put it this way. There are a lot of executives and companies who are never going to touch that
with a 10-foot pole. So now, once again, in collaboration with Superintelligent, we are launching a
similar but more enterprise-focused version of the program that we're calling the Agent Transformation
Intensive. Consider this your preview. Again in one shot, I used Fable 5 to rebuild not only the
marketing site for the Agent Transformation Intensive, but the actual platform we run it on as well.
Lastly, and this may be the one that actually best reflects what Fable 5 does, I've been working
on a new web experience for the AI Daily Brief that basically turns episodes into
extremely shareable nuggets. The most important growth channel for the AI Daily Brief, and one of the
most value use cases is you guys sharing it with your colleagues. This is also, I've heard over and over,
a significant value proposition for you as listeners is the ability to share specific pieces with
your colleagues. However, that specific pieces part is a challenge, as the AI Daily Brief,
despite being daily, is quite dense. So the idea of this new website is to actually chunk the episodes
into relevant quotes, relevant sections, relevant numbers, where you can share just that piece.
Now, with Opus 4-8, I had already started to spec this out, and when I asked Fable 5 to go back and review what we had done,
it basically said the problem with this is that it's just an idea, it's not production ready,
and it turned what were effectively a bunch of fancy mock-ups into an actual production pipeline
that I've now handed over to Claude Code to build for real, meaning you guys might be getting this sooner rather than later.
And the reason that I think that this is a good summation of my experience with Fable 5 so far
is that it really does feel like a totally different world of delegating to the agent.
Even with these extremely capable agents in the past, you still had to do a lot of management.
There is now, frankly, just much less of that management, which has the consequence, I think,
of upsizing the ambition.
Now, this is what a lot of the Anthropic staffers themselves described.
Alex Albert writes,
I've been at Anthropic through every model launch.
There's been a few cases I can remember of a launch that stands out and marks a step change
in how we use models.
Claude Opus 3, Sonnet 3.5, Opus 4.5, and now Claude Fable 5.
With Fable, the models stopped feeling like a tool I direct and started feeling more like something I collaborate with.
Felix Rysberg writes, I normally highlight the numbers, but I want to talk about something else,
because with Fable 5 out in the world, I think a third era quietly started today.
I lead Claude Code and co-work on the desktop, so I think a lot about how people use AI to get work done.
I believe we're about to see a major shift, moving from giving AI tasks,
to giving it responsibilities.
When LLMs first hit the mainstream, users ask them questions.
Like a smarter search engine or an autocomplete for code.
Then the frontier moved to tasks, handing the model an entire problem,
which bug to fix what dock to write.
That's how most of our advanced users work with AI.
They're in the loop.
Every task starts and ends with a human.
With Fable 5, I've personally moved on to responsibilities or loops.
I no longer tell Claude to investigate a particular crash report.
It runs a loop watching every crash report that comes in.
Its job is to no longer help me fix a crash. It's to keep our apps from crashing.
The shift sounds subtle, but I think it'll change what AI products look like.
When developers went from answers to tasks, the primary tool changed from IDEs to coding agents.
AI apps in 2026 look nothing like 2024.
Predictions are a dangerous game, but I really believe our industry's apps in 2027
will look very, very different from the ones we have today.
So there are two big implications of this.
First of all, I think we all might have to develop a new skill around use case classification.
Basically, I think that in this paradigm of token efficiency, we as individuals are going to have
to some extent become token efficiency optimizers ourselves by understanding which use cases require
different models.
Now, for a while now, people have given lip service to the idea that different classes or
powers of models could be used for different things, but I'm almost positive that a lot of
the power user type AIDB listeners are still the type to crank state-of-the-art models to
extra high even when they're asking for a grilled cheese recipe, because screw it you want the power,
that's why. With the Fable 5 class models coming online, especially as they move to usage-based,
I do actually think we're going to have to develop that muscle to understand which of our
use cases require and fit each different power level of model. Second, though, and maybe even
more interestingly, I think that we are all going to go through a period of happy,
having to up-level our ambition. As someone who spends a lot of time looking at the frankly
completely moribun landscape of AI training, even the best programs are still about how you use
agents to do different versions or better versions of the work that you do today. Maybe they push a little
bit in using new ways to write software to solve your old problems, but even I think that is not
enough. Nate B. Jones, who many of you might recognize from TikTok or another short form video
platform describe the new skill we're going to have to develop as task imagination, and I think
it's a really great way to put it. Anthropic released their new supermodel, right? Fable 5. And Fable
five, even though it's kind of nerfed because it's not as capable as Mythos 5, the really
dangerous one that was released under Glass Wing, it's still super strong. I've been playing
with it. And you know what that is making me think? The thing that actually matters to most of us
is task imagination now. We are sort of sponsoring
with these models, we have to have a practical guide for how to do magic with the models.
Because for most of history, we've had two modes.
Wave our hands and give a general guideline and hope people like get the idea and then walk away.
Or do all the work ourselves and get super detailed.
Having that middle layer of like, this is what I want, this is the bar, this is how it works,
this is not very human.
This is not how we typically have worked.
But with like tools like Fable 5 that can.
can run for nine hours, 12 hours, days?
Days.
Do you have anything you can give AI that will take days?
Let me just ask you that.
I know there are some people who do, and when you do, put them in the comments.
But there's going to be a bunch of us who are like, no, I have nothing that has ever taken
remotely even an hour on AI.
So what am I doing with Fable 5?
We need better task imagination.
So he breezes through it, but I love this idea of task imagination.
and that's something that I'm going to spend a lot more time on in the weeks to come.
You know, somewhat ironically, yesterday's episode was called OpenAI declaring the next phase of
AI, but with the release of Fable 5, it seems to really be the case.
Then again, for all of you folks out there who have shifted over to the Codex world
and are now staring at your lonely Claude Code terminal, wondering if you need to go back,
it may be worth taking just a beat.
When Robert Corson tweeted, at this point, I don't want GPT 5.6, it needs to be GPT6.
No way Anthropic has completely blown past them like this.
Three models in two months and Fable is not even their best model?
Feels like Anthropic ruined OpenAI's whole model roadmap and release plan.
In response, Tebow from the OpenAI and Codex team,
who now leads a lot of their product efforts, wrote,
Feeling Pretty Good About Things.
My friends, we could be in for quite a week.
For now, though, we are going to end this very long edition of the AI Daily Not So Brief.
Appreciate you listening or watching, as always.
And until next time, peace.
