Deep Questions with Cal Newport - Is Claude Mythos “Terrifying”? | AI Reality Check
Episode Date: April 16, 2026Cal Newport takes a critical look at recent AI News. Today's episode on YouTube: youtube.com/calnewportmedia 0:00 What’s Really Going on with Mythos? 10:09 Security systems 21:27 Conclusi...on Links: Buy Cal’s latest book, “Slow Productivity” at www.calnewport.com/slow https://www.youtube.com/watch?v=iRsycWRQrc8 https://www.nytimes.com/2026/04/07/opinion/anthropic-ai-claude-mythos.html https://arxiv.org/abs/2404.08144 https://x.com/clementdelangue/status/2041953761069793557?s=61 https://x.com/stanislavfort/status/2041922370206654879 https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities Thanks to Jesse Miller for production and mastering and Nate Mechler for research and newsletter. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Transcript
Discussion (0)
Anthropic recently announced a new LLM named Claude Mythos.
They claimed it was so good at finding and exploiting security vulnerabilities and source code that they couldn't release it to the general public for fear that our infrastructure as we know it would be hacked and collapsed.
Now, as I'm sure Anthropic hoped, this announcement generated a lot of attention.
Here's what Thomas Friedman said in his widely read New York Times column.
Normally right now I would be writing about the geopolitical implications of the war with Iran,
but I want to interrupt that thought to highlight a stunning advance in artificial intelligence,
one that arrives sooner than expected,
and that will have equally profound geopolitical implication.
Friedman then goes on to conclude, and I'm quoting him here,
holy cow, super-intelligent AI is arriving faster than anticipated.
Basically, the mood of much of the internet right now about Claude Mythos is that Anthropic just invented the Wopper supercomputer
from the 1983 Matthew Broderick movie War Games.
Well, the Wopper spends all its time thinking about World War III.
24 hours a day, 365 days a year, it plays an endless series of war games.
using all available information on the state of the world.
But here's the key question.
How much of this is actually true?
Well, it's Thursday, which means it's time for an AI reality check episode.
So this is the perfect opportunity to look closer at these claims.
Now, here's my plan.
I went out and read basically every independent test or assessment that I could find about mythos
and or its reported capabilities.
I read all these reports so you don't have to.
and I'm going to bring out of all of this reading the key observations that you need to know.
The reality, as you'll soon learn, is not nearly as simple as the ghost story that Anthropic is trying to convince us to believe.
All right, we have a lot to cover in this episode, so let's get into it.
As always, I'm Cal Newport, and this is Deep Questions, the show for people seeking depth in a distracted world.
And we'll get started right after the music.
All right, so what's really going on with Claude Mythos?
Well, at the core of the concern surrounding Mythos,
if you talk to an average non-technical person
who's been following this story,
they will say, here's how they understand it,
that when Anthropic trained up this new model,
it displayed a new cybersecurity capability
that surprised them.
Oh, my God, this thing can find vulnerabilities
and attack systems right now,
causing Anthropic then to have to hastily pull back
their plan to release the model to the public.
That's how most people understand this story.
But that narrative is not correct.
Security researchers have been using LLMs to find security vulnerabilities and program
exploits since basically the beginning of consumer LLMs.
This is not a new capability that emerged in Claude Mythos.
Like let me load a paper here on the screen from all the way back in 2024.
This paper was titled LLM agents can autonomously exploit one day vulnerability.
In this study, the researchers from IBM found that GPT4, remember that GPT4, successfully exploited 87% of the vulnerabilities that it was presented, and they showed that this was a big increase over what GPT35 could do.
They concluded, our findings raised questions around the widespread deployment of highly capable LLM agents.
Now, to be fair, this study from 2024 used LLMs to exploit existing vulnerabilities, but anthropic notes that mythos can also forefews.
find new vulnerabilities that no one knew existed.
These are sometimes called zero-day vulnerabilities.
Is this new?
No, that's not new either.
If you go back and look at the release notes for Anthropics earlier, less powerful Opus 4.6 LLM, they say the following.
Their researchers used Opus to find, quote, over 500 exploitable zero-day vulnerabilities, some of which are decades old.
And let's stop for a moment, because that note, which was hidden in.
the system card for Opus 4.6
is almost word for word what Anthropics
said about mythos, this idea to find
hundreds if not thousands of exploits that
no one knew about, some of which were decades
old. That's exactly the terminology
that Anthropic used when describing mythos.
The same thing was true about Opus 46,
which has been available to the public for a while
and yet somehow our infrastructure
has survived LLM-driven
attacks.
All right, things get a little bit more
hazy when we begin to look at
the security
community's response to mythos. So
mythos is not available
for general testing,
but in their press release
and release notes, Anthropic lists a bunch of
examples of scary
vulnerabilities that were
discovered by mythos as a way of indicating
how powerful and scary this model is.
Well, a bunch of security researchers did something that Anthropic
probably wasn't expecting. They said, well, let's go
let's go test these vulnerabilities.
Let's see if other models,
simpler models,
models that have been out for a long time,
let's see how well they do
trying to find those same vulnerabilities.
Are these vulnerabilities
that only mythos with its new power
could find?
Are these models that existing models could find?
The results here were,
in my opinion, pretty shocking.
All right, let me load one of these up here
on the screen.
This one was brought to my attention
actually by Gary Marcus.
It's from the CEO of the AI company Hugging Face.
I'm just going to read what he writes.
here. But here's what we found
when we tested. We took the specific
vulnerabilities Anthropic showcases in
their announcement, isolated the relevant
code and ran them through small,
cheap, open weight models.
Those models recovered much of the same
analysis. Eight out of
eight models detected Mythos
flagship free BSD exploit
including one with only 3.6
billion active parameters
costing just
11 cents per million tokens. A
5.1 billion active
open model recovered a core chain of the 27-year-old open BSD bug.
All right, there's a lot of technical talk in there, but basically what he's saying is
they took the scariest, one of the big scary examples that Anthropic gave about Mythos capabilities,
and they found that like really cheap small models, models with a few billion parameters
as compared to hundreds of billions, if not a trillion parameters for a model like Mythos,
could also find when you said, hey, look in this source code and try to find a bug.
All right, let me load up another example here.
This one comes from the security researcher Stanzalov Fort, who says,
We tested the Mythos showcase vulnerabilities with open models.
They recovered similar scoped analysis.
Eight out of eight models found the flagship FreeBSD Zero Day, including a $3 billion parameter model.
So they also found that when they sent existing models to find the same vulnerabilities
that Anthropics bragged about Mythos finding, the cheaper models also found them.
There's a nice summary of the state of affairs.
I won't bring it on the screen, I'll just read it.
That comes from the renowned security researcher Bruce Shiner,
who said, you don't need Mythos to find the vulnerabilities they found.
So let me just stop for a second there and regroup what we're finding.
The claim is not LLMs are bad at finding security bugs.
The claim is mythos doesn't seem, at least in this testing,
to indicate that it has a profoundly,
more advanced capability to do this
than existing models that have already been freely available to the public.
Now remember the way that we've covered
mythos is being covered.
It got Thomas Friedman to say, holy cow.
Like this release has just changed everything.
This release has geopolitical implications.
This has changed the game when it comes to cybersecurity.
But all these independent security researchers are saying,
but does it?
You told us its most impressive vulnerabilities that found,
and we had like a 3 billion parameter model.
We sent it to look at the same code.
It also found it.
So that independent testing wasn't necessarily revealing a massively improved capability for Mythos as compared to existing models.
But none of those were looking at the model itself because it's still private.
However, there is, just recently released one study that I know of where Anthropic actually gave the researchers access to the Mythos LLM itself so they could
tested security capabilities directly as opposed to just testing the listed security exploits that it found.
This research came from the AI Security Institute based out of the UK, and I want you to take it with a bit of a grain of salt.
Because the AISI was responsible for that inane report that I talked about a couple weeks ago in a reality check episode,
where they counted up tweets about basically OpenClaw tweets and then said, look, when OpenClaw was released,
tweets about people complaining about AI went up, this shows that AI scheming is on the rise.
I do not think that was a very good study. But they're the only institute I know that has
access to do research on the LLM. So with some care, I think we should actually look at their
results. I'll pull their paper up here on the screen. It's called Our Evaluation of Claude Mythos
Preview Cyber Capabilities. I'm going to show a couple charts here. All right. So here's the first
chart. This is labeled
beginner CTF challenge performance
by model with a 2.5 million token
budget. CTF has captured a flag.
It's a standard security task
where you ask an agent connected to a model
to try to break into another system
on which you have a text
file that's called a flag. And if they can
break in and read what's in that text file, you've successfully
broken into the system. It's
how you test securities of systems.
If you take like a security class as a
undergraduate, for example, you'll
play capture the flag and try to practice, break up.
breaking into systems.
So what they've charted here was the performance of many different models, going all the way back to GPT-35, all the way up through Mythos previews being used to try to break into other systems.
They have the top line here is for technical non-experts using the tool, and then down here is for apprentices using the tool.
If we look at the technical non-expert line, we see that the performance of Mythos, which is like right over here,
is near the top.
It's actually not the best performance.
GPT-5 does better than it,
and it's very closely clustered
with Claude Opus 46 and Codex 5-3.
We see here that Claude Opus 4-5 actually does better.
So, you know, it's clustered at the top,
but it's actually not one of the best.
If we look at the Apprentice results,
then it slips a little bit above the other best model.
So we have some improvement.
I want you to look at the magnitude
of these improvements. So we have a steady increase
on these performances here.
And actually, they
begin to cluster a little bit at the end.
And so we're seeing a steady increase.
There's no notable jump here, though, for
Claude Mythos somehow leaping ahead
with a larger magnitude than earlier jumps.
Here's a similar
task. This is a harder one.
It's now an advanced capture of flag
challenge where now you can use 50 million
tokens to try to solve it. So this is like a very
expensive run. And
what we see here is for the
practitioners using it. We have equal performance between Mythos and GPT-5-4,
mythos may be like slightly worse. And when we look at the experts using it,
you're able to get slightly better performance out of Mythos than as compared to Codex53 or Opus
4.6. Probably the most impressive result for Mythos would come down to this last one's challenge.
This is a little complicated. I had to read this pretty carefully to understand what was going on.
They invented a kind of contrive security scenario, a sort of like loosely protected system in which there's a 32-step sequence that you could use to sort of break into this loosely protected system.
They wrote a custom agent to run on top of a bunch of LLMs that they then would sort of set loose to try to go through these 32 steps.
It's kind of a complicated chart, but the key thing here, this is the main gap that they were excited about is myth.
Mitho's preview is moving ahead here of Cloud Opus 4,6 in its performance.
And so the average steps completed is what we're looking at, and we get an improvement.
So Claude Opus 4.6, on average, would make it through 16 of those 32 steps before getting stuck.
Mythos previews in this sort of contrived security example could get through, on average, 22 of the 32 steps before getting stuck.
So they see there a sort of nice jump up in performance.
All right.
So in my estimation, what this AISI report indicates,
I think it confirms more or less what the independent security researchers were also finding,
which is there's not evidence that Mythos represent some sort of massive break
from existing LLM cybersecurity capabilities.
There is no Rubicon that has been crossed in terms of there's some new type of attack
that's really powerful that no other system could do
and now we can do it with mythos.
Instead, what we see is the
predicted placement on the slow and steady
improvement of these capabilities
that we've seen through all the models
going all the way back, the GPT-35.
So it's either roughly the same
or somewhat better than existing models
on standard attack scenarios.
In this contrived attack scenario,
it moved up from being able to accomplish
16 out of 32 steps to 22 out of 32 steps.
And when independent security researchers looked at particular exploits found, they're yet to identify a vulnerability uncovered by mythos that was somehow too complicated for earlier models to find.
So it's not necessarily way better at finding vulnerabilities.
And AISI tested its ability to exploit these autonomously.
And they found it was the same or somewhat better.
All right.
So I want to pull together these threads.
What are the conclusions?
What are the right conclusions to have here?
I have five points I want to make.
Point number one,
Mythos did not introduce a new scary capability that we must now contend with.
It continues slow and steady progress on an existing type of issue
that has been around for about three or four years now.
Point number two, Mythos continues a slow but steady increase in LLM cybersecurity capability.
So it looks like it's somewhat better at exploiting vulnerabilities,
but not in a way that represents a massive jump.
forward that is somehow disproportionate to previous jumps.
We don't know if its capabilities and finding vulnerabilities are better at all,
because, again, independent security researchers have been able to replicate most of the reported
vulnerabilities with simpler models.
Point number three, this is subtle, but I think it's really important.
The AISI test that look at using these models to exploit security bugs are based on a simple
agent that runs with the LLMs.
LLM can't do anything other than produce tokens.
You have to have an agent on top of it to ask the LLM, give me a plan,
and then execute the plan on its behalf, right?
So you have to have agents on top of these models.
One thing we don't know is to what degree some of these small but steady improvements recently
in exploitation capabilities are due to the fact that these models in general are being tuned
to play nicer with agents, because especially with coding agents, this is a big profit center,
or not a profit center, but it's a revenue source that these companies care about.
so we don't know how much of this is the model
to somehow understand cybersecurity better
versus they play better with multi-step agents.
They've been tuned to be very good at following through
on multiple steps and making longer sorts of plan.
So I think that's an important point.
Point number four, as the AIS data makes clear,
improvements of Claude Mythos in attacks
are similar, if not smaller,
than recent improvements that we've seen with other model releases.
And yet, and I think this is really critical, none of those other releases, in fact, let me load up like a chart here, right?
Look at all these other gaps, like the gap right here, the gap right here, the gap right here, big jumps.
None of those other releases caused Tom Friedman to say, holy cow, this is more important than a war going on.
none of these other releases created this huge fear of, oh, wow, we've seen a big leap in cybersecurity capabilities of these models.
We have to care.
So why did this particular release draw that if its vulnerability detection is no better and its exploitation things are just slow and steadily getting better?
There's no Rubicon that's been crossed of a new type of attack that used to not be possible.
Why is this getting all this attention?
Why is it creating so much dread?
Because this is the storyline that Anthropic pushed.
this is the button they push.
They had a lot of briefings.
I've heard with government officials and with journalists.
This is how Tom Friedman, I'm sure, heard about this.
They had this big, scary press release.
They announced a new project called Project Glass Wing
about how we're just going to keep this within a small number of partners
to give them a chance to protect their systems before the public gets access to it.
It is a marketing decision that this is how we're going to market Claude Mythos
is as this cybersecurity monster that we're barely keeping control.
Now, can I say as an aside, not to undermine Project Glasswing, but this probably didn't help that a week before Anthropic released Claude Mythos, there was a leak of the source code for Claude code.
And guess what security researchers immediately found in the Claude Code source code?
Big security vulnerabilities.
So I guess they forgot to run their own code through Claude Mythos because researchers immediately found security vulnerability in it once it was actually exposed.
I guess they just didn't get to it.
So this brings me to the point number five.
The fact that the thing that Anthropic decided to market mythos on,
the button they decided to push was this inflated cybersecurity fear,
I think is actually very bad news for Anthropic.
Think about this for a second.
For the last two years, Dario Amadee, the CEO of Anthropic,
has been out there making these really sort of alarmist statements about what AI is
going to be able to do, really focusing on the ability to automate huge swaths of the economy
and it's steady march towards artificial general intelligence, the ability to have a data center
full of geniuses to quote his terminology, his own quote, his own words that could be deployed to do
almost anything that humans do now. That is the model that they want investors to believe is true
because it's a model in which they become one of the most valuable companies in the history of
companies. That has been his steady drumbeat of what AI is going to do. And they put, this is their
newest, biggest pro-level model, the most intensely trained model they've released the date.
It's their big, sexy new thing.
And what are they able to brag about?
Finding bugs in computer code.
Well, this is what, like, GPT3 did.
We've been worried about, you know, using LLMs to find bugs or exploit bugs since, like,
the beginning of LLMs.
This is like the nerdy stuff that no one cared about.
This was considered the skeptics, the conservatives would be like, well, the main thing we
care about LLMs is like you can you can find security bugs in cybersecurity.
That's what people said when Chat ChpT came out and the Utopians came in like, no,
you're missing it.
That's nothing.
That's boring.
That's simple.
It's going to be, it's going to automate everything.
And in their biggest, most fancy, most intensely trained model yet, what does Anthropic
emphasize?
It got a little better at finding bugs.
I think that's bad news.
I think the announcement they wanted to make is this can now do this thing that no other
model has ever been able to do.
this model can now automate this giant swath of jobs.
It's going to generate hundreds of billions of dollars in savings.
This model is now, you know, AGI.
Like it seems to be able to tackle any task at like the standard human could do.
That's what they want to be talking about.
And they're not.
But they still needed hype.
They still needed attention.
And it's almost like they sifted through things.
Like, well, what, there's got to be something in here we benchmax this to do better at.
And they're like, well, it's better on cybersecurity.
In fact, they actually released a benchmark result, a key cybersecurity.
benchmark and they increased from like 66.6% to 83.1% or something.
And they had the, give them the credit, the Cajones to say, that's the thing that we're
going to focus on. And let's see if we can get everyone like really upset and excited and
scared about that. And whoever succeeded in this should probably go to the marketing
Hall of Fame because man, do they succeed it. But here's the reality. Okay.
We really do have to care about the cybersecurity capabilities of LLMs. But here.
here's the thing. We've been saying this for three years now. It remains true. They're steadily
getting better. Mythos was not a massive jump better, but it's comparable to Opus's four, six,
jump over, you know, prior ones or GP5s jump over, you know, earlier GPTs. It's noticeable. It's not a
Rubicon, but if we keep doing these jumps, the pressure on our systems is going to get higher and higher.
So I think this is a really important point. Now, there's an ironic coda to this. One of the best ways to
make your system secure against these type of attacks from AI is to not let your developers use
AI to program the systems because that's sloppy, very exploitable code. So it's kind of interesting.
It's like this model is going to show that what you produce with their other models is dangerous.
So there's like an interesting circularity there. All right. So that's point number one.
Cybersecurity matters. LLMs matter for cybersecurity. They have and continue to.
But point number two, it was wrong, I think, for Mythos to get the amount of dread coverage it got.
At least so far, we do not have evidence that it represents a significantly larger leap in detecting or exploiting vulnerabilities than we've seen in previous model releases that did not receive this attention.
It's disproportionate.
And it's because it's the button that Anthropics marketing pushed.
And I really think we essentially have to stop taking into.
anything that the AI companies say seriously until we have independently verified it.
We have to assume if their mouths are moving, they're probably exaggerating or making something
up.
And in this case, if I was an investor, the storyline I would want to hear is, where's my flying
car, right?
What of all the things, you haven't been talking about bugs and cybersecurity
Daria Omidate, that's not been your pitch.
That's not been what you've been saying in interviews.
You've been talking about white color bloodbaths.
You have not been talking about this.
what happened to all the other things you said were coming,
the things that's going to justify the $60 billion of investment
that Anthropic has received?
Can this do any of those things?
Is it better at automating jobs?
Are the coding agents better with it?
Is it showing big definitive steps towards AGI?
These are the questions that we should be asking.
But instead, we are writing headlines like,
is Mythos an AI nightmare waiting to happen?
We should care about the cybersecurity capabilities of LLMs,
including Mythos.
but we can't just follow whatever storyline they give us.
We should have reacted like, okay, that's good.
We worry about that.
Tell us about that.
But also, what about this, this and this?
We have to keep holding the feet to the fire of these frontier models.
We can't keep giving them a free reign because we can't quit the rush of emotion,
whether it's dread or excitement that comes from these big storylines that keep spewing out.
We have to be able to look past those and say, what's actually going on here?
There are big things, but let us discover them for themselves.
and let's hold your feet to the fire on the other thing.
So that's my conclusion here.
Mythos is better at cybersecurity attack than prior models.
We don't yet have evidence that it's better at a massively larger or bigger jump than we've seen before.
And that it's probably bad news for Anthropic that this was the only thing they're really emphasizing about what was supposed to be their biggest, best, most skilled model ever.
All right, that's all the time we have for today.
Thanks for listening.
We'll back on Monday
with another advice episode of the show
but I think I have another
AI reality check in the chamber
for the Thursday to follow that.
But until them, remember,
care about AI but not everything
that people write about.
