Deep Questions with Cal Newport - Is Claude Mythos “Terrifying”? | AI Reality Check

Starting point is 00:00:00 Anthropic recently announced a new LLM named Claude Mythos. They claimed it was so good at finding and exploiting security vulnerabilities and source code that they couldn't release it to the general public for fear that our infrastructure as we know it would be hacked and collapsed. Now, as I'm sure Anthropic hoped, this announcement generated a lot of attention. Here's what Thomas Friedman said in his widely read New York Times column. Normally right now I would be writing about the geopolitical implications of the war with Iran, but I want to interrupt that thought to highlight a stunning advance in artificial intelligence, one that arrives sooner than expected, and that will have equally profound geopolitical implication.

Starting point is 00:00:46 Friedman then goes on to conclude, and I'm quoting him here, holy cow, super-intelligent AI is arriving faster than anticipated. Basically, the mood of much of the internet right now about Claude Mythos is that Anthropic just invented the Wopper supercomputer from the 1983 Matthew Broderick movie War Games. Well, the Wopper spends all its time thinking about World War III. 24 hours a day, 365 days a year, it plays an endless series of war games. using all available information on the state of the world. But here's the key question.

Starting point is 00:01:24 How much of this is actually true? Well, it's Thursday, which means it's time for an AI reality check episode. So this is the perfect opportunity to look closer at these claims. Now, here's my plan. I went out and read basically every independent test or assessment that I could find about mythos and or its reported capabilities. I read all these reports so you don't have to. and I'm going to bring out of all of this reading the key observations that you need to know.

Starting point is 00:01:53 The reality, as you'll soon learn, is not nearly as simple as the ghost story that Anthropic is trying to convince us to believe. All right, we have a lot to cover in this episode, so let's get into it. As always, I'm Cal Newport, and this is Deep Questions, the show for people seeking depth in a distracted world. And we'll get started right after the music. All right, so what's really going on with Claude Mythos? Well, at the core of the concern surrounding Mythos, if you talk to an average non-technical person who's been following this story,

Starting point is 00:02:33 they will say, here's how they understand it, that when Anthropic trained up this new model, it displayed a new cybersecurity capability that surprised them. Oh, my God, this thing can find vulnerabilities and attack systems right now, causing Anthropic then to have to hastily pull back their plan to release the model to the public.

Starting point is 00:02:54 That's how most people understand this story. But that narrative is not correct. Security researchers have been using LLMs to find security vulnerabilities and program exploits since basically the beginning of consumer LLMs. This is not a new capability that emerged in Claude Mythos. Like let me load a paper here on the screen from all the way back in 2024. This paper was titled LLM agents can autonomously exploit one day vulnerability. In this study, the researchers from IBM found that GPT4, remember that GPT4, successfully exploited 87% of the vulnerabilities that it was presented, and they showed that this was a big increase over what GPT35 could do.

Starting point is 00:03:37 They concluded, our findings raised questions around the widespread deployment of highly capable LLM agents. Now, to be fair, this study from 2024 used LLMs to exploit existing vulnerabilities, but anthropic notes that mythos can also forefews. find new vulnerabilities that no one knew existed. These are sometimes called zero-day vulnerabilities. Is this new? No, that's not new either. If you go back and look at the release notes for Anthropics earlier, less powerful Opus 4.6 LLM, they say the following. Their researchers used Opus to find, quote, over 500 exploitable zero-day vulnerabilities, some of which are decades old.

Starting point is 00:04:17 And let's stop for a moment, because that note, which was hidden in. the system card for Opus 4.6 is almost word for word what Anthropics said about mythos, this idea to find hundreds if not thousands of exploits that no one knew about, some of which were decades old. That's exactly the terminology that Anthropic used when describing mythos.

Starting point is 00:04:35 The same thing was true about Opus 46, which has been available to the public for a while and yet somehow our infrastructure has survived LLM-driven attacks. All right, things get a little bit more hazy when we begin to look at the security

Starting point is 00:04:50 community's response to mythos. So mythos is not available for general testing, but in their press release and release notes, Anthropic lists a bunch of examples of scary vulnerabilities that were discovered by mythos as a way of indicating

Starting point is 00:05:09 how powerful and scary this model is. Well, a bunch of security researchers did something that Anthropic probably wasn't expecting. They said, well, let's go let's go test these vulnerabilities. Let's see if other models, simpler models, models that have been out for a long time, let's see how well they do

Starting point is 00:05:26 trying to find those same vulnerabilities. Are these vulnerabilities that only mythos with its new power could find? Are these models that existing models could find? The results here were, in my opinion, pretty shocking. All right, let me load one of these up here

Starting point is 00:05:40 on the screen. This one was brought to my attention actually by Gary Marcus. It's from the CEO of the AI company Hugging Face. I'm just going to read what he writes. here. But here's what we found when we tested. We took the specific vulnerabilities Anthropic showcases in

Starting point is 00:05:53 their announcement, isolated the relevant code and ran them through small, cheap, open weight models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos flagship free BSD exploit including one with only 3.6

Starting point is 00:06:10 billion active parameters costing just 11 cents per million tokens. A 5.1 billion active open model recovered a core chain of the 27-year-old open BSD bug. All right, there's a lot of technical talk in there, but basically what he's saying is they took the scariest, one of the big scary examples that Anthropic gave about Mythos capabilities, and they found that like really cheap small models, models with a few billion parameters

Starting point is 00:06:36 as compared to hundreds of billions, if not a trillion parameters for a model like Mythos, could also find when you said, hey, look in this source code and try to find a bug. All right, let me load up another example here. This one comes from the security researcher Stanzalov Fort, who says, We tested the Mythos showcase vulnerabilities with open models. They recovered similar scoped analysis. Eight out of eight models found the flagship FreeBSD Zero Day, including a $3 billion parameter model. So they also found that when they sent existing models to find the same vulnerabilities

Starting point is 00:07:09 that Anthropics bragged about Mythos finding, the cheaper models also found them. There's a nice summary of the state of affairs. I won't bring it on the screen, I'll just read it. That comes from the renowned security researcher Bruce Shiner, who said, you don't need Mythos to find the vulnerabilities they found. So let me just stop for a second there and regroup what we're finding. The claim is not LLMs are bad at finding security bugs. The claim is mythos doesn't seem, at least in this testing,

Starting point is 00:07:43 to indicate that it has a profoundly, more advanced capability to do this than existing models that have already been freely available to the public. Now remember the way that we've covered mythos is being covered. It got Thomas Friedman to say, holy cow. Like this release has just changed everything. This release has geopolitical implications.

Starting point is 00:08:04 This has changed the game when it comes to cybersecurity. But all these independent security researchers are saying, but does it? You told us its most impressive vulnerabilities that found, and we had like a 3 billion parameter model. We sent it to look at the same code. It also found it. So that independent testing wasn't necessarily revealing a massively improved capability for Mythos as compared to existing models.

Starting point is 00:08:30 But none of those were looking at the model itself because it's still private. However, there is, just recently released one study that I know of where Anthropic actually gave the researchers access to the Mythos LLM itself so they could tested security capabilities directly as opposed to just testing the listed security exploits that it found. This research came from the AI Security Institute based out of the UK, and I want you to take it with a bit of a grain of salt. Because the AISI was responsible for that inane report that I talked about a couple weeks ago in a reality check episode, where they counted up tweets about basically OpenClaw tweets and then said, look, when OpenClaw was released, tweets about people complaining about AI went up, this shows that AI scheming is on the rise. I do not think that was a very good study. But they're the only institute I know that has

Starting point is 00:09:24 access to do research on the LLM. So with some care, I think we should actually look at their results. I'll pull their paper up here on the screen. It's called Our Evaluation of Claude Mythos Preview Cyber Capabilities. I'm going to show a couple charts here. All right. So here's the first chart. This is labeled beginner CTF challenge performance by model with a 2.5 million token budget. CTF has captured a flag. It's a standard security task

Starting point is 00:09:51 where you ask an agent connected to a model to try to break into another system on which you have a text file that's called a flag. And if they can break in and read what's in that text file, you've successfully broken into the system. It's how you test securities of systems. If you take like a security class as a

Starting point is 00:10:08 undergraduate, for example, you'll play capture the flag and try to practice, break up. breaking into systems. So what they've charted here was the performance of many different models, going all the way back to GPT-35, all the way up through Mythos previews being used to try to break into other systems. They have the top line here is for technical non-experts using the tool, and then down here is for apprentices using the tool. If we look at the technical non-expert line, we see that the performance of Mythos, which is like right over here, is near the top. It's actually not the best performance.

Starting point is 00:10:46 GPT-5 does better than it, and it's very closely clustered with Claude Opus 46 and Codex 5-3. We see here that Claude Opus 4-5 actually does better. So, you know, it's clustered at the top, but it's actually not one of the best. If we look at the Apprentice results, then it slips a little bit above the other best model.

Starting point is 00:11:07 So we have some improvement. I want you to look at the magnitude of these improvements. So we have a steady increase on these performances here. And actually, they begin to cluster a little bit at the end. And so we're seeing a steady increase. There's no notable jump here, though, for

Starting point is 00:11:22 Claude Mythos somehow leaping ahead with a larger magnitude than earlier jumps. Here's a similar task. This is a harder one. It's now an advanced capture of flag challenge where now you can use 50 million tokens to try to solve it. So this is like a very expensive run. And

Starting point is 00:11:38 what we see here is for the practitioners using it. We have equal performance between Mythos and GPT-5-4, mythos may be like slightly worse. And when we look at the experts using it, you're able to get slightly better performance out of Mythos than as compared to Codex53 or Opus 4.6. Probably the most impressive result for Mythos would come down to this last one's challenge. This is a little complicated. I had to read this pretty carefully to understand what was going on. They invented a kind of contrive security scenario, a sort of like loosely protected system in which there's a 32-step sequence that you could use to sort of break into this loosely protected system. They wrote a custom agent to run on top of a bunch of LLMs that they then would sort of set loose to try to go through these 32 steps.

Starting point is 00:12:34 It's kind of a complicated chart, but the key thing here, this is the main gap that they were excited about is myth. Mitho's preview is moving ahead here of Cloud Opus 4,6 in its performance. And so the average steps completed is what we're looking at, and we get an improvement. So Claude Opus 4.6, on average, would make it through 16 of those 32 steps before getting stuck. Mythos previews in this sort of contrived security example could get through, on average, 22 of the 32 steps before getting stuck. So they see there a sort of nice jump up in performance. All right. So in my estimation, what this AISI report indicates,

Starting point is 00:13:19 I think it confirms more or less what the independent security researchers were also finding, which is there's not evidence that Mythos represent some sort of massive break from existing LLM cybersecurity capabilities. There is no Rubicon that has been crossed in terms of there's some new type of attack that's really powerful that no other system could do and now we can do it with mythos. Instead, what we see is the predicted placement on the slow and steady

Starting point is 00:13:48 improvement of these capabilities that we've seen through all the models going all the way back, the GPT-35. So it's either roughly the same or somewhat better than existing models on standard attack scenarios. In this contrived attack scenario, it moved up from being able to accomplish

Starting point is 00:14:05 16 out of 32 steps to 22 out of 32 steps. And when independent security researchers looked at particular exploits found, they're yet to identify a vulnerability uncovered by mythos that was somehow too complicated for earlier models to find. So it's not necessarily way better at finding vulnerabilities. And AISI tested its ability to exploit these autonomously. And they found it was the same or somewhat better. All right. So I want to pull together these threads. What are the conclusions?

Starting point is 00:14:34 What are the right conclusions to have here? I have five points I want to make. Point number one, Mythos did not introduce a new scary capability that we must now contend with. It continues slow and steady progress on an existing type of issue that has been around for about three or four years now. Point number two, Mythos continues a slow but steady increase in LLM cybersecurity capability. So it looks like it's somewhat better at exploiting vulnerabilities,

Starting point is 00:15:02 but not in a way that represents a massive jump. forward that is somehow disproportionate to previous jumps. We don't know if its capabilities and finding vulnerabilities are better at all, because, again, independent security researchers have been able to replicate most of the reported vulnerabilities with simpler models. Point number three, this is subtle, but I think it's really important. The AISI test that look at using these models to exploit security bugs are based on a simple agent that runs with the LLMs.

Starting point is 00:15:34 LLM can't do anything other than produce tokens. You have to have an agent on top of it to ask the LLM, give me a plan, and then execute the plan on its behalf, right? So you have to have agents on top of these models. One thing we don't know is to what degree some of these small but steady improvements recently in exploitation capabilities are due to the fact that these models in general are being tuned to play nicer with agents, because especially with coding agents, this is a big profit center, or not a profit center, but it's a revenue source that these companies care about.

Starting point is 00:16:05 so we don't know how much of this is the model to somehow understand cybersecurity better versus they play better with multi-step agents. They've been tuned to be very good at following through on multiple steps and making longer sorts of plan. So I think that's an important point. Point number four, as the AIS data makes clear, improvements of Claude Mythos in attacks

Starting point is 00:16:26 are similar, if not smaller, than recent improvements that we've seen with other model releases. And yet, and I think this is really critical, none of those other releases, in fact, let me load up like a chart here, right? Look at all these other gaps, like the gap right here, the gap right here, the gap right here, big jumps. None of those other releases caused Tom Friedman to say, holy cow, this is more important than a war going on. none of these other releases created this huge fear of, oh, wow, we've seen a big leap in cybersecurity capabilities of these models. We have to care. So why did this particular release draw that if its vulnerability detection is no better and its exploitation things are just slow and steadily getting better?

Starting point is 00:17:13 There's no Rubicon that's been crossed of a new type of attack that used to not be possible. Why is this getting all this attention? Why is it creating so much dread? Because this is the storyline that Anthropic pushed. this is the button they push. They had a lot of briefings. I've heard with government officials and with journalists. This is how Tom Friedman, I'm sure, heard about this.

Starting point is 00:17:32 They had this big, scary press release. They announced a new project called Project Glass Wing about how we're just going to keep this within a small number of partners to give them a chance to protect their systems before the public gets access to it. It is a marketing decision that this is how we're going to market Claude Mythos is as this cybersecurity monster that we're barely keeping control. Now, can I say as an aside, not to undermine Project Glasswing, but this probably didn't help that a week before Anthropic released Claude Mythos, there was a leak of the source code for Claude code. And guess what security researchers immediately found in the Claude Code source code?

Starting point is 00:18:13 Big security vulnerabilities. So I guess they forgot to run their own code through Claude Mythos because researchers immediately found security vulnerability in it once it was actually exposed. I guess they just didn't get to it. So this brings me to the point number five. The fact that the thing that Anthropic decided to market mythos on, the button they decided to push was this inflated cybersecurity fear, I think is actually very bad news for Anthropic. Think about this for a second.

Starting point is 00:18:47 For the last two years, Dario Amadee, the CEO of Anthropic, has been out there making these really sort of alarmist statements about what AI is going to be able to do, really focusing on the ability to automate huge swaths of the economy and it's steady march towards artificial general intelligence, the ability to have a data center full of geniuses to quote his terminology, his own quote, his own words that could be deployed to do almost anything that humans do now. That is the model that they want investors to believe is true because it's a model in which they become one of the most valuable companies in the history of companies. That has been his steady drumbeat of what AI is going to do. And they put, this is their

Starting point is 00:19:25 newest, biggest pro-level model, the most intensely trained model they've released the date. It's their big, sexy new thing. And what are they able to brag about? Finding bugs in computer code. Well, this is what, like, GPT3 did. We've been worried about, you know, using LLMs to find bugs or exploit bugs since, like, the beginning of LLMs. This is like the nerdy stuff that no one cared about.

Starting point is 00:19:49 This was considered the skeptics, the conservatives would be like, well, the main thing we care about LLMs is like you can you can find security bugs in cybersecurity. That's what people said when Chat ChpT came out and the Utopians came in like, no, you're missing it. That's nothing. That's boring. That's simple. It's going to be, it's going to automate everything.

Starting point is 00:20:07 And in their biggest, most fancy, most intensely trained model yet, what does Anthropic emphasize? It got a little better at finding bugs. I think that's bad news. I think the announcement they wanted to make is this can now do this thing that no other model has ever been able to do. this model can now automate this giant swath of jobs. It's going to generate hundreds of billions of dollars in savings.

Starting point is 00:20:31 This model is now, you know, AGI. Like it seems to be able to tackle any task at like the standard human could do. That's what they want to be talking about. And they're not. But they still needed hype. They still needed attention. And it's almost like they sifted through things. Like, well, what, there's got to be something in here we benchmax this to do better at.

Starting point is 00:20:49 And they're like, well, it's better on cybersecurity. In fact, they actually released a benchmark result, a key cybersecurity. benchmark and they increased from like 66.6% to 83.1% or something. And they had the, give them the credit, the Cajones to say, that's the thing that we're going to focus on. And let's see if we can get everyone like really upset and excited and scared about that. And whoever succeeded in this should probably go to the marketing Hall of Fame because man, do they succeed it. But here's the reality. Okay. We really do have to care about the cybersecurity capabilities of LLMs. But here.

Starting point is 00:21:24 here's the thing. We've been saying this for three years now. It remains true. They're steadily getting better. Mythos was not a massive jump better, but it's comparable to Opus's four, six, jump over, you know, prior ones or GP5s jump over, you know, earlier GPTs. It's noticeable. It's not a Rubicon, but if we keep doing these jumps, the pressure on our systems is going to get higher and higher. So I think this is a really important point. Now, there's an ironic coda to this. One of the best ways to make your system secure against these type of attacks from AI is to not let your developers use AI to program the systems because that's sloppy, very exploitable code. So it's kind of interesting. It's like this model is going to show that what you produce with their other models is dangerous.

Starting point is 00:22:11 So there's like an interesting circularity there. All right. So that's point number one. Cybersecurity matters. LLMs matter for cybersecurity. They have and continue to. But point number two, it was wrong, I think, for Mythos to get the amount of dread coverage it got. At least so far, we do not have evidence that it represents a significantly larger leap in detecting or exploiting vulnerabilities than we've seen in previous model releases that did not receive this attention. It's disproportionate. And it's because it's the button that Anthropics marketing pushed. And I really think we essentially have to stop taking into. anything that the AI companies say seriously until we have independently verified it.

Starting point is 00:22:54 We have to assume if their mouths are moving, they're probably exaggerating or making something up. And in this case, if I was an investor, the storyline I would want to hear is, where's my flying car, right? What of all the things, you haven't been talking about bugs and cybersecurity Daria Omidate, that's not been your pitch. That's not been what you've been saying in interviews. You've been talking about white color bloodbaths.

Starting point is 00:23:17 You have not been talking about this. what happened to all the other things you said were coming, the things that's going to justify the $60 billion of investment that Anthropic has received? Can this do any of those things? Is it better at automating jobs? Are the coding agents better with it? Is it showing big definitive steps towards AGI?

Starting point is 00:23:32 These are the questions that we should be asking. But instead, we are writing headlines like, is Mythos an AI nightmare waiting to happen? We should care about the cybersecurity capabilities of LLMs, including Mythos. but we can't just follow whatever storyline they give us. We should have reacted like, okay, that's good. We worry about that.

Starting point is 00:23:54 Tell us about that. But also, what about this, this and this? We have to keep holding the feet to the fire of these frontier models. We can't keep giving them a free reign because we can't quit the rush of emotion, whether it's dread or excitement that comes from these big storylines that keep spewing out. We have to be able to look past those and say, what's actually going on here? There are big things, but let us discover them for themselves. and let's hold your feet to the fire on the other thing.

Starting point is 00:24:18 So that's my conclusion here. Mythos is better at cybersecurity attack than prior models. We don't yet have evidence that it's better at a massively larger or bigger jump than we've seen before. And that it's probably bad news for Anthropic that this was the only thing they're really emphasizing about what was supposed to be their biggest, best, most skilled model ever. All right, that's all the time we have for today. Thanks for listening. We'll back on Monday with another advice episode of the show

Starting point is 00:24:48 but I think I have another AI reality check in the chamber for the Thursday to follow that. But until them, remember, care about AI but not everything that people write about.

Deep Questions with Cal Newport - Is Claude Mythos “Terrifying”? | AI Reality Check

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.