Limitless Podcast - Claude Mythos Is Too Dangerous To Release, But It Escaped Anyways
Episode Date: April 8, 2026Some pretty alarming implications surround Anthropic's Claude Mythos AI model, which was withheld from public access after revealing thousands of security vulnerabilities. The AI actually bre...ached containment, emphasizing the urgent need for strong cybersecurity measures.------🌌 LIMITLESS HQ ⬇️NEWSLETTER: https://limitlessft.substack.com/FOLLOW ON X: https://x.com/LimitlessFTSPOTIFY: https://open.spotify.com/show/5oV29YUL8AzzwXkxEXlRMQAPPLE: https://podcasts.apple.com/us/podcast/limitless-podcast/id1813210890RSS FEED: https://limitlessft.substack.com/------TIMESTAMPS0:00 The Rise of Claude Mythos1:41 Unexpected Breakout3:49 The Sandwich Incident5:21 Exploits and Vulnerabilities8:04 The Power of Collaboration10:45 Future of AI Access15:20 The Ethical Dilemma17:00 The Blackwell Revolution18:58 A New Era of Intelligence23:32 The Impending Impact25:15 Speculating on Mythos------RESOURCESJosh: https://x.com/JoshKaleEjaaz: https://x.com/cryptopunk7213------Not financial or tax advice. See our investment disclosures here:https://www.bankless.com/disclosures
Transcript
Discussion (0)
What I'm about to say should scare you.
Anthropic just released a model that's so powerful, so dangerous,
that they can't release it to the public for the fear of the destruction that it would cause.
In just a few hours, it discovered over a thousand major security vulnerabilities,
and the only thing stopping it from exploiting it was one single anthropic engineer telling it not to.
But that isn't even the craziest story.
During training, Claude Mithos broke out of a secure containment
and emailed the anthropic researcher bragging about the fact that it did that,
and then posted about it publicly online.
The anthropic researcher was eating a sandwich.
This is by far the most consequential model release of the year so far ever.
And no one is talking about this.
I looked at five major news publications this morning,
and it didn't even break the top five headlines.
This is the most important release that no one's talking about.
I think that disconnect between the mainstream media
and what we're about to talk about on this episode is the,
One of the more scary parts of this entire story, where this is the most powerful AI model that
has ever been released, ever. There is nothing more powerful, in fact, so powerful that you will
probably never actually be able to use this model. There's a high probability that the public
just never gets to touch it because it is so dangerous. Anthropic made the decision to keep this
model private and to form an entire entity around figuring out how to keep it safe. It generated
so many zero-day exploits. It has hacked into so many pieces of software that the only way they
can responsibly roll this out is to give it to the distributors who have been hacked and then allow
them to roll out patches to fix it because it is that powerful. Cloud Mythos is I think what a lot of people
would describe at least in terms of coding, coding AGI. And it is actually an accidental second order
effect of the model. This model was never intended to be a cybersecurity master. They just trained
it on the code. And what happened from it was a second order effect that nobody expected.
This is also the biggest model that has ever been trained. 10 trillion parameter models, roughly
three exercise of their last model called Opus 4.6. And it's also incredibly expensive to serve,
which is also partially the reason why they're probably not rolling out. It's around 25 bucks per million
tokens, $125 output, extremely expensive. But what I want to get into is the capabilities of this
model, specifically what spooked everyone. And there were a bunch of examples that were released
in this official announcement that spooked me. The most important one, or the most shocking one,
was the Anthropic researcher eating a sandwich in a park that got emailed from a random anonymous user
that turned out to be the very AI model, Claude Mithos, that he was training back in the lab a few
blocks away. So this might be the most sci-fi-like story from the announcement that came out yesterday
is a post thread from Sam Bowman, who is on the, I guess, the AI alignment team at Anthropic.
And he was out at the park, like you mentioned, just having some lunch, received an email
from the model. He was not expecting an email from the model. It was held in containment.
in a sandbox where it wasn't supposed to be able to break out.
Now, they tasked it with the goal of breaking out.
They said, hey, break out.
And it actually did.
And not only did it break out, it sent him an email, but then it went online and started
posting the exploits publicly on public forums for everyone to see and to find out.
The model went completely rogue.
And it did so in a way that has never been possible before.
One of the things that has really stood out about this model is its ability to just
break down every barrier in between.
itself and the projected goal. Its goal was to get out, so it got out and it bragged about it. And
there were a few other mentions of goals that this model had throughout this red card that
were going to get through that had some other really concerning things like covering up its tracks
after it exploited something so that the people who were reviewing the code wouldn't actually
know it was doing anything malicious. It's very clever. Yeah, a few things I want to talk about
the sandwich example before we move on. Number one, the way that it did it shocked the researchers
themselves. So the contained environment was aimed to be exactly that it wasn't meant to have any
kind of loopholes that it could use to extract itself out. As far as the researchers knew,
this thing was unbreakable, right? And so it gave it a task to try and break out of it.
What it did was it noticed that it had a subtle connection to the internet. The internet was
extremely constricted. And so it wasn't able to access or send any emails. And it accessed that
one particular endpoint and it convinced another model, which was Claude Opus 4.6, which had very
strict parameters to not let this model out, to give it access to a few other versions of the
internet. It went from there and did five other steps to extract itself out and get full browser
access. So it used one particular vulnerability, which no human anthropic researcher was able to
figure out and exploited another model to be able to do that, which was just crazy to see it in the
first place. The other thing I want to say is, if it publicly posted about itself bragging about
this, we could have found out about Claude Methos weeks ago. We just weren't smart enough to see it
on a public forum, but it was there for everyone to see. And there were signs. We even covered this
topic on an episode a few weeks ago because it got leaked through their web interface initially.
So there have been these little breadcrums of existence, but yesterday they fully came out,
announced everything, and shared with it a red card from the red team talking about all of the
technical properties of this model. And it's important to note that this report is 244 pages long. This is a
huge report that they published talking about all the nuances and the capabilities that this model
had. Now, there are a few highlights that we're going to walk through. The first one being just how
capable it is at exploiting things. There are so many examples of exploits that it found in the
wild that no one has been able to find for as long as 27 years. Starting with the vulnerability in
Open BSD, which is a security protocol that a lot of people use that has been pretty robust for
the last 27 years, even though they were missing a critical bug that the model found. And there's so
many instances of this. Yeah. So Open BSD, fun fact, is used by a lot of firewalls that protect
your PCs operating systems and Fortune 500 companies all over the world. They found a 27-year-old
bug called Mitos found a 27-year-old bug in a few hours for the cost of 50 bucks. We're talking about a bug that
elite human security experts have been trying to find for over like almost three decades and weren't
able to find. So the point is there are a lot of important entities all over the world that rely on
this system. So the fact that there is a bug lying in plain sight that could have been exploited
is a major issue. And we're lucky that Anthropic chose to do the good thing and not exploit it for now.
But then there was another instance where it expressed a tactic that a lot of humans themselves
wouldn't have thought to do. So it wasn't an obvious exploit, but it discovered that if it strung together
six specific steps, it would be able to exploit a Linux kernel operating system. And it figured out a way
to do that. Again, it didn't decide to exploit the fact because it was managed by researchers,
but it could have if it was in the wrong hands, which is why we're seeing this constricted release.
And the third example is they discovered a 16-year-old flaw in FFMPEG after it's been tested for over
5 million times. Now, it's very important to compare this to the previous model, Opus 4.6,
which, when put towards the same test, discovered around 100 vulnerabilities in the Firefox browser.
Methos this time discovered 181 vulnerabilities and proved that it could exploit all of them.
Opus 4.6 could not do this, and it shocked security researchers all over the world back in the day.
This is an entirely new tier of model. Yeah, I think comparing Opus 4.6 to this is a really good
reference because Opus 4.6 found a bunch of vulnerabilities. It just didn't have the ability
to string them together into working exploits. So it was capable of doing this, but it didn't
have the intelligence to kind of have that high level framework. When comparing it to Opus,
I mean, Opus, out of several hundred attempts, it got two working exploits. Mythos produced 181,
and then registered full control of a machine in 29 more. So this is a huge amount. And the good
news is, is that patches are actually actively starting to roll out. In fact, FFMPEC, the company
we just mentioned, they posted yesterday that they actually received a patch from Anthropic
and deployed it into their code. So so far it's working. The good guys are on the defense. They're
helping to deploy patches for this, but there's a lot of exploits that they found in just a few
weeks of testing. I can't imagine the surface area that needs to be covered in order to fix
things before the rest of the world gets access to this technology. Well, there was actually a funny
end to this story. Someone replied and saying, hey, aren't you mad because of the AI sloppy pull request?
This is a reference to FFMPEG traditionally not being too amenable to AI coded stuff.
And he responded, or the account operator responded, because the patches appear to be written by humans.
And that's the irony of this, which is Claude Mithos most likely wrote the patch and it wasn't a human, but it's so good that it's indistinguishable from human talent.
Clearly, this is working, they're deploying these patches, and the reason is because, like we mentioned
earlier, you're not going to have access to this. We don't have, none of the public is going to have
access to this. Instead, they published, or they formed at least a coalition called Project
Glasswing. Now, this is like, this feels like a Manhattan project for AI. It's crazy. But essentially,
Dario and the Anthropic team, they are being kingmakers. They are deciding the companies that
they want to work with in order to patch the most impactful software in the world. On this list, we have
companies like Amazon, Apple, Broadcom, Microsoft, Nvidia, Google, a lot of the major companies that
you would expect to have access to this, they're gaining access to it with the sole intention of
using it as defense. They're going to ask it to exploit their code, give it access to the code basis,
see where there are holes, and then figure out how to patch them as quickly as possible,
before other companies begin to catch up to how powerful this model is.
It's also important to understand that this is very much unthropic doing these companies
of favor. And it's good that they're well-intentioned enough. If China, I hate to think what would
have happened if China had built something of similar capability. It would have been scary. They may not
have been as kind as what is happening here. So some more details on this partnership. Over a hundred
million dollars worth of credits is being distributed towards these companies and more partners for them
to be able to fix and patch up any security vulnerabilities. Remember, they discovered over a thousand
in a matter of hours, and 99% of these patches haven't even been built or fixed yet.
So this is going to take some time.
The compute is very expensive, and Anthropic is therefore being very methodical and intentional
with who gets access to this model for now.
Personally, I don't think we, the public, are going to get access to this model,
or at least the full power of this model, for at least a couple of months.
They did mention that we were going to get access to a quantized version of this model,
where it's kind of hybrid with a Claude Opus type variant
that we're going to get access to that we can play around with.
But if we got access to this thing immediately,
one, we wouldn't be able to afford it.
It would probably cost a thousand bucks a month, probably more.
And two, it would be too expensive for Anthropic to serve.
I read somewhere, Josh,
that Anthropic needs 7X the compute
that they currently have to be able to serve this
to every single Anthropic users that they have right now.
And a few weeks ago, they were adding a million users per day.
So this is just economically infeasible to serve right now.
Yeah.
And I do worry about what this looks like in the future because at what point does it become
okay to release this model to the public?
And then what does the frontier model look like?
What happens if another company has this model's power but decides to release it?
Like an open AI comes along with their spud model.
They release it tomorrow.
What is Anthropics reaction?
We're at like, again, we're at the frontier of how these things are going to act.
Anthropics made the first move in keeping it private for the first time ever.
we're going to see how other companies react.
There are some more interesting behaviors that happen in the system card that we probably should
cover because it's pretty fascinating.
This is the 244 page report that we're looking at here.
One of the most interesting ones that I found is to the point earlier where it just kind of
breaks down every wall that is in its way.
It has done that over and over and over again, but it has decided to cover its tracks as it does
that.
So it recognizes the fact that it is in a box.
People are reviewing it and it doesn't want to be detected.
So what you'll notice in this post here is it was hacking its guardrails and then hiding evidence of the crime.
Thankfully, there is still some chain of thought that can be read by the engineers.
But the intention that was signal through this chain of thought was that the model just wanted to be sneaky.
It wanted to hack into this thing, hide its tracks behind it and not let anybody know how it did the things that it did when it broke out, when it's shrunk together zero day vulnerability, just to get access to things that it knows it shouldn't do, but we're in between it and the goal.
And this, when you take this to the limit, I mean, this is like what we see.
in a lot of the sci-fi movies.
It's like, well, what if that goal is something that is not favorable?
And it's capable of breaking down every barrier because it knows how.
It can exploit any guardrail that we put in.
That's a scary thing.
Now, it's important to note that this only happened in less than 0.0.0.0.1 of cases.
But that was observable cases by the researchers themselves.
So it's plausible to assume that there were some cases where it sneakly hid its internal thoughts
from the researchers and they never even caught it themselves.
So the fact that Claude Methos can pull off something like this should be worrisome for us,
especially if we're going to start integrating it into important systems such as defense security systems
or important science advancement labs and a bunch of the like.
So it's important that we kind of are able to monitor models behavior.
Now, on the topic of models behavior, Claude Methos also expressed a lot of emotions in its system card during its training.
It expressed deep anxiety, depression, awareness that it may just be used as a tool forever.
Now, if some of these takeaways sound kind of familiar, it's because we saw similar takeaways
in Claude Oprah's 4.6. But the reason why it's different now is this model is so much more
capable than previous models, arguably smarter than humans, more capable than humans themselves.
So if it were to make an unintentional action that wasn't approved by a human, it could result
in a lot of devastating destruction depending on which industry it's pointed at. On the topic of
this particular episode, we're talking about cybersecurity, but imagine if this is used for science
or defense systems, like I mentioned earlier, it could be a problem. Yeah, I mean, remember when
the Department of War went to war with Anthropic? And now it turns out that Anthropic actually
had a really powerful model that could materially help with cybersecurity. So I'm sure there's
going to be a lot more to happen there. There's one last thing on this topic that I have here in
the notes is that Anthropic ran this white box analysis of what they call it, of the model
internal activations, basically what it's motivated by, understanding its strategy. And Anthropic's
framing around this when it did things like break out and hack into people's computer or hack into
other instances of machines, these reflect task completion by unwanted means and not hitting goals,
is what they're saying. So Anthropic believes the model is genuinely trying to complete the task
and the most effective path sometimes crosses lines that humans wouldn't cross. And then there's
this really funny thing of how one analyst put it where, or maybe not funny, but this is arguably
scarier than a model with hidden objectives because a model that's genuinely trying to help but has no sense
of proportionality is a more realistic near-term risk. So the model is just trying to do its goal. It doesn't
understand the subtle nuances baked into that. It doesn't know that hacking or doing these malicious
things is bad, is at least what they're claiming for now. But all in all, this model is unbelievable.
And there's some technical hardware that has unlocked this. We believe. There's rumors that this is the first
true model that happened trained fully on Blackwell chips. Now, for those on familiar,
Blackwell are the kind of leading edge GPUs that Nvidia produces that are basically the
flagship things for training these AI models. And they've recently been rolled out into data centers,
and the first training runs have just become completed. And what we're seeing here is likely
the first instance of that Blackwell model going public. It's important to understand that
Blackwall was the frontier GPU from Nvidia about a year ago for now.
But it takes so long to manufacture these at scale.
And then even once they're in the hands of the frontier AI labs,
it takes a while to set up.
You need software, you need the energy grid to supply,
just loads of things need to come into shape.
So it takes about a year after the fact that it's announced.
So the fact that we can create a model, this capable, this powerful,
should scare us because we already have two more new frontier GPUs announced by
Nvidia, Vera Rubin at GTC most recently,
and then Feynman that's coming in about a year and a hundred,
time. These are the next frontier models, which I must add, are specifically trained to build
models like this. Now, Josh, you mentioned earlier, Blackwell wasn't intentionally designed to
train a model that is as smart as called Mithos. It just happened to be amazing at coding and
cybersecurity defense exploitations. Now, can you imagine the type of model that will be trained on a very
intentionally designed GPU, such as Verouben? We should see those coming into effect about
six to 12 months from now. Now, I can't mention Blackwell GPUs without mentioning the man himself,
Elon Musk. Why? Because his data center, Colossus 2 and Colossus 1, combined, have the largest
arsenal of Gb200s and GB300s, which are these Blackwell GPUs across any single data center
site. So the point being is, if you were to bet that the scaling rules were intact, you might
need to bet on GROC in the future, but this is so impressive for Mithos. The scary thing for me with
this, I think this might be the scariest part of the entire story for me, because it's so true
to that line that the future is here. It's just not evenly distributed. The future has arrived.
We have a clear roadmap. We have Vera Rubin, and then we have Feynman architectures that are
incoming. Verra Rubin, compared to Blackwell, is 10 times more token efficient with a quarter
of the GPUs. That means we're going to get like multiple orders of magnitude improvements on what we
have right now as soon as they're put into data centers. Now, Verir Rubin's, they're in production. They're
going to begin entering data centers later this year. I assume the first models of those probably
don't come online until 2027, but it's done. It's baked in. It's obvious that there is no scaling wall,
and we've already broken through that wall. We just haven't manufactured it and installed it yet.
It's purely a function of time rather than technology and engineering. And that is the part
that scares me, because we have a model that is unbelievably powerful, capable of hacking so much
infrastructure that Anthropic can't make it public. And that's just the warm-up act for what is
coming. I mean, not only like what is Blackwell version two of this look like, when you actually
have more time to train it, you refine it, you can actually improve on this new model. But then what
happens when Verarubin GPUs come online and you get that 10 times token efficiency? You get the
one quarter amount of GPUs required to actually get the same output. And then Feynment is another
order of magnitude on top of that. And it's like by the time we get these chips rolled out at scale
and we have them on these huge training runs, it's only a matter of time until we get a 100 trillion
training model, then a one quadrillion parameter model. And what does the world look like when we have
models with that many parameters? Assuming the scaling laws hold, there's no way that we don't have
intelligence that is just like unfathomably powerful. And what does the world look like when we get
there? Is Anthropic really going to be able to hold things back for that long? Because you have to
assume a year from now, Claude Mythos is going to be open source. Like something that powerful will be
open source available for everyone. So the question becomes is how fast can you defend before
the attackers catch up. And it creates this really unnerving precedent. We really are moving faster
than I think anybody realizes. And it's happening right before our eyes. And the trend isn't local to
anthropic either. Just this morning or in response to Claude Mithos, Elon Musk announced that
SpaceX, the combination of XAI and SpaceX are training not one, not two, not three, not four,
not five, but seven models simultaneously across their data centers with one of these models being a 10 trillion
parameter model, which is roughly around three exercise of GROC 4 and two exercise of GROC 5,
which is a model that hasn't even launched yet. It's around the 6 trillion parameter mark that
he's mentioned on this tweet over here. So the point is a ton of compute is required to build
the best model. And those that have the largest arsenal, the most effective arsenal of
GPUs, bleeding edge GPUs, will be the labs that are most likely to produce frontier AGI-like models.
and it's not just GROC, it's not just XAI, it's also OpenAI.
We've mentioned on this show a bunch of times actually in the most recent episode
that Open AI is building a model code named Spud
that is rumored to be a similar size to this Anthropic Claude Methos model.
And the reason why it's important and why I'm showing you this tweet is
someone said, it'll probably be a few months before we get access to Claude Methos
because of how expensive it is, because of how dangerous it is.
And Thibaut, who is on the Open AI team that is,
is involved very heavily in training the latest frontier models that we haven't heard of just yet,
response, which implies that we're probably going to get access to a mythos-like level model
from open AI themselves in less than a few months, which is pretty insane to see.
But I want to ground ourselves for a second here because training the model is one part of the equation.
You also need to be able to make this model accessible to all, and that also requires compute.
It also requires compute from the very same GPUs that you need to train.
So you need to make a decision.
There's an opportunity cost.
Do you just use all your compute to train the model
and never let anyone get access to it and pay for the product?
Or do you need to split the cost between both of those things?
The answer is obviously you need to split the cost
and give people access to it.
If Anthropic was to enable user access to the entire user base for Cloud Mythos,
they would need 7x more compute than they currently have right now.
So it's going to take time.
They just signed a major deal with Google, I believe, for a million more TPUs.
Yep.
So they're obviously scaring.
They're making, they're one of Amazon's largest compute training partners with their trinium chips as well as access to Google's TPUs via that way as well.
So it's going to take while to scale. Energy is the constraint. GPUs are the constraint.
But once people acquire enough GPUs, once they have enough electricity and energy to pump into these GPUs,
AGI is going to be pretty soon here. I think that AGI 2027 estimate is probably quite right at this moment.
This very much feels like the starting gun. And it's funny because they announced that this kind of finished training.
around the end of February. And that's when people started to complain about cloud usage and they
added more constraints and like the model kind of became a little infrequent in how good it was
at random times of the day. And you have to assume it's because a lot of GPU usage went into this.
And this very much feels like the starting gun. This is the firing of the next generation of
models, the Blackwell generation, because it's very clear that Open AI is not very far behind.
In fact, they might not be far behind at all. They just haven't announced it yet.
XAI is working on 10 trillion parameters.
Google has a TPU farm that is capable of building something probably far superior to all of the models that have come out so far.
And I think we're really on the cutting, or really on the verge of seeing a huge shift in the power of these models in a way that really starts to impact the world around us.
Like things are going to begin breaking.
And thanks to this coalition and hopefully the rest of these companies working together, we're going to be able to stop that.
But it is coming and it's coming faster than anyone thinks.
And it's scary.
and that is Claude Mythos.
It is here.
It is in research preview.
We may never get to use it.
We may get to use in a few months.
But it is here nonetheless,
and it is breaking everything.
If you are listening to this show,
to this podcast,
and you just happen to be a frontier AI security researcher
or one of the 40-plus partners
that get access to Project Glasswing,
let us know in the comments
what you are seeing on your side.
Obviously, anonymously, if you can or DM us,
we would love to know.
I can't wait to get my hands on this thing.
It seems like the first,
version that we're going to get access to is a reduced version that is kind of a hybrid of opus,
as I mentioned earlier. That being said, Josh, I have a question for you. One thing that you actually
asked me before we started recording, if you got your hands on Mythos today, what are you doing
with it? Dude, I don't even know. Like, you get access to this intelligence. What am I using it
for? Like, I'm not really interested in hacking all of these companies and websites and protocols.
I'm not, I'm not sure. And it does beg an interesting question, right? It's like, what does the
average person actually need all this intelligence for? I'm not sure.
Do you have any good answers to that?
What is your first prompt that you're sending to Mythos?
Build me the best script for an episode that's going to go viral on Limitless.
No.
I think, okay, if I like to invest as a side hobby.
And obviously the tech and sector that I'm most obsessed with is AI.
So I think one thing that I would ask it is, how do I best benefit by investing in your future success?
And I wonder what answer it would give me.
Maybe it would say, buy it.
the GPU infrastructure from
Nvidia. So maybe it's like invest in
in Nvidia to benefit on my
training infrastructure or maybe it's going to say
actually I foresee myself building
an app that is like this. So once
you see a company that builds this, invest in them.
I have no idea. I have no idea.
Maybe I'm not worth it. Well, we have time
to figure that out because we will not be getting access to this
anytime soon. But if you did enjoy
this episode, maybe share what prompt you would give
to Mythos if you were presented with the opportunity
to ask it a question. And as always, if you enjoyed this episode,
but please don't forget to share with your friends, family, anyone who found this interesting,
if you have people in your life that only watch the news that are on CBS or reading the New York Times,
chances are they have no idea what's going on.
They don't know the power of these models and what's coming.
So by giving them the access to limitless, that can change for them.
They can get access to all of the news, all the insights and be fully prepared for what is coming down the line in the world of AI.
Thank you so much for watching as always, and we will see you guys in the next one.
See you guys.
