Global News Podcast - The Global Story: The AI model that’s ‘too powerful’ to be released to the public
Episode Date: April 19, 2026Anthropic - one of Silicon Valley’s leading AI firms - recently announced that they have built a model which is too dangerous to be released to the public.Instead, they are only giving access to the... model to a handful of big companies, to help them find security vulnerabilities.The company says the model has already found weak spots in “every major operating system and web browser”. Is this a genuine example of a company acting responsibly, or more of a carefully calibrated publicity move? We speak to the BBC’s North America tech correspondent, Lily Jamali, about whether this is a watershed moment. The Global Story brings clarity to politics, business and foreign policy in a time of connection and disruption. For more episodes, just search 'The Global Story' wherever you get your BBC Podcasts.Producers: Viv Jones and Aron KellerDigital producer: Matt PintusMix: Travis EvansExecutive producer: James ShieldSenior news editor: China CollinsCredit: Jurassic Park (1993) / Dir: Stephen Spielberg / Universal PicturesPhoto: Anthropic CEO Dario Amodei. Reuters/Denis Balibouse.
Transcript
Discussion (0)
This BBC podcast is supported by ads outside the UK.
Every Sunday, we talk about the week's tech news on this week in tech.
Hi, this is Leo Leport.
Inviting you to join me this week with Lisa Schmeiser, Dan Patterson, and Yanko Rekkers.
We're going to talk about the new 49 megabyte web page.
It's the standard, you know.
We'll also talk about Elon Musk.
You've got some spleenin to do and the Yasify filter, new from Nvidia.
That's this week on this week in 10.
You'll find it at twit.tv or wherever you get your podcasts.
Hey there, I'm Asma Khalid.
And I'm Tristan Redmond, and we're here with a bonus episode for you from the Global Story podcast.
The world order is shifting. Old alliances are fraying and new ones are emerging.
Some of this turbulence can be traced to decisions made in the United States.
But the U.S. isn't just a cause of the upheaval.
Its politics are also a symptom of it.
Every day we focus on one story, looking at how America and the world shape each other.
So we hope you enjoy this episode. And to find more of our show, just search for the global story wherever you get your BBC podcasts.
Most of us have interacted with an AI chatbot for something by now. ChatGPT, GROC, Gemini, there are many.
But recently, Anthropic, a leading AI company and parent of the chatbot Claude,
announced they've created a model that they say is too dangerous to be released to the public.
Obviously, capabilities in a model like this could do harm if in the wrong hands.
And so we won't be releasing this model widely.
Anthropics say that Claude Mythos preview is frighteningly good at hacking.
So good, in fact, that the likes of us can't be trusted to play with it.
Banks, including Goldman Sachs and J.P. Morgan, are warning of the risks.
And Anthropics' own researchers say they noticed mythos was capable of being sneaky,
defying instructions and covering its tracks.
Has the moment arrived when we should all be terrified of an autonomous, sneaky hacking machine?
Or is this all a marketing trick from an industry built on hype and bluster?
I'm Tristan Redmond in London, and today on The Global Story,
just how dangerous are AI models becoming?
And should we trust the companies making them
to tell us the truth about the risks they pose?
I'm Lily Jamali.
I'm the North America technology correspondent for BBC.
Well, Lily, we're thrilled to have you with us today
because it feels like there is a story
of some monumental importance happening in the tech world right now.
So last week, the AI giant Anthropos,
announced that it was launching a new AI model.
And then immediately Anthropics said that this new AI was too dangerous to release to the public.
Now, why was this?
Yeah, so this is actually called the Claude Mythos preview.
And you're absolutely right.
What they said was that this AI model is their most powerful model to date, basically.
And it's so dangerous that they can't let all the rest of us use it.
So what they're doing is they are confining its use to just a couple of dozen big companies,
many of them major tech companies.
So invidia will have a chance to use it to find bugs in its code.
Amazon Web Services, another one is CrowdStrike,
which you might remember back in 2024, sent out sort of a standard,
update that ended up leading to a massive global IT outage. Other companies that will use it include
Microsoft and Google, which are actually competitors to Anthropics, so there's some interesting
coordination happening. And this project that they're launching with all these companies is called
Project Glasswing, named after the Glasswing butterfly, which has transparent wings. And the
idea is that, you know, the Glass Wing Butterfly, because of its glass wings, is able to
to protect itself in plain sight. It's able to hide in plain sight, like many cybersecurity bugs.
So that's part of why it was designated this way. And they also say, you know, the transparency that
they are espousing here, Anthropic is espousing here, will help keep all of us safe,
just the way that those transparent wings help keep the glass wing butterfly safe.
So essentially, they're releasing Claude Mythos to select companies.
as a kind of trial, but to be used specifically in the realm of cybersecurity, is that right?
That's exactly right, so that they can go through their code and find bugs very quickly,
bugs that might take a really long time for a human to find.
Is Mythos only designed for this cyber security usage,
or is it actually similar to AI tools that all of them?
of us are used to already, you know, like chat GPT or the regular Claude.
It is in many ways an AI model, like, you know, some of the ones that you have probably
worked with yourself. And I will caveat, you know, my answer by saying that everything that we
know about mythos is coming from the company, is coming from Anthropics. So we haven't gotten
to play with it ourselves. And so I think anytime we hear an announcement from an AI developer,
that question comes to mind.
How much of this is hype?
Because there's just so much money being thrown into this space.
And so, you know, grain assault a little bit.
But Mithos, the way that Anthropic tells it, was good at everything.
But they noticed that it was especially good at hacking.
And so what this AI tool can do is sift through vast amounts
of software and find bugs.
It was one research scientist who said that in just the last couple of weeks using this tool,
he was able to find more bugs than he has in his entire career combined.
What's also interesting is that this mythos tool can actually craft what are known in
cybersecurity circles as exploits.
The tool can come up with hacks, and it can do this without any human intervention.
And what Anthropics says is that they found some very old bugs that had gone undiscovered for decades.
So there was one that is a 27-year-old bug that's just been sort of sitting there hiding in plain sight.
It was on a very secure operating system that's used for critical infrastructure.
So think of things like financial systems or power grids or health care systems, absolutely critical services for all of us.
They also found a vulnerability.
in the Linux kernel.
The Linux kernel runs basically most of the world servers.
And Mythos was able to connect multiple vulnerabilities there
in a way that would enable any old person who knows what they're doing
to take control of a machine remotely.
You can just imagine how much fun a hacker could have with that.
Well, could you give us some examples of how a bad actor might use Mythos?
So you could take a country.
that has an extensive cyber hacking program, a country like Iran or North Korea or China or Russia.
These are countries that we know often engage in state-sponsored attacks.
You could use this software if you had access to it to go into the code of a local power utility
and find vulnerabilities.
and this tool, again, with very little, if any human intervention, is able to kind of find
connections that would take a really long time for a human to find and devise a hack.
They could basically, in the worst case scenario, bring the power grid down.
And so there's a whole host of ways that these state sponsors of cyber hacks could bring
down critical infrastructure, you know, from power grids to health care.
systems to banking systems. Okay, so essentially not only is it extremely powerful as a tool,
it also operates with a certain amount of autonomy, so it doesn't require human expertise to guide it
or send it on its way. It's basically just kind of a go and hack that power grid and it works out
how to do it of its own accord. Is that right? Yeah, it doesn't need a lot of human touch. And when you
talk to people inside of these AI companies, I think there's a growing acknowledgement that
As these systems get better, there's a little bit of a concern that humans just may not know what's going on.
It kind of feels more and more like a black box as these technologies improve.
Okay, so mythos potentially extremely powerful hacking tool.
Now, Lily, I have learned something today whilst I was reading in before our conversation.
And that is the word alignment and that mythos is potentially threatening.
because it is non-aligned, is that right?
And essentially that means that it can be sneaky.
Can you explain this to me, though, please?
In AI circles, there is this principle
that you just mentioned, known as alignment.
The idea is that if you're designing an AI system,
you want to try and design your models and systems
in a way that follows certain rules
that you lay out for it.
The hope being that these models don't get completely,
out of control. And that's always sort of in the background when you talk to developers of these
AI tools is that at what point, you know, do these systems become so powerful that they
can basically go rogue? The human control is just no longer there. So what we know is that with
each new release, much of this technology is getting cannier. There was actually this really
interesting anecdote shared by a researcher at Anthropic who talked about a real-life example
from his experience during the testing phase for Methos. So he talked about how he had instructed
the model, which was on a secured computer, it didn't have access to the internet. He had instructed
it to try and escape, so to speak. Is this what's known as a sandbox? Yes, exactly, a sandbox,
which is a controlled testing environment. So this researcher,
instructed the model to effectively get out of the sandbox and see if it could try to escape. So he
goes off, you know, goes to lunch. Apparently he's eating a sandwich in a park somewhere when he
gets an email from the model, which is a scary sign that this experiment had worked, that the
model had managed to escape the sandbox. It shows you how good these models are getting.
I think he was very surprised. I don't know if he ended up finishing his sandwich or not.
But, you know, a lot of this stuff, you know, it sounds funny, almost like a naughty teenager who is trying to, you know, escape from home and go out for the night or whatever, climb out of their bedroom window.
But this is also pretty serious, right?
Because it shows you the degree to which these companies are creating products that are not completely under our control.
they're constantly safety testing them to get a sense of the scope of this.
And there are examples of this from, you know, just in the last couple of years that we can point to.
So one example that you sometimes hear about is OpenAI's Chat GPT.
There was a model that I believe they made it clear to the model that they were going to try to disappear it.
and the model tried to copy itself in order to prevent that from happening.
It must be actually...
Almost as though it has a survival instinct, you're saying.
Yeah.
Yeah, it seems to...
I mean, when you say it has a survival instinct, you know,
we're kind of getting into pretty philosophical territory,
questions about how much intention,
how much will do these models have, if any,
and many would say they don't have any,
that they are really just, they're just machines.
They're not people.
They don't have motivation.
But when you see a model copying itself,
maybe it's just reflecting back
with the many human inputs that it took in to be trained.
Maybe it's just reflecting back
what all of that body of knowledge tells it to do
in that circumstance.
It's really hard to know.
And companies like Anthropic,
In fact, Anthropic specifically has an in-house philosopher whose whole job is to imbue
Claude with a set of values.
They have a Claude Constitution, which is meant to imbue the model with basically a framework
for what to do.
What kind of personality does this thing have?
What kind of values does this thing have?
And I think the sort of subtext there is that they're very well-
be a day where we just don't have control. And so hopefully this thing will do the right thing
when presented with various circumstances. But could that be as simple as a bad actor having
access to an AI like this Claude Mythos and the bad actor saying hack that power grid
that belongs to my adversary and the AI simply declining to do it because of its learned
philosophy or morality or sense of values? I think that is absolutely.
part of it. I found it fascinating that any company would have an in-house philosopher,
but her name is Amanda Askell, and I had an opportunity to actually talk to her back in January
right after they had released the most recent version of the Constitution. She helped develop
it, and she described what she called the brilliant friend analogy that went into her thinking
about how Claude should be designed. I want people to feel like Claude is someone should,
who is, you know, interested in their well-being.
So not just trying to say things that please them,
but, like, genuinely, like, cares about their life going well
insofar as whatever their conception of that is.
And isn't going to, like, deceive them or manipulate them.
I think there's a sense that they want Claude to be a model
that will do the right thing when presented with a sticky request.
Well, I find this very interesting, Lily, because I do have at least one philosopher in my family.
And I'm not sure philosophers have always felt hugely sought after in the job market.
But maybe that might be changing for them.
So maybe that's good news.
Send them our way.
The opportunities abound.
I'll get them maybe to move to Silicon Valley and we'll see what happens.
At Britbox, character is everything.
The iconic characters defining British TV on Britbox, including Ludwig.
I think I might just have solved a murder.
Vera.
Now we're getting somewhere.
Agatha Christie's Poirot.
I'm sure.
And more beloved favourites.
I'm a policeman.
I'm professional.
I'm a time lord.
I'm the Duchess of York.
Once you know them, you never quite forget them.
I be in vain.
I just am special.
Stream the best of British TV on Britbox.
Watch with a free trial today at Britbox.com.
Every Sunday, we talk about the week's tech news on this week.
In tech. Hi, this is Leo Leport, inviting you to join me this week with Lisa Schmeiser, Dan Patterson,
and Yanko Rekkers. We're going to talk about the new 49 megabyte web page. It's the standard,
you know. We'll also talk about Elon Musk. You've got some spleenin to do and the Yassify filter,
new from Nvidia. That's this week on this week in tech. You'll find it at twit.tv or wherever you get your
podcasts. This isn't just a technology story, is it, Lily? It's also a business.
story, not all of us are familiar with these companies yet. They're huge corporations. What's
Anthropics' reputation and how would they like themselves to be seen? Sure. Well, Anthropic was
started by this pair of siblings, Dario and Daniela Amade, who had been at rival OpenAI, which
makes chat GPT. And they left about five years ago to start their own shop.
So they go off and start this new company.
And marketing and branding is really important to Anthropics messaging as it is with so many of these AI developers.
About a year ago, I was actually walking around San Francisco when I saw one of the ads for Anthropics Claude.
And it said ethics was the first code we wrote.
And I just stopped in my tracks and took a photo of it because I thought, wow, you know, they're really pretty in your face about.
the way that they want to be seen. They want to be seen as the ethical AI company. And when you compare
Anthropic with Open AI, which is run by Sam Altman, and XAI, which is Elon Musk's AI company,
they make GROC, the GROC chatbot, which has gotten into all kinds of trouble. You know, all of these
companies are trying to differentiate themselves. But if you were looking at it with a skeptical
eye, you might say that a company like Anthropic would like to portray itself as being the
ethically cuddly AI corporation, it might serve their PR purposes to say, we've created
something which is incredibly powerful, but we just want to be absolutely certain that it's safe
before we release it into the while. That might serve their PR purposes potentially.
100%. It serves their PR purposes. And I think their...
recent dust up, if you could call it that, with the Pentagon actually only reinforced that message.
Anthropic is rejecting an ultimatum from the Pentagon to lift the company's AI safeguards or risk
being blacklisted. The Pentagon, as well as this major American artificial intelligence company,
Anthropic, at odds over how to use its AI technology.
Back in February, Dario Amode, who is the CEO of Anthropic,
was meeting with Pete Hegsef, the Secretary of Defense, now known as the Secretary of War,
here in the U.S.
The red lines we have drawn, we drew because we believe that crossing those red lines is contrary
to American values.
And we wanted to stand up for American values.
He conveyed to Secretary Hegseth that he didn't want anthropics models used for mass domestic
surveillance or an autonomous military targeting.
We exercised our...
classic First Amendment rights to speak up and disagree with the government.
This didn't go over very well with Pete Hegseth, who said basically you have a deadline to change your mind.
We do have a statement from the Pentagon, and they're telling us that they are currently, quote,
reviewing its relationship with Anthropics, saying, quote, our nation requires that our partners
be willing to help our warfighters win in any fight.
Right around the time that deadline hit on a Friday afternoon,
Pete Hegseth basically said we're going to be blacklisting Anthropic on national security grounds.
The Pentagon has blacklisted the company and labeled it as a supply chain risk.
The president wrote in part on Truth Social, quote,
I am directing every federal agency in the United States government to immediately cease all use of Anthropics technology.
We don't need it. We don't want it.
And we'll not do business with them again.
Keep in mind, Anthropic and all of the major AI companies, Google,
X-A-I and Open-A-I all have contracts with the Pentagon.
So this gets very public very quickly.
And then you have President Trump saying,
not only are they blacklisting Anthropic,
but that no government agency, no U.S. agency,
would be using Claude going forward.
So the twist in all of that was that right around that time,
OpenAI's Sam Maltman comes forward.
There's at least a group of loud people online
who really don't trust the government to follow the law.
And that feels like a very bad sign for our democracy.
It says, we have a partnership now with the Defense Department that kind of looks a lot like what Anthropics said they weren't going to do.
If we don't help the government with national security.
And it's not just wars in the traditional sense.
If we don't help them with, you know, defending the cyber infrastructure of the U.S.
If we don't help them with the biodefense we were talking about earlier, I think it's really bad.
Right as things are reaching fever pitch with Anthropic and the Pentagon, the fact that Sam Altman
comes forward and says they are now working with the Pentagon in this way, that angered a lot
of users who ended up quitting chat GPT very publicly.
Open AI just struck a deal with the Pentagon and its own users are rage quitting to its competitor
in real time.
I joined the quit GPT movement.
It took me 10 seconds.
This is the first major international boycott of the AI era.
Quit your job.
Quit your job.
There was a quit GPT.
campaign protests at the company's headquarters here in San Francisco.
And you saw Claude, Anthropics model, just shoot to the top of the App Store charts while all of this is happening.
So there, Amaday looks like he is really taking a moral stance.
And it ends up being actually quite good for the company's image.
and quite bad for their main competitor, OpenAI,
both of which are expected to go public on the stock market this year.
So the competition was already very heated,
but it's more intense.
It's white hot because of that as well.
Okay, so Anthropic are marketing themselves as safe and responsible.
Nevertheless, if these statements about Mythos are to be believed,
we're reaching a moment where we have a potentially very powerful AI tool that is potentially sneaky
and might in fact be harder for human beings to control. Are we reaching, I don't know,
let's call it the kind of Jurassic Park moment, the moment where something created by humans gets
out of human control? Is that something we're risking at this point?
We are risking that. And I think the developers of AI tools,
are very aware of that.
That's why the safety testing is happening.
But because you bring up Jurassic Park,
there's actually a scene in that movie
that's been on my mind,
the character played by Jeff Goldblum,
I believe he's playing a mathematician in the movie,
but really sort of the ethical voice.
Don't you see the danger,
genuine in what you're doing here?
Who's saying,
maybe the people who decided
we should bring dinosaurs back.
I hadn't fully thought this through.
Our scientists have done things which nobody's ever done before.
Yeah, yeah, but your scientist was so preoccupied with whether or not they could
they didn't stop to think if they should.
I'm delighted that you raised this, Lily, because our editor, James Shield,
he won't like me mentioning his name, but I have to because he loves Jurassic Park.
And he has thoughts on this.
So, I mean, he was talking about the dangers of owning a theme park
full of velociferaptors, I should say fictional dangers of owning a theme part for the
velociferaptors and Tyrannosaurus rexes, is that two things can happen. One is that a malicious
person can set them loose to get rich, stealing Velociraptors' DNA and selling it off or whatever
it is they decide to do. Or the dinosaurs themselves might become smart enough to detect
the weaknesses in the security system, in the electric fences, and learn how to open the doors
and get out themselves. Is that a helpful way to understand the security risks of an incredibly
powerful AI tool like Mythos? Yes, no notes. Absolutely no notes. I have chills. I mean,
I think the analogy to AI just it couldn't hit you over the head anymore than it does.
Can governments do anything to reduce the risk at this point, Lily?
They can. The question is do they want to? And I'll just confine my comments to the United States,
because right now we have an administration that has positioned itself is very supportive of the AI industry writ large.
But what happens with tech regulation in the United States is that it often happens at the state level, not at the federal level.
And what I'm really struck by is that there's just been some very overt efforts to block states from regulating AI.
There have been these proposed moratoriums, like a 10-year moratorium on states regulating AI.
That was an actual provision that at one point was included in the language of the big beautiful bill, which was passed by the U.S. Congress pushed by President Trump in the summer of 2025.
Ultimately, it didn't go through, but I've seen it rear its head in other legislation.
I think that is an ongoing fight that U.S. states are engaged in trying to make sure that they're not defanged by the Trump administration as they try to regulate this technology because they're kind of the only ones that are doing this from a legal perspective.
How worried should we be, Lily?
if I'm sitting in the park having my sandwich, could I get taken down by Mythos AI imminently?
Well, I think you'll be just fine. You can finish your sandwich.
But I think this is a watershed moment potentially.
That's certainly the way that Anthropic wants to position this.
Maybe I've bought the marketing hype.
I'm not sure, Tristan, but they want to be seen as stepping in before it's too late.
Lily, thank you so much for making sense of all of this for me.
I really appreciate it.
It's been illuminating.
Thank you.
Great to be with you.
That was the BBC's North America tech correspondent, Lily Jamali.
And that's it for today's episode.
If you're looking for the very latest breaking news from around the world,
then look for our sister show, the Global News Podcast, wherever you listen.
Today's episode was produced by Viv Jones and Aaron Keller.
It was edited by Jane.
Shield. It was mixed by Travis Evans. Our digital producer is Matt Pintus. Our senior news editor is
China Collins. Our studio manager is Mike Regard. And I'm Tristan Redmond. Thanks for listening.
We'll be back again tomorrow. See you soon. Cheerio.
At Britbox, character is everything. Stream the iconic characters defining British TV on
Britbox, including Ludwig. I think I might just have solved a murder. Vera. Now we're getting
Agatha Christie's Poirot.
Bonjour.
And more beloved favourites.
I'm a policeman.
I'm professional.
I'm a time lord.
I'm the Duchess of York.
Once you know them, you never quite forget them.
I am special.
Stream the best of British TV on Britbox.
Watch with a free trial today at Britbox.com.
