Global News Podcast - The Global Story: The AI model that’s ‘too powerful’ to be released to the public

Starting point is 00:00:00 This BBC podcast is supported by ads outside the UK. Every Sunday, we talk about the week's tech news on this week in tech. Hi, this is Leo Leport. Inviting you to join me this week with Lisa Schmeiser, Dan Patterson, and Yanko Rekkers. We're going to talk about the new 49 megabyte web page. It's the standard, you know. We'll also talk about Elon Musk. You've got some spleenin to do and the Yasify filter, new from Nvidia.

Starting point is 00:00:28 That's this week on this week in 10. You'll find it at twit.tv or wherever you get your podcasts. Hey there, I'm Asma Khalid. And I'm Tristan Redmond, and we're here with a bonus episode for you from the Global Story podcast. The world order is shifting. Old alliances are fraying and new ones are emerging. Some of this turbulence can be traced to decisions made in the United States. But the U.S. isn't just a cause of the upheaval. Its politics are also a symptom of it.

Starting point is 00:00:59 Every day we focus on one story, looking at how America and the world shape each other. So we hope you enjoy this episode. And to find more of our show, just search for the global story wherever you get your BBC podcasts. Most of us have interacted with an AI chatbot for something by now. ChatGPT, GROC, Gemini, there are many. But recently, Anthropic, a leading AI company and parent of the chatbot Claude, announced they've created a model that they say is too dangerous to be released to the public. Obviously, capabilities in a model like this could do harm if in the wrong hands. And so we won't be releasing this model widely. Anthropics say that Claude Mythos preview is frighteningly good at hacking.

Starting point is 00:01:52 So good, in fact, that the likes of us can't be trusted to play with it. Banks, including Goldman Sachs and J.P. Morgan, are warning of the risks. And Anthropics' own researchers say they noticed mythos was capable of being sneaky, defying instructions and covering its tracks. Has the moment arrived when we should all be terrified of an autonomous, sneaky hacking machine? Or is this all a marketing trick from an industry built on hype and bluster? I'm Tristan Redmond in London, and today on The Global Story, just how dangerous are AI models becoming?

Starting point is 00:02:34 And should we trust the companies making them to tell us the truth about the risks they pose? I'm Lily Jamali. I'm the North America technology correspondent for BBC. Well, Lily, we're thrilled to have you with us today because it feels like there is a story of some monumental importance happening in the tech world right now. So last week, the AI giant Anthropos,

Starting point is 00:03:04 announced that it was launching a new AI model. And then immediately Anthropics said that this new AI was too dangerous to release to the public. Now, why was this? Yeah, so this is actually called the Claude Mythos preview. And you're absolutely right. What they said was that this AI model is their most powerful model to date, basically. And it's so dangerous that they can't let all the rest of us use it. So what they're doing is they are confining its use to just a couple of dozen big companies,

Starting point is 00:03:45 many of them major tech companies. So invidia will have a chance to use it to find bugs in its code. Amazon Web Services, another one is CrowdStrike, which you might remember back in 2024, sent out sort of a standard, update that ended up leading to a massive global IT outage. Other companies that will use it include Microsoft and Google, which are actually competitors to Anthropics, so there's some interesting coordination happening. And this project that they're launching with all these companies is called Project Glasswing, named after the Glasswing butterfly, which has transparent wings. And the

Starting point is 00:04:26 idea is that, you know, the Glass Wing Butterfly, because of its glass wings, is able to to protect itself in plain sight. It's able to hide in plain sight, like many cybersecurity bugs. So that's part of why it was designated this way. And they also say, you know, the transparency that they are espousing here, Anthropic is espousing here, will help keep all of us safe, just the way that those transparent wings help keep the glass wing butterfly safe. So essentially, they're releasing Claude Mythos to select companies. as a kind of trial, but to be used specifically in the realm of cybersecurity, is that right? That's exactly right, so that they can go through their code and find bugs very quickly,

Starting point is 00:05:18 bugs that might take a really long time for a human to find. Is Mythos only designed for this cyber security usage, or is it actually similar to AI tools that all of them? of us are used to already, you know, like chat GPT or the regular Claude. It is in many ways an AI model, like, you know, some of the ones that you have probably worked with yourself. And I will caveat, you know, my answer by saying that everything that we know about mythos is coming from the company, is coming from Anthropics. So we haven't gotten to play with it ourselves. And so I think anytime we hear an announcement from an AI developer,

Starting point is 00:05:59 that question comes to mind. How much of this is hype? Because there's just so much money being thrown into this space. And so, you know, grain assault a little bit. But Mithos, the way that Anthropic tells it, was good at everything. But they noticed that it was especially good at hacking. And so what this AI tool can do is sift through vast amounts of software and find bugs.

Starting point is 00:06:32 It was one research scientist who said that in just the last couple of weeks using this tool, he was able to find more bugs than he has in his entire career combined. What's also interesting is that this mythos tool can actually craft what are known in cybersecurity circles as exploits. The tool can come up with hacks, and it can do this without any human intervention. And what Anthropics says is that they found some very old bugs that had gone undiscovered for decades. So there was one that is a 27-year-old bug that's just been sort of sitting there hiding in plain sight. It was on a very secure operating system that's used for critical infrastructure.

Starting point is 00:07:14 So think of things like financial systems or power grids or health care systems, absolutely critical services for all of us. They also found a vulnerability. in the Linux kernel. The Linux kernel runs basically most of the world servers. And Mythos was able to connect multiple vulnerabilities there in a way that would enable any old person who knows what they're doing to take control of a machine remotely. You can just imagine how much fun a hacker could have with that.

Starting point is 00:07:48 Well, could you give us some examples of how a bad actor might use Mythos? So you could take a country. that has an extensive cyber hacking program, a country like Iran or North Korea or China or Russia. These are countries that we know often engage in state-sponsored attacks. You could use this software if you had access to it to go into the code of a local power utility and find vulnerabilities. and this tool, again, with very little, if any human intervention, is able to kind of find connections that would take a really long time for a human to find and devise a hack.

Starting point is 00:08:36 They could basically, in the worst case scenario, bring the power grid down. And so there's a whole host of ways that these state sponsors of cyber hacks could bring down critical infrastructure, you know, from power grids to health care. systems to banking systems. Okay, so essentially not only is it extremely powerful as a tool, it also operates with a certain amount of autonomy, so it doesn't require human expertise to guide it or send it on its way. It's basically just kind of a go and hack that power grid and it works out how to do it of its own accord. Is that right? Yeah, it doesn't need a lot of human touch. And when you talk to people inside of these AI companies, I think there's a growing acknowledgement that

Starting point is 00:09:21 As these systems get better, there's a little bit of a concern that humans just may not know what's going on. It kind of feels more and more like a black box as these technologies improve. Okay, so mythos potentially extremely powerful hacking tool. Now, Lily, I have learned something today whilst I was reading in before our conversation. And that is the word alignment and that mythos is potentially threatening. because it is non-aligned, is that right? And essentially that means that it can be sneaky. Can you explain this to me, though, please?

Starting point is 00:09:59 In AI circles, there is this principle that you just mentioned, known as alignment. The idea is that if you're designing an AI system, you want to try and design your models and systems in a way that follows certain rules that you lay out for it. The hope being that these models don't get completely, out of control. And that's always sort of in the background when you talk to developers of these

Starting point is 00:10:27 AI tools is that at what point, you know, do these systems become so powerful that they can basically go rogue? The human control is just no longer there. So what we know is that with each new release, much of this technology is getting cannier. There was actually this really interesting anecdote shared by a researcher at Anthropic who talked about a real-life example from his experience during the testing phase for Methos. So he talked about how he had instructed the model, which was on a secured computer, it didn't have access to the internet. He had instructed it to try and escape, so to speak. Is this what's known as a sandbox? Yes, exactly, a sandbox, which is a controlled testing environment. So this researcher,

Starting point is 00:11:18 instructed the model to effectively get out of the sandbox and see if it could try to escape. So he goes off, you know, goes to lunch. Apparently he's eating a sandwich in a park somewhere when he gets an email from the model, which is a scary sign that this experiment had worked, that the model had managed to escape the sandbox. It shows you how good these models are getting. I think he was very surprised. I don't know if he ended up finishing his sandwich or not. But, you know, a lot of this stuff, you know, it sounds funny, almost like a naughty teenager who is trying to, you know, escape from home and go out for the night or whatever, climb out of their bedroom window. But this is also pretty serious, right? Because it shows you the degree to which these companies are creating products that are not completely under our control.

Starting point is 00:12:15 they're constantly safety testing them to get a sense of the scope of this. And there are examples of this from, you know, just in the last couple of years that we can point to. So one example that you sometimes hear about is OpenAI's Chat GPT. There was a model that I believe they made it clear to the model that they were going to try to disappear it. and the model tried to copy itself in order to prevent that from happening. It must be actually... Almost as though it has a survival instinct, you're saying. Yeah.

Starting point is 00:12:53 Yeah, it seems to... I mean, when you say it has a survival instinct, you know, we're kind of getting into pretty philosophical territory, questions about how much intention, how much will do these models have, if any, and many would say they don't have any, that they are really just, they're just machines. They're not people.

Starting point is 00:13:16 They don't have motivation. But when you see a model copying itself, maybe it's just reflecting back with the many human inputs that it took in to be trained. Maybe it's just reflecting back what all of that body of knowledge tells it to do in that circumstance. It's really hard to know.

Starting point is 00:13:38 And companies like Anthropic, In fact, Anthropic specifically has an in-house philosopher whose whole job is to imbue Claude with a set of values. They have a Claude Constitution, which is meant to imbue the model with basically a framework for what to do. What kind of personality does this thing have? What kind of values does this thing have? And I think the sort of subtext there is that they're very well-

Starting point is 00:14:10 be a day where we just don't have control. And so hopefully this thing will do the right thing when presented with various circumstances. But could that be as simple as a bad actor having access to an AI like this Claude Mythos and the bad actor saying hack that power grid that belongs to my adversary and the AI simply declining to do it because of its learned philosophy or morality or sense of values? I think that is absolutely. part of it. I found it fascinating that any company would have an in-house philosopher, but her name is Amanda Askell, and I had an opportunity to actually talk to her back in January right after they had released the most recent version of the Constitution. She helped develop

Starting point is 00:14:58 it, and she described what she called the brilliant friend analogy that went into her thinking about how Claude should be designed. I want people to feel like Claude is someone should, who is, you know, interested in their well-being. So not just trying to say things that please them, but, like, genuinely, like, cares about their life going well insofar as whatever their conception of that is. And isn't going to, like, deceive them or manipulate them. I think there's a sense that they want Claude to be a model

Starting point is 00:15:36 that will do the right thing when presented with a sticky request. Well, I find this very interesting, Lily, because I do have at least one philosopher in my family. And I'm not sure philosophers have always felt hugely sought after in the job market. But maybe that might be changing for them. So maybe that's good news. Send them our way. The opportunities abound. I'll get them maybe to move to Silicon Valley and we'll see what happens.

Starting point is 00:16:09 At Britbox, character is everything. The iconic characters defining British TV on Britbox, including Ludwig. I think I might just have solved a murder. Vera. Now we're getting somewhere. Agatha Christie's Poirot. I'm sure. And more beloved favourites.

Starting point is 00:16:35 I'm a policeman. I'm professional. I'm a time lord. I'm the Duchess of York. Once you know them, you never quite forget them. I be in vain. I just am special. Stream the best of British TV on Britbox.

Starting point is 00:16:46 Watch with a free trial today at Britbox.com. Every Sunday, we talk about the week's tech news on this week. In tech. Hi, this is Leo Leport, inviting you to join me this week with Lisa Schmeiser, Dan Patterson, and Yanko Rekkers. We're going to talk about the new 49 megabyte web page. It's the standard, you know. We'll also talk about Elon Musk. You've got some spleenin to do and the Yassify filter, new from Nvidia. That's this week on this week in tech. You'll find it at twit.tv or wherever you get your podcasts. This isn't just a technology story, is it, Lily? It's also a business. story, not all of us are familiar with these companies yet. They're huge corporations. What's

Starting point is 00:17:30 Anthropics' reputation and how would they like themselves to be seen? Sure. Well, Anthropic was started by this pair of siblings, Dario and Daniela Amade, who had been at rival OpenAI, which makes chat GPT. And they left about five years ago to start their own shop. So they go off and start this new company. And marketing and branding is really important to Anthropics messaging as it is with so many of these AI developers. About a year ago, I was actually walking around San Francisco when I saw one of the ads for Anthropics Claude. And it said ethics was the first code we wrote. And I just stopped in my tracks and took a photo of it because I thought, wow, you know, they're really pretty in your face about.

Starting point is 00:18:24 the way that they want to be seen. They want to be seen as the ethical AI company. And when you compare Anthropic with Open AI, which is run by Sam Altman, and XAI, which is Elon Musk's AI company, they make GROC, the GROC chatbot, which has gotten into all kinds of trouble. You know, all of these companies are trying to differentiate themselves. But if you were looking at it with a skeptical eye, you might say that a company like Anthropic would like to portray itself as being the ethically cuddly AI corporation, it might serve their PR purposes to say, we've created something which is incredibly powerful, but we just want to be absolutely certain that it's safe before we release it into the while. That might serve their PR purposes potentially.

Starting point is 00:19:14 100%. It serves their PR purposes. And I think their... recent dust up, if you could call it that, with the Pentagon actually only reinforced that message. Anthropic is rejecting an ultimatum from the Pentagon to lift the company's AI safeguards or risk being blacklisted. The Pentagon, as well as this major American artificial intelligence company, Anthropic, at odds over how to use its AI technology. Back in February, Dario Amode, who is the CEO of Anthropic, was meeting with Pete Hegsef, the Secretary of Defense, now known as the Secretary of War, here in the U.S.

Starting point is 00:19:56 The red lines we have drawn, we drew because we believe that crossing those red lines is contrary to American values. And we wanted to stand up for American values. He conveyed to Secretary Hegseth that he didn't want anthropics models used for mass domestic surveillance or an autonomous military targeting. We exercised our... classic First Amendment rights to speak up and disagree with the government. This didn't go over very well with Pete Hegseth, who said basically you have a deadline to change your mind.

Starting point is 00:20:33 We do have a statement from the Pentagon, and they're telling us that they are currently, quote, reviewing its relationship with Anthropics, saying, quote, our nation requires that our partners be willing to help our warfighters win in any fight. Right around the time that deadline hit on a Friday afternoon, Pete Hegseth basically said we're going to be blacklisting Anthropic on national security grounds. The Pentagon has blacklisted the company and labeled it as a supply chain risk. The president wrote in part on Truth Social, quote, I am directing every federal agency in the United States government to immediately cease all use of Anthropics technology.

Starting point is 00:21:11 We don't need it. We don't want it. And we'll not do business with them again. Keep in mind, Anthropic and all of the major AI companies, Google, X-A-I and Open-A-I all have contracts with the Pentagon. So this gets very public very quickly. And then you have President Trump saying, not only are they blacklisting Anthropic, but that no government agency, no U.S. agency,

Starting point is 00:21:36 would be using Claude going forward. So the twist in all of that was that right around that time, OpenAI's Sam Maltman comes forward. There's at least a group of loud people online who really don't trust the government to follow the law. And that feels like a very bad sign for our democracy. It says, we have a partnership now with the Defense Department that kind of looks a lot like what Anthropics said they weren't going to do. If we don't help the government with national security.

Starting point is 00:22:04 And it's not just wars in the traditional sense. If we don't help them with, you know, defending the cyber infrastructure of the U.S. If we don't help them with the biodefense we were talking about earlier, I think it's really bad. Right as things are reaching fever pitch with Anthropic and the Pentagon, the fact that Sam Altman comes forward and says they are now working with the Pentagon in this way, that angered a lot of users who ended up quitting chat GPT very publicly. Open AI just struck a deal with the Pentagon and its own users are rage quitting to its competitor in real time.

Starting point is 00:22:37 I joined the quit GPT movement. It took me 10 seconds. This is the first major international boycott of the AI era. Quit your job. Quit your job. There was a quit GPT. campaign protests at the company's headquarters here in San Francisco. And you saw Claude, Anthropics model, just shoot to the top of the App Store charts while all of this is happening.

Starting point is 00:23:09 So there, Amaday looks like he is really taking a moral stance. And it ends up being actually quite good for the company's image. and quite bad for their main competitor, OpenAI, both of which are expected to go public on the stock market this year. So the competition was already very heated, but it's more intense. It's white hot because of that as well. Okay, so Anthropic are marketing themselves as safe and responsible.

Starting point is 00:23:44 Nevertheless, if these statements about Mythos are to be believed, we're reaching a moment where we have a potentially very powerful AI tool that is potentially sneaky and might in fact be harder for human beings to control. Are we reaching, I don't know, let's call it the kind of Jurassic Park moment, the moment where something created by humans gets out of human control? Is that something we're risking at this point? We are risking that. And I think the developers of AI tools, are very aware of that. That's why the safety testing is happening.

Starting point is 00:24:23 But because you bring up Jurassic Park, there's actually a scene in that movie that's been on my mind, the character played by Jeff Goldblum, I believe he's playing a mathematician in the movie, but really sort of the ethical voice. Don't you see the danger, genuine in what you're doing here?

Starting point is 00:24:45 Who's saying, maybe the people who decided we should bring dinosaurs back. I hadn't fully thought this through. Our scientists have done things which nobody's ever done before. Yeah, yeah, but your scientist was so preoccupied with whether or not they could they didn't stop to think if they should. I'm delighted that you raised this, Lily, because our editor, James Shield,

Starting point is 00:25:04 he won't like me mentioning his name, but I have to because he loves Jurassic Park. And he has thoughts on this. So, I mean, he was talking about the dangers of owning a theme park full of velociferaptors, I should say fictional dangers of owning a theme part for the velociferaptors and Tyrannosaurus rexes, is that two things can happen. One is that a malicious person can set them loose to get rich, stealing Velociraptors' DNA and selling it off or whatever it is they decide to do. Or the dinosaurs themselves might become smart enough to detect the weaknesses in the security system, in the electric fences, and learn how to open the doors

Starting point is 00:25:48 and get out themselves. Is that a helpful way to understand the security risks of an incredibly powerful AI tool like Mythos? Yes, no notes. Absolutely no notes. I have chills. I mean, I think the analogy to AI just it couldn't hit you over the head anymore than it does. Can governments do anything to reduce the risk at this point, Lily? They can. The question is do they want to? And I'll just confine my comments to the United States, because right now we have an administration that has positioned itself is very supportive of the AI industry writ large. But what happens with tech regulation in the United States is that it often happens at the state level, not at the federal level. And what I'm really struck by is that there's just been some very overt efforts to block states from regulating AI.

Starting point is 00:26:45 There have been these proposed moratoriums, like a 10-year moratorium on states regulating AI. That was an actual provision that at one point was included in the language of the big beautiful bill, which was passed by the U.S. Congress pushed by President Trump in the summer of 2025. Ultimately, it didn't go through, but I've seen it rear its head in other legislation. I think that is an ongoing fight that U.S. states are engaged in trying to make sure that they're not defanged by the Trump administration as they try to regulate this technology because they're kind of the only ones that are doing this from a legal perspective. How worried should we be, Lily? if I'm sitting in the park having my sandwich, could I get taken down by Mythos AI imminently? Well, I think you'll be just fine. You can finish your sandwich. But I think this is a watershed moment potentially.

Starting point is 00:27:55 That's certainly the way that Anthropic wants to position this. Maybe I've bought the marketing hype. I'm not sure, Tristan, but they want to be seen as stepping in before it's too late. Lily, thank you so much for making sense of all of this for me. I really appreciate it. It's been illuminating. Thank you. Great to be with you.

Starting point is 00:28:23 That was the BBC's North America tech correspondent, Lily Jamali. And that's it for today's episode. If you're looking for the very latest breaking news from around the world, then look for our sister show, the Global News Podcast, wherever you listen. Today's episode was produced by Viv Jones and Aaron Keller. It was edited by Jane. Shield. It was mixed by Travis Evans. Our digital producer is Matt Pintus. Our senior news editor is China Collins. Our studio manager is Mike Regard. And I'm Tristan Redmond. Thanks for listening.

Starting point is 00:28:56 We'll be back again tomorrow. See you soon. Cheerio. At Britbox, character is everything. Stream the iconic characters defining British TV on Britbox, including Ludwig. I think I might just have solved a murder. Vera. Now we're getting Agatha Christie's Poirot. Bonjour. And more beloved favourites. I'm a policeman. I'm professional.

Starting point is 00:29:30 I'm a time lord. I'm the Duchess of York. Once you know them, you never quite forget them. I am special. Stream the best of British TV on Britbox. Watch with a free trial today at Britbox.com.

Global News Podcast - The Global Story: The AI model that’s ‘too powerful’ to be released to the public

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.