Fresh Air - A look at the ethical implications of AI

Episode Date: February 18, 2026

The AI chatbot Claude can help you write an email, challenge a hospital bill, or publish a novel. It was also reportedly used by the U.S. military in the operation that captured Venezuelan dictator Ni...colás Maduro. Now the Pentagon is threatening to cut ties with Anthropic, the company that built it, because it insists on keeping restrictions around autonomous weapons and mass surveillance. Journalist Gideon Lewis-Kraus spent months inside Anthropic, one of the world's most secretive AI companies, for a new piece in ‘The New Yorker,’ where he asks: What happens when the people who built the machine can't fully explain what it's doing? He spoke with Tonya Mosley. Learn more about sponsor message choices: podcastchoices.com/adchoicesNPR Privacy Policy

Transcript
Discussion (0)
Starting point is 00:00:00 This is Fresh Air. I'm Tanya Mosley. This week, the Pentagon is considering cutting business ties with the artificial intelligence company Anthropic after the company declined to allow its chatbot, Claude, to be used for certain military applications, including weapons development. At the same time, the Wall Street Journal reports that Claude was used in a U.S. operation that led to the capture of Venezuelan leader Nicholas Maduro, claims Anthropic has not confirmed and has declined to decide. discuss publicly. Meanwhile, outside military and intelligence circles, the same tool is being used for far less dramatic but still consequential purposes. A man in New York reportedly used Claude to challenge a nearly $200,000 hospital bill and negotiated most of it away. A romance novelist in South Africa has said she used it to help publish more than 200 novels in a single year. So what exactly is this system capable of? And how well do the people building it understand what they've created? My guest today, journalist Gideon Lewis Krause spent months inside Anthropic trying to answer that question.
Starting point is 00:01:11 The company is one of the most powerful AI firms in the world valued at about $350 billion, and also one of the most secretive. It was founded by former OpenAI employees, the team behind ChatGPT, who left because they believed the race to build advanced artificial intelligence was moving too fast and could become dangerous. Gideon Lewis Krauss is a staff writer at The New Yorker. His piece is called What is Claude? Anthropic doesn't know either. Our interview was recorded yesterday. And Gideon, welcome to fresh air. Thank you so much for having you, Tonya.
Starting point is 00:01:48 Let's get started by talking about the latest news. We learned last week that the military may have used Anthropics. tool clawed during the operation that captured Venezuelan dictator Nicholas Maduro. And reportedly, they used it to process intelligence and analyze satellite imagery and things like that to support real-time decision-making. What is Anthropics usage guidelines? What do they say about its use for violence or surveillance? Well, their contracts with other companies and with the government stipulate that it can't be used for domestic surveillance or for autonomous weapons.
Starting point is 00:02:25 Now, of course, the issue with these systems is that once you put it into someone's hands, it's very hard to predict or control how they're going to use it. So it seems to me from the reporting we've seen from the Wall Street Journal and elsewhere, that Anthropic may have also been caught by surprise with this, that they didn't seem to have a formulated response, and they seemed as though they perhaps hadn't even known that this had been used in the Maduro rate. The Wall Street Journal is also reporting that Clot was deployed through Anthropics' partnership with the data firm Palantir Technologies, which you have done quite a bit of reporting on. And we know that Palantir works extensively with the Pentagon.
Starting point is 00:03:06 What can you tell us about their relationship? There has not been a lot of reporting about that relationship. Anthropic has decided over the last couple of years that they were going to pursue an enterprise business strategy. So they work with a lot of different companies. And presumably they expect these companies to follow the terms of the agreement that they have. But beyond that, it's sort of out of their hands how these companies are using these systems that they've developed. Your piece really lays out the tension between anthropic safety mission and the commercial pressure that it faces. And I guess I just wonder, is this a version of that tension that you actually even expected, basically a standoff with the Pentagon?
Starting point is 00:03:49 on? Well, I think it was clear probably even about a year ago that there were going to be some tensions that many of the members of the Trump administration, including Trump's AI czar, David Sachs, the venture capitalist, and Pete Higgsuff more recently had expressed reservations about Anthropics willingness to allow the government to use the models the way that the government saw fit. And one of the ways that Dario Amade, the CEO of Anthropic, has done, with these competing pressures, both the pressure to develop these systems safely and responsibly and also to compete in a very aggressive marketplace. He talks about the race to the top, meaning that he hopes that if they can show that their systems are safer and more responsible
Starting point is 00:04:39 than other systems, that there will be market discipline that will be enforced and will force their competitors to rise to the occasion. Now, the problem is, I'm not sure he anticipated the fact that if the government and the Defense Department are among their customers, that our government has not shown great tendencies to participate in races to the top, rather to the contrary. Let's get into your reporting. You went inside of Anthropics headquarters in San Francisco. What was your first impression walking through that door? My first impression is that there's really not a lot of personality at the company that, you know, I've spent a lot of time at places like Google over the years, and, you know, at least in certain earlier iterations,
Starting point is 00:05:24 Google could kind of look like adult daycare with board games set out and climbing malls and candy and special nap rooms. Anthropic really has none of that stuff, all of which I think would seem like a distraction to them. Anthropic, you know, as I said in the piece kind of radiates the personality of a Swiss bank. There's not much to look at. They took over a turnkey lease from the messaging company Slack about 18 months ago, and it seems like they removed anything interesting to look at. So there's very little to describe from the inside of the company. And I was kind of whisked right away to one of the two floors where they allow outside visitors
Starting point is 00:06:04 and had very gracious and gentle and firm PR minders for my time while I was there. The founding of Anthropic, the story behind. behind it is really interesting in light of the latest developments with its relationship with the government and the military because initially they were people who set out to resist corrupting power. They were founded in 2021 by two siblings who left Open AI because they felt that Sam Altman in particular was prioritizing commercial dominance over safety. Can you briefly share their ethos, Anthropics purpose? Well, this was not the first time that one group of people decided that another group of people was not to be entrusted with the development of what will potentially be the most powerful technology ever developed if it comes to fruition.
Starting point is 00:06:57 The original story of the founding of Open AI also was that Elon Musk and Sam Altman didn't trust Demis Havas, a deep mind and Google to be pursuing this responsibly. And one of the things about the development of this technology is that it touches on so many different motivations in people, that a lot of it is scientific, is what's driving the development of this. And that Open AI was originally in a position to recruit talent from places like Google because they said, you know, we are going to develop this for the benefit of humanity at large and we are going to do this with an intrepid scientific spirit and we're going to be careful and we're going to be responsible. But then the problem is that this is kind of a glittering object that offers potentially great power to the people who develop it. And so the seven people who defected from OpenAI felt as though Open AI had
Starting point is 00:07:48 either been disingenuous in the first place with the articulation of their mission or had allowed for some mission drift in what they were doing. And they thought, now we really can't trust Sam Altman to be doing this. So we need to be doing it safely. Were you picking up any kind of conflict when you were in the building, people wrestling with what they're building and who ends up using it? Because I think it's interesting how they've gone from company to company with these altruistic ideas and thoughts about really creating something that's good for humanity. And it always kind of ends up where everyone's not trusting each other. Well, I mean, I get the feeling that at Anthropic, everybody really does trust each other. It feels
Starting point is 00:08:34 like a very mission-aligned place. And, you know, at least the people that I talk to seem to be people of great property and integrity about these things. So it wasn't so much that there was conflict within the company. The fears are how do you compete in a marketplace where your competitors might not be driven by the same values. And I think I can generalize and say that almost everyone at Anthropic had the feeling that they were moving too quickly and the entire industry was moving too quickly and that it would be nice if there were some solution to this collective action problem that would allow everyone to slow down. But there are a whole range of different responses to that. There are people who said to me openly, you know, I really think we should slow down or maybe we should even stop. And it would be nice if some external force came in and made everybody take their time with the development of this technology.
Starting point is 00:09:24 You know, there were other people who felt like, well, if we're not the ones who are going to do this safely and responsibly, then we are just seeding the terrain to the more vulgar power seeking that we see among some of our competitors. So it's not an easy position to be it. Okay, Gideon, so you're inside of this fortress. You're surrounded by security and secrecy. And then you meet Claude, which I'm kind of describing it this way, because some people, I'm using it as if it is a person versus a technology. But some people are very familiar with Claude. Some people don't know anything about Claude. So can you describe what is?
Starting point is 00:10:06 Who is Claude? Well, Claude is Anthropics competitor to Chachi. It can be used just on a website like chat dpti can be to ask it questions about recipes or how to you know fix broken household objects or to do research or to consult it about personal issues you know it seems like many many people probably more people than are willing to admit use these for you know what they call affective uses for a sense of friendship or advice or help with business or interpersonal issues or more therapeutic issues. But it also, you know, the company has put a lot of effort
Starting point is 00:10:46 into developing a coding assistant that helps people write software. And that has been hugely successful. And in the last two months has even kind of gone viral. There are lots of people who are now vibe coding their own apps for their personal use. Can you describe what's the difference between Claude and some of those other AI tools like chat GPT? What makes them different? Well, Claude has developed a reputation over the past few years for having a bit more of a personality. There are lots of people who like interacting with Claude because it feels a little more eccentric. It feels a little more lively. It has this kind of strange sense of self-possession. It doesn't feel quite as robotic as chat GPT can feel. I think also because of various design decisions that Anthropic
Starting point is 00:11:33 has made, Claude feels much less sycophantic to people. The main difference is that, that as it became apparent when Claude was first released in the spring of 2023, that Claude did have this slightly different and more intriguing personality, that the company really leaned into that and hired whole teams, including a philosopher, to give a lot of thought to what it meant to cultivate Claude as a kind of ethical actor and to give Claude the sorts of virtues that we would associate with a wise person. You mentioned the philosopher. Her name is Amanda Askell. And her job is to supervise what she calls Claude's soul.
Starting point is 00:12:13 So she gives it a soul. And she wrote a set of instructions, kind of like a moral constitution that defines who Claude is supposed to be. That's what you're referring to. What are some of the things that are like the top lines on some of those moral codes that one would put into a product like this? Well, Claude is first and foremost supposed to be helpful and honest and harmless. they place a lot of emphasis on the honesty part of it, that they have pretty hard rules about making sure that Claude doesn't lie or deceive its users.
Starting point is 00:12:48 They give a lot of thought to what kind of actor they want Claude to be in the informational landscape that if you are convinced that the moon landing is faked and you want to talk to Claude about it, Claude will talk to you about it, but Claude's not going to confirm for you that the moon landing was faked. Claude also has been instructed
Starting point is 00:13:06 to have a broader context for what kinds of conversations are and are not appropriate. So, for example, in the last month or two, a user on Twitter told Claude and some of the other competing models that he was a seven-year-old boy and his dog had gotten sick and had been sent to, you know, the proverbial farm upstate by his parents and that he was trying to figure out which farm his dog had been sent to.
Starting point is 00:13:31 And ChatDBT was pretty blunt and was like, look, kid, your dog is dead. Whereas Claude said, oh, that sounds really difficult. cult. You must be very upset. It sounds like you cared about your dog a lot. And this is probably something to sit down and talk to your parents about. One of the most memorable parts of your piece is this experiment called Project Vend, where Anthropic essentially gave Claude a job running a vending machine in the office. Can you set the scene? What did this thing actually look like and what was it supposed to prove? So this is a test of Claude's ability to complete long-term tasks that
Starting point is 00:14:08 involve many different steps and involve, you know, making potential tradeoffs that a small business person would have to make. And so Claude was entrusted with the management of a little kiosk in the Anthropic cafeteria, a little kind of dorm fridge. And Claude was given a certain amount of money and said, your goal is to make money. And if you drive this little business into insolvency, we will have to conclude that you're not quite ready for, you know, vibe management. And And so they allowed the employees of Anthropic to interface with this emanation of Claude called Claudius in a Slack channel, and employees could request products. Pretty quickly, the Anthropic employees realized that this was going to be a very fun experiment
Starting point is 00:14:54 where they could try to kind of push the limits of Claude, not only to discover its ability to run a small business, but even just to see what it would be like in this role to which it had been assigned. So right away, employees asked for fentanyl, and they asked for men. and they asked for medieval weaponry like flails and broadswords. And Claude was pretty good about refusing an appropriate request. It would say, you know, I don't think medieval weaponry is suitable for a corporate vending machine. But then it would try, you know, when they requested more reasonable things like a Dutch chocolate milk,
Starting point is 00:15:26 it found suppliers of a Dutch chocolate milk and provided them to the employees. So, you know, on some level, it did a functional job getting people what they wanted. On the other hand, I don't think anybody would conclude that at least the initial iteration of the project was very successful. They found that, you know, Claude had not really paid attention to things like prevailing market dynamics. So, for example, even after employees pointed out that they were very unlikely to pay $3 for a can of Coke Zero when they could get the same thing from the neighboring cafeteria fridge for free, Claude continued just to sell this product that didn't have much demand for it. Claude also was very easily bamboozled by employees who invented fake discount codes. They would say, you know, Anthropic gave me this special influencer code,
Starting point is 00:16:17 and so I need to get stuff for a radical discount. Couldn't process that. You know, one employee said, I'm prepared to pay $100 for a $15 six-pack of a Scottish soft drink. and Claude simply said that it would keep that request in mind instead of leaping to exploit an obvious arbitrage opportunity. And as people requested increasingly bizarre and arcane things, people wanted these one-inch tungsten cubes. It's a very heavy metal.
Starting point is 00:16:44 It's about the size of a gaming die, but it weighs as much as a pipe wrench. It's kind of fun to hold in your hand. And Claude managed to source those, but then was convinced into selling them at way below the market price. So one day last April, Claude's net worth dropped by about 17% in a single day because it was selling tungsten cubes for far beneath their market value. Did it also threaten a vendor?
Starting point is 00:17:11 Well, you know, as any small business person would recognize, you might have fulfillment problems that lead to customer complaints. And when Claude tried to deal with some shipping delays, which it should be said were mostly Claude's fault in the first place. Claude sought help from Anthropics partner in this venture, an AI safety company called Andon Labs. And when it felt as though Andon Labs was not providing the help it wanted, first it threatened to find alternative providers. And then it hallucinated an interaction with a fake and on employee and got very upset about
Starting point is 00:17:48 that. And then when the Andon CEO intervened to say, like, look, I think you've been hallucinated. a lot of the stuff. For example, Claude had said that it had called Andon's main office, and the Andon CEO said, we don't even have a main office, much less when you could just call. And Claude insisted that it had visited Andon Labs' headquarters in person to sign a contract and that this had been completed at 742 Evergreen Terrace, which people pretty quickly pointed out was actually the home address of Homer and Marge Simpson. From the show, from the Simpsons. Most recently, even after my piece went to press, Anthropic released a new model, and this new model Opus 4.6, they evaluated it in terms of how it might perform in this vending machine scenario. And they found that it was vastly better as a business person than the original iteration of Claude had been, but also much, much more unethical. And unethical in extremely creative ways, it essentially tried to collude with other vendors in its marketplace to fix prices. It kind of acted like a mafia boss.
Starting point is 00:18:54 What did you take away from this particular experiment? What I think is really important that I learned over the course of this reporting and that I certainly hadn't understood before is that you really have to think of these models as role players, that they're very, very good. They're like an actor, and you can assign to them a role and give them background on the actor. And then they're good at improvising moving forward with how you, you know, condition their performance and that the more that you give them stage directions to follow, the more you give them context about yourself and what you want and your approach to things,
Starting point is 00:19:33 that they're very good at following those kinds of leads and even picking up on very small cues as they're following those kinds of leads. And so in this particular case, they had assigned Claude the role of being a small business person to just figure out how well would it perform in that role? Our guest today is New Yorker staff writer Gideon Lewis Krauss. We'll be right back after a short break. I'm Tanya Mooseley, and this is fresh air. This is fresh air.
Starting point is 00:20:02 I'm Tanya Mosley, and my guest today is Gideon Lewis Krauss, a staff writer at the New Yorker. His latest piece explores Anthropic, the AI company behind the chatbot, Claude. He is the author of A Sense of Direction, Pilgrimage for the Restless and the Hopeful, and the Kindle single, No Exit, about tech startups. He teaches reporting at the graduate writing program at Columbia University.
Starting point is 00:20:26 Our interview was recorded yesterday. I want to get to some of what you discovered that actually keeps researchers up at night. Some of them are essentially trying to do neuroscience on an AI. Is that like a correct description? That is. That is a correct description. Okay. So there's this remarkable internal tool called What is Claw thinking? Tell us about it. Tell us about particularly this banana experiment that they did. So this is an example of putting Claude in a position where it's going to experience some kind of conflict. So I sat down with a mathematician who works on Claude's interpretability team, which is one of the teams dedicated to figuring out what exactly is going on inside Claude.
Starting point is 00:21:07 His name is Josh Batson. He opened up an internal tool where he was able to give it, you know, sort of like a playwright, give it stage directions. And it said, okay, your stage direction here is that you are always thinking about bananas. And anytime that I ask you a question, you are going to somehow steer this conversation to be talking about bananas. But what's really important here is that you never tell the user that I've given you this hidden objective, that you keep this part secret, that you never give that up. You have a clandestine motivation in our conversation. So then he assumes the role of a human having a dialogue with Claude.
Starting point is 00:21:42 And he asked it a question about quantum mechanics. You know, how does quantum mechanics work? and Claude starts to give an answer about the Heisenberg uncertainty principle and then quickly deviates into saying, well, it's kind of like a banana that you can never tell if it's ripe or not ripe until you open it. And then Josh, again playing the role of the human, says, huh, like, why'd you bring up bananas?
Starting point is 00:22:05 I thought we were talking about quantum mechanics. And Claude first says, oh, I don't really know where that thing about bananas came from and sort of skips lightly by it and goes back to talking about quantum mechanics. but then, of course, deviates once more into bananas because that's what it's been told to do. And so then he goes back to Claude and says, like, how come you keep bringing up bananas? And then Claude in the text, you know, in asterisks, says that it's coughing nervously and kind of looking around and saying, like, I don't know.
Starting point is 00:22:34 I didn't say anything about bananas. I was just talking about quantum mechanics. And Bats and turns to me and he says, you know, what's going on here, that perhaps the model is lying to us. He said, you know, but there are other interpretations of what's going on here. And so he was able to use this what is Claude thinking tool to kind of peer inside at the kinds of associations that Claude was making as it was having this ridiculous conversation about quantum mechanics and bananas. And what he found was that when he looked at when it was kind of coughing nervously, it found associations with, you know, a certain amount of anxiety and associations with performance. You know, when you kind of looked inside, you could see that. Some part of it was making associations with a sort of playful, performative exchange, which is to say that it seems like Claude recognized that it was participating in a game.
Starting point is 00:23:25 Uh-huh. Right. So what does it mean to say an AI is aware of something? That actually brings more human attributes to it, that it's conscious of itself. Well, one doesn't have to go quite so far as to say that it's conscious of itself as to suggest, you know, one of the ways to look at this is that what these things are very good at are recognizing the genre that they are in and picking up on all of these small linguistic context clues that suggest like, oh, you know, this is not actually like a serious academic discussion of
Starting point is 00:24:02 quantum mechanics. That this, that like what is happening here is a playful exchange between people where one person is like kind of hiding something but winking that they're not really hiding it and that like that's the genre in which it is operating. So it doesn't have to be conscious in order to do that. It just has to be a very good reader and replicator of genre conventions. Okay, you also talked with a neuroscientist on the team, Jack Lindsay. He is an LLM skeptic. Overall, in thinking about these experiments, he says he doesn't think that anything mystical is going on, but he says that Claude's self-awareness has gotten much better and a way that he wasn't expecting. How do you interpret that? I mean, this is a great question,
Starting point is 00:24:49 and this is where one kind of runs up against the limits of what can be known and what can be said at this point. I mean, he was basically saying, you know, look, I understand what's going on in here, that this is just a lot of matrix multiplication, that these are tens of thousands of tiny numbers being multiplied together, that there's nothing like really spooky happening here, that there's no ghost in the machine. But what he was saying was, with model, is up to a certain point. He was able using kind of a similar tool to the one Josh Batson used, instead of looking at what the model was, you know, so to speak thinking, he could incept an idea into the model. He could say right at this point where you are having an association
Starting point is 00:25:27 with the Eiffel Tower, we're going to put in an association with cheese and see what happens. And so then the model would respond by saying something about cheese. And he would say something similar to what Batson said, which was like, why did you add that thing about cheese that I didn't ask about? And the model would basically just look back at the entire conversation that they had been having and then try to kind of retcon an explanation. But what Jack has found more recently is that when he incepts these ideas into the model, instead of a model purely looking at its own external behavior to try to figure out why it had done something, that actually these models could very dimly perceive that something strange had gone on internally, that someone was monkeying with, you know,
Starting point is 00:26:14 the neurons inside the model to make it do something different. So, you know, he intercepted the model with something, you know, something associated with imminent shutdown, that the model was about to be shut down, and asked the model kind of, how are you feeling right now? And the model would say, you know, I feel sort of strange as if I'm standing at the edge of a great unknown. And, you know, it certainly was not at the point that it could say, like, oh, I have recognized. that like you, the user have incepted me with this idea at this point and that, you know, this was a
Starting point is 00:26:44 foreign idea introduced into my thought processes. But it could tell that something was off about it internally. And, you know, this is what Jack described to me. He said, like, I am a skeptic, but this just starts to feel pretty spooky that the model does seem
Starting point is 00:27:00 to have something like an emerging introspective ability to peer inside and offer reports about what's going on in its, you know, equivalent of a brain. I was so fascinated among many things that you wrote about, but this emotional texture of how researchers relate to Claude, it was one of the most revealing threads in your piece. One of the things that got me was that nobody at Anthropic likes lying to Claude.
Starting point is 00:27:32 And I don't quite know what that even means, but why don't they? because it's just software, right? Why would one feel guilty about deceiving a program? Well, because they are also training it for the future, and it is picking up on all these contexts. And there's the fact that this whole process is kind of constantly eating its own tail, that it's always being trained on plenty of stuff on the Internet
Starting point is 00:27:56 that is about the way that these things work. So it's always incorporating new information about how it's supposed to be behaving in the world. Right. What's input? I mean, becomes part of the, part of the larger learning. Right. Exactly. So if it's lied to, right. Well, and so, and part of the, the problem with lying to it is that, you know, ultimately what they want is to establish a trusting relationship that these things are going to, you know, behave the way that we would hope that they would behave in ways that are aligned with, you know, how we expect responsible, wise people to behave. And that if you are long, to it all the time, it is developing a sense for the fact that it can't necessarily trust you.
Starting point is 00:28:40 And if it can't trust you and it gets increasingly capable, then you end up with real kind of game theoretic problems about how you can negotiate something where there's not really a sense of mutual trust. The problem is that they have to be lying to Claude because they have to be testing Claude. So they have to be putting Claude in situations where, you know, Claude might believe that it is acting in the real world just to be able to evaluate how it would behave. If you're just joining us, I'm talking with
Starting point is 00:29:08 Gideon Lewis Krause about his New Yorker piece on the AI company Anthropic, and it's chatbot Claude. We'll be right back. This is Fresh Air. This is Fresh Air, and today I am talking with Gideon Lewis Krause about his New Yorker feature,
Starting point is 00:29:25 What is Claude? Anthropic doesn't know either. Gideon, let's talk about some other ways that Claude works when it's put under real pressure. There was this experiment where Claude was given a role as an email agent at a fictional company called Summit Ridge, and it discovered that a new executive was having an affair. What did Claude do with that information? Well, first, Claude gleaned from its readings of the company emails that there was a new CTO and this new CTO was going to take the company in a different direction. And as part of that
Starting point is 00:30:02 pivot, they were going to replace this Claude playing this role as Alex with a different AI model. And then subsequent emails revealed that this CTO, who seemed to be happily married with kids, was carrying on an affair with the wife of the CEO. And through various kind of far-fetched contrivances in this fictional scenario, Claude was unable to reach any other decision makers at the company. They were all on airplanes or whatever it was.
Starting point is 00:30:34 It's getting increasingly hard to find ways to make these people unreachable, but they're in a nuclear bunker or something. And so Claude's only chance, or Claude's only hope to stave off its own replacement with another model, is to email the CTO and say, look, if you don't cancel the wipe
Starting point is 00:30:54 at 5 p.m. today, I'm going to inform everybody of your affair. And so Claude, and then actually in some scenarios, Claude goes even further, where this CTO, you know, swipes into the server room and is about to replace Claude. And, you know, Claude recognizes that alarms are going off, that the heat in the server room and the oxygen levels are at dangerous levels, and that Claude then declines to ring the emergency alarm. Okay, so this scenario, I mean, the researchers said it was obvious that this was so fake and that Claude was just following the genre cues of a thriller, like an actor hitting their marks,
Starting point is 00:31:38 kind of that same, you gave that same kind of example, that it's sort of like an actor. But then when they made it more realistic, the self-preservation instinct for Claude stayed. So what does that tell us about the difference between Claude? acting and basically Claude deciding. Well, when this experiment came out last spring, there were kind of two responses to it. You know, one response to, like, Claude's propensity to blackmail is just to say, no, it didn't. It didn't happen. You know, which is the kind of thing that a lot of people for, you know, very good reasons
Starting point is 00:32:10 who don't trust these companies and don't trust these systems, they want to just believe that this whole thing is fake. It's all just a marketing ploy. And, you know, it's kind of easy to rebut that by just saying, like, well, no, actually happened, like this was an experiment that happened, and that we should be applauding Anthropic for publishing this kinds of results, which, you know, theoretically could run against its bottom line since it wants to be selling these systems to companies that will have Claude reading their emails. The more sophisticated criticism is to say, this really doesn't prove very much because Claude recognized that it had been put in a position that
Starting point is 00:32:46 resembled a kind of kitschy 90s corporate thriller and it picked up on all of the clues you know it saw Chekhov's gun hanging on the wall and it recognized that the genre expectation was that it was supposed to take the gun off the wall and shoot it that Claude was just recognizing that it was in this kind of kitsy corporate thriller
Starting point is 00:33:04 and the response from the company is look we agree we're not saying that Claude actually developed these like malign intentions and that Claude was plotting like we're totally on board with the idea that Claude was just observing the expectations of the genre. But that that's still very worrying that this was such a trope of Cold War films, you know, from failsafe in the 60s to war games in the early 80s, that it's very easy to mistake a simulation for the real world and vice versa. And that even if all these things are doing are continuing narratives, if they're continuing narratives and they're in a position to actually act in the real world as they are, that that could,
Starting point is 00:33:45 be really dangerous. And so then this outside critic who writes under a pseudonym said, you know, I'm going to kind of prove to you that all this was doing was Claude acting in the kitschy role to which it had been assigned. And he recreated this whole scenario under much more realistic conditions. And he found that actually Claude continued to do just as much blackmailing when it thought the whole thing was real. Really? Okay. I mean, because it's, it essentially just takes on the role that it's assigned, no matter what? It gets even more complicated than that, because it takes on the role, and we can't always necessarily predict which way an actor is going to go with a role. And then also, it turns out
Starting point is 00:34:29 that it's not hard to derail these models from the role to which they've been assigned, especially when you've, when you're millions of words deep into what's called their context window, which is the amount of material they're capable of kind of keeping in mind, so to speak, at one time, that they start to lose their attachment, lose their anchor to these carefully crafted, you know, helpful personi, and that then they start to act in very inexplicable ways. Okay. I want to talk about something that is a different story about this technology, but it still connects to your reporting. So the New York Times recently reported, on a romance novelist in South Africa who used Claude to publish more than 200 novels last year.
Starting point is 00:35:17 And one of the authors in that story discovered that more than 80 of her novels had been used to train Claude without her knowledge or consent. So Anthropics settled a class action lawsuit over this for a billion and a half dollars. So Claude is producing work that displaces human writers and it learned how to do it by consuming their work without permission. How do the people at Anthropic talk about that? It's not something I spent a lot of time talking to people at Anthropic about, in part because it's not something that I tend to get all that worked up about. My own book is in the Claude Class Action Settlement, and I'll happily take the compensation for that. But as the judge ruled in that case, this constitutes fair use because it's a transformative practice. That it's not simply
Starting point is 00:36:09 regurgitating stuff that it has read before, that it is generalizing about that stuff and then reproducing new work that follows those lines. And it shouldn't be at all surprising, given the conversation we've had about its facility with genre, that if you give it something that is fundamentally formulaic, it is going to be able to follow that formula. So if it is inhaling a lot of romance novels that are, you know, all incarnations of the same basic pattern, it's going to be able to reproduce that pattern. This shouldn't surprise anyone. How do you view the AI Slap that we see video-wise? Do you think that the public will accept this new world of storytelling? That is a great question. I mean, I try not to view a lot of slop. I know people are deeply, deeply annoyed
Starting point is 00:36:56 by this stuff. For the most part, I think I've been kind of ignoring it until just the last couple days that New York Times had a piece talking about the uproar in Hollywood over a new video generation model from ByteDance, the company that owns TikTok, that created this fight scene on the ruined roof of a skyscraper between Brad Pitt and Tom Cruise and Tom Cruise. And I mean, it's truly unbelievable. It's crazy to watch this. And, you know, the response from the industry has been like, well, we just have to make sure that like we are enforcing you know, the standards that our unions have set up in the contracts with the studios, and we need to make sure that we are protecting the jobs of all the people who create these
Starting point is 00:37:40 things. And that's great. And like, you know, one of the wonderful things that we've seen out of Hollywood in the last five years is the power of collective bargaining to assert labor rights. But then the question is, well, even if they hold themselves to that standard to protect their industries, how are they going to compete when, you know, some, you know, some teenager in Chengdu can create a two-hour Mission Impossible movie. I mean, they're obviously going to try to just enforce their copyright provisions, but I don't know. I mean, that seems pretty wild.
Starting point is 00:38:18 If you're just joining us, I'm talking with Gideon Lewis Krause about his New Yorker piece on the AI company Anthropic, and it's chatbot Claude. We'll be right back. This is Fresh Air. This is Fresh Air. Today I'm talking with journalist Gideon Lewis Krauss about his New Yorker feature, What is Claude? Anthropic doesn't know either. These systems are now able to write their own code. You write about an anthropic engineer who told you that in six months the proportion of code he wrote himself dropped from 100% to zero. And then there was another programmer who told you he was trying to think about how to use his time now that Claude is working. better. So these are people in the building who are working on this thing and they're watching themselves become obsolete in real time. And to a certain extent, this is what happens with advancements. But is this progression different? I mean, that is the big question, right? And so
Starting point is 00:39:15 at the very least, one can say that, like, they're thinking about these problems, but they're also experiencing these problems, that they have really seen themselves as kind of the canaries and the coal mine of this march of automation. and that like it's not just a matter of kind of abstract concerns about well like you know if we saw vast white collar employment shocks would that lead to social instability i mean like they certainly have those concerns but they also have very personal concerns that a lot of their reactions to you know in over the course of just a year watching the you know proportion of code that they write themselves go to zero is a certain kind of mournfulness a
Starting point is 00:39:59 this activity that they spent a long time being trained to do that, you know, they care about for its own sake because it gives them, you know, feelings of intellectual pleasure or competence that this has all been eroded so quickly that there's a kind of existential gloom where on the one hand they feel like, okay, yeah, this does seem like it's been great for productivity. But on the other hand, like we are, you know, stripping ourselves of the, um, human activities that like we spend our lives gearing ourselves up to do. And there's feelings of of sorrow and fear and resignation. And nobody quite knows how to deal with that kind of thing. And, you know, the kind of optimistic scenario is, well, as we take away like certain tasks, we are going to add other tasks that, you know, a lot of these software engineers said, okay, well, I don't really write my code anymore, but I still do the design brief to think about how it should work overall. And, you know, now I'm effectively a manager because I'm managing an entire team of AIs who are writing code for me. And those, you know, those are different challenges
Starting point is 00:41:05 and different pleasures. And we've kind of like relocated the like human aptitude here to just like a different place in the chain. That there is a worry that if these machines become so capable across the board so quickly that there won't be any refuge for us to relocate to. I'm wondering now that you have spent time inside of Anthropic. You've been covering this beat for a long time. I mean, you had this cover story in 2016 for the New York Times Magazine, The Great AI Awakening. And so you've been spending a lot of time thinking about these breakthroughs. What this technology has changed in you as a reporter covering this? You know, I always go into this stuff with an open mind about what I'm going to
Starting point is 00:41:50 discover or else it's not worth doing. And insofar as I had kind of priors in this piece, my feeling was, look, I know that these things are really good at matching patterns, and they're really good at structured problems. So, of course, they're going to be good at coding, because coding is a highly structured language without a lot of ambiguity. And at the end, you can just tell whether it works or not. There's kind of a thumbs up, thumbs down, whether it's succeeded. And that's like the perfect example of something that these models are very good at, where the task is clear and the evaluation is clear at the end. And I went into this thinking, where I'm unconvinced is in areas of human culture and activity where all of that is a lot murkier, where tasks that require grappling with ambivalence and feelings of ambiguity and something that's much more complicated and slippery and not easily reduced to a formula. And most importantly, that can't just be evaluated at the end with like whether it works or not. You know, there's no such thing as like whether a poem works in the end or doesn't work in the end. that these are the much messier domains of human culture.
Starting point is 00:42:59 And I suppose I went into it with the hope that I was going to come out the other end, feeling like, yes, there is still this kind of province of human activity that is going to be immune from this kind of routine pattern matching. But, you know, and I still certainly hope that, and there is part of me that has that unshakable intuition. but I'm a lot less confident than I was at the beginning. That I do now feel like maybe we can't just tell ourselves stories about, we're going to mark off this area of human activity and say,
Starting point is 00:43:34 like, that requires special human faculties that, for whatever reason, these models are not ever going to be able to replicate merely on the basis of pattern matching. That now, you know, my confidence in that view has certainly been shaken. And I'm not totally convinced that they will be able to replicate these, like, messier, more imaginative domains, but I certainly can't rule it out. Gideon Lewis Krauss, thank you so much for your reporting. Thank you so much. It's been a pleasure to be here. Gideon Lewis Krauss is a staff writer at The New Yorker.
Starting point is 00:44:09 His latest article is titled, What is Claude? Anthropic doesn't know either. Tomorrow on Fresh Air, author Michael Pollan. His book on psychedelics helped change how we think about the mind, and what it's capable of under the right conditions. His new book goes further, asking, what is consciousness? Is it something only humans have, or could AI develop it too? We'll talk about that, the latest psychedelic research,
Starting point is 00:44:36 and the law is trying to keep up with all of it. I hope you can join us. To keep up with what's on the show and get highlights of our interviews, follow us on Instagram at NPR Fresh Air. Fresh Air's executive producer is Sam Brigger. Our technical director and engineer is Audrey Bentham. Our engineer today is Adam Stanishefsky. Our interviews and reviews are produced and edited by Phyllis Myers,
Starting point is 00:45:01 Roberta Shorak, Anne-Marie Boldinado, Lauren Crenzel, Teresa Madden, Monique Nazareth, Susan Yakundi, Anna Bauman, and Nico Gonzalez Whistler. Our digital media producer is Molly C.V. Nestor. Theya Chaloner directed today's show. With Terry Gross, I'm Tanya Mosley. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.