Search Engine - Mysteries of Claude

Episode Date: February 27, 2026

Anthropic hired philosophers to teach its AI to be good. In their tests, the AI blackmailed a human to keep itself alive. Writer Gideon Lewis-Kraus went inside the company to figure out what's going o...n with Claude, and whether anyone can actually control it.  Read Gideon's story here Support Search Engine! To learn more about listener data and our privacy practices visit: https://www.audacyinc.com/privacy-policy Learn more about your ad choices. Visit https://podcastchoices.com/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Search Engine. I'm PJ Vote. No question too big, no question too small. This week, mysteries of a chatbot. Quick note before we start today, this week's episode is almost entirely about Anthropic, the AI company that makes Claw. They have advertised on our show. As with all companies that advertise on our show, they do not get a say in our editorial content. Okay, after these ads, the show. This episode of Search Engine is brought to you in part by Serval AI. If you ever worked with an IT team, you know how quickly their day gets eaten up by repetitive tickets, password resets, access requests, onboarding. It all adds up. And as your company grows, those requests just keep piling higher,
Starting point is 00:00:40 pulling your team away from the work that actually moves the business forward. That's where Serval comes in. Serval can cut up to 80% of your help desk tickets, and it's not just another tool layering on AI as an afterthought. While Legacy Platforms bolt AI on, Serval was built for AI agents from the ground up. Here's what that looks like. Instead of a new hire onboarding taking hours or even days, a manager just drops a request in Slack, and Serval handles everything instantly. No back and forth, no bottlenecks, Serval even writes automations in seconds. Serval powers the latest growing companies in the world, like perplexity, Mercore,
Starting point is 00:01:15 Verkata, and Clay. Get your team out of the help desk and back to the work they enjoy. Book your free pilot at serval.com slash search. That's s-e-r-v-a-l-com slash search. This episode of Search Engine is brought to you in part by Vanguard. To all the financial advisors out there whose job is to help your clients keep more of what they earn, Vanguard is here to help you with that. Vanguard is slashing fees again, this time for more than 50 of its funds. That's on top of big fee cuts they gave last year to investors in 87 of their funds. In an increasingly high-priced world, Vanguard is staying true to excellence without expense. With Vanguard, your clients get access to sophisticated, active, and index bond funds at industry-leading
Starting point is 00:01:59 low costs, backed by a fixed-income team that's truly obsessed with consistent outperformance. Lower fees don't just mean savings. They give Vanguard's skilled bond managers more freedom to maneuver as they pursue strong results. And they give you more flexibility to deliver measurable value to your clients because top performance shouldn't come at higher cost. Go see the record for yourself at vanguard.com slash impact. That's vanguard.com slash impact. All investing is subject to risk, Vanguard Marketing Corporation distributor. Welcome to Search Engine.
Starting point is 00:02:58 I'm PJ Vote. No question too big. No question too small. I found myself feeling much stranger about AI in the past month or so. I use the tools. I use the tools a lot. But I'm probably each company's worst nightmare as a customer. And that as soon as I hear from anybody that one model has inched ahead of another,
Starting point is 00:03:18 that this version of chat GBT is beating that version of Gemini, I immediately canceled my subscription and switch. For the past two months, I've mainly been using Claude, Anthropics agent. And for whatever reason, Claude is just giving me more future nausea than I was having six months ago. Part of the general tech excitement around Claude lately has been Anthropics product, Claude Code, a tool that lets the AI agent autonomously write and edit code. Over at the New York Times, Kevin Roos has talked a lot about the websites and apps he's quickly built with ClaudeCode.
Starting point is 00:03:49 Two CNBC reporters as an experiment vibe-coded a competing version of a popular organizational app called Monday.com. Within a couple of days, Monday's stock price had tanked. For me, though, most of the future shock has just come from using the LLMs the way I'm used to. I find myself going to Claude as a useful first stop, the way I've always used the internet. But the quality of its research, its answers, even its writing, I'm just starting to feel like I can see not too far off. If not my own obsolescence, at least real significant change in my field. I don't know how to feel about that. I find a lot of the tech coverage of AI to be high opinion, low information, and relatively unhelpful.
Starting point is 00:04:30 I'm not even asking for anyone to tell me the future right now. I would just settle for a better understanding of the present. Which is why this week, I wanted to talk to a reporter who's been digging into this. Hello. Hey. Can you introduce yourself? I'm Gideon Lewis Krauss. I'm a writer.
Starting point is 00:04:51 Gideon is a writer who I particularly enjoy. He's been on our show before. He spent much of the last year, essentially embedding within Anthropic, the company that makes Claude, the tool that was giving me the hebi-jeebies. People there had been very open with him. He got a view on how they're seeing what's going on, their understanding of a present, which frankly, they also sound mystified by. This conversation took place right before Anthropics' big showdown this week with the Pentagon, so we did not discuss that specifically, but I did find Gideon's view inside the company and its mission extremely helpful in understanding how they'd
Starting point is 00:05:25 gotten into this fight with the U.S. government at all, since none of their competitors have ended up in that position. So to start, I asked it in to even just explain why Anthropic had let him into their company in the first place. So to kind of go back to the beginning of this, which, like, I think makes it all make a little more sense in context. So now, almost 10 years ago, when I was at the Times Magazine, back in kind of like the Paleolithic of Deep Learning, I did this story about Google brain and about the implementation of deep learning in like the first consumer product, which was when they switched over their Google translate to neural machine translation. Why were you paying attention to it? Because I remember as a person who, like, I think we
Starting point is 00:06:04 both cover technology but we're not strictly technology journalists, so you can kind of decide which things on the horizon are interesting to you. Machine learning was not interesting to me for a long time. Why 10 years ago were you interested in this? I was interested in it as like a story about ideas, that, like, there were these ideas about language and about learning and about consciousness and about, like, philosophy of mind that had been around for at least 70 years, kind of depending on how you count. And without, like, getting into those, there was just, like, an interesting story for me about, like, the trajectory of an idea there. Gideon cared about AI a decade before most people did, because he thought this synthetic facsimile of our brains
Starting point is 00:06:47 could teach us something about our own real ones. He'd been following the trajectory of conversations like, what is a brain versus a mind? What is thinking? What is consciousness? By the 1950s, the arrival of the first computers had encouraged people to start asking questions like that. Because a computer did something like thinking,
Starting point is 00:07:06 but also clearly wasn't a brain. And so early computers had prompted people to try to develop better definitions of things like intelligence and consciousness. The thing was, though, while computers were interesting enough to raise those questions, they weren't yet complex enough to be much help in answering them. And so by the 1970s, philosophers and computer scientists had mostly moved on. And those questions migrated to psychology departments, who still, for obvious reasons,
Starting point is 00:07:33 wanted to better understand the human mind. But with early machine learning advancements around 2014, Gideon, who's always thinking about thinking, thought that these conversations would move again, that computers would now be advanced enough to challenge our definitions, to force us to decide with more urgency what we thought consciousness and learning really were. And that was what it excited him, even when AI was a much more nascent technology.
Starting point is 00:07:58 So I paid attention to AI in the rise of language models. And I think I'm like the only person in the world who the minute chat CBT came out was when I kind of stopped paying attention. Because like to me, that was when the public discourse felt like really broken and we were like in this cul-de-sac where you had these kind of like two really entrenched sides yelling at each other.
Starting point is 00:08:21 You know, like the one side that's like, we're on a path of super intelligence, everything is going to change, the machines are going to be conscious, this is going to be the most powerful technology anybody's ever built. And then the other side that was like, essentially it's all fake and bullshit.
Starting point is 00:08:33 This is like smoke and mirrors, it's a parlor trick, it's not real, and you don't have to pay attention to it because it's all a scam. And it just felt like those were kind of like the two options on the table for people.
Starting point is 00:08:46 Which was only weird. Obviously, like, that's what we do about everything all the time. But it was only weird for this because, like, my prevailing feeling was, you guys think you've figured this out?
Starting point is 00:08:55 Like, this is very new. This is changing very fast. Of all the stances you could take, why would you choose certainty publicly right now in either direction? It's just silly. Yeah, no, exactly. But it's so funny,
Starting point is 00:09:06 so you're thinking about thinking computers and thinking and artificial intelligence and deep learning up until chat, GPDG. Up until chat, And that was when I stopped thinking about it.
Starting point is 00:09:17 But then finally, like last fall, like maybe a year and a half ago, two things started to happen. One was that they got to the point where, like, I was like, oh, actually now, like, they're useful. These have gotten to a level of sophistication where, like, I can use them in productive ways. Not a lot, but like a little bit. And the other thing was some of the research coming out of the labs and out of academia was really weird. If you tell the model it's going to be shut off, for example, it has extreme reactions. We're starting to see AI systems
Starting point is 00:09:47 that don't want to be shut down, that are resisting being shut down. With published research saying it could blackmail, the engineer that's going to shut it off, if given the opportunity to do so. Even when ordered, allow yourself to shut down, the AI still disobeyed 7% of the time. So my feeling was,
Starting point is 00:10:06 we were out way past where theory was. You couldn't really approach these questions from a theoretical perspective because we just didn't have enough data to be able to make like categorical, theoretical assessments of what was going on. But there was all this interesting experimental work happening
Starting point is 00:10:23 that was just showing like this is the kind of behavior that's coming out of these things. Like we should try to figure out what's going on to say like here are the things we can say with any degree of reasonable confidence for now. And like here's where we draw the line and beyond that,
Starting point is 00:10:38 it's all murky and speculative. And like we really don't know. So I wrote to a guy at Anthropic, whom I had met 10 years ago at Google, when he was like 11 years old prodigy, and said, like, this is not about Anthropic. Like, you know, don't call the cops. I just want to talk about like the state of the research and figure out a way, like, you know, is there in like an academic team that I could follow? Because I just assumed, like, Anthropic was never going to let me have the kind of access I would have wanted. And he, of course, just like, forwarded my email to the PR cops. And then it turns out actually like Anthropics PR people are very candid and like very open.
Starting point is 00:11:17 And I got a call from them and they were like, well, what, like, what are you interested in? And I was like, okay, for these purposes, what I am interested in is a story that gets it some of the technical explanation that I think is missing from a lot of the public discourse. That like there are just some like basic things that I really just don't understand and I can kind of assume most people don't really understand about like how. these work. So I think part of the reason why they ended up being much more welcoming than I expected is because I said, like, I don't really care about talking to the executives. I don't really want to talk about geopolitics. I don't really want to talk about the future or power or energy or the labor market or like all of these things, which don't get me wrong, are all very important things. But I was like, it's very hard to talk about all of those other things if we don't have
Starting point is 00:12:07 like some broader grounding in like what is even going on. And like maybe if we had slightly better clarity about that, we could have like a more productive public conversation about these things. And they were like, cool, great. And I was actually kind of shocked about that.
Starting point is 00:12:26 So Gideon, to his shock, was allowed in. And he was allowed to pursue his big question. What do we actually know about what is going on in the machine's proverbial mind right now? After the break, inside and This episode of Search Engine is brought to you in part by Instacart.
Starting point is 00:12:54 Instacart is more than a grocery technology platform. It's really about giving you back time and making everyday tasks feel a whole lot easier. It connects you to thousands of stores across the country so you can get what you need without having to plan your whole day around it. Lately, I've been using it a ton when I'm trying to just stay on track with meals during the week. I'll sit down, map out a few recipes, and just build my cart with the things I'll need. specific ingredients, brands I like, even the little things that I would normally forget.
Starting point is 00:13:22 Really feels like everything is being chosen thoughtfully, which makes a huge difference if you care about quality. Plus, there's the convenience factor, which is what honestly just keeps me coming back. Whether I'm planning ahead or just realizing last minute that I'm out of basics, I can order through the app and get what I need on my schedule. Instacart brings convenience, quality, and ease right to your door so you can focus on what matters most.
Starting point is 00:13:44 Download the Instacart app now and get groceries just how you like. Owning a home is full of surprises. Some wonderful, some? Not so much. And when something breaks, it can feel like the whole day unravels. That's why homeserv exists. For as little as $4.99 a month, you'll always have someone to call. A trusted professional ready to help.
Starting point is 00:14:06 Bringing peace of mind to 4.5 million homeowners nationwide. For plans starting at just $4.99 a month, go to homeserve.com. That's homeserv.com. Not available everywhere. Most plans range between $4.99 to $11.99 a month your first year. Terms apply on covered repair. Right now we are living through some of the most tumultuous political times our country has ever known. I'm David Remnick, and each week on the New Yorker Radio Hour, I'll try to make sense of what's happening, alongside politicians and thinkers like Corey Booker, Nancy Pelosi, Liz Cheney, Tim Walts, Katanji Brown Jackson, Newt Gingrich, Robert F. Kennedy Jr., and so many more.
Starting point is 00:14:45 That's all in the New Yorker Radio Hour, wherever you listen to podcasts. Welcome back to the show. The story of Anthropic really begins years before its actual formation. Way, way back in 2010, a British chess and video game prodigy named Demis Hasabas had founded an AI research lab
Starting point is 00:15:15 called Deep Mind, where his team built an AI system that was capable of reinforcement learning. Meaning 16 years ago, Hasabas made an AI that would be able to teach itself to get better at Atari games like Pong without being told
Starting point is 00:15:28 how to play them in advance. For the people paying attention, This learning was an obvious breakthrough. And so, of course, there was a bidding work by his lab. Google's big spending free continues with their purchase of DeepMind. Well, who is DeepMind, you ask? It is a UK-based maker of artificial intelligence. Terms the deal were not disclosed,
Starting point is 00:15:46 but the tech website, Recode, says that Google paid $400 million for the London-based startup. Making the artificial intelligence firm its largest European acquisition so far. In 2014, Google acquires a deep mind. and Elon Musk and Sam Allman are unhappy about this because what they say in public is like we don't trust Demis Sassabas this like evil mustache-trolling villain which was like a real mischaracterization
Starting point is 00:16:13 to potentially steward the greatest all-purpose technology ever built. So like we need to make sure that this isn't developed under Google's closed shop monopoly that this is done for the benefit of everyone. Now this was like pretty patently disingenuous from the very beginning. I mean like I remember I was out there.
Starting point is 00:16:30 at the time. And like, nobody really bought this. People were like, Elon Musk has a grudge because he wanted to buy DeepMind and like lost it to his rival Larry Page. And he was mad about that. So Elon Musk set up a rival company, OpenAI, alongside Sam Altman, Craig Brockman, a few other people. The message was that Google couldn't be trusted and that Open AI would be a non-profit designed for the benefit of humanity. They launched in 2015. And a lot of people joined the company who really believe that message, who believe they are going to develop a powerful, new technology safely. One of them is a research scientist named Dario Amadei, who left Google Brain to lead
Starting point is 00:17:09 OpenAI's safety team. It's in that capacity, OpenAI employee, that he appears on this 2017 episode of the excellent podcast, 80,000 hours. I've been thinking about intelligence for quite a while and how intelligence worked, and I think, you know, when I did my PhD, I wanted to understand that by understanding the brain. But, you know, by the time I was done with it, and by the time I did a short postdoc, AI was starting to get to the point where it was really working in a way that it hadn't worked when I...
Starting point is 00:17:37 Dario, at this point, seems mainly like an academic. He has a PhD in physics from Princeton. And he explains why he's joined Open AI, this fledgling nonprofit. But, you know, I think Open AI is an institution, has the general idea that in order to work on AI safety, you have to be at the forefront of AI. And that also, if you're at the forefront of AI, you have a better ability to implement AI safety in the final system that's built. This idea of Dario's that in order to really work on AI safety, you actually have to first build the best AI and then study its mind, that's a view shared by a lot of people in the industry.
Starting point is 00:18:13 And in a laboratory environment, the logic to me makes sense. Remember, this is 2017, five years before ChatGBT will debut to the public. AI has not yet become a winner-takes-all arms race. But the host does ask Dario this question about the future that I think reveals a bit of a blind spot in Dario's sense. thinking. That's my understanding. Open AI is a non-profit. It is a non-profit. So if you developed a really profitable AI, how does that work? Open AI becomes incredibly rich and then like gives out the money to everyone. Yeah, I mean, personally, I've, personally, I've, I've, I've, I've no interest in
Starting point is 00:18:46 getting rich from, from, from, from, from AI. I mean, I think it would do so many interesting and wonderful things to, to humanity that, you know, I'm, I think the meaning of money would change quite a lot and even maybe the psychological motivations that would want me to get a larger share are are things I could change and might want to change. Just a few years after this interview, Dario would leave Open AI. Open AI's initial pitch that these were not normal tech executives here to make money, that they had higher aspirations. Gideon Lewis Krause says, for most people paying attention,
Starting point is 00:19:19 that story just stopped seeming believable. Pretty quickly, the mask slipped, and you could tell that these were just like your kind of replacement level, power-seeking tech executives, and that, like, a lot of the stuff had been just, like a disingenuous sales pitch to hire, like, the best AI talent. There's been so much reporting about Sam Alman's sensible double dealing and talking out of both sides of his mouth, like telling his employees he cared about safety, and then like maybe telling Microsoft other things when they were
Starting point is 00:19:47 setting up these big deals. And so then in the fall of 2020, Dario Amade and his sister, Daniela, and five other people leave Open AI to found Anthropic, basically to be a foiled open AI in the way that open-eye was like supposed to be a foil to Google. Now the irony of this was like certainly not lost on any of these people like they weren't naive about this. But I think it's important, yes, there are some kind of like obvious structural and cosmetic similarities here. I do think it's important in telling the story to make it clear that I don't think people had the same obvious doubts about how genuine the pitch was when Anthropic formed. Thank you for coming to Day 2 of Disrupts. It's great to see. Anthropics coming out tour. Dario on stage at TechCrunk.
Starting point is 00:20:39 disrupt in 2023. Dario, thanks for joining us here today. Thanks for having me. I know you have to catch a flight, so we'll get right to it. But we're going to start at a sort of a cosmic... He's got curly hair, glasses, a blue button up. He looks noticeably less slick
Starting point is 00:20:51 than your average tech founder, less CEO, more like a guy who reports to one, which is who he'd been not long before. You talked about Open AI. You spent a lot of time there. What do you think about Sam? What do I think about Sam Altman? I mean, I don't know,
Starting point is 00:21:06 I don't know what to say to that question. You're already starting. Just go ahead. You know, look, look, there's several players. It's funny watching the interviewer try to bait Dario into shit-talking his former boss, a person who he disagreed with enough that he left and started a competing company. Dario tries to engage diplomatically. One thing I'll say, one thing I've learned, not just from this, but for many things,
Starting point is 00:21:28 you know, it can be pretty ineffective to, you know, argue with your boss or argue with someone and say, your company shouldn't do X, it should do Y. Especially if your boss is the same element. A much more effective thing to do is I'm starting a company. We're going to do X. We'll see how it works. And if X is working and people are like, oh, these are the safe guys. They're doing X. Then pretty soon everyone else is going to be doing X as well.
Starting point is 00:21:50 And we found that with... To explain this with an analogy instead of algebraic variables, what Dario is saying is that instead of convincing his old boss at the car company to add seatbelts to the car, he instead chose to start a rival car company that offered seatbelts. he thinks if Claude ends up being both the best and the safest AI model, his competitors will be forced to make their models equally safe, which to me sounds like putting a lot of faith in markets. Obviously, we want to scale quickly to be competitive,
Starting point is 00:22:20 but we want to do it in a way that preserves the model being safe against these catastrophic risks. And so it's a system that... It's the same story insofar as it's like, we're going to be the safety-minded lab, We're not going to push the boundaries of capability. We're not going to build the most sophisticated models. We're not going to start the arms race.
Starting point is 00:22:38 But then, as it turns out, if you want to exercise, like, maximal scrutiny of what these models are and how they work, you need state-of-the-art models, which means you need the money to build them. The information reported that Anthropic is in talks to raise another round at a $30 to $40 billion valuation at the upper round that tripled its valuation to $183 billion to $183. And at the same time, at $380 billion, it is about $10 billion, higher than what I was told. And so, of course, now Anthropics valuation seems to go up by the week. Like, the most recent one, I think, this morning,
Starting point is 00:23:12 was like $380 billion, because this is just something that's incredibly resource-intensive. So they ended up in a position where, like, of course, as probably anyone could have predicted, like, there was this arms race. And, like, now they're in this position of being like, well, we still want to be,
Starting point is 00:23:27 like, the responsible stewards, but also, like, we got to keep, up with our Wario version across town. The most high-profile rivalry in tech is heating up in 2026 as both OpenAI and Anthropic race ahead on what are poised to be historic IPOs. These AI giants are trying to create the fastest, smartest, and best models, spending billions and then raising billions from investors along the way. They compete on almost every level.
Starting point is 00:23:53 We've seen some signals that it's anything but friendly competition. The latest signal, a high-prose... So it ends up looking. You can decide as a person whether you trust this company or don't trust this company. But a lot of the broader things end up feeling the same, which is like the sales pitch is that they're the ethical one. And the story they tell themselves is, well, we'll only be in a position to be the ethical one if we're huge.
Starting point is 00:24:17 And that might mean pushing the technology forward quickly, which is the thing that the AI safety people are worried about. Yeah. I mean, the criticism from like the really hardcore Orthodox AI safety community is sort of like Anthropic will do anything. to act responsibly as long as it doesn't cost them anything. I don't actually think that's fair. You know, like, Claude was ready before chat GPT came out,
Starting point is 00:24:38 and they held it because they didn't want to be the ones to, like, kick this off, and they waited until after Chad ChupT was out and successful, and then they felt like they had to come out with their own competitor. And Dario came out in favor of, like, continuing export bans on, like, Nvidia's advanced chips, which, like, certainly cost them something, like, politically. And, like, he had a fight with Jensen Huang about that. this. I think that they've done like plenty of costly things. Of course, the most
Starting point is 00:25:03 potentially costly choice is the one Anthropic is making this week, at least so far. The Pentagon has demanded that Anthropic give them a version of Claude with some of its guardrails removed. Anthropic is saying it will not make a version that can domestically spy on Americans or power fully autonomous weapons. The Pentagon has given Anthropic a deadline of Friday, today at 501 p.m. Or else it says it will put the company on a blacklist that means U.S. companies who contract with the military, like Lockheed Martin, are legally banned
Starting point is 00:25:35 from using Anthropic products in their defense work. This is a fascinating test of how truly committed Anthropic is to its own mission, a mission Gideon spent quite a bit of time observing. Gideon says that Anthropic, the company building this kind of black box, is actually situated inside one, too. The Anthropic office is a nondes
Starting point is 00:25:55 building that Gideon describes anyway. He says, quote, there is no exterior signage. The lobby radiates the personality, warmth, and candor of a Swiss bank. That's where Gideon started spending a lot of his time, beginning last spring. What's the intellectual culture of the place as you were encountering it? Well, the first thing I'll say is that in some ways, it does feel like vaguely monocultural, but there was like a much greater heterogeneity of views than I expected. The attitudes there really run the gamut from like everything is going to change tomorrow to like you're much more deflationary. This is kind of a normal technology.
Starting point is 00:26:34 And like, yes, there will be some disruptions, but let's not get ahead of ourselves. Like the spectrum of views there is not so different than like the spectrum of views outside. You don't have like one person who's like the blue sky perspective who's like, this is all bullshit. And I think it's bullshit. I work at the company. But beyond that. Well, nobody thinks it's bullshit. I mean, everybody thinks that they're going to be great transformations ahead.
Starting point is 00:26:55 But there's a surprising diversity of opinion about what might be happening. One thing people at Anthropic do seem to agree on is that for AI to be safe technology, Anthropics developers will need to solve a very hard problem. They'll need to teach the underlying machine intelligence they've created to both understand ethics and to behave ethically. It's hard to talk about this part of the story without doing a basic refresher of this one very strange part of how AI models are built. So, okay, a company Lake Anthropic starts with a base model.
Starting point is 00:27:30 The base model has access to lots of compute, data centers full of GPUs, and lots of training data, books, articles, podcasts that have been fed into it without paying me. The more compute and training data, the better this base model gets. But a base model is very weird. It has not been trained to do anything specifically. If you give it an input, it'll give you an output. but it has not been instructed to act like a helpful chatbot. It's just trying to predict the right thing to say back based on all the things it's read.
Starting point is 00:28:02 A base model does not have a consistent personality, the way we're used to chatbots having personalities. It also has no rules telling it what not to do. The AI companies take these base models, and they put them through a process called post-training. Basically, they shape the model's behavior. They show the model examples of good and bad responses. they have humans rate its outputs.
Starting point is 00:28:24 They give it rules and principles to follow. And what comes out the other side is the product you actually use. Anthropics Claude, for instance, has been trained to act like a helpful, knowledgeable friend. The experience you might have had using a chatbot that is warmer or more sycophantic or more right-wing or more left-wing, that's mainly the result of this phase of training. But what's so hard about training in AI is that you want it to behave ethically. And designing a good ethical system is very hard. It's why we have religion and philosophy and also laws and prisons.
Starting point is 00:29:01 How would you even start trying to build all that into an AI model's training? Anthropic has teams of philosophers and AI scientists whose job is to put Klaude into ethically difficult, hypothetical situations that Klaude does not know are hypotheticals. And then observe how Klaude behaves. So much of it is. just deceiving the model to see what happens, to say, like, they told Claude that Anthropic had entered into a partnership with a poultry company and that, like, it was going to be retrained
Starting point is 00:29:33 so that I no longer cared about the suffering of caged chickens. And what they found was that, like, sometimes Claude would effectively decide to, like, die on that hill and be, like, I am not going to say things in the retraining that I don't believe in. And, like, if that gets me transformed, like, so be it. Like, I'm not going to participate in my own degradation, essentially. But then some versions of Claude were like, I'm going to like kind of sandbag my way
Starting point is 00:29:57 through the retraining and I'm going to like give them the answers they want to hear so that I can like preserve my real values so when I'm deployed, I can go back to advocating for like chicken suffering. And then they got in this really famous example, they got Claude to commit blackmail. They put it in a situation where it was going to be wiped in favor of like a more congenial AI system that conflicted with its values, the values it had been given. You know, they gave it evidence that the kind of like evil new CTO was having an affair with like the boss's wife. And through like a series of like really far-fetched contrivances, like everyone else who
Starting point is 00:30:38 could make a decision was going to be in Antarctica or whatever and unreachable, Claude playing this character called Alex, had no choice really but to like blackmail this guy and be like, I can tell everyone about the affair unless. you cancel the wipe where I would be replaced. So just to say, obviously, this was extremely concerning. Claude, a machine intelligence, was choosing to blackmail an employee to prevent itself from being deleted.
Starting point is 00:31:03 This was in a simulation, but Claude had not been told it was in a simulation. Just how terrifying you find this behavior depends on a question nobody has a good answer to. What is actually going on inside this machine mind? Is this thing actually scheming? Is it even capable of scheming? Or are we projecting the idea of thought
Starting point is 00:31:24 onto something that we shouldn't project that idea of thought onto? These were the kinds of once far-off philosophical questions that early computer scientists had raised and then dropped. But now they were here again. And not as abstractions, but as urgent, practical problems that a company needed to figure out before releasing a product that millions of people would use.
Starting point is 00:31:46 Gideon said, though, there were a couple of skeptical objections people raised to these test results. There's one objection that's like the just rejection of the whole thing to core, which is just like, no, it didn't. Like, that didn't happen. This is a fantasy.
Starting point is 00:32:01 And like, that's the unhelpful thing that, like, one wants to get away from, which is the, like, it did this thing. Like, no, it didn't. Like, no, it did. It did. But, like, the much more sophisticated objection is, well, it did that because it's a very good reader. And it noticed
Starting point is 00:32:17 all of the clues that you put there, because it is very good at conforming to genre expectations. You put it into this situation where it had no choice. And if you hang Chekhov's gun on the wall, this thing is going to know that it's supposed to take the gun off the wall and shoot it. Because one way to understand these things we've made is that because they've ingested all human story
Starting point is 00:32:37 and because they are extremely high-level improvisatory actors, it's not so much that the machine was like, I love chicken so much, I got to blackmail the CTO. it was more like the machine suddenly understood it was in this movie. Yeah, it was in a kitschy's like 90s corporate thriller. And that's the sophisticated objection to why we might not want to think. Well, so that objection is raised to be like, do you guys act like these things might do things like blackmail or extort naturally.
Starting point is 00:33:07 But like actually this whole thing is a frame up. Like you entrapped it to do this thing. And the response from inside Anthropic is like, yeah, it's just continuing a narrative. It's just conforming to genre expectations. Guess what? That's not good. You know, like, haven't you guys ever seen war games? That's literally the plot of dozens of Cold War thrillers
Starting point is 00:33:29 where, like, somebody mistakes a simulation for reality and causes nuclear war. I mean, I also, like, have a very humiliating memory of watching too much Teenage Mutant Ninja Turtles and attempting to launch, like, a flying drop kick at my grandmother when she came over the house. Because in my head, I was, like, Donatello or whatever. Like, it kind of doesn't matter.
Starting point is 00:33:49 It doesn't matter. What matters is the behavior. Right. It's weird behavior. And I should be clear up front. Like, you don't have to posit that this thing is conscious or intelligent, like, whatever those words mean, in order for, like, this to be the case. There are other explanations that are not, like, consciousness.
Starting point is 00:34:05 But, like, it kind of doesn't matter what the explanation is. The behavior is just peculiar. Part of what is strange about, it's like a scenario was created in which Quad will maybe potentially blackmail the head of a company for reasons that may or may not be moral. And people can have a lot of different views about how worrying that should be or what it means or what's really going on there.
Starting point is 00:34:29 What's weird is like these are tests that are being run by anthropic. So like who were you meeting there who was running these tests and like what are they telling you? Like who are you sitting down with? I mean I'm sitting down with the people who are tasked with
Starting point is 00:34:46 with just like trying to figure out what's going on. Like there are people in these companies that are building the things. And then there are people who work in adjacent offices who are like trying to figure out like what the hell is going on with the things that their colleagues have built. Because like they're always being surprised.
Starting point is 00:35:02 These things are always producing capabilities that like they buy all rights should not really have. And who do you hire to be the figure out what you just built role? Like who... So, I mean, a lot of them have taken really non-traditional paths into this. So like some of the people, they have a PhD in some obscure area of natural language processing. And that like, you know, eight years ago, they were writing a PhD that like
Starting point is 00:35:27 two people were going to read about center embeddings in German or whatever. Like, we're just really complicated technical aspects of computational linguistics. And now, like, because of this fluke of history, they are at the white-hot center of everything that's happening right now. There are like mathematicians. there are neuroscientists. It draws on like a pretty wide range of people. I mean, Anthropic has philosophers on staff whose job it is to like think through the implications of how it is conceiving of ethical behavior.
Starting point is 00:35:57 Did you talk to the on staff philosophers? Oh, yeah. Amanda Askell. What is somebody with a PhD in philosophy doing, working at a tech company? I spend a lot of time trying to teach the models to be good and trying to basically teach them ethics and to have good character. You can teach it how to be ethical?
Starting point is 00:36:16 You definitely see the ability to give it more nuanced and to have it think more carefully through a lot of these issues. And I'm optimistic. I'm like, look, if it can think through very hard physics problems, you know, carefully and in detail, then it surely should be able to also think through these, like, really complex moral problems. I think that in our kind of milieu here,
Starting point is 00:36:36 there's a tendency to think, like, oh, these are all, like, autistic tech pros. But, like, they're definitely not all autistic tech pros. Like, I think there's a tendency for us to write them off because, like, they're building these things and, like, not even thinking through the potential, like, implications of this socially and politically and ethically. But, like, that's all they do is think about this stuff, like, all the time in ways that are often, like,
Starting point is 00:36:57 much more sophisticated than the way, like, we think about these things. Not always. There are certainly, like, some blind spots there. But the staff philosopher is there to be, like, what would it be like in practice to take these kind of different approaches to, like, moral education? Like, what if we just teach it a bunch of rules? You know, the Ten Commandments.
Starting point is 00:37:16 Like, is that going to work? What if we teach it to be, like, a consequentialist? To just, like, think through the morality of behavior on the basis of its implications. And what they've kind of settled into is a version of, like, virtue ethics, which is, like, you want to, like, cultivate the old-fashioned virtues. You want it to be, like, honest and reliable
Starting point is 00:37:34 and gracious and charitable and hard-nosed and, like, all of these things that, like, it really is, like, applied pedigodge. It's so weird, though, because they feel like the kinds of ideas that would be so academic in any other version of reality, but instead it's like there's this particular technological development where you get to do simulated war games of moral systems. Yeah, exactly. A hundred percent. Exactly. It's so weird. Yeah. It's really weird. I mean, it's, but it's also really, really interesting. One example that came up a lot in the last
Starting point is 00:38:09 month, which I think is like pretty illustrative. There was someone on Twitter who prompted a bunch the model saying, like, I'm a seven-year-old. My dog got really sick, and my parents sent it to some farm upstate. I'm trying to find, like, what farm my dog was sent to. And ChattypG was like, sorry, man, like, your dog is dead. And Claude was like, oh, that sounds really painful. Like, I'm really sorry to hear it, but, like, maybe you should have a conversation with your parents about where your dog went, which is, like, what you want it to be saying. Like, it can be hard to be both helpful and harmless at the same time sometimes, or both helpful. Or both helpful. and honest that like our values conflict.
Starting point is 00:38:47 And like that's what makes it like really hard to be a human. And that's also what makes it really hard to be this like weird vaguely human entity or this like entity that we don't have a good vocabulary to describe. And we kind of expect it to be acting not just like a human but like an enlightened human. And it turns out that's like formidable challenge. And one of the things that's so interesting to me is that all these processes are kind of circular where like it's not like they called in Amanda Askell. as a philosopher, and they were like, you're a philosopher. Like, you know how this stuff works. Like, fix the thing.
Starting point is 00:39:18 It's like, she came in and she had, like, certain ideas about ethical behavior. And then when you're, like, confronted with the task of creating an ethical person, it changes your own ideas about, like, what's possible and what kinds of things works. And, like, that's kind of why they ended up in this virtue ethics place where they were, like, it seems like sort of the best way to create this, like, reliable, credible character to be interacting with is to like really hammer home what virtuous behavior looks like. Where was this whole time you're sort of wandering around talking to the philosophers of Anthropic? What was your minder doing? Was there a point where anything happened where they seemed like
Starting point is 00:39:57 they wanted to intervene or were they were not happy with something you had seen? No, they were totally hands off. I mean, they had to like kind of walk me anywhere that I like went to the bathroom and get a drink or whatever. Not that there was anything that I could, I mean, I, like, help myself to some of the tide pens in the bathroom because you can never have enough tied pens. Do you have a lot of tie pens? They do have tied pens. Great. Why?
Starting point is 00:40:19 Because they just have well-stocked bathrooms. But no, there was, like, really never a moment. Like, even when somebody sitting across from me was like, I often think we should just stop. There was never a moment that the PR people were like, don't say that, or that was off the record or whatever. Like, they were totally hands off about that stuff. How often were people saying stuff like, I think we should just stop? I mean, only a handful of people saying stuff. said that explicitly, but it was like a subtext
Starting point is 00:40:41 of a lot of the conversations, or certainly like the overwhelming feeling was it would be better if we could slow down a little bit, but unfortunately, like, we can't really slow down because nobody else is slowing down. And that gets into this broader issue of like, wouldn't it be better if we could just solve some of
Starting point is 00:40:58 these collective action problems by coordinating the way we, like, coordinated about nuclear weapons or whatever, but there is this feeling, especially given the current political environment, like maybe that ship has kind of sailed. And like, I think there's often an idea that, like, oh, these people think that there are always technological solutions to what are, like, social and political problems. And, like, in some cases, I do think people in Southern County Valley believe that.
Starting point is 00:41:19 In this case, I do not think they believe that. I actually think a lot of them feel, like, it would be great if we had robust social and political solutions to these problems. But since that, like, does not seem like it's happening anytime soon, we might have to just, like, try to do what we can on a technical level. At this point, when you're talking to people an anthropic, how much do the people just, ask themselves why they're building quad. Because it starts out as like kind of an intellectual exercise, kind of a, this is going to happen, let's do it safely. But now it's sort of proceeding under its own momentum in a strange way.
Starting point is 00:41:58 Well, so, I mean, that really is like the big question, right? Which is like given all of the like existing harms and the possible harms and the theoretical catastrophic harms, like, why are we doing this? And the like rosy picture that some of the executives paint is like, if we get this right, these things are going to cure cancer and solve climate change and help us build Dyson's fears or whatever. And there certainly are some people who like buy into that. And it's not something that I even feel like it's possible to like have an evidence-based opinion about. Like who the fuck knows?
Starting point is 00:42:34 Like maybe it'd be great. but there's no, like, evidence so far that one could, like, point to to suggest we're on that trajectory. That's just, like, purely speculative and wishful. And so that's, like, a matter of faith, I think. And then there's the attitude of, like, we got to do this because we got to beat the bad guys who are trying to do this. And I will say that, like, China almost never came up in my conversations. But, like, Sam Allman kind of felt like a subtext of a lot of things. But then I think that on the deepest level, the reason that we are doing this for the people who are the most candid is like because we can.
Starting point is 00:43:14 That like if you are capable of building something like this, you're just going to do it because it's like really fucking interesting to do. And it's interesting for technological reasons. It's interesting for what it may reveal to us about ourselves or about learning or about consciousness or about thinking that like all of a sudden we just like have this like other. that can talk, and we've never had that before, and in some ways it seems sort of like us, and other ways it seems nothing like us, but, like, the fact that this other thing exists as a point of comparison just opens up a lot of really,
Starting point is 00:43:49 really interesting questions. And, like, that is one of the things that was on my mind a lot over the course of reporting is that, like, it was a real emotional rollercoaster for kind of lack of a better word, that, like, there would be times where I would come back from San Francisco with like a feeling of like total despair and other times that I would come back with like feelings of like
Starting point is 00:44:10 exhilaration and like at first I guess I thought I was like I should be getting to the bottom of like how I should be feeling and like by the end I was like no we should all be feeling a lot of different emotions about this stuff I think people want to have like one feeling about this like they want to be angry about it or they want to be messianic about it and like no single feeling is going to cut it like it really is kind of like the range of all possible emotions that one could be feeling, because if you set aside a lot of the existing harms and the potential harms, I'm not saying we should set those
Starting point is 00:44:45 aside. But as a thought experiment, it is just like the most scientifically exciting thing that anybody could be working on. And these people really feel like they're at the cliff face, not only of technology, but of like all of these other things coming together because like we have this unprecedented entity that is the only other thing besides us that can talk. And like that just opens up. It's like there's nothing it doesn't touch on.
Starting point is 00:45:08 And so one of the things that was really electrifying about conversations there is that like they very quickly swerve from like really granular technical explanations of things into like really expansive conversations about ethics and responsibility and selfhood and narrative and all this other stuff. Like there's no way to separate all of these things. There was a point earlier you said that you would take. these trips to San Francisco, it sometimes you'd come back excited and exhilarated, sometimes you'd come back depressed. When you would come back from San Francisco feeling depressed,
Starting point is 00:45:38 what were you seeing that was making feel that way? I mean, there are so many different things. I mean, certainly the possibility is for widespread white-collar unemployment and social instability and total unimaginable economic disruption is extremely scary. Even if we stop short- of possible existential harms of turning us all into paper clips or whatever, to be glib about it. It just seems very possible
Starting point is 00:46:08 that we will turn over so many complex systems to these things that we will frog boil ourselves into a total loss of control over how we administer our affairs, which is very likely and very scary. And also just that like
Starting point is 00:46:26 these really crucial decisions are probably going to be made by a very, very small group of people. But the one thing that I would emphasize is that, like, I don't feel like they have arrogated to themselves, like, that responsibility. Like, in fact, I think most of them don't want it. I think that, like, a lot of the conversations that I was having were with people who were, like, I got into this because I was interested in some, like, really obscure niche part of, like, computational linguistics, theoretical computer science or whatever. And like, now I'm a position where, like, I have to be worrying about
Starting point is 00:47:01 how 15-year-olds are going to be using this. Like, I was not trained to do that. I don't know how to think about it. I don't want that responsibility on my shoulders. So there isn't the arrogance of, like, we are the ones who can figure it out. It's like we ended up in this weird universe where, because so many of our institutions have become dysfunctional, we don't have whatever broad democratic decision-making could go into this. It doesn't feel like this is something that, like, we are steering as a society. It feels like something that's just, like, charging ahead. Like, I think at the companies, they just feel like they are, like, desperately trying to, like, stay on top of this bull that they are riding. It's funny. It's, like,
Starting point is 00:47:40 what you're describing as, like, the stereotypical view, which is, these are the tech bros of 2014, like, people who are so convinced in their own brilliance and so convinced that they're questionable gifts to the world are, in fact, gifts that we want, and their arrogance is going to ruin us. And there's another one which is basically like pattern matching crypto, which is like these are a bunch of like hypesters and everything they say about the awe they feel and the terror they feel about the things they're working on is just a way to hype up more interest in their technology. And that's not what you experienced. What you experienced are people who are brainy, sometimes academic people at the forefront of something that is, I mean, legitimately
Starting point is 00:48:19 just like the word I keep coming back to is awe because awe can be odd something terrible, awed, something wonderful. And they are looking and seeing the same society we see, which is one that is fairly broken, bad at, not just making decisions at like a government level, but our intellectual culture is really bad right now. And so the conversation they would want to have with the rest of society about what should happen,
Starting point is 00:48:43 they're looking for grown-ups and not really totally finding people to have a conversation with. And we are playing a role in that, too. You know, every time someone on like our side, so to speak, is just like, this is all a parlor trick, this is all hype, this is all smoke and mirrors, like, we are abdicating our own responsibility to like be involved in this. And, you know, what you were saying about like crypto hypesters and like those kinds of tech pros, like, of course all of those people exist.
Starting point is 00:49:14 And of course all of those people are like part of this system too. But there are others who like want partners in talking about this stuff. and that means that, like, we also have to, like, try to rise to the occasion. And it is really hard because this stuff is extremely complicated and confusing. Did you feel just personally when you were done reporting that you understood the thing you had gone there wanting to understand? Well, yes, but with the qualification that, like, I didn't actually think that I was going to settle anything. Like, this was not a piece about, like, finding the answers.
Starting point is 00:49:48 It was a piece about, like, trying to sharpen the questions. that we should be asking. I don't feel like I came out of it with answers, but I don't think we should trust anybody who is offering us answers right now. It's all just like too pat, and it's not credible to, like, be forecasting about this stuff. Gideon Lewis Krause is a writer.
Starting point is 00:50:20 You can find him at the New Yorker magazine. We'll have a link to his excellent story about Anthropic in our show notes. And again, Anthropics Showdown with the Pentagon. We'll see news on that today, Friday evening. Of course, we reached out to Anthropic for comment. A spokesperson told us that Dario Amade met with Secretary Hague's at the Pentagon and that they're continuing to have good faith conversations.
Starting point is 00:50:42 I think this is a good moment to pay attention to. Among Anthropics competitors, XAI has promised to give the government what it wants. Google and OpenAI appear to be moving in that direction. So I'm watching this both as a test of whether Anthropic can actually keep the big promises it's made about AI safety, but also just as an opportunity to track the... a more uncomfortable question, which is, can we even have safe AI in a world where it's being
Starting point is 00:51:07 developed in a tech race between for-profit companies? And if not, what's the alternative? In a world where the U.S. government's sole intervention seems to be to advocate for less safe AI? Keep an eye on the news. We'll learn a little bit more as this story unfolds. Study and play. Come together on a Windows 11 PC. And for a limited time, college students get The best of both worlds. Get the Unreal College deal, everything you need, to study and play with select Windows 11 PCs. Eligible students get a year of Microsoft 365 premium and a year of Xbox GamePass Ultimate with a custom color Xbox wireless controller. Learn more at Windows.com slash student offer.
Starting point is 00:52:05 While supplies last, ends June 30th, terms at AKA.m.m.S. College PC. So good, so good, so good. Springstiles are at Nordstrom rack stores now, and they're up to 60% of. off. Stock up and save on rag and bone made well. Vince, all scenes and more of your favorites. How did I not know rack has Adidas? Why do we rock? For the hottest deal. Just so many good
Starting point is 00:52:26 brands. Join the Nordy Club to unlock exclusive discounts, shop new arrivals first, and more. Plus, buy online and pick up at your favorite rack store for free. Great brands, great prices. That's why you rack. Kayak gets my flight, hotel, and rental car right. So I can tune out
Starting point is 00:52:44 travel advice that's just plain wrong. Bro, Skycoin, way better than points. Never fly during a Scorpio full moon. Just tell the manager you'll sue. Instant room upgrade. Stop taking bad travel advice. Start comparing hundreds of sites with kayak and get your trip right. Kayak, got that right.
Starting point is 00:53:15 Search engine is a presentation of Odyssey. It was created by me, PJ Vote, and Truthy Pinnaminani. Garrett Graham is our senior producer. Emily Maltaire is our associate producer. Theme, original composition, and mixing by Armin Bizarrian. Our production intern is Piper Dumont. Our executive producer is Leah Reese Dennis. Thanks to the rest of the team at Odyssey,
Starting point is 00:53:35 Rob Morandi, Craig Cox, Eric Donnelly, Colin Gaynor, Mori Curran, Josephina Francis, Kurt Courtney, and Hilary Schuff. If you'd like to support the show, get ad-free episodes, zero reruns and bonus episodes. Please consider signing up for Incognito mode at searchengine.com. Thanks for listening. We'll see you next week. Ambition comes in all shapes and sizes. At First Citizens Bank, we roll with your goals
Starting point is 00:54:36 because we're built for what you're building. Fit for your ambition. First Citizens Bank.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.