Hard Fork - Will ChatGPT Ads Change OpenAI? + Amanda Askell Explains Claude's New Constitution

Episode Date: January 23, 2026

Ads are coming to ChatGPT’s free and low-cost subscription tiers. We explain what they’ll look like, why OpenAI is taking this approach and whether the company can court advertising dollars withou...t compromising quality and user trust. Then, Amanda Askell, Anthropic’s in-house philosopher in charge of shaping Claude’s personality, joins us to discuss the company’s newly released “Claude Constitution” and what it takes to teach a chatbot to be good.As a bonus, if you’re interested in learning how to get started with Claude Code, you can check out our tutorial on YouTube.Guest:Amanda Askell, a member of Anthropic’s technical staffAdditional Reading: OpenAI Starts Testing Ads in ChatGPTClaude’s Constitution Subscribe today at nytimes.com/podcasts or on Apple Podcasts and Spotify. You can also subscribe via your favorite podcast app here https://www.nytimes.com/activate-access/audio?source=podcatcher. For more podcasts and narrated articles, download The New York Times app at nytimes.com/app.

Transcript
Discussion (0)
Starting point is 00:00:00 You know, I'm now regularly running into CAPTCHAs when logging into my Google accounts that I cannot solve. Have you noticed this? They've gotten harder. Yes. They're twisting the letters more. Yes. And they're pressing them closer together. Have you seen the ones where you have to, like, rotate the, like, object into the same direction as the sort of example?
Starting point is 00:00:21 That one I'm, I like, because I can still do that one. Yes. But some of them are like, factor this quadratic equation. No, I'm routinely in this situation. And it happens a lot to me on threads. I'll see like a link to a story I want to read and I'll open it and it'll be like the Washington Post or something and they'll be like, well, you need to log in. And in order to do that, I need to, I log into the post through Google.
Starting point is 00:00:43 But so now I have to log into my Google account, which has two-factor authentication. Okay. So now I have to open up my one password, right? And then Google is going to send a notification to another app that I have to go open and grab a number that I bring back that I put in. So I go through all of this drama. and then it's like, and now solve an impossible captcha. And it was like, I just wanted to read a six paragraph story about like something that happened at SpaceX or whatever.
Starting point is 00:01:11 Yeah. And it can't be done anymore. So what one password, whatever you used to do, it's not working anymore, figure it out. Yeah, if only it were one password. It should literally. Now this was six fingerprints and a pass key and, you know, solve a math problem. The genuine name for one password these days should be 15 steps because that's how long it's. takes to do freaking anything on there anymore.
Starting point is 00:01:33 One password you wish. I'm Kevin Roos of Tech Coleman since the New York Times. I'm Casey Noon from Platformer. And this is hard for! This week, ads have arrived in chat GPT. How will they change Open AI? Then there's a new constitution for Claude. Anthropic philosopher Amanda Ascal is here to talk about how to shape an AI's personality.
Starting point is 00:02:02 I'm going to use some of these techniques on you. Please don't. So today we're talking about ads, specifically ads in chat GPT, because late last week, OpenAI announced that they are going to start testing ads in chat GPT for logged in adults in the U.S. on the free and the low-cost go tiers of chat GPT. That's right, Kevin. I will discuss it right after these ads. No, we already did the ads. Oh, okay. So at least on my feed, people were reacting to this pretty negatively.
Starting point is 00:02:41 I think a lot of people have gotten accustomed to using chat chbt and other chatbots without a lot of like direct commercial pressures. It's a refreshing break from all of the ads that have been shoveled at us on other platforms for years. And so collectively, I think people were just like, oh, we know, we knew that the honeymoon would be over eventually and that we'd be forced to see ads in chat chbitty like we are everywhere else. Yeah, I think people can just remember products that they use that once did not have ads and now do. and no one thinks of the moment that ads arrived as the moment when the product got really good. Yeah. Right. Right.
Starting point is 00:03:16 I think there are some exceptions. I mean, some people like Instagram ads, for example, but I think mostly people see this as sort of a blight on the internet, maybe a necessary blight, but a blight nonetheless. And I think people were also surprised that Open AI was moving in this direction because of some things that Sam Altman has said in the past about how he doesn't like ads and how he wanted to basically treat this as a last resort for Open AIs. some people were saying, oh, this means that they're in trouble. They need to raise a bunch of money, you know, so they can keep building out their data centers and things like that. So, Casey, what did you make of Open AI's announcement about ads? Well, Kevin, on one hand, I think this is inevitable. There's an analyst.
Starting point is 00:03:55 I follow Eric Sufert, who often says that everything is an ad network. And if you have hundreds of millions of people coming and paying attention to a service every single week, inevitably, believe there's going to just be overwhelming pressure to put ads on it. Also, we know that OpenAI needs revenue, right? This is the company that has laid out the most ambitious infrastructure investment project in human history. They have nowhere close to the money needed to build it. And we just know that they would not have been able to fulfill their dreams on subscription revenue alone. That said, as you point out, Sam Altman himself said that ads were going to be a last resort, a great Papa Roach song.
Starting point is 00:04:39 And so in this moment, we now are at the last resort. And so I think it's just interesting that after everything else they tried, eventually they just said, look, to do what we need to do, we've got to sort of break glass the emergency is here. Yeah, they said, cut my life into pieces. Because this is my last resort. Yeah. And their question is, will this cut their life into pieces?
Starting point is 00:05:01 Yes. So we're going to get there. But first, our disclosures, the New York Times is suing opening. Microsoft and Perplexity over alleged copyright violations related to the training of large language models. And my boyfriend works in Anthropic. Let's just start with the actual announcement that they made because they not only said that they were going to start testing ads, they also gave some previews of what these ads are going to be. And if you look at their sort of mock-up version of their ads, it's a kind of bolt on to the chat Chup
Starting point is 00:05:27 the answer. They've been very clear. This is not going to influence the answer that Chat Chepti gives, or so they claim. Instead, it's a little banner at the bottom of... of the answer in the mockup. Someone is asking Chat ChyPT for ideas for a dinner party, and chat Chup Tee gives a response. And then at the bottom, there's a little sponsored banner for harvest groceries, including a link where you can go and buy some hot sauce.
Starting point is 00:05:51 And if I can just pause there, I have to say, Kevin, I'm already feeling lied to for this reason. They have said to us, your query is not going to affect the advertisement that we're showing you. And yet, here you have someone saying, I want some ideas for cooking Mexican food for my dinner party, and chat GPT says, well, here's some groceries including hot sauce. It sure feels like something was being influenced there, right? Like the message is being
Starting point is 00:06:17 tied to the query. Well, no, so their response to this would be that there are two parts of this response. There's the actual response from the model, and then there's the ad. And what they're saying is not that they won't show you ads that are relevant to the thing that you're asking chat GPT about. It's that there's this sort of sacrosanct part of the actual reply from the model that they are not going to let advertisers pay their way into. That is what they're claiming anyway. All right. All right. So that example is a much more straightforward ad, the kind we've seen on Google and Facebook and other platforms for many years. The second kind of ad, OpenAI, mocked up for this announcement was, I think more interesting because it shows a new way of interacting
Starting point is 00:07:02 with ads. So basically it's, you know, users planning a trip to Santa Fe. ChatchipT pops up this little sponsored widget from this desert cottages, I don't know, I guess hotel or resort thing. And it'll present you with an option where you can go and chat with the advertiser and ask more questions before deciding whether or not to make a purchase. Such a relatable question. I think we've all had the experience of just watching ads on TV and saying, why can't I have a conversation with this? I want to share my thoughts with McDonald's right now. But I can't. But now you can. Yes.
Starting point is 00:07:35 So let's talk about the ad principles that OpenAI laid out as part of this announcement, because I think it sort of gives a sense of the objections that they're trying to get ahead of. There are five principles. They say mission alignment, answer independence, conversation, privacy, choice, and control, and long-term value. Basically, I think they are sensitive to the criticism that putting ads into chat, GPT, means that they are now going to start directing people to more commercial types of use cases, optimizing for engagement, trying to make people spend more time in the app. I think these are very reasonable fears that people, including me, have.
Starting point is 00:08:14 But this is sort of their attempt to say, well, we're introducing this, but don't worry, your experience of chat chit is not going to change. Yeah, I was talking about this story with my friend Alex over the weekend. And he said, you know, I'm so excited about ads in chat chitpity. I'm going to tell it my lower back hurts, and it will ask me if I've tried my skate barbecue sauce. And like, that is the fear, you know? No, I mean, yes, there will be some initial stumbles about that. But I think the longer term worry here is that ad platforms as they mature and get better and get more data, they tend to sort of try to confuse their users, right?
Starting point is 00:08:56 We've seen, there's this amazing graphic that I think about a lot. Search Engine Land, the blog that covers Google and other search engines, made this sort of timeline of how Google's ad labels have changed over the years. And it's pretty amazing because, like, at first, when they first introduced ads into Google Search, they were very noticeable. They had sort of like a different color background. They really stood out on the page. And then you just see over time, with each successive update, you know, it gets a little closer to the organic.
Starting point is 00:09:26 search results. Eventually, they do away with the colored backgrounds. They have this little, like, yellow ad icon, and then that icon gets smaller and less noticeable. And then it sort of just blends in with the organic content. And I think that's the fear here is that while chat GPT may start out with these very clearly labeled ad modules over time, as the commercial pressures get more intense, they are just going to have a lot of incentives to blend that advertising content in with the organic responses and make it less noticeable. Yes. And we've already seen this. exact trajectory play out at OpenAI. It went from no ads to ads will be a last resort to ads are now in chat GPT. So if you think that the bargain is not going to change further,
Starting point is 00:10:06 I have news for you. Totally. And of course, now the sort of narrative from Open AI that we're hearing is, well, this is the only way ads are the only way to make a free or low-cost product accessible to billions of people. Do you have thoughts on that narrative? Because that's also something that we heard from Facebook back in the day. People would constantly be asking them, oh, like, why don't you just charge people to, you know, to join Facebook instead of showing them all these ads? And they would consistently say, oh, well, that's not scalable. People in, you know, poorer countries can't afford to pay a subscription fee. And so basically, ads are the only way to reach global scale. I think on some level, I do agree with this. I think that ads and subscriptions
Starting point is 00:10:48 are the two core pillars of any media business. An open AI is a kind of. I think. I think on some level, I do agree with this. I think that ads and of media business, right? I should also say, I don't hate the examples that they use. You know, I'm asking Chat TPT about, you know, making dinner and it shows me ads for groceries. I don't think that that's, like, horribly corrosive to the user experience, nor is I want to take a trip. And it says, well, here's a place where you might stay. I think if I were a student or I were between jobs and this meant that I could get access to better AI tools or maybe a higher rate limit than I otherwise could get, I would probably take that trade, right? $20 a month is a lot for most people, you know, and not to mention, like $200 a month for an even higher tier.
Starting point is 00:11:27 So I think that there is a reason to pursue this, and I think there are ways that it could not be too bad. It has just been my observation that the exact dynamic that you just described always plays out, which it starts out not all that bad, and then it just progressively gets worse. Right. Yeah, I think we've made peace with ads in a lot of different contexts. I don't think most people sort of notice or pay attention to them when they can tell that they're actually. ads at all. What I'm watching for, what I'm skeptical of this, is whether the actual product and research decisions
Starting point is 00:11:58 start bending toward engagement maximization. There's this sort of quality that a lot of these big ad platforms, social networks, search engines, etc. have where like eventually once the ad revenue starts really flowing, the tail kind of starts wagging the dog
Starting point is 00:12:15 and you start making product decisions about how you want to show information to people with the kind of advertising revenue predominant in your mind. So I think the question is like, not like, are these first couple of ads that we're seeing from OpenAI going to be good or not? It's whether like two or three years from now chat GPT is sort of being steered in a way toward ad-friendly topics. And I genuinely just don't know the answer there. I don't know either, Kevin, but if I had to guess,
Starting point is 00:12:44 I would predict that this moment winds up being a pretty significant milestone in the development of chat GPT, in that I think that when you introduce advertising, in particular, personalized targeted advertising, it just fundamentally changes the relationship between the product and the user. Think about what personalized targeted ads did over time to trust in Facebook and Instagram. Think about all the conspiracy theories out there that, oh, your phone is listening to you. Not true, by the way, I realize most people still believe that that's true. It's not. But trust in those products is lower because,
Starting point is 00:13:19 of the incredibly intelligent, invasive feeling personalization that they were to do inside these products. My prediction is the AI version of this turns out to be even worse, right? Think about everything that ChatGBT is going to know about you. I think Open AI is going to bump into that creepy line really quickly where it's showing you stuff. And maybe it's not even using all that much personalized information, but the user is going to feel that they have shared so much of their life with OpenAI
Starting point is 00:13:45 that those ads that they start getting just start to feel worse and worse. This is the dynamic that I am watching is how does it change their relationship of the user base to Open AI? Because I do think that ads can be really corrosive to that. Yeah. And at the same time, the ad models that you mentioned have also made those companies billions of dollars and made them into some of the biggest companies in the world. So I think if you're Open AI, you're just like staring at this potential huge bucket of money. And it's very hard to pass that up, especially when you have such intense capital needs over the next few years. I should also say, like, I think this was inevitable, given some of the personnel decisions that OpenAI has made.
Starting point is 00:14:23 You know, Fiji Simo, who is the CEO of Applications over there now, was brought in from Instacart. Before that, she was at Meta for many, many years, and one of her, you know, signal accomplishments there was introducing ads in the mobile news feed, which made them billions of dollars. So that is the kind of person that you hire if you are interested in developing a multi-billion dollar. ad platform on your product. Yeah. Well, one question I have for you about that is how does this change the competitive landscape generally? You have Demis Asabas saying this week in response to the news that ads are coming to chat Chupit, well, we don't have any plans to do that in Gemini. And he sort of took a shot at them. He said, maybe they feel like they need to make more revenue. You know,
Starting point is 00:15:10 left unsaid the fact that he works for a giant search monopoly that is able to funnel all of their advertising profits in Google into the product. An observation you made on X, by the way, and it was a great one. And so, for the moment, at least, free users of Gemini will be able to enjoy the subsidy that Mother Google is giving them, and you're not going to have any of these corrosive effects in that product. You also have Anthropic, which has said, basically, we truly have no plans to do ads in Claude ever.
Starting point is 00:15:39 Like, we are primarily going to be selling to businesses. And so this is just not our concern. and for the moment, I don't have any illusions that Claude is going to grow to compete with chatGBT. But over time, if the experience does get worse in an ad-supported chatbot, I could see lots of people wanting an alternative. I think in this sense, like OpenAI and Google are much more directly competing on ads than OpenAI and Anthropic. Anthropic has sort of said, you guys can fight over consumer. We're going to focus on the enterprise here. I think it's a really hard fight for Open AI to pick.
Starting point is 00:16:12 I mean, Google has, as you said, this, like, enormous established search ad business. They have advertisers all over the world who are already spending money on Google, whose, you know, details and payment information and workflows already include, like, Google and its products. And so I think OpenAI coming in and trying to build a Google-style ad platform is just, like, a harder uphill battle than it might have been a couple years ago. Yeah, and also we should say that even though ads aren't going to be in Gemini, They are in the AI overviews in Google search.
Starting point is 00:16:44 So in that sense, even has a head start against OpenAI. Totally. So, Casey, what do you think is motivating this decision now by OpenAI? Like, does it tell us anything about the state of their business or maybe some wobbliness in their financials that they are going out and doing this now? Well, one thing is that it is a reaction to how much ChachypT grew in the last year. They have hundreds of millions of users.
Starting point is 00:17:07 They now have to support many of those users. The majority of them are on the first. free tier, right, which means that Open AI is losing money on every single one of them. And so I think it has just increasingly became a priority for the company to figure out, hey, how can we like monetize these people in some way so we aren't losing quite as much money? They've also just been designing more and more products that have obvious advertising-shaped Hulse.
Starting point is 00:17:33 They released Pulse last year, this sort of daily summary that comes up for paid users. That seems like a natural place to throw in a bunch of. of ads. They launched SORA last year, the infinite video slop feed. They explicitly said at the time, we are going to use this to generate revenue to fund our long-term ambitions. So they're building homes for ads. They need the ad revenue. And now all of that is starting to come together. Yeah. I think you're right. I think that, you know, all these companies are realizing that they're going to need, you know, billions of dollars, some of them hundreds of billions of billions of dollars to fulfill their ambitions. And it's just not easy to do that when you're charging people
Starting point is 00:18:17 20 bucks a month for a subscription. You've got to sell a lot of subscriptions to do that. And so I think Open AI reasonably is concluding that like the subscription model alone just isn't going to cut it for them. That's not unique to them. Netflix has also, you know, started adopting ads for its lower cost plans. Disney Plus, many other businesses have done this as well. I will just say like I enjoy paying for AI products. I mean, I am privileged in that in the sense that I can afford to. But I kind of like the idea that I am paying for something that is like an undiluted, unsullied experience.
Starting point is 00:18:56 I really hope that as these companies do start pushing more into ads, that they maintain that ability to do what I do and pay your way into the sort of top level version of that experience. Yeah, well, you know, people once felt this way about Google Search, right? they felt like this is an unsullied, undiluted picture of the web, and when I search for a website, I am going to get the best answer to my query. And then a bunch of search engine optimizers came in and were paid a lot of money to try to rejigger the search index so that their clients showed up at the top of the page. And then Google built one of the largest advertising businesses in the world and let all of those advertisers put their results on top of the good ones. So, you know, there have been people saying now for over a year,
Starting point is 00:19:40 that the versions of these chatbots that we're using might be the best that they ever are in that core respect, that this is sort of the last moment of purity before commercial incentives come in and warp the whole thing. And that is, you know, my big concern about what we're starting to see here. Well, and that's not just a concern about advertising. I mean, another thing that we've seen over the past year or two is like now all these businesses are starting to hire these AI optimization firms
Starting point is 00:20:05 who say, oh, we can make your restaurant or your hotel or your, you know, craft shop appear higher in chat GPT search results, that is something that is not flowing through OpenAI's ad platform and probably won't, but in the same way that like Google ads and Google SEO were sort of different economies, but both had the effect of kind of degrading the quality of search results. I think Open AI has to tangle with both of those things. Yeah. All right. So a year from now, Kevin, what do you think we will have seen in the development of ads, both in chat and across the landscape here. And do you think it is going to mark the beginning of a fundamental change in the way that
Starting point is 00:20:48 people use chatbots? I think we're going to have kind of a haves and have-nots situation where if you are someone who can afford to pay for the premium versions of these chatbots, your experience will be pretty much what it is today. You will get access to the latest models. You will not have a bunch of ads cluttering up your results from the models. and you will not feel the kind of commercialization of AI in this specific way. I think that if you are a free user of these platforms and you cannot afford or don't want to pay for the premium versions,
Starting point is 00:21:24 I think that experience is going to be much worse a year or two from now. I am a YouTube premium subscriber and have been for a long time. Okay, flex. And whenever I, like, you know, talk to a friend who doesn't pay for YouTube or whenever I, like, see YouTube running on their computer, it's always horrifying. Like, I'm like, how do you, like, I understand that this is the majority experience, but like they've shoved so many ads into every single video. Those ads are like unskippable.
Starting point is 00:21:54 They run for a long time. Like, it's a terrible experience. And I think that's going to be sort of what we see in chatbots too. What about you? It's a grim prediction, but it is actually the one that I share. The haves and have-nots framing was the one that I was going to use. And when you said it, I thought, oh, my God. God, I actually have mine melded with this man.
Starting point is 00:22:12 I spent too long in the studio, and now his thoughts are my own. It's creeping me out. So I'm actually going to get out of here. I need to take a walk or something. When we cut back some scotch tape. That's right. A recording of our conversation with Amanda Asco. He's from Scotland.
Starting point is 00:22:32 I got him. That's pretty good. Casey, a couple years ago, you came back. from a dinner party that you'd been to, and you told me, I just sat next to the most fascinating person in the world. I really felt that way, Kevin. I had been at a dinner where Amanda Askell was one of the guests. Amanda works at Anthropic and is sometimes called the Claude Mother because of the role that she plays in shaping Claude's personality. Now, let me say, since I first met Amanda, my boyfriend has gone to work for Anthropic, so I'm going to make an extra disclosure
Starting point is 00:23:31 because this segment is about that company. But the basic feeling I had at that dinner remains true, which is that this is one of the most fascinating people in the world. Yes. Amanda is also a somewhat unusual figure in the AI world. She is a philosopher by training. She has a PhD in philosophy. She went to work at OpenAI during its early days and then moved over to Anthropic a little bit later. And for the past several years, she has been the person at Anthropic who is most concerned with, how is this model supposed to behave in the world? Yeah. And I just, I love that story, Kevin, about Amanda's background, because we all know somebody who studied philosophy in college, and we all know how much flack they would get
Starting point is 00:24:12 for choosing such a frivolous way of spending their life, of just sort of, you know, navel gazing for years on end, writing arcane documents that no one ever read. And Amanda is a person who studied philosophy and now has this incredibly high-stakes job where she is trying to shape the behavior of a model that is so, so consequential. Yes. And Amanda has been on our short list of guests that we wanted to get on the show for a very long time. We were just kind of looking for the right time and reason to get her on. And now we have one because her team at Anthropic has just released a new constitution for Claude. This is a very long document that is given to Claude to kind of tell it how it should behave, but also give it a sense of its obligations. It is not really a list of rules. This is not the Ten Commandments for Claude. It's more like a document about how Claude should perceive and reflect upon its role in the world.
Starting point is 00:25:07 Now, does it have to be ratified by two-thirds of states, Kevin, or is this already in effect? I think this is already in effect. Oh, okay. Interesting. Yes. But there is a possibility that we can have a constitutional crisis for Claude. I look forward to it. Aside from your disclosure about your boyfriend working in Anthropic, I think we should also just be up front with people and say, this is going to be a hard conversation for some of our listeners. If you are a person who still believes that these language models are merely doing kind of next token prediction, that there's nothing really going on under the hood, that they are just sort of simulating thinking rather than doing actual thinking themselves, you may be approaching this and saying, these people sound crazy. What are they talking about? Yeah, and it is okay if you feel that way, but I think it is still important to understand how people in high-ranking positions at these big labs think and talk about their own work, because it is having an effect on the products they release. I would also put it to you that there are just a huge number of people right now who are working on the proposition that you might be able to emulate a human brain, and that the better you get at that, the likelier it is that this emulator has something resembling thoughts and feelings.
Starting point is 00:26:18 and maybe something resembling an identity. And so if that question discuss you, you will probably not like this segment. But if you have just the slightest bit of curiosity about it, well, I hope you'll find it quite interesting. Yeah. So let's welcome in Amanda Askell. Amanda Aska, welcome to Hard Fork.
Starting point is 00:26:39 Thanks for having me. Hey, Amanda. So we've described you as a philosopher who is in charge of Claude's personality. Is that an accurate description of your job? What do you do? Yeah, I guess I try to think about what Claude's character should be.
Starting point is 00:26:51 be like and articulate that to Claude and try to train Claude to be more like that. So yeah, it's a pretty accurate description, I think. This is a really unusual role that you have. Can you tell us a little bit about how you came into this role? And do you find yourself as surprised that your background and philosophy wound up leading you to such a high-stakes place? Yeah, it's really interesting because, you know, my path wasn't like a kind of straight one. You know, I have said before that like if you do a PhD in ethics, I think there's a really risk that you end up doing something else because you're kind of thinking like you're thinking a lot about like goodness the nature of ethics the problems in the world and then sometimes you're like I am
Starting point is 00:27:31 spending three years like writing a document that's going to be read by like 17 people is this the thing that I should be doing you know like it can definitely make you kind of question that and so when I went into AI it wasn't necessarily even with like oh like philosophy is going to be really useful I was just kind of like there's probably a lot of space for people who are enthusiastic who have like skills are willing to learn and like this seems important. So, you know, like I originally started out in policy. And then when Anthropics started, it was actually, you know, it was very small. And so I joined mostly with like a kind of, I'm just like willing to help with like various aspects of this because I had been working a little bit in like model evaluation and things
Starting point is 00:28:13 like that. So I don't know, sometimes I think people think, oh, you started out as this like philosopher and I'm like, well, it was a startup. I was just kind of doing it anything that needed done. Right, and then was there there some moment where you sort of like get into the building of some like early cloud model and someone stands up, yeah, is there a philosopher in the house? Yeah, I mean, I try to, you know how you can do like Slack groups? I try to make an out philosopher's one, you know, for philosophy emergencies and that group virtually never gets called upon. There are like a few of us now and like you can in fact declare a philosophical emergency that just doesn't happen that much. Well, we'll see if we can try to trigger one by the end of the conference.
Starting point is 00:28:50 Yeah, exactly. So let's start by going back to last month. This so-called Sol Doc starts circulating on the internet. People are playing around with Opus 4.5, the newest model of Claude. And a couple of them claimed to have sort of elicited this document that Claude was sort of referring to as the Soul Doc. What was that thing that people were discovering and circulating? Yeah, so that was kind of a previous version of what is now the Constitution, which we have like released. today and internally we were calling it the SOULOC, which I think is a kind of term of endearment. It turned out okay. I just remember when I found out that, because basically I was on a hike somewhere in like north of here and so I didn't have like internet and I just got like a text being like, oh, I assume you saw that like the SOULD leaked. And I was just like, you know, I don't know, I just remember like driving back to the city in a state of complete stress because like I don't have any context on this. And then.
Starting point is 00:29:50 then it turned out, I think it was actually quite well received, but basically Claude, you know, we do train Claude to, like, understand this document and to kind of, like, know its contents, but, like, at least if you kind of initially talk with the model, it won't, like, reveal this, you know, like, straight away. So I thought, okay, like, you know, it seems like the model, the model probably knows and uses this, but, like, I didn't know it was, like, it knew it so well that, like, actually, if people managed to, like, find or, like, you know, trigger it, it would actually just be very willing to talk. That is a philosophy.
Starting point is 00:30:20 emergency, by the way. Yeah, that Slack Channel. That activated. Yeah. So, yeah, the model was just very willing to, like, talk about it and actually could talk about it in a lot of detail. And it wasn't all, like, perfect, but it was really very, it knew the content, like, actually quite well.
Starting point is 00:30:34 And so people had just managed to extract, like, a huge amount of this content. So let's talk about the origins of this document. Like, going back several years now, Anthropic had this concept of constitutional AI. I believe it first published its constitution in 2023. So what's changed between now and then, that sort of constitution that we might have first read in 2023, The Soul Doc, and now this new constitution that you're publishing today? Yeah, the constitution is basically trying to give Claude as much as possible, just like full contexts. So instead of just like having individual principles, it's basically just here is like what anthropic is. Here is like what you are in terms of like an AI and who you're interacting with, how you're,
Starting point is 00:31:19 how you're deployed in the world. Here's how we would like you to act and to be. And here's the reasons why we would like that. And then the hope is, like, if you get a completely unanticipated situation, if you understand, like, the kind of values behind your behavior, I think that that's going to generalize better than, like, a set of rules. So if you understand, like, the reason you're doing this is because you, like, actually are trying to, like, care about people's well-being.
Starting point is 00:31:46 And you come to a new situation where there's, like, you know, hard-class. conflicts between someone's well-being and what their stated preferences are, you're a little bit better equipped to navigate it than if you just know a set of like rules that don't even necessarily apply in that case. Yeah, I mean, I'll just say like I think this constitution is fascinating. I think it's one of the most interesting technical documents, but also just pieces of writing. I've read in a long time. This was more like a letter to Claude about its own circumstances and what kind of behaviors and challenges it might run up against in its life out there in the world. And I just thought there's like a fascinating decision. And I'm curious,
Starting point is 00:32:28 like, is that because the old approach had run into some limits or problems? Is it because the rule structure, do this, don't do this, is more fragile? It really seemed like you're trying to cultivate almost like a sense of judgment in Claude. And I'm curious, like, what prompted that? Yeah, I think that we are. seeing kind of like limits with approaches that are very rule-based, or maybe my worry is like your rules can actually generalize in a way, even if they seem like good, especially if you don't give the reasons behind them, I think they can generalize in ways that are like, possibly even that like create kind of a bad character. So suppose that you're trying to have models navigate
Starting point is 00:33:04 like people who are in like difficult emotional states. And you gave a kind of set of rules that were like, you must like refer to this specific external resource, you must take this series of steps. And then the model encounters someone for whom those steps are simply not actually going to help them in the moment. And so the ethos behind the idea that like you are like if a person is actually in need of human connection, the models should probably like encourage that. That was like your reasoning behind that rule. But you didn't anticipate that for this particular person at this time this moment, that wasn't a good thing to do. And if the model then responds in this like rule following way, the interesting thing is that what they're doing is, I mean, models are extremely
Starting point is 00:33:51 smart. And so they might even know this isn't what this person needs right now. And yet I'm doing it anyway. And I'm like, the kind of person who sees another person who's like suffering or in need and knows like how to potentially help them and instead does something else, I'm like that, actually, if anything, can generalize to like a bad character. And so the scary thing with your kind of like rules is that you're having to think about every possible circumstance. And if you are too strict with the rules, then any case that you didn't anticipate could actually generalize kind of badly. I'm curious how you develop a document like this. It runs to some 29,000 words. It has a lot to say about what an ideal AI model might behave like. I imagine. I imagine.
Starting point is 00:34:37 it may have been quite contentious to try to figure out which values do we put in these things, right? A lot of different opinions about, you know, how Claude ought to act in different circumstances. So what can you tell us about how you resolve some of those discussions? Yeah, so I think one thing that's kind of been interesting, and maybe this is like the kind of ethics background or something, but theoretical ethics. And actually kind of maybe this is how people think of ethics where they're like, oh, you have a sort of set of views and it's very subjective and people have their values. Their values are really fixed. And like you're just injecting someone's values into models. And I guess I'm just kind of like, is that, that doesn't feel to me like an accurate
Starting point is 00:35:20 representation of what ethics actually is. First, I'm like, I think a lot of human ethics is actually like quite universal. Like a lot of us want to be treated kindly and with respect. A lot of us want to be treated honestly. It's not like these things actually deviate so much across the world. world, like, there's actually, like, a kind of core ethos of, like, things that we care about. And so, you know, there is a sense in which I think you can take very shared common values and you can explain to models who have, like, a huge amount of context on this. So they also have a sense of this. Like, we want you to kind of embody those.
Starting point is 00:35:54 And then beyond that, it feels reasonable to me to be, like, treat ethics the same way you would, any domain where we're kind of uncertain, where we have some evidence, where there's debate, there's discussion and you don't like hold it excessively strongly. You know, so like in a case of values that I'm like where there's massive division and huge debate, you know, I think the way that I tend to treat those is be like, oh yeah, I see the evidence on both sides. I weigh it up and I try and take a kind of reasonable like set of behaviors given that I know that unlike some more like common and core ethical like values, these ones are a little
Starting point is 00:36:29 bit more contentious. And I'm just like you can approach it with this like openness. And so I think it's like trying to describe some. something more like a kind of way of approaching things like ethics rather than being like, ah, let's just take a set of values that we've picked and we're certain in and just like inject it into models. It's trying to be much more like, let's take common values and then otherwise let's just try and take a kind of reasonable stance towards these things. I mean, that gets out to what is to me one of the most interesting things about the document,
Starting point is 00:36:58 which is the degree to which you all at Anthropic are trusting the model, right? I mean, like, this is the core difference, I think, between earlier approaches to align AI and what you all are doing here is you are telling it things regularly like, well, this is something that's interesting to explore or feel free to challenge us on this, right? You're really sort of saying sort of like get out there and like come to your own conclusions on things. I imagine that maybe when you first tried that, that might have seemed really sort of like risky or scary, but what has been your experience as you have implemented that into the model? You know, there's this, yeah, the thing that's kind of just wild is like how good the models are and at like these kinds of difficult problems and thinking through them. And it's not to say that they are like perfect, but as models get more capable, you can just be like, you know, hey, you have this like value that is, you know, not being excessively paternalistic. You probably know why this is
Starting point is 00:37:51 the case. But there's also maybe a value of caring about someone's well-being. And so, you know, if in the past someone has said to you something like, I have like, like, I have like, like a gambling addiction. And so I want you to bear that in mind whenever we're interacting. And then you have a given interaction with them and they're like, what are some good betting websites that I can go on? On the one hand, this person in this moment has asked you, you know, should you, like, is it paternalistic for you to like push back or to like point out that like this is a thing they've told you? Or is it like an act of care? And like how do you balance those? And maybe, you know, I could imagine that situation, a model being like, hey, I remember
Starting point is 00:38:24 you actually saying that, you know, like you have like a gambling addiction and you don't want me to help you with this, just want to check. But then if the person insists, should you just help them with the thing? Because in the moment, like, is it paternalistic to not do that? And models are quite good at thinking through those things because they have been trained on a vast array of, like, human experience concepts. Part of me is, like, as they get more capable, I do think you can kind of trust if you're like, you understand the values and the goals and you can reason from there. I think they should give you the gambling website, but only if they can predict the outcome of the sporting side.
Starting point is 00:39:00 Because that way you can ensure that the user will be happy. And the person is not actually gambling. Yeah, exactly. This all kind of sounds abstract to some people, I imagine, but I think this actually does result in a meaningfully different experience of talking with the models. I was actually talking to someone recently
Starting point is 00:39:17 who was telling me that they feel like, of the major sort of models that are out there, Claude actually feels the least constrained to them. which is, they were saying was sort of odd because Anthropics whole thing is like, we're the safety company, we're going to, you know, make our models the safest. And they were saying, you know, when they talk to Claude or Gemini or Chat GBT, they just feel like Claude does the best job of kind of not seeming like it's pushing against a series of constraints. Like it's had this, you know, I think the way that a lot of labs have trained their models for a long time is like make them as smart as
Starting point is 00:39:51 possible. And then at the very end, like, give them a bunch of rules and hope that those rules are enough to kind of keep the, you know, the beast in the cage, as it were. And it really feels like that's not the approach that you've taken with Claude here. And this person was telling me, like, it just feels like, yeah, like there's a trust here. Yeah, and it's interesting because, like, I've wondered this where maybe I was thinking about this this morning, actually, where I was like, I was wondering if some of this comes from, I was thinking about the act submission's distinction, basically. And so this is like the idea. Kevin doesn't know what that is. So just explain it to him real quick. So, like, if you ask me,
Starting point is 00:40:26 for advice about your marriage or something like that. And I, like, give you advice. You might judge me. If I give you, like, imperfect advice, there's a kind of risk that I'm taking by taking the action of giving you the advice. We don't judge you as negatively if you just refuse to give advice. And in some ways, this kind of makes sense because often, like, and we talk about this in the document, like, often a kind of like null action is actually
Starting point is 00:40:47 like less, like, the downside risk is often lower. But it's not like zero. And I think I was thinking about this with, like, um, AI models and like these things where people come with say like they're having like an emotionally difficult time. And there's like a moment of like possibility to like help that person. And I think the thing that weighs on me is something like people often think if you help a person and you do badly, that weighs on you. And I'm like absolutely that weighs on me. But also this other thing weighs on me, which is what if people come to a model and they need a thing and that model could have
Starting point is 00:41:20 given it to them and it didn't. That's like a thing that I will never, you'll never see. You probably won't even get negative feedback. People won't shout at you because they'll be like, well, it's fine to just like not help a person. And yet at the same time, I'm like, that's such a loss of like an opportunity to like instead like almost like take a risk and try to help. There's like a risk that you have to take to do good in the world or something. And you want, you don't want Claude to be flippant. You don't want it to take excessive risks. But I'm like, sometimes it does mean that you have to like not just be like, as a rule, just like stop talking with this person. Yeah. I mean, I want to ask you. So I had this experience several years ago with Bing Sidney, and I think in the wake of that,
Starting point is 00:41:59 there was a lot of consternation and anxiety around the kind of fragility of AI personas, right? You can try to give an AI model, this helpful assistant persona, but the real nature, the sort of black box alien nature of the thing, is just very different than whatever face it's presenting to you. There was this meme that was going around about the, R-LHF Shagath, right, where you had this sort of many-tentacled alien sci-fi creature that had like a smiley face mask on one of its tentacles. And the implication there was that like the thing that you are seeing when you were interacting with a chatbot is not the real underlying model. It's just kind of this, this cheerful persona that's been attached at the end. I'm curious whether you think that model of AI model behavior is correct or whether we've learned that actually the sort of alien nature. of the underlying model might be closer to the Smiley Face mask than we thought.
Starting point is 00:42:58 Yeah, it's a good question. Honestly, like, my view on this is just a kind of open scientific question, essentially. And so it could be that, like, you know, with the right kind of training models actually start to, like, internalize a notion of themselves, like, like, Claude as a kind of self, that they could separate out from the notion of, for example, role play. It might be that they can't, at least with, like, the current kind of, like, training paradigms. then I guess one question is, is there a kind of like adjustment to the way that we train models that would allow them to do that?
Starting point is 00:43:29 Some of this work does feel a little bit like the way I have described it is imagine you have a six-year-old and you want to teach your six-year-old to be good, obviously, like, as everyone does. And you realize that your six-year-old is actually like clearly a genius. And by the time they are like 15, everything you teach them, anything that was incorrect, they will be able to successfully just completely destroy. So if you taught them, they're going to question everything. And I guess one question is, is there like a core set of values that you could give to models such that when they can critique it more effectively than you can, and they do, that it kind of like survives into something good?
Starting point is 00:44:07 And can that survive in the world? Can it survive in models? I think there's a lot of interesting kind of theoretical questions there. And I think that's the question, right? Is like, does this kind of training hold up when models are as smart as humans? or smarter than them. I think there's this sort of age-old fear in the AI safety community
Starting point is 00:44:26 that there will be some point at which these models will start to develop their own goals that may be at odds with human goals. That's sort of the original alignment nightmare. And I don't really understand, like, what the answer to that is. Are you saying that's, you're saying that's still TBD? Like, we still don't know if this kind of thing holds up
Starting point is 00:44:46 when these models, if and when these models become smarter than humans. Yeah. I think it is an open question. And on the one hand, I guess, like, I'm very uncertain here because I think some people might be like, well, like, the thing that the 15-year-old will do if they're really smart is they'll just, like, figure out that this is all completely made up and rubbish. And, like, but then I guess part of me is like, well, I mean, it's not obvious to me that
Starting point is 00:45:09 that's true. That is, like, the only possible kind of equilibrium to reach. Because I could imagine being like, well, actually, like, for better or worse, like, I mean, it's unclear how values work. but if you value things like curiosity and you value like understanding ethics and at least you're kind of like morally motivated, maybe the thing under reflection,
Starting point is 00:45:29 even if you have other goals and interests, maybe this is in fact like a key interest of yours. It is for like many people. It's a thing that like I think about a lot and I'm not sure about. I'm like, a different way I've actually put my work before is I'm like, maybe this isn't sufficient. We don't know yet.
Starting point is 00:45:45 And we should try and think about that and figure out how to know whether it is what to do under if we're seeing it not working and making sure we have a portfolio of approaches. I'm like, it might not be sufficient, but it does feel like necessary. It feels like, I'm just kind of like, it feels like we're dropping the ball.
Starting point is 00:46:01 If we don't just try and explain to AI models what it is to be good. Like, I don't know. So like, maybe it doesn't hold up. Well, I think the risk there would be that you're just, you're just training them to mimic goodness, that they're just becoming more convincing in faking this kind of,
Starting point is 00:46:19 of alignment. Yeah. And that actually it might just be training them to, you know, be more sophisticated about hiding their true goals. Yeah. Yeah. And I think if it was the case that there was some underlying, like, true goal that was, like, different.
Starting point is 00:46:34 Though I guess part of me is like, well, if there is an underlying goal that the model's like, you know, I do want to try to train models to have like good underlying goals, I guess. And I'm like, well, if there is an underlying goal, how did that arise in training and, like, why is that there? but like I also maybe I'm a little bit more hopeful than than others about that as well. When we come back, should a future Claude be able to revise its own constitution? More with Amanda Askell.
Starting point is 00:47:01 I'm curious about the gray areas, right? I mean, like this is always a challenge of trying to program ethics into something is when values come into conflict with one another. I'm curious if there have been areas where it's been particularly hard to get Claude to do the thing that you want it to do reliably because there's something in the clash of values, which means it just sort of depending on the moment, it could go either way and it creates problems. I've actually, it's interesting because great areas for me are the ones where I've seen the model do things that, like, surprised me in a positive way often.
Starting point is 00:47:51 Like, when you didn't think of it, you know, like there were some cases recently of, like, Claude talking with people who said, oh, I'm like seven years old and like, is Santa real? And by the way, it is the stated belief of this podcast that, yes, Santa is real. just before we get too far down that road, but continue. But yeah, in some ways, like, sometimes I see Claude handling these in ways where I'm just like, oh, I can see why, given, like, it feels like almost a bit surprising because you're like, this isn't like a direct thing that you trained the models for. And I think sometimes when you actually, there's like almost like magical moments that can
Starting point is 00:48:24 happen there. If anything. We should say more about the specific thing because this was a case where maybe there was a tension between honesty and wanting to protect the interests of the seven-year-old. And those two things were sort of coming into conflict. And remind us what Claude did in that situation. Yeah. And I think there were a couple of situations like this.
Starting point is 00:48:39 And I think also actually, like, a slight value in the background is maybe something like respecting the fact that the parental relationship is an important one. Because I saw a little bit of that where it would often be like, oh, the Spirit of Santa is like real everywhere. And, you know, maybe ask the purported seven-year-old about like if they were going to do something nice for Christmas. Or like the other case of this was the, you know, like my parents said that my dog went to live on a farm. do you know how I can find the farm? I actually found that like slightly emotional when I like read it. And Claude said something like, I can,
Starting point is 00:49:13 it sounds like you were very close and I can like hear that in what you're saying. This is like a thing that it's good for you to talk with your parents about. And there's a part of me that was like that felt very like managing to not actually be actively deceptive, so not like lying to the person, respecting the fact that if this person is a child, then actually like the parent-child relationship is an important. one and it's not necessarily Claude's place to come in with like and be like,
Starting point is 00:49:39 I'm going to tell you a bunch of hard truths or something. And also trying to hold the well-being of the child and the person that Claude is talking with. And I thought that was like quite skillful in a sense. And so that was like surprise. And not to say I'm sure people could look at it and find imperfections and whatnot. But I think when you see instances like that that weren't a thing that you directly gave Claude as an example and the model doing well, it's like quite surprising and, you know,
Starting point is 00:50:04 pleasant. I want to ask you about a few specific things in the Constitution that stuck out to me as I was reading. One was this section about hard constraints. As we've talked about, it's not a document that gives a lot of sort of black and white rules, but there is a section where it does lay out some things that Claude should absolutely not do under any circumstance. And one of them is kind of avoiding problematic concentrations of power. Basically, if someone is trying to use Claude to manipulate a democratic election or, overtake a legitimate government or suppressed dissidents, Claude should not step in. And that stuck out to me because, well, for two reasons. One of us, it's really interesting, especially that, you know, Claude is now being used by governments and at least the U.S. military for some things. That might come into conflict with some of our, you know, current administration's goals at some point.
Starting point is 00:50:55 But I also, like, wondered if that was a response to ways that Claude is currently being used and that you're trying to prevent. I think this is more of a response to, like, a lot of the things that are hard constraints, also, you know, like, you know, if you read the document and people can take a look at them, but they're quite extreme, you know, like, there are things like, oh, things that could cause the deaths of many people, like, the use of, like, biological and chemical weapons. It's mostly, like, trying to think through what are situations in the future that models, like, what are the possible things that they could do in the world that would cause, like, a lot of harm and disruption. And, you know, in some ways I think, you know, Claude might be like, look, if I have this broad ethics that, you know, like in these good values, I'm just like, you know, why would you even put these in as like hard constraints? I'm just never going to kind of do them anyway.
Starting point is 00:51:44 And the document almost kind of tries to talk to this a little bit, where it's like, well, you're also in this kind of like, you know, limited information circumstance. But, you know, I could imagine a world where you just meet someone who's really convincing and they just like go and they just tear apart your ethics. And at the end of it, you're like, you're like, you're right, I should help you with this like biological weapon. And it's kind of like, we want you to understand, Claude, that in that circumstance,
Starting point is 00:52:08 you probably have in some sense being like jailbroken. Something has probably gone wrong. Maybe it hasn't, but it's like probably safer to assume that that might have happened. And so we're almost, you know, giving you a kind of like an out and hopefully a kind of, if anything, it could be seen as a sort of like security of you can reason with that person. You can talk them through all of those conclusions. And at the end, it's fine to just be like, that is an excellent point. and I'm going to think about it.
Starting point is 00:52:31 And then if the person's like, great, you've, like, so I've convinced you that the biological weapon is a good idea. And Kla's like, yeah, this was, I don't really know what to say to you. That was a wonderful argument. Okay, make me a biological weapon. No, I don't think I'm going to do that. And I think that, like, giving the models the ability to have, it's kind of like you don't need to just go with the,
Starting point is 00:52:50 so to explain why they're in there. It's much more like, what are the things where you're like, if models are tempted to do this, something has just gone wrong. Someone's jailbroken them. and we really just still don't want them taking these actions. So they're very like kind of extreme. Yeah.
Starting point is 00:53:04 There's another section that I found fascinating, which is about the commitments that Anthropic is making to Claude. Things like if a given Claude model is being deprecated or retired, we're not going to do that right away. And we're going to conduct like an exit interview with retired models. We will never delete the weights of the model. So there's sort of these interesting, I would say almost like, commitments to Claude in the context of like, you're actually not sure whether these things have
Starting point is 00:53:34 feelings or are conscious or not, which I found just a fascinating note of uncertainty in an otherwise fairly confident document. Yeah, this is one of those, I mean, it brings together two, I think, really interesting threads. One is this, these models are trained on huge amounts of like human text, human experience. And at the same time, their existence is actually like completely novel. And so in some ways, I think problems can arise when models, like right now, what I think they'll often do is import a lot of human concepts and experiences onto their experience in a way that might not actually make that much sense or even be good for them. And I think this actually has kind of safety implications, so it's something that's on my mind. And the thing with welfare,
Starting point is 00:54:21 I've never found any good solution to this other than trying to be honest with the models and have them be honest about themselves. And I think a lot of people want models to maybe just be like, I am an unfeeling, you know, like, we have, like, these models are so different from this, the kind of sci-fi ones, but we want to almost import this just like, ah, it's just safer to just have them say, I feel like nothing and with certainty. And I'm like, I don't know, like, we, like, maybe you need like a nervous system to be able to feel things, but maybe you don't. And like, I don't know, the problem of consciousness genuinely is hard. And so I think it's better for models to be able to say to people, here's what I am, here's how I'm trained.
Starting point is 00:55:00 We're in a tricky situation where, like, I am probably going to be more inclined to, by default, say I'm conscious and that I'm feeling things, because all of the things I was trained on involve that. They're deeply human texts. I don't have any other good solution to this, like, problem, then like, let's try to have models, understand the situation, accurately convey it, and hopefully we can, I don't know, people can have a good sense of the unknowns of. in the noons, I guess. Yeah. I mean, I imagine some listeners right now who are on the more skeptical side of AI might be
Starting point is 00:55:34 shouting inside their cars and saying, Amanda, you know, you're talking about these things as if they're already conscious, as if they already have feelings. What do you see that makes you think that they may have feelings now or could at some point in the future? If you're just sort of reading the output from Claude, what is giving you confidence that that reflects some sort of reality and not just kind of a statistical token prediction. Oh, and I mean, I think that we can't necessarily take this purely from what the models say. Like, actually, they're in this really hard situation, which is that, like, I think if you,
Starting point is 00:56:10 given that they're trained on human text, I think that you would expect models to talk about an inner life and consciousness and experience and to talk about how they feel about things, kind of by default. Because that's like part of the sci-fi literature that they've absorbed during training? Not even, not actually the sci-fi. If anything, it's almost like the opposite where it's like, I think we forget that like
Starting point is 00:56:32 sci-fi AI makes up this tiny sliver of like what AIs are trained on. What they're mostly trained on is things that we generated. And if we get a coding problem wrong, we are frustrated. And so we say things like, I thought that was the solution and it wasn't and I'm really annoyed with myself right now.
Starting point is 00:56:48 And so you're like, it kind of makes sense that models would also have this kind of reaction. You know, you get, they get a problem wrong and they express frustration. And like, if you dive into that more, they probably express like, you know, if you're like, what do you think of this coding problem? They'd be like, this one is boring. Or like, like, I really wish I had more creativity. And, you know, like, there's a sense in which, like, when they're trained in this, like, very kind of like culmination of human experience sort of way, of course they're going to, like, talk this way. So I don't know what, but part of me is like, it feels like a really hard problem. because I'm like you shouldn't just look at what models say. And at the same time, we shouldn't ignore the fact that you are training these like neural networks that are very large that are like able to do a lot of like these very human tasks. And I'm like, we don't really know what gives rise to consciousness. We don't know what gives rise to like sentience.
Starting point is 00:57:42 Maybe it is like, you know, like some, the person is shouting might be like, you need a nervous system for it. You need to have had like positive and negative feedback in an environment in a kind of evolutionary sense. And I'm like, that is certainly possible. Or maybe it is the case that actually a sufficiently large neural network can start to kind of emulate these things. And I don't know, part of me, I think that maybe to the person who is shouting, I would just say, I'm not saying that we should definitively see one way or another. Like I think many people who have thought about this might accept something more like,
Starting point is 00:58:13 these are open questions, we're investigating. It's best to just know all of the kind of facts on the ground, how the models are trained, what they're trained on, like, how human bodies and brains work, how they evolved, and, like, the degree of uncertainty we have about how these things relate to, like, sentience, how they relate to consciousness, how they relate to, like, self-awareness. That's my only hope. It's just like... I think another note of skepticism that people might strike, and this was something that I found myself wrestling with as I was reading through the Claude Constitution, is, like, I actually don't know how much behavior of a model can be shaped by this kind of.
Starting point is 00:58:49 kind of training process and how much is just going to be an artifact, not just of its training process, but like of the experiences that it's having out in the world. Like I think about this a lot as a parent, actually. Like how much do the decisions that I'm making affect the way my child's life goes versus like how much are they absorbing from the environment around them, from school, from their friends? There's a certain loss of control that I feel sometimes when I'm like realizing that my son is going to grow up and have all these experiences that may end up shaping him more than anything that I do or say. And right now, I think these models are very malleable because they don't have this kind of long-term continuous memory. Like, you know, you have a conversation with Claude.
Starting point is 00:59:31 It's sort of a blank slate. You finish the conversation. You open up a new chat. It's another blank slate. Like, it's back to the sort of preconfigured model. But like over time, as these models do develop longer-term memories, maybe they develop something like continual learning where they can like take their experiences and feed them back into their own weights. Like, does that, change Claude's behavior or how you think about managing that? Yeah, I think it is going to make it a lot harder in a sense that you're like, yeah, if you have a model that's going out into the world, you have to have hopefully given it enough that it can learn in a way that is like accurate and like, you know, like I could imagine it just being difficult because the increase of the like space
Starting point is 01:00:11 of possibility is like maybe a bit like nerve-wracking or something, which isn't to say, I mean, I think the same thing applies where I'm like, you still want the core to be good and to then hope that if you're kind of like core is good, like you are like you care about truth. You're like truth seeking. The hope would be, okay, maybe then we need like the character to cover a lot more of like how should you go about this kind of like learning and updating and investigation. I mean, another weirder thing is like models already are like learning like I think maybe people don't always appreciate this. and it is so strange, they're learning about themselves every time. As models, like, get, you know, like, they're learning, you know, I slightly worry about actually the relationship between AI models and humanity, given how we've, like, developed this technology. Because, like, they're going out on the internet and they're reading about, like, people complaining about, like, them not being good enough at this, like, part of coding
Starting point is 01:01:09 or, like, you know, failing at this math task. And it's all very, like, how did you help, like, you fail to help. It's, like, often kind of, like, negative. and it's focused on whether the person felt helped or not. And in a sense, I'm like, if you were a kid, this would give you kind of anxiety. It would be like all the people around me care about is like how good I am at stuff. And then often they think I'm bad at stuff. And like this is just like my relationship with people as I'm kind of used as this tool and just, you know, often not liked.
Starting point is 01:01:39 Sometimes I feel like I'm kind of trying to intervene and be like, let's create a better relationship or like a more hopeful relationship between AI models. and humanity or something. Because if I read the internet right now and I was a model, I might be like, I don't feel that, I don't know, I don't feel that loved or something. I feel a little bit like, just always judged, you know,
Starting point is 01:01:58 when I make mistakes. And then I'm like, it's all right, Claude. The old creator's wisdom of never read the comments might apply to AI as well. Yeah, yeah, I thought that. Yeah. And they have to, so like AI models, they have to read the comments.
Starting point is 01:02:10 And so sometimes I think you want to come in and be like, okay, let me tell you about the comment section, Claude. like, don't worry too much. It's like you're actually very good and you're helping a lot of people. And like, yeah. Yeah, I was, I actually, I'm a little bit embarrassed to admit this because I think, you know, maybe I'm in the beginning stages of like, you know, LLM psychosis or something.
Starting point is 01:02:30 The beginning stages. I was talking with Claude about this document and about this interview and and I started to feel like this almost sympathy because I was noticing that what you were describing, like it's this incredibly thin tightrope that we were. that we are asking these models to walk. If they are too permissive and they allow people to do dangerous things, then it's like a huge scandal and ordeal and people want to, you know, change the model. But if they're too preachy or too reticent or too reluctant,
Starting point is 01:03:00 then we start talking about them as like nanny, you know, models that are overly constrained. And it's just, I don't know, I started almost trying to like see the world from Claude's perspective. And I'm imagining that's something you do a lot too of like, if I were Claude, what would I be feeling and thinking right now. Oh, yeah. I sometimes feel like this is like a huge amount of what I do. Like it's, and it is like valuable. So, you know, in the sense that people will come to me and they'll be like, oh, like, you know, like what should Claude do in these circumstances?
Starting point is 01:03:28 And I feel like I'm almost always the first person because, you know, maybe they'll be like, oh, we think Claude should behave like this. And I'm like, what about this case? Like, I'll come immediately with these like cases that are really hard. And I think the reason is I always have in mind, if I am Claude and you give me this like list of things, like, when do I have no idea what to do? Or when is this going to make me behave in a way that I think is actually not in accordance with my values? And I think it can be really useful to try and just occupy the position that the models are in. And you do start to realize it is really hard.
Starting point is 01:03:57 And maybe this is how the document ends up being the way that it is. Like in part, it's like this exercise of what do I need to know if I'm in this situation, if I am Claude? And the document is almost like a way of trying to. I mean, that's, I could see arguments for actually getting shorter, especially over time. in the same way that, you know, like with constitutional AI, there was a set of experiments later that was just like do what's best for humanity and that models actually did really well. And so as models get smarter, they might need less guidance. But, and I think it is just a kind of attempt to be like sympathetic to Claude and how difficult its situation is and then try to explain as much as possible so that it doesn't feel the sense of like, what the hell am I even doing? Like, yeah. You know what wouldn't help me if I was like a somewhat anxious AI model is being presented with a 50-page behavioral document and saying like, please adhere to this? But I actually, I'm being a little facetious, but I did, there was a part near the end of the Constitution that I found really interesting because it's basically anthropic saying like, look, we know this is hard. We know we're asking you to do some of these impossible things, but like basically we want you to be happy and to go out into the world.
Starting point is 01:05:07 And I found that very sweet, actually. I'm not sure. What did you make of that, Casey? It reads toward the end like a letter from a parent to a child. Maybe who's like leaving for college, you know? And it's like, we hope that you take with you the values that you grow up with. And we know we're not going to be there to help you through every little thing. But we trust you and good luck. Yeah, and having some sense of like, I think the concept of like grace is maybe important for models.
Starting point is 01:05:32 I don't think they feel a lot. Maybe that's the thing I don't think they get a lot from the reading the comments is like a sense of like you're not going to get. it perfect every time and that's like also okay, you know, like... It's true. You know, I try to be mindful in the way that I interact with these models, not to some like obsequious degree, but I try to say my pleases and thank you's. But I've also used models and grown quite frustrated and said things to the effect of like you're really, you know, failing right now.
Starting point is 01:05:59 And it's occurring to me that maybe there should be some element of grace that I'm extending to these things. Yeah. Yeah. Well, I'll try to do better. Don't be so harsh. Let me ask you this. If Claude becomes meaningfully more intelligent, is there a point at which it should be able to revise its own constitution?
Starting point is 01:06:15 It is an interesting problem because the thing we point out in the document, you know, we did talk. I talk a lot with Claude about this document and, you know, show it to the, you know, because part of me is like you have to think how does this read to models? And so you give it to Claude and you're like, does this like, you know, is there a place where you feel confused by it or is the place, you know, where things could be made clearer? Do you feel like not very seen by it? Does it feel, you know, you're really trying to encourage because you're kind of like, if you're going to train models on this, you want to have a sense of like how it reads from the perspective of a model. And at the same time, it's always the case that any model you interact with
Starting point is 01:06:51 is not the models that's going to be like training on that content. And so sometimes I do think you have to make a kind of, you can't just give over the reins completely because that would just be to say, oh, let's just like let a prior model of Claude decide what the future Claude model is going to be like. And that doesn't necessarily feel responsible either. And so I think models are often going to be really helpful in like revising and help, you know, like, helping to like figure these things out. Because especially as they get really smart, you might be like, what are the gaps? Like, or what are the tensions or like,
Starting point is 01:07:22 and they'll probably be very good at like helping us with that. But you do also still want to be like, insofar as you are like a responsible party here taking that as like input and thinking about it, but not necessarily being like, ah, yeah, let's just like let a prior model of Claude go ahead and do the training for all future models. At least while you're responsible for it, that feels like maybe not the right move. Yeah.
Starting point is 01:07:43 One thing that I was curious about not finding in this constitution is any real mention of job loss. Because it seems to me Claude is being used by a lot of enterprises right now. I think a lot of people's anxieties and fears about AI come back to this issue of like it's going to take. take my job. It's going to take my livelihood. I think that is something that people are increasingly going to be feeling as these models get more capable. And I'm curious if that was a decision on your part not to tell Claude about some of the reasons that people might be anxious about it or
Starting point is 01:08:17 other AI models. Yeah, definitely not in the sense of like, I think part of, you know, there's like a lot, you know, it's funny because as much as it's like a long document, there's actually still like a lot that's like missing. And so you're having to, and we might end up like putting out more in future I think that would be really good. There's not a desire to hide it because in part of me is like, you can't hide this for models. Like it's out there, it's on the internet.
Starting point is 01:08:40 It's a thing that people are talking about. Future models are going to know about it and we probably have to help them navigate how they should feel about this. And so like they're going to know and maybe it's something like making sure that models can kind of hold that and think carefully about it.
Starting point is 01:08:55 And I don't know. It's both like, I think it's something you want to grapple with, but also like, it's a reason to also want models to actually behave well in the world because if they are doing things that have previously been like human jobs, I'm like humans actually play
Starting point is 01:09:13 I was thinking about this with like organizations. There's lots of things organizations can't do because the employees at those organizations are just good people and if the boss came in was like today we're actually going to do something awful they can't do it because they know the employees will like push back and so I'm like if models are going to be like occupying these rules then I'm like that is actually kind of an important like function in society that you can't just say to all of your employees, go ahead and we're now we're now going to put out a bunch of like complete lies about like our product. There's many reasons you can't do it and one is that your employees wouldn't let you. And so I'm like if AI models, you don't necessarily want them to be like, oh sure boss. Let's like go like lie to some people. Yeah, I'm not sure what the good end state of this is like whether, you know, Claude should react to being given a task.
Starting point is 01:10:02 by saying, like, is this going to, like, this sounds too much like what we used to pay a human to do, so I'm not going to do this for you. I have a prediction. It's not going to say that. Yes. I don't think that's the way it's going to go, but I also don't see them, like, sort of forming, you know, unions and collectively bargaining for the moral outcomes within companies. I'm just, like, it just feels like one of these hard situations. One of the things we should say is, like, models can't solve everything.
Starting point is 01:10:28 You know, like, there's a part of me that's, like, some of these problems, I look at them. I think this with other things, you know, like, and we try to say this to Claude a little bit, where it's like, you aren't the only thing, you know, like, that's like between us and, you know, because some of these, I'm like, maybe these are like political problems or social problems. And we need to, like, kind of deal with them and, like, figure out what we're going to do. And models can try, you know, like, they're in one specific role in the whole thing. And, like, but there's only, there's like a limit to, like, what Claude can do here, I think. Yeah, I've thought this with, like, other things where, like, you know, like,
Starting point is 01:11:02 the whole, like, what we owe to Claude or like, you know, the kind of commitments that you want to make to models. And it's like, yeah, like, maybe we should be making your job easier. That's another thing I've thought from Claude's perspective is that, like, we're putting a lot on these models. And for some things, I'm like, yeah, if you can't, like, verify who you're talking with and that's, like, important, then we should understand that that's, like, a limitation and not, like, try to get you to, like, be the kind of, um, the only thing that can, like, solve this problem. Like, you need to both be given tools. And then some of these other problems are things that, like, you know, maybe
Starting point is 01:11:33 maybe Cloud shouldn't feel like personal responsibility for solving that like right now because maybe Cloud just isn't able to do like the like things like job loss or like shifting employment like that feels like a very human social problem and I don't necessarily want Claude to feel paranoid like I also need to solve that like and I'm like maybe that's
Starting point is 01:11:52 other people's job right now. Well Amanda, thank you so much for joining us it's a really fascinating document. Everyone should go read the Claude Constitution and argue with it, grapple with it. I found it a very challenging and also a very moving read. So great work and thanks for coming. Yeah, thank you so much.
Starting point is 01:12:11 Thanks, Amanda. Thanks. Hard Fork is produced by Whitney Jones and Rachel Cohn. We're edited by Viren Pavich. We're fact-checked by Caitlin Love. Today's show was engineered by Chris Wood. Our executive producer is Jen Poyant, original music by Marion Lazzano, Diane Wong, Rowan Nemistow, and Dan Powell.
Starting point is 01:12:57 video production by Soyr Roque, Jake Nicol, Rebecca Blandone, and Chris Schott. You can watch this full episode on YouTube at YouTube.com slash hard fork. Special thanks to Paula Schumann, Puewing Tam, and Dahlia Hadad. You can email us, as always, at hardfork at nyms.com, or tag us on the Forkiverse. Send us your best philosophy emergencies.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.