Dwarkesh Podcast - Dario Amodei (Anthropic CEO) — The hidden pattern behind every AI breakthrough

Starting point is 00:00:00 a generally well-educated human. That could happen in, you know, two or three years. What does that imply for Anthropic when in two to three years? These Leviathans are doing like $10 billion training runs. The models, they just want to learn. And it was a bit like a Zen Cohen. I listened to this and I became enlightened. The compute doesn't flow.

Starting point is 00:00:19 Like the spice doesn't flow. It's like you can't like, like the blob has to be unencumbered, right? The big acceleration that happened late last year and, beginning of this year, we didn't cause that. And honestly, I think if you look at the reaction to Google, that that might be 10 times more important than anything else. There was a running joke. The way building AGI would look like is, you know, there would be a data center next to a nuclear power plant, next to a bunker. But now is 2030. What happens next? What are we doing with a superhuman god? Okay. Today, I have the pleasure of speaking with Dario,

Starting point is 00:00:52 who is the CEO of Anthropic. And I'm really excited about this one. Dario, thank you so much for coming on the podcast. me first question you have been one of the very few people who have seen scaling coming for years more than five years I don't know how long it's been but as somebody you've seen it coming what is fundamentally the explanation for why scaling works why is the universe organized such that if you throw big blobs and compute at a wide enough distribution of data the thing becomes intelligent I think the truth is that we still don't know I think it's almost entirely an empirical fact I think it's a fact that you could kind of sense from the data and from a bunch of different places.

Starting point is 00:01:33 But I think we don't still have a satisfying explanation for it. If I were to try to make one, but I'm just, I don't know, I'm just kind of waving my hands when I say this. You know, there's these ideas in physics around like long tail or power law of like correlations or effects. And so like when a bunch of stuff happens, right? When you have a bunch of like features, you get a lot of the data in like kind of the early, you know, the fat part of the distribution before the tails. You know, for language, this would be things like, oh, I figured out there are parts of speech and nouns follow verbs. And then there are these more and more and more and more subtle correlations. And so it kind of makes sense why there would be this, you know, every log or order of magnitude that you add, you kind of capture more of the distribution.

Starting point is 00:02:21 What I, what's not clear at all is why is it scale so smoothly with parameters? Why does it scale so smoothly with the amount of data? Why are, you can think up some explanations of why it's linear. Like the parameters are like a bucket. And so the data's like water. And so size of the bucket is proportional to the size of the water. But like, why does it lead to all these, this very smooth scaling? I think we still don't know.

Starting point is 00:02:48 There's all these explanations. our chief scientist, Jared Kaplan, did some stuff on, like, fractal manifold dimension that, like, you can use to explain it. So there's all kinds of ideas, but I feel like we just don't really know for sure. And by the way, for the audience who is trying to follow along, by scaling, we're referring to the fact that you can very predictably see how you go from GPD3 to GPD4 or in this case claw to one to claw two, that the loss in terms of whether it can predict the next token scales very smoothly.

Starting point is 00:03:16 So, okay, we don't know why it's happening, but can you at least for a day? if empirically, here is the loss at which this ability will emerge. Here is the place where this circuit will emerge. Is that about predictable or are you just looking at the loss number? That is much less predictable. What's predictable is this statistical average, this loss, this entropy. It's super predictable. It's like, you know, predictable to like sometimes even to several significant figures,

Starting point is 00:03:41 which you don't see outside of physics, right? You don't expect to see it in this messy empirical field. But actually specific abilities are very hard to predict. So, you know, back when I was working on GPT2 and GPT3, like, when does arithmetic come in place? When do models learn to code? Sometimes it's very abrupt. You know, it's kind of like you can predict statistical averages of the weather, but the weather on one particular day is very, you know, very hard to predict.

Starting point is 00:04:07 So dumb it down for me. I don't understand manifolds, but mechanistically, it doesn't know addition yet. Now it knows addition. What has happened? This is another question that we don't know the answer to. I mean, we're trying to answer this with things like mechanistic interpretability. But, you know, I'm not sure. I mean, you can think about these things about like circuits snapping into place, although

Starting point is 00:04:28 there is some evidence that when you look at the models being able to add things, that, you know, like if you look at its chance of getting the right answer, that shoots up all of a sudden. But if you look at, okay, what's the probability of the right answer? You'll see it climb from like one in a million to one in a hundred thousand to one in a thousand long before it actually. gets the right answer. And so there's some, in many of these cases, at least, I don't know if in all of them, there's some continuous process going on behind the scenes. I don't understand it at all. Does that imply that the circuit or the process for doing addition was preexisting and it just got increased in salient? I don't know if like there's this circuit that's weak and getting stronger.

Starting point is 00:05:08 I don't know if it's something that works but not very well. Like, I think we don't know. And these are some of the questions we're trying to answer with mechanistic interpretability. Are there abilities that won't emerge with scale. So I definitely think that, again, like things like alignment and values are not guaranteed to emerge with scale, right? It's kind of like, you know, one way to think about it is you train the model and it is basically it's like predicting the world, it's understanding the world. Its job is facts, not values, right? It's trying to predict what comes next. But there's, there's just, there's free variables here where it's like, what should you do? What should you think? what should you value. Those, you know, like they're just, there aren't the bits for that.

Starting point is 00:05:51 There's just like, well, if I started with this, I should finish with this. If I started with this other thing, I should finish with this other thing. And so I think that's not going to emerge. I want to talk about a lemon in a second, but on scaling, if it turns out that scaling plateaus before we reach human level intelligence, looking back on it, what would be your explanation? What do you think is likely to be the case if that turns out to be the outcome? Yeah. So I guess I would distinguish some problems. with the fundamental theory with some practical issue. So one practical issue we could have is we could run out of data. For various reasons, I think that's not going to happen. But, you know, if you look at it

Starting point is 00:06:26 very, very naively, we're not that far from right now of data. And so it's like we just don't have the data to continue the scaling curves. I think, you know, another way it could happen is like, oh, we just, we just use up all of our compute that was available and that wasn't enough. And then progress is slow after that. I wouldn't bet on either of those things happening, but they could. I think from a fundamental perspective, I personally, I think it's very unlikely that the scaling laws will just stop. If they do, another reason, again, this isn't fully fundamental, could just be we don't have quite the right architecture. Like if we tried to do it with an LSDM or an R&N, the slope would be different. I still might be that we get there.

Starting point is 00:07:07 But I think there are some things that are just very hard to represent when you don't have this ability to attend far in the past that Transformers. have. If somehow, and I don't know how we would know this, it kind of wasn't about the architecture and we just hit a wall, I think I'd be very surprised by that. I think we're already at the point where the things the models can't do don't seem to me to be different in kind from the things they can do. And it just, you know, you could have made a case a few years ago that it was like they can't reason, they can't program, like you could have, you could have drawn boundaries and said, well, maybe you'll hit a wall. I didn't think that. I didn't think we would hit a wall. A few other people didn't think we would hit a wall, but it was a more plausible case that.

Starting point is 00:07:51 I think it's a less plausible case now. Now, it could happen. Like, this stuff is crazy. Like, it could, it could happen tomorrow that it's just like we hit a wall. I think if that happens, I'm trying to think of like what's my, what would really be my, it's unlikely, but what would really be my explanation. I think my explanation would be there's something wrong with the loss when you train on next word prediction. Like some of the remaining like reasoning abilities or something like that, like if you really want to learn, you know, it's a program at a really high level. Like it means you care about some tokens much more than others.

Starting point is 00:08:26 And they're rare enough that it's like the loss function over focuses on kind of the appearance, the things that are responsible for the most bits of entropy. And instead, you know, they don't focus on this stuff that's really essential. And so you could kind of have the signal drowned out in the noise. I don't think it's going to play out that way for a number of reasons. But if you told me, yep, you trained your 2024 model. It was much bigger and it just wasn't any better. And you tried every architecture and didn't work.

Starting point is 00:08:56 I think that's the explanation I would reach for. Is there a candidate for another loss function if you had to abandon next token prediction? I think then you would have to go for some kind of RL. And again, there's, you know, there's many different kinds. There's RL from human feedback. There's RL against an objective. There's things like constitutional AI. There's things like amplification and debate, right?

Starting point is 00:09:15 These are kind of both alignment methods and ways of training models. You would have to try a bunch of things, but the focus would have to be on what do we actually care about the model doing, right? In a sense, we're a little bit lucky that it's like predict the next word gets us all these other things we need. Right. There's no guarantee. It seems like from your worldview, there's a multitude of different loss functions that

Starting point is 00:09:36 it's just a matter of what can allow you to just throw a whole bunch of data at it, like the next token prediction itself is not significant. Yeah. Well, I mean, I guess the thing with RL is you get slowed down a bit because it's like, you know, you have to by some method kind of, you know, design how the loss function works. Nice thing with the next token prediction is it's there for you, right? It's just there. It's the easiest thing in the world.

Starting point is 00:09:58 And so I think it would slow you down if you couldn't scale and just that very simplestly. You mentioned that the data is likely not to be the constrained. Why do you think that is the case? There's various possibilities here. and, you know, for a number of reasons, I shouldn't go into the details. But, you know, like, there's many sources of data in the world, and there's many ways that you can also generate data. My guess is that this will not be a blocker.

Starting point is 00:10:20 Maybe it would be better if it was, but it won't be. Are you talking about multimodal or? There's just many different ways to do it. How did you form your views on scaling? How far back can we go and then you would be basically saying something similar to this? This view that I have probably formed gradually from, I would say, like, 2014 to 2017. So I think my first experience with it was my first experience with AI. So I, you know, I saw some of the early stuff around Alex NED in 2012. Always kind of had

Starting point is 00:10:50 wanted to study intelligence, but I, you know, before I was just like, this isn't really working. Like, it doesn't seem like it's actually working. You know, all the way back to like, you know, 2005, I'd like, you know, I'd read Ray Kurzweil's work. You know, I'd read even, even some of, like, L.A.zer's work on the early, on the early internet back then. And I was, you know, was like, oh, this stuff kind of looks far away. Like, I look at the AI stuff of today, and it's like not anywhere, not anywhere close. But with Alexin, I was like, oh, this stuff is actually starting to work. So I joined Andrew Ing's group initially at Bidu.

Starting point is 00:11:22 And the first task, you know, that I got set to do, right? It was my, you know, I'd been in a different field. And so I first joined, you know, this was my first experience with AI. And it was a bit different from a lot of the kind of academic style research that was going on kind of elsewhere in the world, right? I think I kind of got lucky in that the task that was given to me and the other folks there was just make the best speech recognition system that you can. And there was a lot of data available. There were a lot of GPUs available. So it kind of posed the problem in a way that was amenable to discovering that kind of scaling was a solution,

Starting point is 00:12:01 right? That's very different from like you're a postdoc and it's your job to come up with, you know, what's the best like, you know, what's, what's an idea that seems clever and new and makes your mark as someone who's invented something. And, and so I just quickly discovered that, like, you know, I was just, just tried the simplest experiments. I was like, you know, just fiddling with some dials. I was like, okay, try, you know, try adding more layers to the, literally add more layers to the RNN, you know, try training it for longer. What happens? How long does it take to overfit? What if I add new data and repeat it less times? And like, I just saw these like very consistent patterns.

Starting point is 00:12:39 I didn't really know that this was unusual or that others weren't thinking in this way. This was just kind of like almost like beginner's luck. It was my first experience with it. And I didn't really think about it beyond speech recognition, right? You know, I was just kind of like, oh, this is, you know, I don't know anything about this field. There are zillions of things people do with machine learning. But like, I'm like, weird. It seems to be true in the speech recognition.

Starting point is 00:13:03 field. And then I think it was recently, you know, like just before Open AI started that I met Ilya, who you interviewed, one of the first things he said to me was, look, the models, they just want to learn. You have to understand this. The models, they just want to learn. And it was a bit like a Zen Cohen. Like, I kind of like, I listened to this and I became enlightened. And, you know, over the years after this, you know, again, I would be kind of, you know, the one who would formalize a lot of these things and kind of put them together. But like, just kind of the, what that told me is that that phenomenon that I'd seen wasn't just some random thing that I'd seen. It was like, it was broad. It was, it was more general, right? The models, the models just want to learn.

Starting point is 00:13:50 You get the obstacles out of their way, right? You give them, you give them good data. You, you give them enough space to operate in. You, you, Don't do something stupid, like condition them badly numerically. And they want to learn. They'll do it. They'll do it. You know what I find really interesting about what you said is there are many people who were aware back at that time probably weren't working on it directly.

Starting point is 00:14:13 But we're aware that these things are really good at speech recognition or at playing these constrained games. Very few extrapolated from there like you and Ilya did to something that is generally intelligent. What was different about the way you were thinking about it versus how others think that you went from like, is getting better at speech in this consistent way? It will get better at everything in this consistent way. Yeah. So I genuinely don't know. I mean, at first, when I saw it for speech, I assumed this was just true for speech or for this narrow class of models. I think it was just over the period between 2014 and 2017, I tried it for a lot of things and

Starting point is 00:14:51 saw the same thing over and over again. I watched the same being true with Dota. I watched the same being true with robotics, which many people thought of as a counter example, but I just thought what's hard to get data for robotics, but if we operate within, if we look within the data that we have, we see the same patterns. And so I don't know. I think people were very focused on solving the problem in front of them. Why one person thinks one way, another person, it's very, it's very hard to explain. I think people just see it through a different lens, you know, are looking like vertically instead of horizontally. They're not thinking about the scaling. They're thinking about how do I solve my problem. And well, for robotics, there's not enough data.

Starting point is 00:15:31 And so, you know, and so, you know, that can easily abstract. Well, scaling doesn't work because we don't have the data. And so I don't, I don't know. I just, for some reason, and it may just have been random chance was obsessed with that particular direction. When did it become obvious? to you that language is the means to just feed a bunch of data into these things that or was it just you ran out of other things like robotics there's not enough data this other thing there's not enough data yeah i mean i think this whole idea of like the next word prediction that you could do self-supervised learning you know that together with the idea that it's like wow for predicting the next word there's so much richness and structure there right you know it might say two plus two

Starting point is 00:16:13 equals and you have to know the answer is four and you know it might be telling the story about a character and then basically it's posing to the model, you know, the equivalent of these developmental tests that get posed to children. You know, Mary walks into the room and, you know, puts an item in there and then, you know, Chuck walks into the room and removes the item and Mary doesn't see it. What does Mary think happen? You know, so like, so the models are going to have to, to get this right in the service of predicting the next word, they're going to have to solve, you know, solve all these theory of mind problems, solve all these math problems. And so I, you know, my thinking was just, well, you know, scale it up.

Starting point is 00:16:47 much as you can. You, you know, there's there's kind of no limit to it. And I think I kind of had abstractly that view. But the thing, of course, that like really solidified and convinced me was the work that Alec Radford did on GPD one, which was not only could you get this, this language model that could predict things very well, but also you could fine tune it. You needed to fine tune it in those days to do all these other tasks. And so I was like, wow, you know, this isn't just some narrow thing where you get the language model right. it's sort of halfway to everywhere, right? It's like, you know, you get the language model right.

Starting point is 00:17:22 And then with a little move in this direction, it can, you know, it can solve this, this, you know, logical dereference test or whatever. And, you know, with this other thing, you know, it can solve translation or something. And then you're like, wow, I think there's really something to do it. And, of course, we can really scale it. Well, one thing that's confusing or that would have been hard to see, if you told me in 2018, we'll have models in 2023, like, like what law two, that can write theorems in the style of Shakespeare, whatever theory you want, you want.

Starting point is 00:17:51 They can A-standardized test with open-ended questions, you know, just all kinds of really impressive things. You would have said at that time, I would have said, oh, you have AGI. You clearly have something that is a human-level intelligence. Where while these things are impressive, it clearly seems we're not at human level, at least in the current generation and potentially for generations to come. What explains discrepancy between super impressive performance in these benchmarks and in just like the things you could describe versus, yeah, generally. So that that was one area where actually I was not press scenes and I was surprised as well.

Starting point is 00:18:26 Yeah. So when I first looked at GPT3 and, you know, more so the kind of things that we built in the early days at Anthropic, my general sense was I, you know, I looked at these and I'm like, it seems like they really grasped the essence of language. I'm not sure how much we need to scale them up. Like maybe we maybe what's what's more needed from here is like RL and all and kind of all the other stuff. Like we might be kind of near the, you know, I thought in 2020 like we can scale this a bunch more, but I wonder if it's more efficient to scale it more or to start adding on these other objectives like RL. I thought maybe if you do as much RL as you've done pre-training for a, for a, you know, 2020 style model that that's that's the way to go and scaling it up will keep working but you know is that is that

Starting point is 00:19:15 really the best path and I think it I don't know it just keeps going like I thought it had understood a lot of the essence of language but then you know there's there's kind of there's kind of further to go and and so I don't know stepping back from it like one of the reasons why I'm sort of very impurecist about about AI, about safety, about organizations is that you often get surprised, right? You know, I feel like I've been right about some things, but I've still, you know, with these theoretical pictures ahead, been wrong about most things. Being right about 10% of the stuff is, you know, sets you head and shoulders above, above many people.

Starting point is 00:19:57 You know, if you look back to, I can't remember who it was, kind of, you know, made these diagrams that are like, you know, here's the village ill idiot. Here's Einstein. Here's the scale of intelligence, right? And the village idiot and Einstein are like very close to each other. Like that, maybe that's still true in some abstract sense or something, but it's not really what we're seeing, is it? We're seeing like that it seems like the human range is pretty broad and doesn't, we don't hit the human range in the same place or at the same time for different tasks, right? Like, you know, like write a sonnet, you know, in the style of Cormick McCarthy or something, like, I don't know, I'm not very creative, so I couldn't do that. But like,

Starting point is 00:20:39 you know, that's a pretty high level human skill, right? And even the model is starting to get good at stuff of, you know, like constrained writing. You know, there's like write a, you know, write a page without using the letter E or something. I'm ready to page about X without using the letter E. Like, I think the models might be like superhuman or close to superhuman at that. But when it comes to, you know, I don't know, prove relatively simple mathematical. theorems, like they're just starting to do the beginning of it. They make really dumb mistakes sometimes. And they really lack any kind of broad, like, you know, correcting your errors or doing some extended task. And so, I don't know, it turns out that intelligence isn't, isn't a spectrum.

Starting point is 00:21:25 There are a bunch of different areas of domain expertise. There are a bunch of different, like, kinds of skills, like memories different. I mean, it's all, it's all formed in the blob. It's all formed in the blob. It's not complicated. But to the extent it even is on the spectrum, the spectrum is also wide. If you asked me 10 years ago, that's not what I would have expected at all. But I think that's very much the way it's turned out. Oh, man.

Starting point is 00:21:48 I have so many questions just as follow up on that. One is, do you expect that given the distribution of training that these models get from massive amounts of Internet data versus what humans got from evolution, that the repertoire of skills that elicits will be. just barely overlapping. It will be like concentric circles. How do you think about, do those matter? Clearly, there's a large amount of overlap, right? Because a lot of the things, you know, like these models have have business applications and many of their business applications are doing things that, you know, are helping humans to be more effective at things. So the overlap is quite, is quite large.

Starting point is 00:22:28 And, you know, if you think of all the activity that humans put on the internet in tax, that covers a lot of it. But it probably doesn't cover some things. Like the models, I think they do learn a physical model of the world to some extent, but they certainly don't learn how to actually move around in the world. Again, maybe that's easy to fine tune. But I, you know, I think, so I think there are some things that the models don't learn that humans do. And then I think, you know, the models learn, for example, to speak fluent base 64. I don't know about you, but I never learned that.

Starting point is 00:22:58 Right. How likely do you think it is that these models will be superhuman? for many years at economically valuable tasks, while they are still below humans in many other relevant tasks that prevents an intelligence explosion or something. I think this kind of stuff is like really hard to know. So I'll give that caveat that like, you know, again,

Starting point is 00:23:20 like the basic scaling laws you can kind of predict. And then like this more granular stuff, which we really want to know to know how this all is going to go is much harder to know. But my guess would be the scaling laws are going to continue, you know, again, subject to, you know, do people slow down for safety or for regulatory reasons. But, you know, let's just, let's just put all that aside and say, like, we have the economic capability to keep scaling. If we did that, what would happen? And I think my view is we're

Starting point is 00:23:49 going to keep getting better across the board. And I don't see any area where the models are, like, super, super weak or not starting to make progress. Like, that used to be true of, like, math and programming, but I think over the last six months, you know, the, the 2023 generation of models compared to the 2022 generation had started to learn that. There may be more subtle things we don't know. And so I kind of suspect, even if it isn't quite even, that the rising tide will lift all the boats. Does that include the thing you were mentioning earlier where if there's an extended task, it kind of loses this train of thought or its ability to just like execute a series of steps? I think that that's going to depend on things like RL training to have the model do longer horizon tasks.

Starting point is 00:24:34 I don't expect that to require a substantial amount of additional compute. I think that that was probably an artifact of, yeah, kind of thinking about RL in the wrong way and underestimating how much the model had learned on its own. In terms of, you know, are we going to be superhuman in some areas and not others, I think it's complicated. I could imagine that we won't be superhuman in some areas because, for example, they involve, like, embodiment in the physical world. And then it's like, what happens? Like, do the AIs help us train faster AIs and those faster AIs wrap around and solve that? Do you not need the physical world? It depends what you mean. Are we worried about an alignment

Starting point is 00:25:16 disaster? Are we worried about misuse, like making weapons of mass destruction? Are we worried about the AI taking over research from humans? Are we worried about it reaching some threshold of economic productivity where it can do what the average? These different thresholds, I think, have different answers, although I suspect they will all come within a few years. Let me ask about those thresholds. So if Claude was an employee at Anthropic, what salary would it be worth? Is it like meaningfully speeding up AI progress? It feels to me like an intern in most areas, but then some specific areas where it's better than that. Again, I think one thing that makes the comparison hard is like the form factor is kind of like not the same as a human, right?

Starting point is 00:26:01 Like a human, like, you know, if you were to behave like one of these chat bots, like we wouldn't really, I mean, I guess we could have this conversation. It's like, but, you know, they're not really, they're more designed to answer single or a few questions, right? And like, you know, they don't have the concept of having a long life of prior experience, right? We're talking here about, you know, things that that I've experienced in the past, right? And chatbots don't have that. And so there's all kinds of stuff missing. And so it's hard to make a comparison. But I don't know.

Starting point is 00:26:33 They feel like interns in some areas and kind of then they have areas where they spike and are really savants where they may be better than they may be better than anyone here. But does the overall picture of something like an intelligence explosion? You know, my former guest is Carl Schoeman. and he has this very detailed model of an intel. Does that, as somebody who would actually, like, see that happening, does that make sense to you as they go from interns to entry-level software engineers, those entry-level software engineers increase your productivity? I think the idea that the AI systems become more productive, and first they speed up

Starting point is 00:27:09 the productivity of humans, then they, you know, kind of equal the productivity of humans. And, you know, and then they're in some meaningful sense, the main contributor to scientific progress, that that happens at some point. I think that that basic logic seems likely to me, although I have a suspicion that when we actually go into the details, it's going to be kind of like weird and different than we expect, that all the detailed models are kind of, you know, we're thinking about the wrong things or we're right about one thing and then are wrong about 10 other things. And so I don't know. I think we might end up in like a weirder world than we expect.

Starting point is 00:27:50 Where do you add all this together? Like your estimate of when we get something kind of human level. Yeah. What does that look like? I mean, again, it depends on the thresholds. Yeah. You know, in terms of someone looks at these, the model and, you know, even if you talk to it for, you know, for an hour or so, it's basically, you know, it's basically like a generally well educated human. Yeah.

Starting point is 00:28:16 that could be not very far away at all, I think. Like that could happen in, you know, two or three years. Like, you know, if I look at, again, like I think the main thing that would stop it would be if we hit certain, certain, you know, and we have internal tests for, you know, safety thresholds and stuff like that. So if a company or the industry decide to slow down or, you know, we're able to get the government Institute restrictions that kind of, you know, that moderate the rate of progress for safety reasons, that would be the main reason it wouldn't happen. But if you just look at the logistical and economic ability to scale, I don't think we're

Starting point is 00:28:55 very far at all from that. Now, that may not be the threshold where the models are existentially dangerous. In fact, I suspect it's not quite there yet. It may not be the threshold where the models can take over most AI research. It may not be the threshold where the models, you know, seriously change how the economy works. I think it gets a little murky after that and all those thresholds may happen at various times after that.

Starting point is 00:29:20 But I think I think in terms of the base technical capability of it kind of sounds like a reasonably generally educated human across the board, I think that could be quite close. Why would it be the case that it could be sound like, you know, pass a touring test for an educated person, but not be able to contribute or substitute for a human. involvement in the economy. A couple reasons. One is just, you know, that the threshold of skill isn't high enough, right? Comparative advantage.

Starting point is 00:29:49 It's like it like doesn't matter that, you know, I have someone who's better than the average human at every task. Like what I really need is like for AI research, like, you know, I need what, you know, I need to basically find something that is, is strong enough to substantially accelerate, you know, the like labor of the thousand experts who are best. at it. And so we might reach a point where we, you know, the comparative advantage of these systems is not, is not great. Another thing that could be the case is that I think there are these kind of mysterious frictions that like, you know, kind of don't show up in naive economic models,

Starting point is 00:30:28 but you see it whenever you're like, you know, when you go to a customer or something and you're like, hey, I have this cool chatbot. In principle, it can do everything that, you know, your customer service bot does or that this part of your company does. But like, The actual friction of like, how do we slot it in? How do we make it work? That includes both kind of like, you know, just the question of how it works in a human sense within the company. Like, you know, how things happen in the economy and overcome frictions. And also just like what is the workflow?

Starting point is 00:31:00 How do you actually interact with it? It's very different to say, here's a chatbot that kind of looks like it's doing this task that you're, you know, or helping the human to do, to do. some task, as it is to say like, okay, this thing is, this thing is deployed and 100,000 people are using it. Often, like right now, lots of folks are rushing to deploy these systems, but I think in many cases, they're not using them in anywhere close to the most efficient way that they could, you know, not because they're not smart, but because it takes time to work these things out. And so I think when things are changing this fast, they're going to be all of these frictions.

Starting point is 00:31:37 Yeah. And I think, again, these are messy reality that doesn't quite get captured in the model. I don't think it changes the basic picture. Like, I don't think it changes the idea that we're building up this snowball of, like, the models help the models get better and, you know, do what the humans and, you know, can accelerate what the humans do. And eventually, it's mostly the models doing the work. Like, you zoom out far enough, that's happening.

Starting point is 00:31:59 But I'm kind of skeptical of kind of any kind of precise mathematical or exponential prediction of it's going to be. I think it's I think it's all going to be a mess, but I think what we know is it's not a metaphorical exponential and it's going to happen fast. How do those different exponentials net out, which we've been talking about? So one was the scaling laws themselves are power laws with decaying marginal, you know, loss per, you know, parameter or something. The other exponential you talked about is, well, these things can get involved in the process of AI research itself, speeding it up. So those two are sort of opposing exponentials. Does it net out to be super linear or sublinear? And also you mentioned, well, the distribution of intelligence might just be broader.

Starting point is 00:32:46 So it should we expect after we get to this point in two to three years, it's like, womb, boom, like what does that look like? It's, I mean, I think it's very unclear, right? So we're already at the point where if you look at the loss, the scaling laws are starting to bend. I mean, we've seen that in, you know, published model cards offered by multiple companies. So that's not a secret at all. But as they start to bend, each little bit of entropy, right, of accurate prediction becomes more important, right? Maybe these last little bits, bits of entropy are like, well, you know, this is a physics paper as Einstein would have written it as opposed to, you know, as some other physicist would have would have written it. And so it's hard to assess significance

Starting point is 00:33:26 from this. It certainly looks like in terms of practical performance, the metrics keep going up relatively linearly, although they're always unpredictable. So it's hard to see that. And then, I mean, the thing that I think is driving the most acceleration is just more and more money is going into the field. Like people are seeing that there's just a huge amount of, you know, of economic value. And so I expect the price, the amount of money spent on the largest models to go up by like a factor of 100 or something. And for that then to be concatenated with the chips are getting faster, the algorithms are getting better because there's so many people working on this now. And so again, I mean, I'm not making a normative statement here. This is what should happen.

Starting point is 00:34:11 I'm not even saying this necessarily will happen because I think there's important safety and government questions here, which we're very actively working on. I'm just saying like left to itself, this is what the economy is going to do. We'll get to those questions in a second. But how do you think about the contribution of Anthropic to that increase? in the scope of this industry where, I mean, there's an argument we make that, listen, with that investment, we can work on safety stuff at Anthropic, another that says you're raising the salience of this field in general. Yeah, I mean, it's all, it's all costs and benefits, right? The costs are not zero, right? So I think a mature way to think about these things is, you know, not not to deny that

Starting point is 00:34:51 there are only costs, but to think about what the costs are and what the benefits are. You know, I think we've been relatively responsible in the sense that, you know, the big acceleration that happened late last year and beginning of this year, like, we didn't cause that. We weren't, we weren't the ones who did that. And honestly, I think if you look at the reaction to Google, that that might be 10 times more important than anything else. And then kind of once it had happened, once the ecosystem had changed, then we did a lot of things to kind of stay on the frontier. And so, I don't know, it's, I mean, it's like any other question, right? It's like, you're trying to, you're trying to do the things that have the biggest costs and the, that have the

Starting point is 00:35:27 lowest costs and the biggest benefits. And, you know, that that causes you to have different strategies at different times. One question I had for you while we're talking about the intelligent stuff was, listen, as a scientist yourself, is it, what do you make of the fact that these things have basically the entire corpus of human knowledge memorized? And as far as I'm aware, they haven't been able to make like a single new connection that has led to a discovery. Whereas if even a moderately intelligent person had this much stuff memorized, they'd notice, oh, this thing causes this symptom, this other thing also causes this symptom, you know, there's a medical cure right here, right? Shouldn't we be expecting that kind of stuff? I'm not, I'm not sure.

Starting point is 00:36:04 I mean, I think, you know, I don't know, these words, discovery, creativity, like it's one of the lessons I've learned is that in, you know, in kind of the big blob of compute, often these, these ideas often end up being kind of fuzzy and elusive and hard to track down. But I think, I think there is something here, which is, I think the models do display a kind of ordinary creativity. Again, again, you know, the kind of like, you know, write a sonnet, you know, in the style of Cormick McCarthy or Barbie or so, you know, like there is some creativity to that. And I think they do draw, you know, new connections of the kind that an ordinary person would draw. I agree with you that there haven't been any kind of like, I don't know, like, I would say like big scientific

Starting point is 00:36:46 discoveries. I think that's a mix of like just the model skill level is not, is not high enough yet, right? Like I was on a podcast last week where the host said, I don't know, I played with these models. They're kind of mid, right? Like they get, you know, they get a B or a B minus or something. And that, that I think is going to change with the scaling. I do think there's an interesting point about, well, the models have an advantage, which is they know a lot more than us. You know, like should they have an advantage already, even if their skill level isn't quite high? Maybe that's kind of what you're getting at. I don't really have an answer to that. I mean, it seems certainly like memorization and facts and drawing connections is an area where the models

Starting point is 00:37:28 are ahead. And I do think maybe you need those connections and you need a fairly high level of skill. I do think, particularly in the area of biology, for better and for worse, the complexity of biology is such that the current models know a lot of things right now. And that's what you need to make discoveries and draw. It's not like physics where you need to, you know, you need to think can come up with a formula. And biology, you need to know a lot of things. Right. And so I do think the models know a lot of things and they have a skill level that's not quite high enough to put them together. And I think they are they are just on the cost of being able to put these things together. On that point, last week in your Senate testimony, you said that these models are two to three years

Starting point is 00:38:10 away from potentially enabling large scale bioterrorism attacks or something like that. Can you make that more concrete without obviously giving the kind of information that would But is it like one-shotting how to weaponize something? Is it, or do you have to fine-tune an open-source model? Like, what would that actually look like? I think it would be good to clarify this because we did a blog post in the Senate testimony. And like, I think various people kind of didn't understand the point or didn't, didn't understand what we've done. So I think today and, you know, of course, in our models, we try and, you know, prevent this, but there's always jail breaks.

Starting point is 00:38:41 You can ask the models, all kinds of things about biology and get them to say all kinds of scary things. Yeah. But often those scary things are things that you could Google, and I'm therefore not particularly worried about that. I think it's actually an impediment to seeing the real danger, where, you know, someone just says, oh, I asked this model like, you know, for the smallpox, you know, to tell me some things about smallpox and it will. That is actually, you know, kind of not what I'm worried about.

Starting point is 00:39:08 So we spent about six months working with some of, basically some of the folks who are the most expert in the world on how to, how to buy a lot. attacks happen, you know, what would you need to conduct such an attack and how do we defend against such an attack? They worked very intensively on just the entire workflow of, if I were trying to do a bad thing, it's not one shot, it's a long process, there are many steps to it, it's not just like I asked the model for this one page of information. And again, without going into any detail, the thing I said in the Senate testimony is like, there's some steps where you can just get information on Google. There are some steps that are what I'd call missing. They're scattered

Starting point is 00:39:51 across a bunch of textbooks. Or they're not in any textbook. They're kind of implicit knowledge. And they're not really like, they're not explicit knowledge. They're more like, I have to do this lab protocol and like, what if I get it wrong? Oh, if this happens, then my temperature was too low. If that happened, I needed to add more of this particular reagents. What we found is that for the most part, those missing, those key missing pieces, the models can't do them yet. But we found that sometimes they can. And when they can, sometimes they still hallucinate, which is a thing that's that's kind of keeping us safe. But we saw enough signs of the models doing those, those key things well. And if we look at, you know, state of the art models and go backwards to previous

Starting point is 00:40:40 models, we look at the trend, it shows every sign of two or three years from now, we're going to have a real problem. Yeah, especially the thing you mentioned on the log scale, you go from like one in hundred times it gets a riot to one in ten to exactly. So, you know, I've seen many of these like grocs in my life, right? I was there when I watched when GPT3 learned to do arithmetic, when GPT2 learned to do regression a little bit above chance, when, you know, when we got, you know, with claw and we got better on like, you know, all these, all these tests of helpful, honest, harmless. I've seen a lot of groks. This is, this is unfortunately not one that I'm excited about, but I believe it's happening. So somebody that might say, listen, you were a co-author on this post

Starting point is 00:41:22 that Open AI released, where they said, you know, we're not going to release the weights or the details here because we're worried that this model will be used for something, you know, bad. And looking back on it, now it's laughable to think that GPT2 could have done anything bad. Are we just like way too worried? This is a concern that doesn't make sense for it. It is interesting. It might be worth looking back at the actual text of that post. So I don't remember it exactly, but it should it, you know, it's still up on the internet. It says something like, you know, we're choosing not to release the weights because of concerns about misuse, but it also said, this is an experiment. We're not sure if this is necessary or the right thing to do at this time. But we'd like to establish a norm of thinking carefully about these things. You know, you could think of it a little like the, you know, the Cillamer conference in the in the 1970s, right? Where it's like, you know, they were just figuring out recombinant DNA. You know, it was not necessarily the case that someone could do something really bad with recombinant DNA. It's just the possibilities we're starting to

Starting point is 00:42:27 become clear. Those words, at least, were the right attitude. Now, I think there's a separate thing that, like, you know, people don't just judge the post. They judge the organization. Is an organization that, you know, is, produces a lot of hype or that has credibility or something like that. And so I think that had some effect on it. I guess you could also ask, like, is it inevitable that people would just interpret it as like, you know, you can't get across any message more complicated than this thing right here is dangerous. So you can argue about those. But I think the basic thing that was in my head and the head, the head of others who were who are involved in that. And, you know, I think what is what is evident in the post is like,

Starting point is 00:43:09 we actually don't know. We have pretty wide error of ours on what's dangerous and what's not. So we should, you know, like we want to establish a norm of being careful. I think, by the way, we have enormously more evidence. We've seen enormously more of these grocs now. And so we're well calibrated, but there's still uncertainty. Right. In all these statements, I've said, like, in two or three years, we might be there. Right. There's a substantial risk of it. And we don't want to take that risk. But, you know, I wouldn't say it's 100%. It could be 50-50. Okay, let's talk about cybersecurity, which in addition to biarisk is another thing Anthropica has been emphasizing. How have you avoided the cloud microarchitecture from leaking? Because as you know, your competitors

Starting point is 00:43:47 have been less successful at this kind of security. Can't comment on anyone else's security. Don't know what's going on in there. A thing that we have done is, you know, so there are there are these architectural innovations, right, that make training more efficient. We call them, compute multipliers because they're the equivalent of improving, improving, you know, they're like having more compute. Our compute multipliers, again, I don't want to say too much about it because it could allow an adversary to counteract our measures, but we limit the number of people who are aware of a given compute multiplier to those who need to know about it.

Starting point is 00:44:25 And so there's a very small number of people who could leak all of these secrets. There's a larger number of people who could leak one of them. But, you know, this is the standard compartmentalization strategy that's used in the intelligence community or, you know, resistant cells or whatever. So, you know, we've over the last few months, we've implemented these measures. So, you know, I don't want to jinx anything by saying, oh, this could never happen to us. But I think it would be harder for it to happen. I don't want to go into any more detail. And, you know, by the way, I'd encourage all the other companies to do this as well.

Starting point is 00:44:59 as much as like competitors' architectures leaking is narrowly helpful to Anthropic. It's not good for anyone in the long run, right? So security around this stuff is really important. Even with all the security you have, could you, with your current security, prevent a dedicated state level actor from getting the claw two weights? It depends how dedicated is what I would say. Our head of security who used to work on security for Chrome, which, you know, very widely used and attacked.

Starting point is 00:45:29 application. He likes to think about it in terms of how much would it cost to attack anthropic successfully. Again, I don't want to go into super detail of how much I think it will cost to attack and it's kind of inviting people. But like one of our goals is that it costs more to attack anthropic than it costs to just train your own model, which doesn't guarantee things because, you know, of course, you need the talent as well. So you might still. But, you know, but attacks have risk, the diplomatic costs. And, you know, and they use up the very, the very, the very sparse resources that nation state actors might have in order to do to do the attacks. So we're not there yet, by the way, but I think we're to a very high standard compared to

Starting point is 00:46:12 the size of company that we are. Like, I think if you look at security for most 150 person companies, like, I think there's just no comparison. But, you know, could we resist if it was a state actor's top priority to steal our model weights? No, they would succeed. long does that stay true because at some point the value keeps increasing and increasing. And another part of this question is that what kind of a secret is how to train Cloud 3 or Cloud 2? Is it, you know, with nuclear weapons, for example, we have lots of spies. You just take a blueprint across and that's the implosion device and that's what you need. Here is it just, is it more tacit like the thing you're talking about biology, you need to know how these reagents work? Is it just

Starting point is 00:46:56 like you got the blueprint, you got the microarchitecture and the Harper Parameters? I mean, there are some things that are like, you know, a one-line equation, and there are other things that are more complicated. Yeah. And I think compartmentalization is the best way to do it. Just limit the number of people who know about something. If you're a thousand-person company and everyone knows every secret, like, one, I guarantee you have some, you have a leaker. And two, I guarantee you have a spy, like a literal spy. Okay, let's talk about alignment. And let's talk about mechanistic interpability, which is the branch of which you, you guys specialize in. While you're answering this question, you might want to explain what mechanistic interpability is.

Starting point is 00:47:30 But just the broader question is mechanistically, what is alignment? Is it that you're locking in the model into a benevolent character? Are you disabling, deceptive circuits and procedures? Like what concretely is happening when you align a model? I think as with most things, you know, when we actually train a model to be aligned, we don't know what happens inside the model, right? There are different ways of training it to be aligned. but I think we don't really know what happens.

Starting point is 00:47:58 I mean, I think for some of the current methods, I think all the current methods that involve some kind of fine-tuning, of course, have the property that the underlying knowledge and abilities that we might be worried about don't disappear. It's just, you know, the model is just taught not to output them. I don't know if that's a fatal flaw or if, you know, or if that's just the way things have to be. I don't know what's going on inside mechanistically,

Starting point is 00:48:21 and I think that's the whole point of mechanistic interpretability, to really understand what's going on inside the models at the level of individual circuits. Eventually, when it's solved, what does a solution look like? What is it the case where if you're clawed for, you do the mechanistic interoperally thing and you're like, I'm satisfied. It's a line. What is it that you've seen? Yeah. So I think we don't know that yet.

Starting point is 00:48:44 I think we don't know enough to know that yet. I mean, I can give you a sketch for like what the process looks like as opposed to what the final result looks like. So I think verifiability is a lot of the challenge here, right? We have all these methods that purport to align AI systems and do succeed at doing so for today's tasks. But then the question is always if you had a more powerful model or if you had a model in a different situation, would it be aligned? And so I think this problem would be much easier if you had an Oracle that could just scan a model and say like, okay, I know this model is aligned. I know what it will do in every situation. then the problem would be much easier.

Starting point is 00:49:25 And I think the closest thing we have to that is something like mechanistic interpretability. It's not anywhere near up to the task yet. But I guess I would say, I think of it as almost like an extended training set and an extended test set, right? Everything we're doing, all the alignment methods we're doing are the training set, right? You know, you can run tests in them, but will it really work out of distribution? Will it really work in another situation? Mechanistic interpretability is the only thing that even in principle, And we're nowhere near there yet.

Starting point is 00:49:54 But even in principle is the thing where it's like, it's more like an x-ray of the model than a modification of the model, right? It's more like an assessment than an intervention. And so somehow we need to get into a dynamic where we have an extended test set, an extended training set, which is all these alignment methods, and an extended test set, which is kind of like you x-ray the model and say like, okay, what works and what didn't, in a way that goes beyond just the empirical test. that you've run, right?

Starting point is 00:50:24 Where you're saying, what is the, what is the model going to do in these situations? What is it within its capabilities to do instead of what did it do phenomenologically? And of course, we have to be careful about that, right? One of the things I think is very important is we should never train for interpretability because I think that is, that's taking away that advantage, right? You even have the problem, you know, similar to like validation versus test set where like, if you look at the x-ray too many times, you can interfere. But I think that's a much weaker.

Starting point is 00:50:56 We should worry about that, but that's a much weaker process. It's not automated optimization. We should just make sure, as with validation and test sets, that we don't look at the validation set too many times before running the test set. But, you know, that's, again, that's more of a, that's manual pressure rather than automated pressure. And so some solution where it's like we have some dynamic between the training, and test set where it's like we're trying things out and we really figure out if they work via a way of testing them that the model isn't optimizing against some some orthogonal way like if if

Starting point is 00:51:31 I if I think of and I think we're never going to have a guarantee but some process where we we do those things together again not in a stupid way there's lots of stupid ways to do this where you fool yourself but like some way to put extended training for alignment ability with extended testing for alignment ability together in a way that actually works. I still don't feel like I understand the intuition that why you think this is likely to work or this is a promising to pursue. And let me ask the question in a sort of more specific way and excuse the tortured analogy. But listen, if you're an economist and you want to understand the economy.

Starting point is 00:52:08 So you send a whole bunch of microeconomist out there. And one of them studies how the restaurant business works. One of them studies how the tourism business works. You know, one of them studies how the baking works. and at the end they all come together, and you still don't know where there's going to be a recession in five years or not. Why is this not like that where you have an understanding of, we understand how induction heads work and a two-layer transformer,

Starting point is 00:52:29 we understand, you know, modular arithmetic. How does this add up to, does this model want to kill us? Like, what does this model fundamentally want? A few things on that. I mean, I think that's like the right set of questions to ask. I think what we're hoping for in the end is not that we'll understand every detail. But again, I would give like the X-ray or the MRI. analogy, that like, we can be in a position where we can look at the broad features of the

Starting point is 00:52:54 model and say, like, is this a model whose internal state and plans are very different from what it externally represents itself to do, right? Is this a model where we're uncomfortable that, you know, far too much of its computational power is, you know, is devoted to doing what look like fairly destructive and manipulative things? Again, we don't know for sure whether that's possible, but I think some at least positive signs that it might be possible. Again, the model is not intentionally hiding from you, right? It might turn out that the training process hides it from you. And I can think of cases where if the model is really super intelligent, it like thinks in a way so that it like affects its own cognition. I suspect we should think about that. We should consider

Starting point is 00:53:39 everything. I suspect that it may roughly work to think of the model as, you know, if it's trained in the normal way, just at the just getting to just above human level, it may be a reasonably you should check. It may be a reasonable assumption that the internal structure of the model is not intentionally optimizing against us. And I give an analogy like to humans. So it's actually possible to, you know, to look at an MRI of someone and predict above random chance, whether they're a psychopath.

Starting point is 00:54:15 There was actually a story a few years back about a neuroscientist who was studying this. And they looked at his own scan and discovered that he was a psychopath. And then everyone in his life was like, no, no, no, that's just obvious. Like you're a complete asshole. Like he must be a psychopath. And he was totally unaware of this. The basic idea that, you know, that there can be these macro features that like psychopath is probably a good analogy for it. Right.

Starting point is 00:54:41 They're like, you know, this is what we'd be afraid of, model that's kind of like charming, the surface, very goal-oriented and, you know, very dark on the inside. You know, and, you know, on the surface, their behavior might look like the behavior of someone else, but their goals are very different. A question somebody might have is, listen, you know, you mentioned earlier the importance of being empirical. Yeah. And in this case, you're trying to estimate, you know, listen, are these activation suss? Yeah. But is this something we can be afford to be empirical about in, on, or do we need like a very good first principle theoretical reason to think no it's not just that these MRIs of the model correlate with you know being bad we need just like some just like deep root math proof that this is aligned so it depends what you mean by empirical I mean a better term would be phenomenological like I don't think we should be purely phenomenological in like you know here are some brain scans of like really dangerous models and here are some brain scans I think the whole idea of mechanistic interpretability is to look at the underlying principles and circuits. But I guess the way I think about it is like,

Starting point is 00:55:46 on one hand, I've actually always been a fan of studying these circuits at the lowest level of detail that we possibly can. And the reason for that is kind of that's how you build up knowledge, even if you're ultimately aiming for there's too many of these features. It's too complicated. At the end of the day, we're trying to build something broad. And we're trying to build some broad understanding. I think the way you build that up is by trying to make a lot of these. these very specific discoveries. Like you have to, you have to understand the building blocks and then you have to figure out how to kind of use that to draw these broad conclusions, even if you're not going to figure out everything. You know, I think you should probably talk to Chris

Starting point is 00:56:25 Ola, who would have much more detail, right? This is my kind of high level thinking on it. Like, Chris Ola controls the interpretability agenda. Like, you know, he's the one who decides what to, what to do on interpretability. This is my high level thinking about it, which is not going to be as good is his. Does the bull case on Anthropic rely on the fact that mechanistic interpretability is helpful for capabilities? I don't think so at all. Now, I do think, I think in principle it's possible that mechanistic interpretability could be helpful with capabilities. We might, for various reasons, not choose to talk about it if that were the case. That, you know, that wasn't something that I thought of, or that any of us thought of at the time of Anthropics founding. I mean, we thought of

Starting point is 00:57:08 ourselves as like, you know, we're people who are like good at scaling models and good at doing safety on top of those models. And like, you know, we think that we have a very high talent density of folks who are good at that. And, you know, my view has always been talent density beats talent mass. And so, you know, that's that's more, that's more of our bullcase. Talent density beats talent mass. I don't think it depends on some particular thing like others are starting to do mechanistic interpretability now. And I'm very glad that they are. You know, that was, that was a part of our a part of our theory of change is paradoxically to make other organizations more like us. Talent density, I'm sure, is important.

Starting point is 00:57:45 But another thing anthropic is emphasized is that you need to have frontier models in order to do safety research. And of course, like actually be a company as well. The current frontier models, something somebody might guess like GPD4 o'clock to like $100 million or something like that. That general order of magnitude in very broad terms is not wrong. But, you know, we're two to three years from now the kinds of things you're talking about. we're talking more and more orders of magnitude to keep up with that and to if it's the case that safety requires to be on the frontier. I mean, what is the case in which Anthropic is like competing with these Leviathans to stay on that same scale? I mean, I think it's a very, it's a situation

Starting point is 00:58:22 with a lot of tradeoffs, right? I think it's, I think it's not easy. I guess to go back, maybe I'll just like answer the questions one by one, right? So like to go back to like, you know, why it, why is safety so tied to scale, right? Some people don't think it is. But like, if I, if I just look at like, you know, where have been, where have been the areas that, you know, I don't know, like safety methods have like been put into practice or like worked for something for anything, even if we don't think they'll work in general. You know, I go back to thinking of all the ideas, you know, something like, you know, debate and amplification, right? You know, back in 2018 when we wrote papers about those at Open AI, I was like, well, human feedback isn't isn't quite going to work. but, you know, debate and amplification will take us beyond that. But then if you actually look at and we've, you know, done attempts to do debates,

Starting point is 00:59:11 we're really limited by the quality of the model, where it's like, you know, for two models to have a debate that is coherent enough that a human can judge it so that the training process can actually work, you need models that are at or maybe even beyond on some topics, the current frontier. Now, you can come up with the method. You can come up with the idea without being on the frontier. But, you know, for me, that's a very small fraction of what needs to be done, right? It's very easy to come up with these methods.

Starting point is 00:59:41 It's very easy to come up with like, oh, the problem is X, maybe a solution is Y. But, you know, I really want to know whether things work in practice, even for the systems we have today. And I want to know what kinds of things go wrong with them. I just feel like you discover 10 new ideas and 10 new ways that things are going to go wrong by trying these in practice. And that that empirical learning, I think it's just not as widely understood as it should be. Kind of every, you know, I would say the same thing about methods like constitutional AI. And some people say, oh, it doesn't matter. Like, we know this method doesn't work.

Starting point is 01:00:15 It won't work for, you know, pure alignment. I neither agree nor disagree with that. I think that's just kind of overconfident. The way we discover new things and understand the structure of what's going to work and what's, what's not is by playing around with things. Not that we should just kind of blindly say, oh, this rook here. and so it'll work there. But you really start to understand the patterns, like with the scaling laws.

Starting point is 01:00:37 Even mechanistic interpretability, which might be the one area I see where a lot of progress has been made without the frontier models. We're seeing in the work that, say, OpenAI put out a couple months ago, that, you know, using very powerful models to help you auto-interpret the weak models, again, that's not everything you can do in interpretability, but, you know, that's a, that's a, that's a, that's a big component of it. And we, you know, we found it useful too. And so you see this, this phenomenon over and over again where it's like, you know, the scaling and the safety are these two snakes that are like coiled with each other, always even more than you think, right?

Starting point is 01:01:17 You know, with interpretability, like, I think three years ago, I didn't think that this would be as true of interpretability. But somehow it manages to be true. Why? Because intelligence is useful. It's useful for a number of tasks. One of the tasks it's useful for is like, figuring out how to judge and evaluate other intelligence, and maybe someday even for, you know, doing the alignment research itself. Given all that's true, what does that imply for Anthropic when in two to three years? These Leviathans are doing like $10 billion training runs. Choice one is if it, if we can't or if it costs too much to stay on the frontier, then, you know, then we shouldn't do it. And, you know, we won't work with the most advanced models. We'll see

Starting point is 01:01:56 what we can get with, you know, models that are not quite as advanced. I think you can get some value there, like non-zero value, but I'm kind of skeptical that the value is all that high or the learning can be fast enough to really be in favor of the task. The second option is you just, you just find a way. You just, you know, you just accept the tradeoffs. And I think the tradeoffs are more positive than they appear because of a phenomenon that I've called race to the top. I could go into that later, but I'll just, let me put that aside for now. And then I think the third phenomenon is, you know, as things get, as things get to that scale, I think this may coincide with, you know, starting to get into some non-trivial probability of very serious danger.

Starting point is 01:02:43 Again, I think it's going to come first from misuse, the kind of bio stuff that I talked about, but I don't think we have the level of autonomy yet to worry about some of the, you know, alignment stuff happening in like two years, but it might not be very far behind that. at all. You know, that, that may, that may lead to unilateral or multilateral or government enforced, which we support, decisions not to scale as fast as we could. That may end up being the right thing to do. So, you know, actually that's kind of like, I kind of hope things go in that direction. And then we don't have this hard tradeoff between we're not in the frontier and we can't quite do the research as well as, as well as we want or influence other orgs as well as we want,

Starting point is 01:03:27 or versus we're kind of on the frontier and like have to accept the tradeoffs which are which are net positive but like have a lot in both in both directions. Okay on the misuse versus misalignment. Those are both problems as you mentioned. But in the long scheme of things, what what is what are you more concerned about like 30 years down the line? Which do you think will be considered a bigger problem? I think it's much less than 30 years. But I'm worried about both. I don't know if you have if you if you have a model that could in theory.

Starting point is 01:03:57 you know, like take over the world on its own. If you were able to control that model, then, you know, it follows pretty simply that, you know, if a model was following the wishes of some small subset of people and not others, then those people could use it to take over the world on their, on their behalf. The very premise of misalignment means that we should be worried about misuse as well with similar levels of consequences. But some people who might be more dumery than you would say misuse is you're already working towards the optimistic scenario there because you've at least figured out how to align the model with the bad guys. Now you just need to make sure it's aligned with the good guys instead.

Starting point is 01:04:36 Why do you think that you could get to the point where it's aligned with the back, you know, you haven't already solved it. I guess if you had the view that like alignment is completely unsolvable, then, you know, then you'd be like, well, I don't, you know, we're dead anyway, so I don't want to worry about misuse. That's not my position at all. but also like you should think in terms of like what's a plan that would actually succeed that would make things good. Any plan that actually succeeds, regardless of how hard misalignment is to solve, any problem, any plan that actually succeeds is going to need to solve misuse as well as misalignment. It's going to solve the fact that like as the AI models get better, you know, faster and faster, they're going to create a big problem around the balance of power between countries.

Starting point is 01:05:17 They're going to create a big problem around is it possible for a single individual? to do something bad that it's hard for everyone else to stop. Any actual solution that leads to a good future needs to solve those problems as well. If your perspective is we're screwed because we can't solve the first problem, so don't worry about problems two and three. Like that's not really a statement you shouldn't worry about problems two and three, right? Like they're in our path no matter what. Yeah, in the scenario we succeed.

Starting point is 01:05:43 We have to solve all. So yeah, we might as well operate. We should be planning for success, not for failure. If Mucis doesn't happen and the right people have the situation. superhuman models. What does that look like? Like, who are the right people? Who is actually controlling the model from five years from now? Yeah. I mean, my my view is that these things are powerful enough that I think, you know, it's, it's going to involve, you know, substantial role or at least involvement of, you know, some kind of government or assembly of government bodies. Again, like,

Starting point is 01:06:14 you know, there are kind of very naive versions of this. Like, you know, I don't think we should just, you know, I don't know, like, hand the model over to the UN or whoever happens to be in office at a given time. Like, I could see that go poorly. But there's, it's too powerful. There needs to be some kind of legitimate process for managing this technology, which, you know, includes the role of the people building it, includes the role of like democratically elected authorities, includes the role of, you know, all the, all the individuals who will be affected by it. So at the end of the day, there needs to be some politically legitimate process. But what does that look like?

Starting point is 01:06:53 If it's not the case that you just hand it to whoever the president is at the time, what is the body look like? I mean, is it something you're- These are things it's really hard to know ahead of time. Like, I think, you know,

Starting point is 01:07:04 people love to kind of propose these broad plans and say like, oh, this is the way we should do it. This is the way we should do it. I think the honest fact is that we're figuring this out as we go along. And that, you know, anyone who says,

Starting point is 01:07:15 you know, this is the body that, you know, we should create this. kind of body modeled after this thing. Like, I think, I think we should try things and experiment with them with less powerful versions of the technology. We need to figure this out in time, but, but also it's not really the kind of thing you can know in advance. The long-term benefit trust that you have, how would that interface with this body? Is that the body itself? If not, is it,

Starting point is 01:07:38 like, is it just for the context? You might want to explain what it is for the audience, but I don't know. I think that the long-term benefit trust is like a much, a much narrower thing. Like, this is something that, like, makes decisions for Anthropics. So this is basically a body is described in a recent Vox article. We'll be saying more about it in, you know, later, later this year. But it's basically a body that over time gains the ability to appoint the majority of the board seats of Anthropic. And this is so, you know, it's a mixture of experts in, I'd say, like, AI alignment, national security and philanthropy in general. But if control is handed to them, Anthropic that doesn't imply that control of if Anthropic has AGI, the control of AGI itself is

Starting point is 01:08:22 handed them. That doesn't imply that Anthropic or any other entity should be the entity that like makes decisions about AGI on behalf of humanity. I would think of those as different. I mean, there's lots of maybe, you know, like if Anthropic does play a broad role, then you'd want to like widen that body to be, you know, like a whole bunch of different people from around the world. Or maybe you can strew this as very narrow and then, you know, there's some like

Starting point is 01:08:43 broad committee somewhere that like manages all the AGOs. of all the companies on behalf of anyone. I don't know. Like I think my view is you shouldn't be sort of overly constructive and utopian. Like we're dealing with a new problem here. We need to start thinking now about, you know, what are the, what are the governmental bodies and structures that could deal with it. Okay.

Starting point is 01:09:06 So let's forget about governance. Let's just talk about what this going well looks like. Obviously, there's the things we can all agree on, you know, cure all the diseases, you know, solve all the problems. Everything, all humans would say, I'm down for that. Yeah. But now it's 2030. You've solved all the real problems that everybody can agree on.

Starting point is 01:09:22 What happens next? What are you doing with a superhuman god? I think I actually want to like, I don't know, like disagree with the framing or something like this. I actually get nervous when someone says, like, what are you going to do with the superhuman AI? Like, we've learned a lot of things over the last 150 years about like markets and democracy and each person can kind of define for themselves. like what the best way for them to have the human experience is and that, you know, societies work out norms and what they value in this, just in this very, like, complex and decentralized way. Now, again, if you have these safety problems, that can be a reason why,

Starting point is 01:10:00 you know, and especially from the government, there needs to be maybe until we've solved these problems a certain amount of, like, centralized control. But as a matter of, like, we've solved all the problems, now how do we make things good? I think that most, most, most people, most groups, most ideologies that started with like, let's sit down and think over what the definition of the good life is. Like I think most of those have led to disaster. But so this vision you have of a sort of tolerant liberal democracy market-oriented system with AGI. Like what is each person has their own AGI? Like what does that, what does that mean? I don't know. I don't know what it looks like, right? Like I guess what I'm saying is like,

Starting point is 01:10:38 we need to solve the kind of important safety problems and the important externalities. And then subject to that, you know, which again, you know, those could be just narrowly about alignment. There could be a bunch of economic issues that are super complicated and that we can't solve, you know, subject to that, like, we should think about what's worked in the past. And I think in general, like, unitary visions for what it means to live a good life have not worked out well at all. On the opposite end of things going well or good actors having control of AI, we might want to touch on China as like a potential actor. in the space. So first of all, I mean, being at Baidu and seeing progress in AI happening generally, why do you think the Chinese have underperformed? You know, Baidu had a scaling laws group many years back, or is the premise wrong? And I'm just not aware of the progress that's happening there.

Starting point is 01:11:30 Well, for the scaling laws group, I mean, that was an offshoot of the stuff we did with speech. So, you know, there were still some people there. But that was a mostly Americanized lab. I mean, I was there for a year. That was, you know, my first foray into deep learning. It was led by Andrewing. I never went to China. Most, you know, there's like a U.S. lab. So I think that was somewhat disconnected, although it was an attempt by, you know, a Chinese entity to kind of get into the game. But I don't know.

Starting point is 01:11:56 I think since then, you know, I couldn't speculate, but I think they've been maybe very commercially focused and not as focused on these kind of fundamental research side of things around scaling laws. Now, I do think because of all the, you know, excitement with the release of chat GPT, in November or so, you know, that's been a starting gun for them as well. And they're trying very aggressively to catch up now. I think we're the U.S. is quite substantially ahead. But I think they're trying very hard to catch up now. How do you think China thinks about AGI? Are they thinking about safety and misuse or not?

Starting point is 01:12:32 I don't really have a sense. You know, one concern I would have are people say things like, well, China isn't going to develop an AI because, you know, they like stability or, you know, they're going to have all these restrictions to make sure things are in line with what the CCP wants. You know, that might be true in the short term and for consumer products. My worry is that if the basic incentives are about national security and power, that's going to become clear sooner or later. And so, you know, I think they're going to, if they see this as, you know, a source of national power, they're going to at least try to do what's most effective

Starting point is 01:13:08 and that, you know, that could lead them in the direction of AGI. At what point, is it possible for them they just get your blueprints or your code base or something that they can just spin up their own lab that is competitive at the frontier with the leading American companies? Well, I don't know about fast, but I'm like, I'm concerned about this. So this is one reason why we're focusing so hard on cybersecurity. You know, we've worked with our cloud providers. We really, you know, like, you know, we have this blog post out out about security where we said, you know, we have a two key system for access to the model weights. We have other measures that we put in place or thinking or putting in place that, you know, we haven't announced.

Starting point is 01:13:45 We don't want an adversary to know about them, but we're happy to talk about them broadly. All this stuff we're doing is, is, by the way, not sufficient yet for a super determined state-level actor at all. I think it will defend against most attacks and against a state-level actor who's not, you know, who's less determined. But there's a lot more we need to do, and some of it may require new research. on how to do security. Okay, so let's talk about what it would take at that point. You know, we're at anthropic offices and, you know, it's like got good at security. We had to get badges and everything to come in here. But the eventual version of this building or a bunker or whatever where the AGI is built. I mean, what does that look like? Are we, is it a building in the middle

Starting point is 01:14:29 of San Francisco or is it you're out of the middle of Nevada or Arizona? Like, what is the point in which you're like Los Alamosing it? At one point, there was a running joke somewhere that, you know, the way the way building AGI would look like is, you know, there would be a data center next to a nuclear power plant next to a bunker. Yeah. And, you know, that we'd all kind of live in the bunker and everything would be local so it wouldn't get out on the internet. You know, again, if we, you know, if we take seriously the rate at which the, you know, the rate at which all this is going to happen, which I don't know, I can't be sure of it. But if we take that seriously, then it, you know, it does make me think that maybe not something quite as cartoonish as that,

Starting point is 01:15:08 but that something like that might happen. What is the timescale on which you think alignment is solvable? If these models are getting to human level in some things in two to three years, what is the point at which they're aligned? I think this is a really difficult question because I actually think often people are thinking about kind of alignment in the wrong way. I think there's a general feeling that it's like models are misaligned or like there's like an alignment problem to solve kind of like the remand hypothesis or something.

Starting point is 01:15:35 Like someday we'll crack the remand hypothesis. I don't quite think it's like that. Not in a way that's worse or better. It might be just as bad or just as unpredictable. When I think of like, you know, why am I scared? Few things I think of. One is, look, like, I think the thing that's really hard to argue with is like there will be powerful models.

Starting point is 01:15:58 They will be agentic. We're getting towards them. If such a model wanted to wreak havoc and destroy humanity or whatever, I think we have basically no ability to stop it. Like that's, that's, I think just, just, if that's not true at some point, it'll continue to be true as we, you know, it will reach the point where it's true as we scale the models. So that definitely seems the case. And I think a second thing that seems the case is that we seem to be bad at controlling the models, not in any particular way, but just there are statistical systems and you can ask a million things and they can say a million things in reply. And, you know,

Starting point is 01:16:34 you might not have thought of a millionth of one thing that does something crazy. Or when you train them, you train them in this very abstract way and you might not understand all the consequences of what they do in response to that. I mean, I think the best example we've seen of that is like being in Sydney, right? Where it's like, I don't know how they trained that model. I don't know what they did to make it do all this weird stuff like, you know, threaten people and, you know, have this kind of weird obsessive personality. But what it shows is that we can get something very different.

Starting point is 01:17:04 from and maybe opposite to what we intended. And so I actually think facts number one and fact number two are like enough to be really worried. Like you don't need all this detailed stuff about, you know, convergent instrumental goals or, you know, analogies to evolution. Like actually one and two for me are pretty motivated. I'm like, okay, this thing's going to be powerful. It could destroy us. And like, all the ones we've built so far like, you know, are at pretty decent risk of doing some random shit we don't understand. Yeah, if I agree with that, and I'm like, okay, I'm concerned about this, the research agenda you have of a mechanistic interpability plus, you know, constitution AI and the other LHF stuff, if you say that we're going to get something with like

Starting point is 01:17:48 biopens or something that could be dangerous in two to three years, yes. Do these things culminate within two to three years of actually meaningfully contributing to. Yes. So I think I think where I was going to go with this is like, you know, people talk about like doom by default or alignment by default, like, I think it might be kind of statistical. Like, you know, like you might get, you know, with the current models, you might get Bing or Sydney or you might get Claude. It doesn't really matter because Bing or Sydney, like, if we take our current understanding and, you know, move that to very powerful models, you might just be in this world where it's like, okay, you make something and depending on the details, maybe it's totally fine, you know, not really alignment by default,

Starting point is 01:18:28 but just kind of like it depends on a lot of the details. And like, if you, if you, if you, you're very careful about all those details and you know what you're doing, you're getting it right. But we have a high susceptibility to you mess something up in a way that you didn't really understand was connected to. Actually, instead of making all the humans happy, it wants to, you know, turn them into pumpkins. Yeah, I, you know, I just some weird shit, right? Because the models are so powerful, you know, they're like these kind of giants that are, you know, they're like, you know, they're standing in the landscape. And if they start to move their arms around randomly, they could just break everything.

Starting point is 01:18:59 I guess I'm starting it with that with that kind of framing because it's not like, I don't think we're aligned by default. I don't think we're doomed by default and have some problem we need to solve. It has some kind of different character. Now, what I do think is that hopefully within a timescale of two to three years, we get better at diagnosing when the models are good and when they're bad. We get better at training, you know, increasing our repertoire of methods to train the model that they're less likely to do bad things and more likely to do good things in a way that

Starting point is 01:19:31 isn't just relevant to the current models, but scales. And we can help develop that with interpretability as the test set. I don't think of it as, oh, man, we tried our LHF. It didn't work. We tried constitutional. It didn't work. Like, we tried this other thing. It didn't work. We tried mechanistic interpretability. Now we're going to try mechanistic. I think this frame of like, man, we haven't cracked the problem yet. We haven't solved the remand hypothesis isn't quite right. I think of it more as already with today's systems, we are not very good at controlling them. And the consequences of that could be, could be, could be very bad. We just need to get more ways of like increasing the likelihood that are that, you know,

Starting point is 01:20:12 that we can control our models and understand what's going on in them. And like we have some of them so far. They aren't that good yet. But, you know, I don't think of this is binary of like works and not works. we're going to develop more. And I do think that over the next two to three years, we're going to start eating that probability mass of ways things can go wrong. You know, it's kind of like in the core safety views paper.

Starting point is 01:20:35 There's a probability mass of how hard the problem is. I feel like that way of staying it isn't really even quite right, right? Because I don't feel like it's the remod hypothesis to solve. I, you know, I just feel like, you know, it's almost like right now if I try and, you know, juggle five balls or something. I can juggle three balls, right? I actually can. But I can't juggle five balls.

Starting point is 01:20:54 at all, right? You have to practice a lot to do that. If I were to do that, I would mostly I would, I would almost certainly drop them. And then just, just over time, you just get better at the task of controlling the balls. On that post in particular, what is your personal probability distribution over, so for the audience, the three possibilities are, it is like trivial to align these models with RLHF plus plus to, it is a difficult problem, but one that a big company could solve to something that is like basically impossible for human civilization currently to solve if I'm capturing those three. What is your probability distribution over those personally? Yeah, I mean, I'm not super into like what's your probability distribution of X.

Starting point is 01:21:33 I think all of those have enough likelihood that, you know, they should be considered seriously. I'm more interested. Question, I'm much more interested in is what could we learn that shifts probability mass between them? What is the answer to that? I think that one of the things mechanistic interpretability is going to do more than more than necessarily solve problems is it's going to to tell us what's going on when we try to align models. I think it's basically going to teach us about this. Like one way I can imagine concluding that things are very difficult is if mechanistic interpretability sort of shows us that, I don't know, problems tend to get moved around instead of being stamped out, or that you get rid of one problem, you create another one, or it might

Starting point is 01:22:20 inspire us or give us insight into why problems are kind of persistent or hard to eradicate or crop up. Like for me to really believe some of these stories about like, you know, oh, something will always, you know, there's always this convergent goal in this particular direction. I think the abstract story is, it's not uncompelling, but I don't find it really compelling either, nor do I find it necessary to motivate all the safety work. But like the kind of thing that would really be like, oh man, we can't solve this is like, we see it happening inside inside the x-ray because yeah because i i think right now there's just there's there's there's way there's way too many assumptions there's way too much overconfidence

Starting point is 01:22:59 about how all this is going to go um i have a substantial probability mass on this all goes wrong it's a complete disaster um but in a completely different way than anyone had anticipated it would be beside the point to ask like how could it go different than anyone anticipated so on this in particular what information would be relevant how much would the difficulty of aligning cloud three and the next generation of models basically be like is that a big piece of information is that not so so i think the people who are most worried are predicting that all the subhuman like ai models are going to be alignable right they're they're going to seem aligned they're going to deceive us in some way i think it it certainly gives us some information but uh i i am more interested in

Starting point is 01:23:43 what mechanistic interpretability can tell us um because uh again like You see this X-ray. It would be too strong to say it doesn't lie. But at least in the current systems, it doesn't feel like it's optimizing against us. There are exotic ways that it could. You know, I don't think anything is a safe bet here. But I think it's the closest we're going to get to something that isn't actively optimizing against us. Let's talk about the specific methods other than mechanistic interpability that you guys are researching. When we talk about RLHF or, you know, Constitution AI, whatever, RLHF plus plus, if you had to put it in terms of human psychology, what is the change that is happening? Are we creating new drives, new goals, new thoughts? How is the model changing in terms of psychology? I think all those terms are kind of like inadequate for, you know, describing what's, it's not clear how useful they are as abstractions for humans either.

Starting point is 01:24:38 I think we don't have the language to describe what's going on. And again, I'd love to have the x-ray. I'd love to look inside and say and kind of actually know what we're talking about instead of basically making up words, which is what, which is what I do, what am I doing you're doing and asking this question, where, you know, we should just be honest. We really have very little idea what we're talking about. So, you know, it would be great to say, well, what we actually mean by that is, you know, this circuit within here turns, you know, turns on and, you know, after we've trained the model, then, you know, this circuit is no longer operative or weaker and that

Starting point is 01:25:13 I would love to be able to say, again, we're going to take a lot of work to be able to do that. Model organisms, which you hinted at before when he said, we're doing these evaluations to see if they're capable of doing dangerous things now and currently not. How worried are you about a lab leak scenario where in fine-tuning it or in trying to get these models to elicit dangerous behaviors, you know, make bio-weapons or something, you like leak somehow and actually makes the bioweapon instead of telling you it can make the bioweapon? With today's passive models, I think it's not that much, you know, chatbots. It's not so much of a concern, right?

Starting point is 01:25:47 Because it's like, you know, if you were to fine tune to model, do that. We do it privately and we work with the experts. And so, you know, the leak would be like, you know, suppose the model got open sourced or something. And, you know, and then someone. So I think for now it's mostly a security issue. In terms of models truly being dangerous, I mean, you know, I think we do have to worry that it's like, you know, if we make a truly powerful model and we're trying to like see what makes it dangerous or safe, then there could be more of a one-shot thing where it's like, you know, some risk that the

Starting point is 01:26:19 model takes over. I think the main way to control that is to make sure that the capabilities of the model that we test are not such that they're capable of doing this. At what point would the capabilities be so high where you're, you say, I don't even want to test this. Well, there's different things. I mean, there's capability testing and, you know. But that itself could lead to it.

Starting point is 01:26:37 If you're testing and replicate that, like, what if it actually does? Sure. But I think, I mean, I think what you want to do is you want to like extrapolate. So we've talked with Arc about this, right? You know, you have like factors of two of compute or something where, you know, you're like, okay, you know, can the model do something like, you know, open up an account on AWS and like make some money for itself. Like some of the things that are like obvious prerequisites to like complete survival in the wild. And so just set those thresholds very well, you know, kind of very well below. And then as you proceed upward from there, do kind of more and more rigorous tests and be more and more careful about what it is you're doing.

Starting point is 01:27:16 On Constitution AI, and feel free to explain what this is for the audience, but who decides what the Constitution for the next generation of models or a potentially superhuman model is? How is that actually written? I think initially, you know, to make the Constitution, we just took some stuff that was like broadly agreed on, like the U.N. Charter of, you know, UN Declaration on Human Rights. and some of the stuff from Apple's terms of service, right? Stuff that's like, you know, consensus on like what's acceptable to say or like, you know, what basic things are able to be included. So one, I think for future constitutions, we're looking into like more participatory processes for making these. But I think beyond that, I don't think there should be like one constitution for like a model that everyone uses.

Starting point is 01:28:00 Like probably models constitution should be very, very simple, right? it should only have very basic facts that everyone would agree on. And then there should be a lot of ways that you can customize, including appending, you know, constitutions. And, and, you know, I think beyond that, we're developing new methods, right? This is, you know, I'm not imagining that this or this alone is the method that will use to train superhuman AI, right? Many of the parts of capability training may be different.

Starting point is 01:28:29 And so, you know, it could look very different. And, again, I'd go, they're like, there are levels above this. Like, I'm pretty uncomfortable with, like, here's the AI's constitution. It's going to run the world. Like that, you know, again, like just normal lessons from like how societies work and how politics works like that. That just kind of, yeah, that strikes me as fanciful. Like, you know, I think I think we should try to hook these things into, you know, even when

Starting point is 01:28:57 they're very powerful. Again, after we've mitigated the safety issues, like any good future, even if it's, has all these security issues that we need to solve, it somehow needs to end with something that's that's more decentralized and, you know, less like a godlike super, super, you know, I just, I just don't think that ends well. What scientists from the Manhattan Project do you respect most in terms of they acted most ethically under the constraints they were given? Is there one that comes to mind? I don't know. I mean, you know, I think there's, there's a lot of answers you could give. I mean, I'm definitely a fan of Zillard for having kind of figured it out.

Starting point is 01:29:34 He was then, you know, against the, against the actual dropping of the bomb. I don't actually know the history well enough to have an opinion on whether, you know, demonstration of the bomb could have, could have ended the war. I mean, that involves a bunch of facts about Imperial Japan that are, you know, that are, that are complicated and that I'm not an expert on. But, you know, Zillard seemed to, you know, he discovered this stuff really. He kept it secret, you know, you know, patented some of it and put it in the hands of the British admiralty. So, you know, he seemed to display the right kind of awareness as well as,

Starting point is 01:30:10 as well as discovering stuff. I mean, it was when I read that book that I kind of, you know, when I wrote this big blob of compute doc and many others, you know, I only showed it to a few people. And there were other docs that I showed to almost no one. So, you know, I, yeah, I was a bit, a bit inspired by this. Again, I mean, I, you know, we can all get self-aggrandizing here. Like, we don't know how it's going to turn out or if it's actually going to be actually going to be something on par with the Manhattan Project. I mean, you know, this could all be just Silicon Valley people building technology and, you know, just kind of like having delusions of grandeur. So I don't know how it's going to turn out. I mean, if the scaling stuff is true, then it's more bigger than the Manhattan

Starting point is 01:30:48 project. Yeah. Yeah. It certainly, it certainly could be bigger. I just, you know, we should always kind of, I don't know, maintain this attitude that it's, it's really easy to fool yourself. If you were asked by the government, if you're a physicist during World War II and you were asked by the government to contribute non-replaceable research to the Manhattan Project. Well, what do you think you would have said? Yeah, I mean, I think given you're in a war with the Nazis, at least during the period when you thought that the Nazis were going to, I don't, yeah, I don't really see much choice, but to do it.

Starting point is 01:31:15 If it's possible, you know, you have to figure it's going to be done within 10 years or so by someone. Regarding cybersecurity, what should we make of the fact that there's a whole bunch of tech companies which have ordinary tech company security policy that publicly seeming, it's not obvious said they've been hacked, like Coinbase still has its Bitcoin, you know, Google, as far as I know, my Gmail hasn't been leaked. Should we take from that that current status code tech company security practices are good enough for AGI or just simply that nobody has tried hard enough? It would be hard for me to speak to, you know, current tech company practices. And of course, there may be many attacks that we don't know about where things are stolen and then silently used.

Starting point is 01:31:58 You know, I mean, I think an indication of it is when someone really cares, basically cares about attacking someone, then often the attacks happen. So, you know, recently we saw that some fairly high officials of the U.S. government had their email accounts hacked via Microsoft. Microsoft was providing the email accounts. So, you know, presumably that related to information that was, you know, of great interest to, you know, to foreign adversaries. And so it sounds, it seems to me, at least, you know, that the evidence is more consistent with, you know, when something is, really high enough value, then, you know, then, then, you know, someone acts and it's stolen. And my worry is that, of course, with, with AGI will get to a world where, you know, the value is

Starting point is 01:32:43 seen as incredibly high, right? That, you know, it'll be like stealing nuclear missiles or something. You can't be too careful on this stuff. And, you know, at every place that I've worked, I push for the cybersecurity to be better. One of my concerns about cybersecurity is, you know, it's not, it's not kind of something you can trump it. I think a good dynamic with safety. research is like, you know, you can get companies into a dynamic, and I think we have, where, you know, you can get them to compete to do the best safety research and, you know, kind of use it as a, I don't know, like a recruiting point of competition or something. We used to do this all the time with interpretability, you know, and then sooner or later,

Starting point is 01:33:21 other orgs started recognizing the defect and started working on interpretability, whether or not, you know, like whether or not that was a priority to them before. But I think it's harder to do that with cybersecurity because a bunch of the stuff you have to do in quiet. And so, you know, we did try to put out one post about it. But I think, you know, mostly you just, you just see the results. You know, I think people should, you know, a good norm would be, you know, people see the cybersecurity leaks from companies or, you know, leaks of the model parameters or something and say, you know, they screwed up. That's, that's bad. If I'm a safety person, I might not want to work there. Of course, as soon as I say that, we'll probably have a security breach tomorrow. But, you know, but that's, that's part of the game here, right?

Starting point is 01:34:04 That's, I think that's part of, you know, trying to make things safe. I want to go back to the thing we're talking about earlier where the ultimate level of cybersecurity required for two to three years from now. And whether it requires a bunk, like, are you actually expecting to be in a physical bunker in two to three years? Or is that just a metaphor? Yeah. I mean, I think, I think that's a metaphor. You know, we're still figuring it out. Like something I would think about is like, I think security of the data center, which may not be in the same physical location as us, but, you know, we've worked very hard to make sure it's in the United States.

Starting point is 01:34:35 But securing the physical data centers and the GPUs, I think some of the really expensive attacks, if someone was really determined, just involved going into the data center and just, you know, trying to steal the data data directly or as it's flowing from a data center to, you know, to us. I think these data centers are going to have to be built in a very special way. I mean, given the way things are scaling up, you know, we're probably any way heading to a world where, you know, the, you know, networks of data centers, you know, costs as much as aircraft carriers or something. And so, you know, they're already going to be pretty unusual objects. But I think addition to being unusual in terms of their ability, you know,

Starting point is 01:35:14 to link together and train gigantic, gigantic models that are also going to have to be very secure. Speaking of which, how, you know, there's been sorts of rumors on the difficulty of procuring the power and the GPUs for the next generation of models, what has the process been like to secure the necessary components to do the next generation? That's something I can't go into great detail about. I will say, look, like, you know, people think of even industrial-scale data centers, right? People are not thinking at the scale that I think these models are going to go to very soon. And so whenever you do something in a scale where it's never been done before, you know, every single component, every single thing,

Starting point is 01:35:52 has to be done in a new way than it was before. And so, you know, you may, you may, you may, you may run into problems with, you know, surprisingly simple components. Power is one that you mentioned. And is this something that Anthropic has to handle or can you just outsource it? You know, I mean, for data centers, we work with cloud providers, for instance. What should we make about the fact that these models require so much training and the entire corpus of internet data in order to be subhuman, whereas, you know, if GPD4, there's been estimates that, you know, it was like 10 to 25 flops or something where, you know, whereas you, I mean, you can take these numbers through grain of salt, but there's reports that, you know,

Starting point is 01:36:31 human brain from the time it is born to the time a human being is 20 years old, that's like on the order of 10 to the 20 flops to simulate all those interactions. You don't have to go in the particular zone of those numbers, but should we be worried about how sample inefficient these models seem to be? Yeah, so I think that's one of the remaining mysteries. One way you could phrase it is that the models are maybe two to three orders of magnitude smaller than the human brain, if you compare to the number of synapses, while at the same time being trained on, you know, three to four more orders of magnitude of data. If you compare to, you know, number of words, human sees as they're developing to age 18, it's, I don't remember exactly, but I think it's in the

Starting point is 01:37:13 hundreds of millions, whereas for the models, we're talking about the hundreds of billions, the trillions. So what explains this? There are these offsetting things where the models are smaller. They need a lot more data. And they're still below human level. But so, you know, there's some way in which, you know, the analogy to the brain is not quite right or is breaking down or there's some, there's some missing factor. You know, this is just kind of like in physics where it's like, you know, we can't explain the Mickelson morally experiment or like, I'm forgetting one of the other 19th century physics paradoxes, but like, I think it's one thing we don't quite understand, right? Humans see so little data and they still do fine. One theory on it, it could be that it,

Starting point is 01:37:56 you know, it's like our other modalities. You know, how do we get, you know, 10 to the 14th bits into the human brain? Well, most of it is kind of these images and maybe a lot of what's going on inside the human brain is like, you know, our mental workspace involves all these, these, you know, these simulated images or something like that. But honestly, I think intellectually, we have to admit that that's a weird thing that doesn't match up. And, you know, it's one reason I'm a bit skeptical, a kind of biological analogies. I thought in terms of them like five or six years ago. But now that we actually have these models in front of us as artifacts, it feels like almost all the evidence from that has been screened off by what we've seen.

Starting point is 01:38:34 And what we've seen are models that are much smaller than the human brain and yet can do a lot of the things that humans can do. And yet, paradoxically, require a lot more data. So maybe we'll discover something that makes it all efficient or maybe we'll understand why the discrepancy is present. But at the end of the day, I don't think it matters, right? If we keep scaling the way we are, I think what's more relevant at this point is just measuring the abilities of the model and seeing how far they are from humans. And they don't seem terribly far to me. Does this scaling picture and the big blob of compute more generally, does that under-emphasize the role that algorithmic progress has played when you compose the big, blob of compute. So, you know, you're talking about LSTMs, presumably at that point.

Starting point is 01:39:16 Presumably, the scaling on that would not have you at cloud two at this point. So are you under emphasizing the role that an improvement of the scale of transformer could be having here when you put it up behind the label of scaling? This big blob of compute document, which I still have not made public, I probably should for like historical reasons. I don't think it would tell anyone anything they don't know now. But when I wrote it, I actually said, look, there are seven factors that, and, you know, I wasn't, I wasn't like, these are the factors, but I was just like, let me give some sense of the kinds of things that matter and what don't. And so I wasn't thinking, like, these are the, you know, there could be nine, there could be five. But like, the things I said

Starting point is 01:39:53 were I said number of parameters, scale of the model, like, you know, the compute and compute matters. Quantity of data matters. Quality of data matters. Loss function matters. So like, you know, are you doing R.L? Or are you doing next word prediction? If your loss function, isn't rich or doesn't incentivize the right thing. You won't get anything. So those were the key four ones, which I think are the corridor hypothesis. But then I said three more things. One was symmetries, which is basically like if your architecture doesn't take into account the right kinds of symmetries, it doesn't work.

Starting point is 01:40:30 Or it's very inefficient. So, for example, convolutional neural networks take into account translational symmetry. LSTM is taking into account time symmetry. But a weakness of LSTMs is that they can't attend over the whole context. So there's kind of this structural weakness. Like if a model isn't structurally capable of like absorbing and managing things that happened in a far enough distant past, then it's just like it's kind of like, you know, like the compute doesn't flow.

Starting point is 01:40:59 Like the spice doesn't flow. It's like you can't like like the blob has to be unencumbered, right? It kind of it's not it's not going to work if if you artificially close things. off. And I think RNNs and LSTMs artificially closed things off because they close you off to the distant past. And so, again, things need to flow freely. If they don't, it doesn't work. And then, you know, I added a couple things. One of them was like conditioning, which is like, you know, if you're, if the thing you're optimizing with is just really numerically bad, like you're going to have trouble. And so this is why, like, Adam works better than, you know, than normal SDD. And I think I'm forgetting what the seventh

Starting point is 01:41:38 condition was, but it was similar to things like this where it's like, you know, if you, if you set things up in kind of a way that's that's set up to fail or that doesn't allow the compute to work in an uninhibited way, then it won't work. And so transformers were kind of within that, even though I can't remember if the transformer paper had been published. It was around the same time as I wrote that document. It might have been just before. It might have been just after. It sounds like from that view that the, the way to think about these algorithmic progress is not as increasing the power of the blob of compute, but simply getting rid of the artificial hindrances

Starting point is 01:42:15 that older architectures have? Is that a fair way to that's a little, yeah, that's a little how I think about it. You know, again, if you go back to like, Ilius, like the models want to learn. Yeah, yeah, yeah. Like the compute wants to be free. Yeah, yeah, yeah.

Starting point is 01:42:26 And like, you know, it's being blocked in various ways where you like don't understand that it's being blocked and so you need to like free it up. Right, right. I love the de iridians, change it to spice. Okay. on that point though so do you think that another thing on the scale of a transformer is coming down the pike to enable the next the next great iterations i think it's possible i mean people have worked

Starting point is 01:42:51 on things like you know trying to model very long time dependencies or you know you know there's various different ideas where i could see that we're kind of missing an efficient way of representing or dealing with something so i think those inventions are possible I guess my perspective would be, even if they don't happen, we're all, we're already on this very, very steep trajectory. And so I'm less, I mean, we're constantly trying to discover them as are others. But things are already on such a fast trajectory. All that would do is speed up the trajectory even more. And probably, probably not by that much because it's already going so fast.

Starting point is 01:43:28 Is something embodied or having an embodied version of a model? Is that at all important in terms of getting either data or progress? I'd think of that less in terms of the, you know, like a new architecture and more in terms of like a loss function, like the data, the environments you're exposing yourself to end up being very different. And so I think that could be important for learning some skills, although data acquisition is hard. And so things have gone through the language route. And I would guess will continue to go through the language route even as, you know, even as as as more as possible in terms of embodiment. And then the other possibilities I mentioned, R.L. you can see it as yeah i mean we we kind of already do rl with rlhf right people are like is in alignment

Starting point is 01:44:10 as capabilities i always think in terms of the two snakes right they're they're kind of often hard to distinguish so we already kind of use rl in these language models but i think we've used r l less in terms of getting them to take actions and you know do things in the world but you know when you take actions over a long period of time and understand the consequences of those actions only later than you know rl is a typical tool we have for that so i would guess that in terms of models taking action in the world, that RL will become a thing with all the power and all the safety issues that come with it. When you project out in the future, do you see the way in which these things will be integrated

Starting point is 01:44:47 into productive supply chains? Do you see them talking with each other and criticizing each other and contributing to each other's output? Or is it just the model one shots, one shot's the answer or the work? Models will undertake extended tasks. That will have to be the case. I mean, we may want to limit that to some extent because it may make some of the safety problems easier. But, you know, some of that I think will be required.

Starting point is 01:45:13 In terms of our models talking to models or are they talking to humans, again, this goes kind of out of the technical realm and into the like sociocultural economic realm where my heuristic is always that it's very, very difficult to predict things. And so I feel like these scaling laws have been very predictable. And when you say like, well, you know, when is there going to be a commercial explosion in these models or what's the form it's going to be or are the models going to do things instead of humans or pairing with humans? I feel like certainly my track record on predicting these things is terrible. But I also looking around, I don't really see anyone who's track record is great. You mentioned how fast progress is happening, but also the difficulties of integrating within the existing economy into the way things work. do you think there will be enough time to actually have large revenues from AI products before the next model is just so much better or we're in like a different landscape entirely?

Starting point is 01:46:12 It depends what you mean by by large, right? You know, I think multiple companies are already in the, you know, 100 million to billion per year range. Will it get to the 100 billion or trillion range, you know, before I, that stuff is just so hard to predict, right? And it's, it's, it's, it's not even super well defined. Like, you know, I think right now there are companies that are throwing a lot of money at, at generative AI, you know, as, as, as customers. But and, and they'll, you know, I think, I think that's the right thing for them to do. And they'll, you know, they'll find uses for it.

Starting point is 01:46:43 But it doesn't mean they're, doesn't mean it's, you know, they're finding uses or the best uses from day one. So even money changing hands is not, is not quite the same thing as economic value being created. But surely you've thought about this from the perspective of Anthropic, where if these things are happening so fast, then it should be an insane valuation, right? Even us who have, you know, not been super focused on commercialization and more on safety,

Starting point is 01:47:05 I mean, you know, the graph goes up. And it goes up, it goes up relatively quickly. Yeah. So, you know, I can only imagine what's happening that, you know, the orgs or, you know, this is, this is their singular focus. So it's certainly happening fast, but, you know, again, it's like it's the exponential from the small base, while the technology itself is moving fast.

Starting point is 01:47:28 So it's kind of a race between how fast the technology is getting better and how fast it's integrated into the economy. And I think that's just a very unstable and turbulent process. Both things are going to happen fast. But if you ask me exactly how it's going to play out, exactly what order of things are going to happen, I don't know. And I'm kind of skeptical of the ability to predict.

Starting point is 01:47:50 I'm kind of curious with regards to Anthropics specifically, you're a public benefit corporation. Yes. And rightfully so you want to make sure that this is an important technology. Obviously, the only thing you want to care about is not sure of other value. But how do you talk to investors who are putting in like hundreds of millions, billions of dollars of money? Like how do you talk to them about the fact that how do you get them to put in this amount

Starting point is 01:48:12 of money without the shareholder value being the main concern? So I think the LTBT is, you know, the right thing on this, right? You know, I mean, we're going to talk more about the LTBT, but like some version of that has been in development since the beginning. of anthropic even formally, right? And so, you know, from from the beginning, you know, even as the body has changed in some ways, it's like from the beginning, it was like, this body's going to exist. And it's, you know, it's unusual. Like every traditional investor who invests in anthropic, you know, has to, you know, looks at this. Some of them are just like, whatever, you

Starting point is 01:48:46 run your company how you want. Some of them are like, you know, oh my God, like this, you know, this body of random people or to them random people could like, you know, could could move anthropic in a direction that's, you know, that's totally contrary to our. And now there are, there are legal limits on that, of course. But, you know, we have to have this conversation with every investor. And then it gets into a conversation of, well, what are the kinds of things that, you know, that we might do that would be contrary to the, to the, you know, to the interest of traditional investors. And just having those conversations has helped get everyone on the same page. I want to talk about the physics and the fact that so many of the founders and the employees

Starting point is 01:49:25 that Anthropic are physicists, what is the, I mean, we talked in the beginning about the scaling laws and how the power laws from physics are something you see here, but, you know, what are the actual like approaches and ways of thinking from physics that seem to have carried over so well? Is that notion of effective theory is super useful? You know, what is going on here? I mean, I think part of it is just physicists learn things really fast. We have generally found that You know, if we hire, you know, someone who is a, you know, physics PhD or something that they can, they can learn ML and contribute just very, very quickly in most cases. And, you know, because several are founders, myself, Jared Kaplan, Sam McCandlish were physicists. We knew a lot of other physicists.

Starting point is 01:50:05 And so we were able to hire them. And now there's, I don't know how many is exactly, you know, might be 30 or 40 of them here. ML is not still not yet a field that has an enormous amount of depth. And so they've been able to get up to speed very quickly. Are you concerned that there's like a lot of people who would have been doing physics or something, whatever, they go into finance instead. And since Anthropic exists, they have now been recruited to go into AI. And, you know, you're, you obviously care about AI safety, but, you know, maybe in the future they leave and they get funded to do their own thing. Is that a concern that you're bringing more people into the ecosystem here?

Starting point is 01:50:42 Yeah, I mean, you know, I think there's like a broad set of action, you know, like we're causing GPUs to exist. You know, there's a lot of kind of side effects that you can't, that you can't currently control or that you just incur if you buy into the idea that you need to build frontier models. And that's one of them. A lot of them would have happened anyway. I mean, finance was a hot thing 20 years ago. So physicists were doing it. Now, ML is the hot thing. And, you know, it's not like we've caused them to do it when they had no interest previously.

Starting point is 01:51:08 But, you know, again, you know, at the margin, you're kind of, you're kind of bidding things up. And, you know, a lot of that would have happened anyway. some of it, some of it wouldn't, but it's all part of the calculus. Do you think that cloud has conscious experience? How likely do you think that is? This is another of these questions that just seems very unsettled and uncertain. One thing I'll tell you is I used to think that we didn't have to worry about this at all until models were kind of like operating in rich environments, like not necessarily embodied,

Starting point is 01:51:35 but like that, you know, they needed to like have a reward function and like have kind of long lived experience. So I still think that might be the case. but the more we've looked at kind of these language models and particularly looked inside them to see things like induction heads, a lot of the cognitive machinery that you would need for active agents seems kind of already present in the base language models. So I'm not quite as sure as I was before that we're missing the things that, you know, that we're missing enough of the things that you would need. I think today's models just probably aren't smart enough that we should worry about this too much,

Starting point is 01:52:11 but I'm not 100% sure about this. And I do think the models will get in a year or two, like this might be a very real concern. What would change if you found out that they are conscious? Are you worried that you're pushing the negative gradient to suffering? Like what is? Conscious is, again, one of these words that I suspect it will like not end up having a well-defined meaning. But it's like something to be clawed.

Starting point is 01:52:32 Yeah, but that, yeah. Well, I suspect that's a spectrum, right? So I don't know. If we, if we discover like that, you know, that I should care about, let's say we discover that I should care about Claude's experience as much as I should care about like a dog or a monkey or something. Yeah, I would be, I would be kind of worried. I don't know if their experience is positive or negative. Unsettlingly, I also don't know, like, I wouldn't know if any intervention that we made was more likely to make Claude, you know, have a positive

Starting point is 01:53:03 versus negative experience versus not having one. If there's an area that is helpful with this, it's maybe mechanistic interpretability, because I think of it as neuroscience for models. And so it's possible that we could, we could shed some light on this, although, you know, it's not, it's not a straightforward factual question, right? It kind of depends what we mean and what we value. We talked about this initially, but I, I want to get more specific. We talked initially about, you know, now that you're seeing these capabilities ramp up within the human spectrum, you think that the human spectrum is wider than we thought. But yeah, more specifically, what have you, how is the way you think about human intelligence different? Now, the way

Starting point is 01:53:41 you're seeing these marginal useful abilities emerge, how does that change your picture of what intelligence is? I think for me, the big realization on what intelligence is came with the like blob of compute thing, right? Like it's not, you know, there might be all these separate modules. There might be all this complexity. You know, it's, you know, Rich Sutton called it the bitter lesson, right? It's almost called, has many names. It's been called the scaling hypothesis. Like, the first few people who figured it out was around 2017. I mean, you could go further back to, I think, I think Shane Legg was maybe the first person who really knew it. Maybe Ray Kurzweil, although in a very vague way.

Starting point is 01:54:17 But, you know, I think the number of people who understood it went up a lot around 2014 to 2017. But I think that was the big realization. It's like, you know, well, how did intelligence evolve? Well, if you don't need very specific conditions to create it, if you can create it just from like the right kind of the right kind of gradient loss signal, then of course it's so mysterious how it all happened in terms, you know, it had this click of scientific understanding. In terms of like watching what the models can do, how has it changed my view of human intelligence,

Starting point is 01:54:52 I wish I had something more intelligent to say on that. I feel like, I don't know, one thing that's been surprising is like, I thought things might click into place a little more than they do. Like, you know, I thought like different cognitive abilities might all be connected and there was more of one secret behind them, but it's, it's like, the model just learns various things at different times, you know, and it can be like very good at coding, but like, you know, it can't, it can't quite, you know, prove the prime number theorem yet. And I don't, I mean, I guess it's a little bit the same for, for humans, although it's, it's weird, the juxtaposition of things it can do and not. I guess the main lesson is like having theories of intelligence or how

Starting point is 01:55:32 intelligence works. Like, again, a lot of these words just, just kind of like dissolve into a continuum, right? They just kind of like dematerialize. I think less in terms of intelligence and more in terms of what we see in front of us. Yeah. No, it's really surprising to me. Two things. One is how discreet these like different paths of intelligent things that contribute to Lossauer rather than just being like one reasoning circuit or one general intelligence. And the other thing talking with you that is surprising or interesting is many years from now, it'll be one of those things that looking back, it'll be, why wasn't this obvious to you? If you're seeing these smooth scaling curves,

Starting point is 01:56:11 why at the time where you're not completely convinced? So you've been less public than the CEOs of other AI companies. You know, you're not posting on Twitter. You're not doing a lot of podcasts except for this one. What gives? Like, why are you off the radar? Yeah, I aspire to this and I'm proud of this. If people think of me as kind of like boring and low profile,

Starting point is 01:56:33 like this is actually kind of what I want. So I don't know, I've just seen a number of cases, a number of people I've worked with that I think you could say Twitter, although I think I mean a broader thing, like just kind of like attaching your incentives very strongly to like the approval or cheering of a crowd. I think that can destroy your mind and in some cases it can destroy your soul. And so I think I kind of deliberately tried to be a little bit low profile because I want to, I don't know, kind of like defend my ability to think. about things intellectually in a way that's different from other people and isn't kind of tinged by the approval of other people. So, you know, I've seen cases of folks who are deep learning skeptics and they become known as deep learning skeptics on Twitter. And then even as it starts to become clear to me, they kind of sort of changed their mind. They like, this is their thing on

Starting point is 01:57:26 Twitter and they can't change their Twitter persona and so forth and so on. I don't really like the trend of kind of like personalizing companies like the whole, you know, know, like cage match between CEO's approach. Like, I think it, it distracts people from the actual merits and concerns of, like, the, you know, the company in question. Like, I kind of want people to, like, judge the, like, nameless bureaucratic institution. You know, I want people to think in terms of the nameless bureaucratic institution and its incentives more than they think in terms of me.

Starting point is 01:57:59 Everyone wants a friendly face, but actually I think friendly faces can be misleading. Okay. Well, in this case, this will be a misleading interview because this has been a lot of fun. You're like a blast to talk to. Indeed. Yeah, this isn't a blast. I'm super glad you came on the podcast and hope people enjoyed. Thanks.

Starting point is 01:58:15 Thanks for having me. Hey, everybody. I hope you enjoyed that episode. As always, the most helpful thing you can do is to share the podcast. Send it to people you think might enjoy it, put it in Twitter, your group chats, etc. It just splits the world. I appreciate you listening. I'll see you next time.

Starting point is 01:58:32 Cheers.

Dwarkesh Podcast - Dario Amodei (Anthropic CEO) — The hidden pattern behind every AI breakthrough

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.