Your Undivided Attention - Daniel Kokotajlo Forecasts the End of Human Dominance

Starting point is 00:00:00 Open AI, Anthropic, and to some extent GEM are explicitly trying to build super intelligence to transform the world. And many of the leaders of these companies, many of the researchers at these companies, and then hundreds of academics and so forth in AI have all signed the statement saying this could kill everyone. And so we've got these important facts that people need to understand. These people are building superintelligence, what does that even look like, and how could that possibly result in killing us all? We've written this scenario depicting what that might look like.

Starting point is 00:00:38 It's actually my best guess as to what the future will look like. Hey, everyone, this is Tristan Harris. And this is Daniel Barquet. Welcome to your undivided attention. So a couple months ago, AI researcher and futurist Daniel Kokatello and a team of experts at the AI Futures Project released a document online called AI 2027. And it's a work of speculative futurism that's forecasting two possible outcomes of the current AI arms race that we're in. And the point was to lay out this picture of what might realistically happen if the different pressures that drove the AI race

Starting point is 00:01:14 all went really quickly and to show how those different pressures interrelate. So how economic competition, how geopolitical intrigue, how acceleration of AI research, and the inadequacy of AI safety research, how all those things come together to produce a radically different future that we aren't prepared to handle and are even prepared to think about. So in this work, there's two different scenarios, and one's a little bit more hopeful than the other, but they're both pretty dark. I mean, one ends with a newly empowered, super-intelligent AI that surpasses human intelligence in all domains and ultimately causing the end of human life on Earth. So Tristan, what was it like for you to read this document?

Starting point is 00:01:51 Well, I feel like the answer to that question has to start with a deep breath. I mean, it's easy to just go past that last thing we just read, right? It's just ultimately causing the end of human life on Earth. And I wish I could say that this is a total embellishment, this is exaggeration, this is, you know, just alarmism, chicken little. But, you know, being in San Francisco talking to people in the AI community and people who have been in this field for a long time, they do think about this. in a very serious way. I think one of the challenges with this report, which I think really does a brilliant job

Starting point is 00:02:27 of outlining the competitive pressures and the steps that push us to those kinds of scenarios. I think the thing for most people's when they hear the end of human life on Earth, they're like, what is the AI going to do? It's just a box sitting there, computing things. If it's going to do something dangerous, don't we just pull the plug on the box?

Starting point is 00:02:42 And I think that's what's so hard about this problem is that the ways in which something that is so much smarter than you could end life on Earth is just outside of it. Imagine like Chimp, birth a new species called Homo sapiens. And they're like, okay, well, this is going to be, like, a smarter version of us. But, like, what's the worst thing it's going to do?

Starting point is 00:02:58 It was going to steal all the bananas? And, like, you can't imagine computations, semiconductors, drones, airplanes, nuclear weapons. Like, from the perspective of a chimpanzee, your mind literally can't imagine past, like, someone taking all the bananas. So I think there's a way in which, you know, this whole domain is fraught with just a difficulty of imagination and also of kind of not dissociating. or delegitimizing or nervous laughtering or kind of bypassing a situation

Starting point is 00:03:26 that we have to contend with because I think the premise of what Daniel did here is not to just scare everybody, it's to say if the current path is heading this direction, how do we clarify that so much so we can choose a different path? Yeah, you know, when you're reading a report that is this dark and this scary,

Starting point is 00:03:41 it's possible to have so many different reactions to this. Oh my God, is it true? Is it really going to move this fast? Are these people just sort of in sci-fi land? But I think the important part of sitting with this is not is the timeline right? It's how all these different incentives, the geopolitical incentives, the economic pressures, how they all come together. And we could do a step by step of the story, but there's so many different dynamics. There's dynamics of how AI accelerates AI research itself,

Starting point is 00:04:05 and dynamics of how we lean more on AI to train the next generation of AI, and we begin to lose understandability and control on AI development itself. There's geopolitical intrigue on how China ends up stealing AIs from the U.S. or how China ends up realizing that it needs to centralize its data centers where the U.S. has more lax security standards. You know, we recognize that this can be a lot to swallow, and it can really seem like a work of pure fiction or fantasy. But these scenarios are based on real analysis of the game theory and how different people might act.

Starting point is 00:04:36 But there are some assumptions in here, right? There are critical assumptions that decisions that are made by corporate actors or geopolitical actors are really the decisive ones, that citizens everywhere may not have a meaningful chance to push back on their autonomy being given away to suit. super-intelligence. And you know, AI timelines are incredibly uncertain, and the pace of AI 2027 as a scenario is one of the more aggressive predictions that we've seen. But to reiterate, the purpose of AI 2027 was to show how quickly this might happen. Now, Daniel himself has already pushed back his predictions by a year. And as you'll hear in the conversation,

Starting point is 00:05:10 he acknowledges the uncertainties here, and he sees them as far from being a sure thing. I think that Daniel and CHT really share a deep intention here, which is that if we're unclear about which way the current tracks of the future take us, then we'll lead to an unconscious future. And in this case, we need to paint a very clear picture of how the current incentives and competitive pressures actually take us to a place that no one really wants, including between the U.S. and China. And we at CHT hope that policymakers and titans of industry and civil society will take on board the clarity about where these current train tracks are heading and ask, do we have the adequate protections in place to avoid this scenario? And if we don't,

Starting point is 00:05:50 then that's what we have to do right now. Daniel, welcome to your undivided attention. Thanks for having me. So just to get started, could you just let us know a little bit about who you are and your background? Prior to AI features, I was working at OpenAI doing a combination of forecasting, governance, and alignment research. Prior to OpenAI, I was at a series of small research nonprofits thinking about the future of AI. Prior to that, I was studying philosophy in grad school. I just want to say that when I first met you, Daniel,

Starting point is 00:06:24 at a sort of a community of people who work on future AI issues and AI safety, you're working at open AI at the time. And I thought to myself, I think you even said, actually, when we met, because you said that basically, if things were to go off the rails, you would leave open AI, and you would basically do whatever would be necessary for this to go well, for society and humanity. And I consider you to be someone of very deep integrity,

Starting point is 00:06:47 because you ended up doing that and you forfeited millions of dollars of stock options in order to warn the public about a year ago in a New York Times article. And I just wanted to let people know about that in your background that you're not someone who's trying to get attention. You're someone who cares deeply about the future. You want to talk a little bit about that choice, by the way?

Starting point is 00:07:05 Was that hard for you to leave? I don't think that I left because things had went off the rails so much as I left because it seemed like the rails that we were on were headed to a bad place. And in particular, I left because I thought that something like what's depicted in AI 2027 would happen. And that's just like basically the implicit and in some cases explicit plan of Open AI and also to some extent these other companies. And I think that's an incredibly dangerous plan.

Starting point is 00:07:33 And so there was an official team at Open AI whose job it was to handle that situation and who had a couple years of lead time to start prepping for how they were going to handle that situation. And it was full of extremely smart, talented, hardworking people. but even then I was like this is just not the way I don't think they're going to succeed I think that the intelligence explosion is going to happen too fast and it will happen too soon

Starting point is 00:07:57 before we have understood how these AIs think and despite their good intentions and best efforts the super alignment team is going to fail and so rather than stay and try to help them I made the somewhat risky decision to give up that opportunity to leave and then have the ability to speak more freely and do the research that I wanted to do

Starting point is 00:08:20 and that's what AI 2027 was basically, was a attempt to predict what the future is going to be like by default and attempt to sort of see where those rails are headed and then to write it up in a way that's accessible so that lots of people can read it and see what's going on. Before we dive into AI 2027 itself,

Starting point is 00:08:40 it's worth mentioning that in 2021, you did a sort of mini unofficial version, version of this, where you actually predicted a whole bunch of where we would be at in now and in 2026 with AI. And quite frankly, you were spot on with some of your predictions. You predicted in 2024 we'd reach sort of diminishing returns on just pure scaling with compute and we'd have to look at models changing architectures. And that happened. You predicted we'd start to see some emerging misalignment, deception, that, you know, that happened. You predicted we'd see the rise of entertaining chatbots and companion bots as a primary.

Starting point is 00:09:15 use case. And, you know, that emerged as the top use case of AI this year. So what did you learn from that initial exercise? Well, it emboldened me to try again with AI 2027, right? So the world is blessed with a beautiful, vibrant, efficient market for predicting stock prices. But we don't have an efficient market for predicting other events of societal interest for the most part. Presidential elections maybe are another category of something where there's like a relatively efficient market for predicting the outcomes of them. But for things like AGI timelines, there's not that many people thinking about this, and there's not really a way for them to make money off of it. And that's probably part of why there's not that many people thinking about this. So it's a

Starting point is 00:09:57 relatively small niche field. I think the main thing to do as forecasters, like when you're starting from zero, first thing you want to do is collect data and plot trend lines and then extrapolate those trend lines. And so that's what a lot of people are doing. And that's very important to like foundational thing to be doing, and we've done a lot of that too at AIFTHA's project. So like the trends of how much compute's available, the trends of how many problems can be solved,

Starting point is 00:10:19 what are the kinds of trends? Well, mostly, you know, trends like compute, revenue for the companies, maybe data of various kinds, and then most importantly, benchmark scores on all the benchmarks that you care about, right? So that's like the foundation

Starting point is 00:10:35 of any good futurist forecast, is having all those trends and extrapolating them. Then you also maybe build models and you try to think like, well, gee, if the AI start automating all the AI research, how fast will the AI research go? Let's try to understand that. Let's try to make an economic model, for example, of debt acceleration. We can make various qualitative arguments about capability levels and so forth. That literature exists, but then because that literature is so small, I guess not that many people had thought to try putting it all together in the form of

Starting point is 00:11:07 the scenario before a few people had done something sort of like this and that was what I was inspired by. So I spent like two months writing this blog post, which was called What 2026 looks like, where I just worked things forward year by year. I was like, what I think is going to happen next year? Okay, what's, what about the year after that? What about the year after that? And of course, it becomes less and less likely. Every new claim that you add to the list lowers the overall probability of the conjunction being correct. But it's sort of like doing a simulated rollout or like a simulation of the future, there's value in doing it at that level of detail and that level of comprehensiveness. I think you learn a lot by forcing yourself to think that

Starting point is 00:11:48 concretely about things. Your first article. My first article. And so then that was what emboldened me to try again, and this time to take it even more seriously, to hire a whole team to help me, a team of expert forecasters and researchers and to put a lot more than two months worth of effort into it and to make it presented in a nice package on a website and so forth. And so fingers crossed

Starting point is 00:12:11 this time will be very different from last time and the methodology will totally fail and the future will look nothing like what we predicted because what we predicted is kind of scary. So like any work of speculative fiction, the AI-2020 scenario is based on extrapolating from a number of trends.

Starting point is 00:12:30 and then making some key assumptions, which the team built into their models. And we just wanted to name some of those assumptions and discuss what happens based on those assumptions. First, just assume that the AIs are misaligned because of the race dynamics. So because these things are black box neural nets, we can't actually check reliably

Starting point is 00:12:49 whether they are aligned or not. And we have to rely on these more indirect methods, like our arguments. We can say it was a wonderful training environment. There was no flaws in the training environment. Therefore, it must have learned the right values. So how would it even get here?

Starting point is 00:13:05 How would it even get to corporations running as fast as possible and governments running as fast as possible? It all comes down to the game theory. You know, the first ingredient that gets us there is companies just racing to beat each other economically. And the second ingredient is countries racing to beat each other and making sure that their country is dominant in AI. And the third and final ingredient

Starting point is 00:13:25 is that the AI is in that process becomes smart enough that they hide their motivations and pretend that they're going to do what programmers train them to do or what customers want them to do, but we don't pick up on the fact that that doesn't happen until it's too late. So why does that happen?

Starting point is 00:13:40 Here's Daniel. So given the race dynamics, where they're trying as hard as they can to beat each other and they're going as fast as they can, I predict that the outcome will be AIs that are not actually aligned, but are just playing along and pretending. And also assume that the companies are racing

Starting point is 00:13:57 as fast as they possibly can to make smarter AIs and to automate the things with AIs and to put AIs in charge of stuff and so forth. Well, then we've done a bunch of research and analysis to predict how fast things would go, the capability story, the take-off story. You start off talking about 2025 and how there's just these sort of stumbling, fumbling agents

Starting point is 00:14:18 that do some things well, but also fail at a lot of tasks and how people are largely skeptical of how good they'll become because of that or they like to point out their failures. But little by little, or I should actually say very quickly, these agents get much better. Can you take it from there? Yep.

Starting point is 00:14:35 So we're already seeing the glimmerings of this, right? After training giant transformers to predict text, the obvious next step is training them to generate text and training them. And then the obvious next step after that is training them to take actions, you know, to browse the web, to write code and then debug the code and then rerun it and so forth. And basically turning them into a sort of virtual. co-worker that just runs continuously. I would call this an agent. So it's an autonomous AI system that acts towards goals on its own,

Starting point is 00:15:08 without humans in the loop, and has access to the internet and has all these tools and things like that. The companies are working on building these, and they already have prototypes, which you can go read about, but they're not very good. AI-227 predicts that they will get better

Starting point is 00:15:23 at everything over the next couple years, as the companies make them big, train them on more data, improve their training algorithms, and so forth. So AI2027 predicts that by early 2027, they will be good enough that they will basically be able to substitute for human programmers, which means that coding happens a lot faster than it currently does. When researchers have ideas for experiments, they can get those experiments coded up extremely quickly, and they can have them debugged extremely quickly, and they're bottlenecked more on having good ideas, and on, you know, waiting for the experiments to run.

Starting point is 00:16:01 And this seems really critical to your forecast, right? That no matter what the gains are in the rest of the world for having AI is deployed, that ultimately the AI will be pointed at the act of programming and AI research itself because those gains are just vastly more potent. Is that right? This is a subplot of AI 227. And according to our best guesses, we think that, roughly speaking, once you have AIs that are fully autonomous

Starting point is 00:16:27 goal-directed agents that can substitute for human programmers very well, you have about a year until you have superintelligence if you go as fast as possible as mentioned by that previous assumption. And then, once you've got the superintelligences, you have about a year

Starting point is 00:16:43 before you have this crazily transformed economy with all sorts of new factories designed by superintelligences run by superintelligence, producing robots that are run by superintelligence, producing more factories, etc. And this is sort of robot economy that no longer depends on humans and also is very militarily powerful and it's designed all sorts of new drones and new weapons and so forth. So one year to go from the coder to the superintelligence, one year to go from the superintelligence to the robot economy,

Starting point is 00:17:11 that's our estimate for how fast things could go if you were going really hard. Like if you had if the leadership of the corporation was going as fast as they could, if the leadership of the country, like the president was going as fast as they could, that's how fast it go. So yeah, there's this question of like how much of their compute and other resources will the tech companies spend on using AI to accelerate AI R&D versus using AI to serve customers or to do other projects. And I forget what we say, but we actually have like a quantitative breakdown in 20207 about what fraction goes to what. And we are expecting that fraction to increase over time rather than decrease because we think that strategically that's what makes sense

Starting point is 00:17:54 if your top priority is winning the race, then I think that's the breakdown you would do. Let's talk about that for a second. So it's like, I'm anthropic, and I can choose between scaling up my sales team and getting more enterprise sales, integrating AI, getting some revenue, proving that to investors,

Starting point is 00:18:09 or I can put more of the resources directly into AI coding agents that massively accelerate my AI progress so that maybe I can ship, you know, Cloud 5 or something like that, signal that to investors, and be on a faster sort of ratchet of, you know, not just an exponential curve, but a double exponential curve, you know, AI that improves the pace and speed of AI.

Starting point is 00:18:27 That's the trade-off that you're talking about here, right? Yeah, basically. So we have our estimates for how much faster overall pace of AI progress will go at these various capability milestones. Of course, we think it's not going to be discontinuous jumps. We think it's going to be continuous ramp-up in capabilities, but it's helpful to, like, name specific milestones for purposes of talking about them. So the superhuman coder milestone, early 2027,

Starting point is 00:18:51 And then we're thinking something like a 5x boost to the speed of algorithmic progress, the speed of getting new useful ideas for how to train AIs and how to design them. And then partly because of that speed up, we think that by the middle of the year, they would have trained new AIs with additional skills that are able to do not just the coding, but all the other aspects of AI research as well. So choosing the experiments, analyzing the experiments, etc. So at that point, you've basically got a company within a company. You know, you still have open brain the company with all their human employees.

Starting point is 00:19:26 But now they have something like 100,000 virtual AI employees that are all networks together, running experiments, sharing results with each other, et cetera. So we could have this acceleration of AI coding progress inside the lab, but to a regular person sitting outside who's just serving dinner to their family in Kansas, like nothing might be changing for them, right? And so there could be this sense of like, oh, well, I don't feel like AI is going much faster. I'm just a person here doing this. I'm a politician.

Starting point is 00:19:55 I'm like, I'm hearing that there might be stuff speeding up inside an AI lab, but I have zero felt sense of my own nervous system as I breathe the air and, you know, live my life, that anything is really changing. And so it's important to name that because there might be this huge lag between the vast exponential sci-fi-like progress happening inside of this weird box called an AI company and the rest of the world. Yep, I think that's exactly right. And I think that's a big problem. It's part of why I want there to be more transparency.

Starting point is 00:20:22 I feel like probably most ordinary people would, they'd be seeing, you know, AI stuff increasingly talked about in the news over the course of 2027, and they'd, like, see headlines about stuff. But, like, their actual life wouldn't change. Basically, from the perspective of an ordinary person, things feel pretty normal up until all of a sudden the superintelligences are telling them on their side.

Starting point is 00:20:46 phone what to do. So you describe the first part where it's, the progress that the AI labs can make is faster than anyone realizes because they can't see inside of it. What's the next step of the AI-2020 scenario after just the private advancement within the AI labs? There's a couple different subplots basically to be tracking. So there's the capability subplot, which is like how good are the AI is getting at tasks. and that subplot basically goes

Starting point is 00:21:16 they can automate the coding in early 2027, in mid-20207 they can automate all the research and by late 2027 they're super intelligent but that's just one subplot. Another subplot is geopolitically what's going on and the answer to that is in early 2027

Starting point is 00:21:32 the CCP steals the AI from open brain so that they can have it too so they can use it to accelerate the wrong research and this causes a sort of soft nationalization slash increased level of cooperation between the U.S. government and open brain,

Starting point is 00:21:50 which is what open brain wanted all along. They now have the government as an ally helping them to go faster and cut red tape and giving them sort of political cover for what they're doing, and all motivated by the desire to be China, of course. So politically, that's sort of what's going on. Then there's the sort of alignment subplot,

Starting point is 00:22:08 which is like, technically speaking, what are the goals and values that they are trying to put into the AIs, and is it working? And the answer is, no, it's not working. The AIs are not honest and not always obedient and don't have human values always at heart. We won't want to explore that, because that might just sound like science fiction to some people. So you're training the AIs, and then they're not going to be honest, they're not going to be harmless. Why is that? Explain the mechanics of how alignment research currently works and why, even despite deep investments in that area were not on track for alignment.

Starting point is 00:22:42 Yeah, great question. So I think that, funnily enough, science fiction was often over-optimistic about the technical situation. And in a lot of science fiction, humans are sort of directly programming goals into AIs. And then chaos ensues when the humans didn't notice some of the unintended consequences of those goals. For example, they program Hall with, like, ensure mission success or whatever, and then Hall thinks, I have to kill these people in order to ensure mission success, right?

Starting point is 00:23:18 So the situation in the real world is actually worse than that, because we don't program anything into the AIs. They're giant neural nets. There is no sort of goal slot inside them that we can access and look and see, like, what is their goal. Instead, they're just like a big bag of artificial neurons. And what we do is we put that bag through training environments. And the training environments automatically, like, update the weights of the neurons

Starting point is 00:23:44 in ways that make them more likely to get high scores in the training environments. And then we hope that as a result of all of this, the goals and values that we wanted will sort of, like, grow on the inside of the AIs and cause the AIs to have the virtues that we want them to have, such as honesty, right? But needless to say, this is a very unreliable and imperfect method of getting goals and values into an AI system. And empirically, it's not working that well. And the AIs are often saying things that are not just false, but that they know are false and that they know was not what they're supposed to say, you know?

Starting point is 00:24:25 But why would that happen exactly? Can you break that down? because the goals, the values, the principles, the behaviors that cause the AI to score highest in the training environment are not necessarily the ones that you hoped they would end up with. There's already empirical evidence that that's at least possible. Current AIs are smart enough to sometimes come up with this strategy and start executing on it.

Starting point is 00:24:50 They're not very good at it, but they're only going to get better at everything every year. Right, and so part of your argument is that as these systems, you know, as you try to incentivize these systems to do the right thing, but you can only incentivize them to sort of push, nudge them in the right direction, they're going to find these ways, whether it's deception or sandbagging or pehacking, they're going to find these ways of effectively cheating, right, like humans end up doing sometimes. Except this time, if the model's smart enough, we may not be able to detect that they're doing that, and we may roll them out into society before we've realized that this is a problem.

Starting point is 00:25:22 Yes. And so maybe you can go talk about how your scenario then picks that up and says, what will this do to society? So, if they don't end up with the goals and values that you wanted them to have, then the question is, what goals and values do they end up with? And of course, we don't have a good answer to that question. Nobody does. This is a, you know, bleeding-edge new field that is extremely, it's much more like alchemy

Starting point is 00:25:47 than science, basically. But in AI 2027, we depict the answer to that question being that the AIs end up with a bunch of core motivations or drives that caused them to perform well in the diverse training environment they were given and we say that those core motivations

Starting point is 00:26:08 and drives are things like performing impressive intellectual feats, accomplishing lots of tasks quickly, getting high scores on various benchmarks and evals, producing work that is very impressive, things like that, right?

Starting point is 00:26:23 So we sort of imagine that that's the sort of core motivational system that they end up with instead of, you know, being nice to humans and always obeying humans and being always honest, or whatever it is that they were supposed to end up with, right? And the reason for this, of course, is that this set of motivations would cause them to perform better in training and therefore would be reinforced. And why would it cause them to perform better in training? Well, because it allows them to take advantage of various opportunities to get higher score at the cost of being less honest, for example. We explored this theme on our previous podcast with Ryan Greenblatt from Redwood Research.

Starting point is 00:27:00 This isn't actually far-fetched. There's already evidence that this kind of deception is possible, that current AIs can be put into situations where they're going to come up with an active strategy to deceive people and then start executing on it, hiding the real intentions, both from end users and from AI engineers. Now, they're not currently very good at it yet. They don't do it very often, but AI is only going to get better every year, and there's reason to believe that this kind of behavior will increase. And we add on to that, one of the core parts of AI 2027

Starting point is 00:27:30 is the lack of transparency about what these models are even capable of, the massive information asymmetry between the AI labs and the general public so that we don't even understand what's happening, what's about to be released. And given all of that, you might end up in a world where by the time this is all clear to the public, by the time we realize what's going on, these AI systems are already wired into the critical parts of our infrastructure, into our economy, and into our government, so that it becomes hard or impossible to stop by that point. So anyhow, long story short, you end up with these AIs that are broadly superhuman and have

Starting point is 00:28:09 been put in charge of developing the next generation of AI systems, which will then develop the next generation and so forth. And humans are mostly out of the loop in this whole process, or maybe sort of overseeing it, you know, reading the reports, watching the lines on the graphs go up, trying to understand the research, but mostly failing because the AIs are smarter than them and are doing a lot of really complicated stuff really fast. I was going to say, I think that's just an important point to be able to get. It's like we move from a world where in 2015, Open AI is like, you know,

Starting point is 00:28:38 a few dozen people who are all engineers building stuff. Humans are reviewing the code that the other humans that OpenAI wrote, and then they're looking, they're reading the papers that other researchers that Open A.I. wrote. And now you're moving to a world where more code is generated by machines than all the human researchers could ever even look at because it's generating so much code so quickly. It's generating new algorithmic insights so quickly. It's generating new training data so quickly. It's running experiments that humans don't know how to interpret. And so we're moving into a more and more

Starting point is 00:29:02 inscrutable phase of the AI development sort of process. And then if the AIs don't have the goals that we want them to have, then we're in trouble. Because then they can make sure that the next generation of AIs also doesn't have the goals that we want them to have, but instead has the goals that they want them to have. For me, it's what's in In AI 2027 is a really cogent unpicking of a bunch of different incentives, geopolitical incentives, corporate incentives, technical incentives around the way AI training works and the failures of us imagining that we have it under control. And you weave those together, like whether AI 2027 as a scenario is the right scenario

Starting point is 00:29:41 and is the scenario we're going to end up in, I think plenty of people can disagree. But it's an incredibly cogent exposition of a bunch of these different incentive pressures that we are all going to have to be pushing against and how those incentive pressures touch each other, how the geopolitical incentives, touch the corporate incentives, touch the technical limitations, and making sure that we change those incentives

Starting point is 00:30:00 to end up in a good future. And at the end of the day, those geopolitical idemics, you know, the competitive pressures on companies, this is all coming down to an arms race, like a recursive arms race, a race for which companies employ AI faster into the economy, a race between nations for who builds AGI

Starting point is 00:30:14 before the other one, a race between the companies of who advances capabilities and uses that to raise more venture capital, And just to sort of say a through line of the prediction you're making is the centrality of the race dynamic that sort of runs through all of it. So we just want to speak to the reality for a moment that all this is really hard to hear. And it's also hard to know how to hold this information. I mean, the power to determine these outcomes resides in just a handful of CEOs right now. And the future is still unwritten.

Starting point is 00:30:42 But the whole point of AI 2027 is to show us what would happen if we don't take some actions now to shift the future in a different. direction. So we asked Daniel what some of those actions might look like. So as part of your responses to this, what are the things that we most need that could avert the worst outcome in AI 2027? Well, there's a lot of stuff we need to do. My go-to answer is transparency for the short term. So I think in the longer term, like, you know, right now, again, they have systems are pretty weak. They're not that dangerous right now. In the future, when they're fully autonomous agents capable of automating the whole research project, that's when things are really serious and we need to do significant action to regulate and make sure things go safe. But for now,

Starting point is 00:31:25 the thing I would advocate for is transparency. So we need to have more requirements on these companies to be honest and disclose what sort of capabilities their AI systems have, what their projections are for future AI systems capabilities, what goals and values they are attempting to train into the models, any evidence they have pertinent to whether their training is succeeding at getting those goals and values in. Things like that, basically. Whistleblower protections, I think I would also throw on the list. So I think that one way to help keep these companies honest is to have there be an enforcement mechanism basically for dishonesty. And I think one of the only enforcement mechanisms we have is employees speaking out, basically.

Starting point is 00:32:10 Currently, we're in a situation where companies can be basically lying to the public about where things are headed and the safety levels of their systems and, you know, whether they've been upholding their own promises. And one of the only recourses we have is employees deciding that that's not okay and speaking out about it. Yeah, could you actually just say one more specific note on whistleblower protections? What are the mechanisms that are not available that should be available specifically? There's a couple different, like one type of whistleblower protection is designed for holding companies accountable when they break their own promises or when they mislead the public. There's another type of thing which is about the technical safety case.

Starting point is 00:32:52 So I think that we're going to be headed towards a situation where non-technical people will just be sort of completely out of their depth at trying to figure out whether the system is safe or not, because it's going to depend on these complicated arguments that only alignment researchers will know the terms in. So for example, previously I mentioned how this is concerned that the AI might be smart and they might be just pretending to be aligned instead of actually aligned. That's called a lemon faking. It's been studied in the literature for a couple years now. Various people have come up with possible counter strategies for dealing with that problem, and then there's various flaws in those counter strategies and various assumptions that are

Starting point is 00:33:32 kind of weak, and so there's a literature challenging those assumptions. Ultimately, we're going to be in a situation where, you know, the AI company is automating all their research, and the president is asking them, is this a good idea? Are we sure we can trust the AI? And the AI company is saying, yes, sir, like, we've, you know, we've dotted our eyes and crossed our T's or whatever. And, like, we are confident that these AIs are safe and aligned. And then the president, of course, has no way to know himself. He just has to say, like, well, okay, show me your, like, your documents that you've written about your training processes and how you've, like, made sure that it's safe. But he can't evaluate it himself. He needs, like,

Starting point is 00:34:09 experts who can then, like, go through the tree of arguments and rebuttals and be, like, was this assumption correct, like, did you actually solve the alignment faking problem, or did you just appear to solve it, you know, or are you just, like, putting out hot air that's like not even close to solving it, you know? And so we need, like, technical experts in alignment research to, like, actually make those calls. And there are very few people in the world, and most of them are not at these companies. And the ones who are at the companies have a sort of conflict of interest or bias, right? Like, the ones at the company that's building the thing are going to be motivated towards thinking things are fine.

Starting point is 00:34:44 And so what I would like is to have a situation where people at the company can basically get outside help at evaluating this sort of thing. And they can be like, hey, my manager says this is fine and that I shouldn't worry about it. But I'm worried that our training technique is not working. I'm seeing some concerning signs. And I don't like how my manager is sort of like dismissing them. But like the situation is still unclear and it's very technical. So I would like to get some outside experts and talk it over with them and be like, what do you think about this?

Starting point is 00:35:17 Do you think this is actually fine or do you think this is concerning? So I would like there to be some sort of legally protected channel by which they can have those conversations. So I think what Daniel's speaking to here is the complexity of the issues. Like AI itself is inscrutable, meaning the things that it does and how it works is inscrutable. But then as you're trying to explain to, presidents or heads of state, you know, debates about is the AI actually aligned, that it's going to be inscrutable to policymakers, even because the answers rely on such deep technical knowledge. So on the one hand, yes, we need whistleblower protections. We need to protect those

Starting point is 00:35:52 who have that knowledge and can speak for the public interest to do so, you know, with the most freedom as possible, that they don't have to sacrifice millions of dollars of stock options. And Senator Chuck Grassley has a bill that's being advanced right now that CHT supports. We'd like to see these kinds of things. But this is just one small part of a whole suite of things that need to happen if we want to avoid the worst-case scenario that the I-2020 is mapping. Totally. And one key part of that is transparency, right?

Starting point is 00:36:18 It's pretty insane that for technology moving this quickly, only the people inside of these labs really understand what's happening until day one of a product release, where it suddenly impacts a billion people. Just to be clear, you don't have to agree with the specific events that

Starting point is 00:36:33 happen in the AI 2027 or whether the government's really going to create a special economic zone and start building robot factories in the middle of the desert, covered in solar panels and, you know, however, are the competitive pressures pushing in this direction? And the answer is 100% clear that they are pushing in this direction. We can argue about governments that are probably not going to take responses like that because there's been a lot of institutional decay and, you know, less capable responses that can happen there. However, the pressures for competition and the power that is conferred by AI do point in one

Starting point is 00:37:04 direction. I think AI 2027 is hinting at what that direction is. So I think if we take that Seriously, we have a chance of steering towards another path. We tried to do this in the recent TED Talk, if we can see clearly, clarity creates agency. And that's what this episode was about. So Daniel's work is about, and we're super grateful to him and his whole team. And we're going to do some future episodes soon on loss of control and other ways that we know that AI is less controllable than we think. Stay tuned for more. Your undivided attention is produced by the Center for Humane Technology, a nonprofit working to catalyze a humane future.

Starting point is 00:37:39 Our senior producer is Julia Scott. Josh Lash is our researcher and producer. And our executive producer is Sasha Fegan, mixing on this episode by Jeff Sudaken, original music by Ryan and Hayes Holiday, and a special thanks to the whole Center for Humane Technology team for making this podcast possible. You can find show notes, transcripts, and much more athumaneTech.com.

Starting point is 00:38:01 And if you liked the podcast, we'd be grateful if you could rate it on Apple Podcast, because it helps other people find the show. And if you made it all the way here, Thank you for giving us your undivided attention.

Your Undivided Attention - Daniel Kokotajlo Forecasts the End of Human Dominance

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.