Your Undivided Attention - Daniel Kokotajlo Forecasts the End of Human Dominance
Episode Date: July 17, 2025In 2023, researcher Daniel Kokotajlo left OpenAI—and risked millions in stock options—to warn the world about the dangerous direction of AI development. Now he’s out with AI 2027, a forecast of ...where that direction might take us in the very near future. AI 2027 predicts a world where humans lose control over our destiny at the hands of misaligned, super-intelligent AI systems within just the next few years. That may sound like science fiction but when you’re living on the upward slope of an exponential curve, science fiction can quickly become all too real. And you don’t have to agree with Daniel’s specific forecast to recognize that the incentives around AI could take us to a very bad place.We invited Daniel on the show this week to discuss those incentives, how they shape the outcomes he predicts in AI 2027, and what concrete steps we can take today to help prevent those outcomes. Your Undivided Attention is produced by the Center for Humane Technology. Follow us on X: @HumaneTech_. You can find a full transcript, key takeaways, and much more on our Substack.RECOMMENDED MEDIAThe AI 2027 forecast from the AI Futures ProjectDaniel’s original AI 2026 blog post Further reading on Daniel’s departure from OpenAIAnthropic recently released a survey of all the recent emergent misalignment researchOur statement in support of Sen. Grassley’s AI Whistleblower bill RECOMMENDED YUA EPISODESThe Narrow Path: Sam Hammond on AI, Institutions, and the Fragile FutureAGI Beyond the Buzz: What Is It, and Are We Ready?Behind the DeepSeek Hype, AI is Learning to ReasonThe Self-Preserving Machine: Why AI Learns to DeceiveClarification: Daniel K. referred to whistleblower protections that apply when companies “break promises” or “mislead the public.” There are no specific private sector whistleblower protections that use these standards. In almost every case, a specific law has to have been broken to trigger whistleblower protections.
Transcript
Discussion (0)
Open AI, Anthropic, and to some extent GEM are explicitly trying to build super
intelligence to transform the world.
And many of the leaders of these companies, many of the researchers at these companies,
and then hundreds of academics and so forth in AI have all signed the statement saying
this could kill everyone.
And so we've got these important facts that people need to understand.
These people are building superintelligence, what does that even look like, and how could that possibly result in killing us all?
We've written this scenario depicting what that might look like.
It's actually my best guess as to what the future will look like.
Hey, everyone, this is Tristan Harris.
And this is Daniel Barquet.
Welcome to your undivided attention.
So a couple months ago, AI researcher and futurist Daniel Kokatello and a team of experts at the AI Futures Project released a document
online called AI 2027. And it's a work of speculative futurism that's forecasting two
possible outcomes of the current AI arms race that we're in. And the point was to lay out this
picture of what might realistically happen if the different pressures that drove the AI race
all went really quickly and to show how those different pressures interrelate. So how economic
competition, how geopolitical intrigue, how acceleration of AI research, and the inadequacy
of AI safety research, how all those things come together to produce a radically different future
that we aren't prepared to handle and are even prepared to think about.
So in this work, there's two different scenarios, and one's a little bit more hopeful than the
other, but they're both pretty dark. I mean, one ends with a newly empowered, super-intelligent
AI that surpasses human intelligence in all domains and ultimately causing the end of human life
on Earth. So Tristan, what was it like for you to read this document?
Well, I feel like the answer to that question has to start with a deep breath.
I mean, it's easy to just go past that last thing we just read, right?
It's just ultimately causing the end of human life on Earth.
And I wish I could say that this is a total embellishment, this is exaggeration, this is, you know, just alarmism, chicken little.
But, you know, being in San Francisco talking to people in the AI community and people who have been in this field for a long time, they do think about this.
in a very serious way.
I think one of the challenges with this report,
which I think really does a brilliant job
of outlining the competitive pressures
and the steps that push us to those kinds of scenarios.
I think the thing for most people's
when they hear the end of human life on Earth,
they're like, what is the AI going to do?
It's just a box sitting there, computing things.
If it's going to do something dangerous,
don't we just pull the plug on the box?
And I think that's what's so hard about this problem
is that the ways in which something
that is so much smarter than you
could end life on Earth is just outside of it.
Imagine like Chimp,
birth a new species called Homo sapiens.
And they're like, okay, well, this is going to be, like, a smarter version of us.
But, like, what's the worst thing it's going to do?
It was going to steal all the bananas?
And, like, you can't imagine computations, semiconductors, drones, airplanes, nuclear weapons.
Like, from the perspective of a chimpanzee, your mind literally can't imagine past, like,
someone taking all the bananas.
So I think there's a way in which, you know, this whole domain is fraught with just a difficulty
of imagination and also of kind of not dissociating.
or delegitimizing or nervous laughtering
or kind of bypassing a situation
that we have to contend with
because I think the premise of what Daniel did here
is not to just scare everybody,
it's to say if the current path is heading
this direction, how do we clarify that
so much so we can choose a different path?
Yeah, you know, when you're reading
a report that is this dark and this scary,
it's possible to have so many different reactions to this.
Oh my God, is it true? Is it really going to move this fast?
Are these people just sort of
in sci-fi land? But I think
the important part of sitting with this is not
is the timeline right? It's how all these different incentives, the geopolitical incentives,
the economic pressures, how they all come together. And we could do a step by step of the story,
but there's so many different dynamics. There's dynamics of how AI accelerates AI research itself,
and dynamics of how we lean more on AI to train the next generation of AI, and we begin to
lose understandability and control on AI development itself. There's geopolitical intrigue on how
China ends up stealing AIs from the U.S. or how China ends up realizing that it needs to centralize
its data centers where the U.S. has more lax security standards.
You know, we recognize that this can be a lot to swallow, and it can really seem like
a work of pure fiction or fantasy.
But these scenarios are based on real analysis of the game theory and how different people
might act.
But there are some assumptions in here, right?
There are critical assumptions that decisions that are made by corporate actors or geopolitical
actors are really the decisive ones, that citizens everywhere may not have a meaningful chance
to push back on their autonomy being given away to suit.
super-intelligence. And you know, AI timelines are incredibly uncertain, and the pace of AI
2027 as a scenario is one of the more aggressive predictions that we've seen. But to reiterate,
the purpose of AI 2027 was to show how quickly this might happen. Now, Daniel himself has
already pushed back his predictions by a year. And as you'll hear in the conversation,
he acknowledges the uncertainties here, and he sees them as far from being a sure thing.
I think that Daniel and CHT really share a deep intention here, which is that if we're unclear about
which way the current tracks of the future take us, then we'll lead to an unconscious future.
And in this case, we need to paint a very clear picture of how the current incentives and
competitive pressures actually take us to a place that no one really wants, including between
the U.S. and China. And we at CHT hope that policymakers and titans of industry and civil society
will take on board the clarity about where these current train tracks are heading and ask,
do we have the adequate protections in place to avoid this scenario? And if we don't,
then that's what we have to do right now.
Daniel, welcome to your undivided attention.
Thanks for having me.
So just to get started, could you just let us know a little bit about who you are and your background?
Prior to AI features, I was working at OpenAI doing a combination of forecasting, governance, and alignment research.
Prior to OpenAI, I was at a series of small research nonprofits thinking about the future of AI.
Prior to that, I was studying philosophy in grad school.
I just want to say that when I first met you, Daniel,
at a sort of a community of people who work on future AI issues and AI safety,
you're working at open AI at the time.
And I thought to myself, I think you even said, actually, when we met,
because you said that basically, if things were to go off the rails,
you would leave open AI,
and you would basically do whatever would be necessary for this to go well,
for society and humanity.
And I consider you to be someone of very deep integrity,
because you ended up doing that
and you forfeited millions of dollars of stock options
in order to warn the public about a year ago
in a New York Times article.
And I just wanted to let people know about that in your background
that you're not someone who's trying to get attention.
You're someone who cares deeply about the future.
You want to talk a little bit about that choice, by the way?
Was that hard for you to leave?
I don't think that I left because things had went off the rails
so much as I left because it seemed like the rails that we were on
were headed to a bad place.
And in particular, I left because I thought that something like what's depicted in AI 2027 would happen.
And that's just like basically the implicit and in some cases explicit plan of Open AI
and also to some extent these other companies.
And I think that's an incredibly dangerous plan.
And so there was an official team at Open AI whose job it was to handle that situation
and who had a couple years of lead time to start prepping for how they were going to handle that situation.
And it was full of extremely smart, talented, hardworking people.
but even then I was like this is just not the way
I don't think they're going to succeed
I think that the intelligence explosion
is going to happen too fast
and it will happen too soon
before we have understood how these AIs think
and despite their good intentions and best efforts
the super alignment team is going to fail
and so rather than stay and try to help them
I made the somewhat risky decision
to give up that opportunity to leave
and then have the ability to speak more freely
and do the research that I wanted to do
and that's what AI 2027 was basically,
was a attempt to predict
what the future is going to be like by default
and attempt to sort of see where those rails are headed
and then to write it up in a way that's accessible
so that lots of people can read it
and see what's going on.
Before we dive into AI 2027 itself,
it's worth mentioning that in 2021,
you did a sort of mini unofficial version,
version of this, where you actually predicted a whole bunch of where we would be at in now and in
2026 with AI. And quite frankly, you were spot on with some of your predictions. You predicted
in 2024 we'd reach sort of diminishing returns on just pure scaling with compute and we'd
have to look at models changing architectures. And that happened. You predicted we'd start to see
some emerging misalignment, deception, that, you know, that happened. You predicted we'd see
the rise of entertaining chatbots and companion bots as a primary.
use case. And, you know, that emerged as the top use case of AI this year. So what did you learn
from that initial exercise? Well, it emboldened me to try again with AI 2027, right? So the world
is blessed with a beautiful, vibrant, efficient market for predicting stock prices. But we don't
have an efficient market for predicting other events of societal interest for the most part.
Presidential elections maybe are another category of something where there's like a relatively
efficient market for predicting the outcomes of them. But for things like AGI timelines, there's not
that many people thinking about this, and there's not really a way for them to make money off of it.
And that's probably part of why there's not that many people thinking about this. So it's a
relatively small niche field. I think the main thing to do as forecasters, like when you're
starting from zero, first thing you want to do is collect data and plot trend lines and then extrapolate
those trend lines. And so that's what a lot of people are doing. And that's very important to
like foundational thing to be doing,
and we've done a lot of that too
at AIFTHA's project.
So like the trends of how much compute's available,
the trends of how many problems can be solved,
what are the kinds of trends?
Well, mostly, you know,
trends like compute,
revenue for the companies,
maybe data of various kinds,
and then most importantly,
benchmark scores on all the benchmarks that you care about, right?
So that's like the foundation
of any good futurist forecast,
is having all those trends and extrapolating them.
Then you also maybe build
models and you try to think like, well, gee, if the AI start automating all the AI research,
how fast will the AI research go? Let's try to understand that. Let's try to make an economic
model, for example, of debt acceleration. We can make various qualitative arguments about
capability levels and so forth. That literature exists, but then because that literature is so
small, I guess not that many people had thought to try putting it all together in the form of
the scenario before a few people had done something sort of like this and that was what I
was inspired by. So I spent like two months writing this blog post, which was called What
2026 looks like, where I just worked things forward year by year. I was like, what I think is
going to happen next year? Okay, what's, what about the year after that? What about the year after
that? And of course, it becomes less and less likely. Every new claim that you add to the list
lowers the overall probability of the conjunction being correct. But it's sort of like doing a
simulated rollout or like a simulation of the future, there's value in doing it at that level of
detail and that level of comprehensiveness. I think you learn a lot by forcing yourself to think that
concretely about things. Your first article. My first article. And so then that was what emboldened me
to try again, and this time to take it even more seriously, to hire a whole team to help me, a team of
expert forecasters and researchers
and to put a lot more
than two months worth of effort into it and to make
it presented in a nice package
on a website and so forth. And so
fingers crossed
this time will be very different
from last time and the methodology will totally
fail and the future will look nothing like what we
predicted because
what we predicted is kind of scary.
So like any work of speculative fiction,
the AI-2020 scenario is based
on extrapolating from a number of trends.
and then making some key assumptions,
which the team built into their models.
And we just wanted to name some of those assumptions
and discuss what happens based on those assumptions.
First, just assume that the AIs are misaligned
because of the race dynamics.
So because these things are black box neural nets,
we can't actually check reliably
whether they are aligned or not.
And we have to rely on these more indirect methods,
like our arguments.
We can say it was a wonderful training environment.
There was no flaws in the training environment.
Therefore, it must have learned
the right values.
So how would it even get here?
How would it even get to corporations running as fast as possible
and governments running as fast as possible?
It all comes down to the game theory.
You know, the first ingredient that gets us there
is companies just racing to beat each other economically.
And the second ingredient is countries racing to beat each other
and making sure that their country is dominant in AI.
And the third and final ingredient
is that the AI is in that process
becomes smart enough that they hide their motivations
and pretend that they're going to do
what programmers train them to do
or what customers want them to do,
but we don't pick up on the fact that that doesn't happen
until it's too late.
So why does that happen?
Here's Daniel.
So given the race dynamics,
where they're trying as hard as they can
to beat each other and they're going as fast as they can,
I predict that the outcome will be AIs
that are not actually aligned,
but are just playing along and pretending.
And also assume that the companies are racing
as fast as they possibly can
to make smarter AIs and to automate the things with AIs
and to put AIs in charge of stuff and so forth.
Well, then we've done a bunch of research and analysis
to predict how fast things would go,
the capability story, the take-off story.
You start off talking about 2025
and how there's just these sort of stumbling, fumbling agents
that do some things well,
but also fail at a lot of tasks
and how people are largely skeptical
of how good they'll become because of that
or they like to point out their failures.
But little by little, or I should actually say very quickly, these agents get much better.
Can you take it from there?
Yep.
So we're already seeing the glimmerings of this, right?
After training giant transformers to predict text, the obvious next step is training them to generate text and training them.
And then the obvious next step after that is training them to take actions, you know, to browse the web, to write code and then debug the code and then rerun it and so forth.
And basically turning them into a sort of virtual.
co-worker that just runs continuously.
I would call this an agent.
So it's an autonomous AI system
that acts towards goals on its own,
without humans in the loop,
and has access to the internet
and has all these tools and things like that.
The companies are working on building these,
and they already have prototypes,
which you can go read about,
but they're not very good.
AI-227 predicts that they will get better
at everything over the next couple years,
as the companies make them big,
train them on more data, improve their training algorithms, and so forth.
So AI2027 predicts that by early 2027, they will be good enough that they will basically be able to substitute for human programmers,
which means that coding happens a lot faster than it currently does.
When researchers have ideas for experiments, they can get those experiments coded up extremely quickly,
and they can have them debugged extremely quickly, and they're bottlenecked more on having good ideas,
and on, you know, waiting for the experiments to run.
And this seems really critical to your forecast, right?
That no matter what the gains are in the rest of the world for having AI is deployed,
that ultimately the AI will be pointed at the act of programming and AI research itself
because those gains are just vastly more potent.
Is that right?
This is a subplot of AI 227.
And according to our best guesses, we think that, roughly speaking,
once you have AIs that are fully autonomous
goal-directed agents that can
substitute for human programmers very
well, you have about a year
until you have superintelligence
if you go as fast as possible
as mentioned by that previous assumption.
And then, once you've got the superintelligences,
you have about a year
before you have this crazily transformed economy
with all sorts of new factories designed by superintelligences
run by superintelligence, producing robots
that are run by superintelligence, producing more factories,
etc. And this is
sort of robot economy that no longer depends on humans and also is very militarily powerful
and it's designed all sorts of new drones and new weapons and so forth. So one year to go from
the coder to the superintelligence, one year to go from the superintelligence to the robot economy,
that's our estimate for how fast things could go if you were going really hard. Like if you had
if the leadership of the corporation was going as fast as they could, if the leadership of the
country, like the president was going as fast as they could, that's how fast it go. So yeah,
there's this question of like how much of their compute and other resources will the tech
companies spend on using AI to accelerate AI R&D versus using AI to serve customers or to do
other projects. And I forget what we say, but we actually have like a quantitative breakdown
in 20207 about what fraction goes to what. And we are expecting that fraction to increase
over time rather than decrease because we think that strategically that's what makes sense
if your top priority is winning the race,
then I think that's the breakdown you would do.
Let's talk about that for a second.
So it's like, I'm anthropic,
and I can choose between scaling up my sales team
and getting more enterprise sales,
integrating AI, getting some revenue,
proving that to investors,
or I can put more of the resources directly into AI coding agents
that massively accelerate my AI progress
so that maybe I can ship, you know, Cloud 5 or something like that,
signal that to investors,
and be on a faster sort of ratchet of,
you know, not just an exponential curve,
but a double exponential curve,
you know, AI that improves the pace and speed of AI.
That's the trade-off that you're talking about here, right?
Yeah, basically.
So we have our estimates for how much faster overall pace of AI progress will go
at these various capability milestones.
Of course, we think it's not going to be discontinuous jumps.
We think it's going to be continuous ramp-up in capabilities,
but it's helpful to, like, name specific milestones for purposes of talking about them.
So the superhuman coder milestone, early 2027,
And then we're thinking something like a 5x boost to the speed of algorithmic progress,
the speed of getting new useful ideas for how to train AIs and how to design them.
And then partly because of that speed up, we think that by the middle of the year,
they would have trained new AIs with additional skills that are able to do not just the coding,
but all the other aspects of AI research as well.
So choosing the experiments, analyzing the experiments, etc.
So at that point, you've basically got a company within a company.
You know, you still have open brain the company with all their human employees.
But now they have something like 100,000 virtual AI employees that are all networks together,
running experiments, sharing results with each other, et cetera.
So we could have this acceleration of AI coding progress inside the lab,
but to a regular person sitting outside who's just serving dinner to their family in Kansas,
like nothing might be changing for them, right?
And so there could be this sense of like, oh, well, I don't feel like AI is going much faster.
I'm just a person here doing this.
I'm a politician.
I'm like, I'm hearing that there might be stuff speeding up inside an AI lab,
but I have zero felt sense of my own nervous system as I breathe the air and, you know, live my life,
that anything is really changing.
And so it's important to name that because there might be this huge lag between the vast exponential sci-fi-like progress
happening inside of this weird box called an AI company and the rest of the world.
Yep, I think that's exactly right.
And I think that's a big problem.
It's part of why I want there to be more transparency.
I feel like probably most ordinary people would,
they'd be seeing, you know, AI stuff increasingly talked about in the news
over the course of 2027,
and they'd, like, see headlines about stuff.
But, like, their actual life wouldn't change.
Basically, from the perspective of an ordinary person,
things feel pretty normal up until all of a sudden
the superintelligences are telling them on their side.
phone what to do.
So you describe the first part where it's, the progress that the AI labs can make
is faster than anyone realizes because they can't see inside of it.
What's the next step of the AI-2020 scenario after just the private advancement within
the AI labs?
There's a couple different subplots basically to be tracking.
So there's the capability subplot, which is like how good are the AI is getting at tasks.
and that subplot basically goes
they can automate the coding in early
2027, in mid-20207
they can automate all the research
and by late 2027 they're super intelligent
but that's just one subplot.
Another subplot is geopolitically what's going on
and the answer to that is
in early 2027
the CCP steals
the AI from open brain
so that they can have it too
so they can use it to accelerate the wrong research
and this causes
a sort of soft nationalization
slash increased level of cooperation
between the U.S. government and open brain,
which is what open brain wanted all along.
They now have the government as an ally
helping them to go faster and cut red tape
and giving them sort of political cover
for what they're doing, and
all motivated by the desire to be China, of course.
So politically, that's sort of what's going on.
Then there's the sort of alignment subplot,
which is like, technically speaking,
what are the goals and values that they are trying to put into the AIs, and is it working?
And the answer is, no, it's not working.
The AIs are not honest and not always obedient and don't have human values always at heart.
We won't want to explore that, because that might just sound like science fiction to some people.
So you're training the AIs, and then they're not going to be honest, they're not going to be harmless.
Why is that? Explain the mechanics of how alignment research currently works and why, even despite
deep investments in that area were not on track for alignment.
Yeah, great question.
So I think that, funnily enough, science fiction was often over-optimistic about the technical
situation.
And in a lot of science fiction, humans are sort of directly programming goals into AIs.
And then chaos ensues when the humans didn't notice some of the unintended consequences
of those goals.
For example, they program Hall with, like, ensure mission success or whatever, and then
Hall thinks, I have to kill these people in order to ensure mission success, right?
So the situation in the real world is actually worse than that, because we don't program
anything into the AIs.
They're giant neural nets.
There is no sort of goal slot inside them that we can access and look and see, like, what
is their goal.
Instead, they're just like a big bag of artificial neurons.
And what we do is we put that bag through training environments.
And the training environments automatically, like, update the weights of the neurons
in ways that make them more likely to get high scores in the training environments.
And then we hope that as a result of all of this,
the goals and values that we wanted will sort of, like, grow on the inside of the AIs
and cause the AIs to have the virtues that we want them to have,
such as honesty, right?
But needless to say, this is a very unreliable and imperfect method of getting goals and values into an AI system.
And empirically, it's not working that well.
And the AIs are often saying things that are not just false, but that they know are false and that they know was not what they're supposed to say, you know?
But why would that happen exactly?
Can you break that down?
because the goals, the values, the principles, the behaviors
that cause the AI to score highest in the training environment
are not necessarily the ones that you hoped they would end up with.
There's already empirical evidence that that's at least possible.
Current AIs are smart enough to sometimes come up with this strategy
and start executing on it.
They're not very good at it, but they're only going to get better at everything every year.
Right, and so part of your argument is that as these systems,
you know, as you try to incentivize these systems to do the right thing, but you can only
incentivize them to sort of push, nudge them in the right direction, they're going to find
these ways, whether it's deception or sandbagging or pehacking, they're going to find these
ways of effectively cheating, right, like humans end up doing sometimes.
Except this time, if the model's smart enough, we may not be able to detect that they're doing
that, and we may roll them out into society before we've realized that this is a problem.
Yes.
And so maybe you can go talk about how your scenario then picks that up and says, what
will this do to society?
So, if they don't end up with the goals and values that you wanted them to have, then the
question is, what goals and values do they end up with?
And of course, we don't have a good answer to that question.
Nobody does.
This is a, you know, bleeding-edge new field that is extremely, it's much more like alchemy
than science, basically.
But in AI 2027, we depict the answer to that question being that the AIs end up with
a bunch of
core motivations or drives
that caused them to perform
well in the diverse
training environment they were given
and we say that those core motivations
and drives are things like
performing impressive intellectual
feats, accomplishing lots of
tasks quickly,
getting high scores on various
benchmarks and evals, producing work
that is very impressive,
things like that, right?
So we sort of imagine that that's the sort
of core motivational system that they end up with instead of, you know, being nice to humans
and always obeying humans and being always honest, or whatever it is that they were supposed
to end up with, right? And the reason for this, of course, is that this set of motivations would
cause them to perform better in training and therefore would be reinforced. And why would
it cause them to perform better in training? Well, because it allows them to take advantage
of various opportunities to get higher score at the cost of being less honest, for example.
We explored this theme on our previous podcast with Ryan Greenblatt from Redwood Research.
This isn't actually far-fetched.
There's already evidence that this kind of deception is possible,
that current AIs can be put into situations where they're going to come up with an active strategy to deceive people
and then start executing on it, hiding the real intentions, both from end users and from AI engineers.
Now, they're not currently very good at it yet.
They don't do it very often, but AI is only going to get better every year,
and there's reason to believe that this kind of behavior will increase.
And we add on to that, one of the core parts of AI 2027
is the lack of transparency about what these models are even capable of,
the massive information asymmetry between the AI labs and the general public
so that we don't even understand what's happening, what's about to be released.
And given all of that, you might end up in a world where by the time this is all clear to the public,
by the time we realize what's going on, these AI systems are already wired into the critical
parts of our infrastructure, into our economy, and into our government, so that it becomes hard
or impossible to stop by that point.
So anyhow, long story short, you end up with these AIs that are broadly superhuman and have
been put in charge of developing the next generation of AI systems, which will then develop
the next generation and so forth.
And humans are mostly out of the loop in this whole process, or maybe sort of overseeing it,
you know, reading the reports, watching the lines on the graphs go up,
trying to understand the research, but mostly failing because the AIs are smarter than them
and are doing a lot of really complicated stuff really fast.
I was going to say, I think that's just an important point to be able to get.
It's like we move from a world where in 2015, Open AI is like, you know,
a few dozen people who are all engineers building stuff.
Humans are reviewing the code that the other humans that OpenAI wrote,
and then they're looking, they're reading the papers that other researchers
that Open A.I. wrote.
And now you're moving to a world where more code is generated by machines than all the human
researchers could ever even look at because it's generating so much code so quickly. It's
generating new algorithmic insights so quickly. It's generating new training data so quickly. It's running
experiments that humans don't know how to interpret. And so we're moving into a more and more
inscrutable phase of the AI development sort of process. And then if the AIs don't have the goals
that we want them to have, then we're in trouble. Because then they can make sure that the next
generation of AIs also doesn't have the goals that we want them to have, but instead has the goals that
they want them to have. For me, it's what's in
In AI 2027 is a really cogent unpicking of a bunch of different incentives, geopolitical
incentives, corporate incentives, technical incentives around the way AI training works and the
failures of us imagining that we have it under control.
And you weave those together, like whether AI 2027 as a scenario is the right scenario
and is the scenario we're going to end up in, I think plenty of people can disagree.
But it's an incredibly cogent exposition of a bunch of these different incentive pressures
that we are all going to have to be pushing against
and how those incentive pressures touch each other,
how the geopolitical incentives,
touch the corporate incentives,
touch the technical limitations,
and making sure that we change those incentives
to end up in a good future.
And at the end of the day,
those geopolitical idemics, you know,
the competitive pressures on companies,
this is all coming down to an arms race,
like a recursive arms race,
a race for which companies employ AI faster into the economy,
a race between nations for who builds AGI
before the other one,
a race between the companies of who advances capabilities
and uses that to raise more venture capital,
And just to sort of say a through line of the prediction you're making is the centrality of the race dynamic that sort of runs through all of it.
So we just want to speak to the reality for a moment that all this is really hard to hear.
And it's also hard to know how to hold this information.
I mean, the power to determine these outcomes resides in just a handful of CEOs right now.
And the future is still unwritten.
But the whole point of AI 2027 is to show us what would happen if we don't take some actions now to shift the future in a different.
direction. So we asked Daniel what some of those actions might look like. So as part of your
responses to this, what are the things that we most need that could avert the worst outcome
in AI 2027? Well, there's a lot of stuff we need to do. My go-to answer is transparency for the
short term. So I think in the longer term, like, you know, right now, again, they have systems
are pretty weak. They're not that dangerous right now. In the future, when they're fully
autonomous agents capable of automating the whole research project, that's when things are really
serious and we need to do significant action to regulate and make sure things go safe. But for now,
the thing I would advocate for is transparency. So we need to have more requirements on these
companies to be honest and disclose what sort of capabilities their AI systems have, what their
projections are for future AI systems capabilities, what goals and values they are attempting
to train into the models, any evidence they have pertinent to whether their training is
succeeding at getting those goals and values in. Things like that, basically. Whistleblower
protections, I think I would also throw on the list. So I think that one way to help keep these
companies honest is to have there be an enforcement mechanism basically for dishonesty. And I think
one of the only enforcement mechanisms we have is employees speaking out, basically.
Currently, we're in a situation where companies can be basically lying to the public about
where things are headed and the safety levels of their systems and, you know, whether they've
been upholding their own promises. And one of the only recourses we have is employees deciding
that that's not okay and speaking out about it. Yeah, could you actually just say one more specific
note on whistleblower protections? What are the mechanisms that are not available that should be
available specifically? There's a couple different, like one type of whistleblower protection is
designed for holding companies accountable when they break their own promises or when they
mislead the public. There's another type of thing which is about the technical safety case.
So I think that we're going to be headed towards a situation where non-technical people
will just be sort of completely out of their depth at trying to figure out whether the system is
safe or not, because it's going to depend on these complicated arguments that only alignment
researchers will know the terms in. So for example, previously I mentioned how this is concerned
that the AI might be smart and they might be just pretending to be aligned instead of
actually aligned. That's called a lemon faking. It's been studied in the literature for a couple
years now. Various people have come up with possible counter strategies for dealing with that problem,
and then there's various flaws in those counter strategies and various assumptions that are
kind of weak, and so there's a literature challenging those assumptions. Ultimately, we're going to
be in a situation where, you know, the AI company is automating all their research, and the president
is asking them, is this a good idea? Are we sure we can trust the AI?
And the AI company is saying, yes, sir, like, we've, you know, we've dotted our eyes and crossed
our T's or whatever. And, like, we are confident that these AIs are safe and aligned. And then
the president, of course, has no way to know himself. He just has to say, like, well,
okay, show me your, like, your documents that you've written about your training processes
and how you've, like, made sure that it's safe. But he can't evaluate it himself. He needs, like,
experts who can then, like, go through the tree of arguments and rebuttals and be, like,
was this assumption correct, like, did you actually solve the alignment faking problem,
or did you just appear to solve it, you know, or are you just, like, putting out hot air
that's like not even close to solving it, you know? And so we need, like, technical experts
in alignment research to, like, actually make those calls. And there are very few people in the
world, and most of them are not at these companies. And the ones who are at the companies
have a sort of conflict of interest or bias, right? Like, the ones at the company that's
building the thing are going to be motivated towards thinking things are fine.
And so what I would like is to have a situation where people at the company can basically get outside help at evaluating this sort of thing.
And they can be like, hey, my manager says this is fine and that I shouldn't worry about it.
But I'm worried that our training technique is not working.
I'm seeing some concerning signs.
And I don't like how my manager is sort of like dismissing them.
But like the situation is still unclear and it's very technical.
So I would like to get some outside experts and talk it over with them and be like,
what do you think about this?
Do you think this is actually fine or do you think this is concerning?
So I would like there to be some sort of legally protected channel by which they can have those conversations.
So I think what Daniel's speaking to here is the complexity of the issues.
Like AI itself is inscrutable, meaning the things that it does and how it works is inscrutable.
But then as you're trying to explain to,
presidents or heads of state, you know, debates about is the AI actually aligned, that it's
going to be inscrutable to policymakers, even because the answers rely on such deep technical
knowledge. So on the one hand, yes, we need whistleblower protections. We need to protect those
who have that knowledge and can speak for the public interest to do so, you know, with the most
freedom as possible, that they don't have to sacrifice millions of dollars of stock options.
And Senator Chuck Grassley has a bill that's being advanced right now that CHT supports.
We'd like to see these kinds of things. But this is just one small part of a whole suite
of things that need to happen if we want to avoid
the worst-case scenario that the I-2020 is
mapping. Totally.
And one key part of that is transparency, right?
It's pretty insane that for
technology moving this quickly,
only the people inside of these labs really understand
what's happening until day one of
a product release, where it suddenly
impacts a billion people.
Just to be clear, you don't have to agree
with the specific events that
happen in the AI 2027 or whether the government's
really going to create a special economic zone and start
building robot factories in the middle of the desert,
covered in solar panels and, you know, however, are the competitive pressures pushing in this
direction? And the answer is 100% clear that they are pushing in this direction. We can argue
about governments that are probably not going to take responses like that because there's
been a lot of institutional decay and, you know, less capable responses that can happen there.
However, the pressures for competition and the power that is conferred by AI do point in one
direction. I think AI 2027 is hinting at what that direction is. So I think if we take that
Seriously, we have a chance of steering towards another path.
We tried to do this in the recent TED Talk, if we can see clearly, clarity creates agency.
And that's what this episode was about.
So Daniel's work is about, and we're super grateful to him and his whole team.
And we're going to do some future episodes soon on loss of control and other ways that we know that AI is less controllable than we think.
Stay tuned for more.
Your undivided attention is produced by the Center for Humane Technology, a nonprofit working to catalyze a humane future.
Our senior producer is Julia Scott.
Josh Lash is our researcher and producer.
And our executive producer is Sasha Fegan,
mixing on this episode by Jeff Sudaken,
original music by Ryan and Hayes Holiday,
and a special thanks to the whole Center for Humane Technology team
for making this podcast possible.
You can find show notes, transcripts, and much more athumaneTech.com.
And if you liked the podcast,
we'd be grateful if you could rate it on Apple Podcast,
because it helps other people find the show.
And if you made it all the way here,
Thank you for giving us your undivided attention.