Big Technology Podcast - AI's Rising Risks: Hacking, Virology, Loss of Control — With Dan Hendrycks
Episode Date: March 26, 2025Dan Hendrycks is the Director and co-founder of the Center for AI Safety, and an advisor to Scale AI and xAI. He joins Big Technology Podcast for a discussion of AI's growing risk profile, and what to... do about it. Tune in to hear Hendricks explain why virology expertise in AI models is an immediate concern and how these systems might soon enable devastating hacks. We also cover intelligence explosion scenarios, the geopolitical implications of AI development, and why an international AI arms race could lead to faster development than the world can handle. Hit play for an insider's perspective on how governments and AI labs are wrestling with unprecedented technological power that could reshape global security. Connect with Dan: https://www.nationalsecurity.ai/ https://x.com/DanHendrycks
Transcript
Discussion (0)
Now that artificial intelligence has tried to break out of a training environment,
cheat at chess, and deceive safety evaluators, is it finally time to start worrying about the risk
that AI poses to us all? We'll talk about it with Dan Hendricks, the director of the Center for
AI Safety right after this. Welcome to Big Technology Podcast, a show for cool-headed, nuanced
conversation of the tech world and beyond. Today we are joined by Dan Hendricks. He is the director
of the Center for AI Safety and advisor to Elon Musk's XAI
and also an advisor to scale AI
here to speak with us about all the risks
that AI might pose to us today and into the future.
Dan, it's so great to see you. Welcome to the show.
Glad to be here. It's an opportune moment to have you on the show
because I'm recently doomed curious. And I'll explain what that means.
So I had long been skeptical of this idea
that AI could potentially break out of its training set
or out of the computers and start to potentially even harm humans.
I still think I'm on that path, but I'm starting to question it.
We've recently seen research of AI starting to try to export its weights and scenarios where
it thinks it might be rewritten, trying to fool evaluators, and even trying to break a game
of chess by rewriting the rules because it's so interested in winning the game.
So I'm just going to put this to you right away.
is what I'm seeing in these early moments of AI's trying to deceive evaluators
or trying to change the rules that it's been given,
is that the early signs of us having AI as an adversary and not as a front?
The easier way to see that it could be adversarial is just if people maliciously use
these AI systems against us.
So if we have an adversarial state trying to weaponize it against us,
that's an easier way in which it could cause a lot of damage to us.
Now, there is an additional risk that the AI itself could have an adversarial relation to us and be a threat in itself, not just the threat of humans in the forms of terrorists or in the forms of state actors, but the AIs themselves potentially working against us.
I think those risks would potentially grow in time.
I don't think they're as substantial now compared to just the malicious use sorts.
of risks. But yeah, I think that as time goes on and as they're more capable, if some
small fraction of them do decide to deceive us or try to self-exfiltrate themselves or develop
an adversarial posture toward us, then that could be extraordinarily dangerous. So it depends.
So I want to distinguish between what things are particularly concerning in the next year versus
somewhat more in the future. And I think in the shorter term it is more of this malicious use,
but that's not to downplay the fact that AS can be threats later on.
Now, from what I understand it, from your first answer,
you are concerned both the way that humans use AI
and AI itself sort of taking its own actions,
loss, our loss of control of artificial intelligence.
So can you just rank sort of where you see the problems
in terms of most serious to least serious
and what we should be focusing on?
That's a really good question.
So the risks in their severity sort of depend on time.
Some become much more severe later.
So I don't think AI poses a risk of extinction like today.
I don't think that they're powerful enough to do that.
Because they can't make PowerPoints yet, right?
They don't have agential skills.
They can't accomplish tasks that require many hours to complete.
And so since they lack that, this puts a severe
limit on the amount of damage that they could do or their ability to operate autonomously.
So I think there's a variety of risks. I think there's in malicious use in the shorter term,
when AIs get more agential, I'd be concerned about AIs causing cyber attacks on critical infrastructure,
possibly as directed by a rogue actor. There'd also be the risk of AIs facilitating the development
of bio-weapons, in particular pandemic-causing ones, not smaller-scale ones like anthrax.
Those are, I think, the two malicious use risks that will need to be getting on top of
in the next year or two.
At the same time, there's loss of control risks, which I think primarily stem from people
and AI company trying to automate all of AI research and development, and they can't
have humans check in on that process because that would slow them down too much. If you have a
human do a week review every month of what's been going on and trying to interpret what's happening,
this would slow them down substantially. And the competitor that doesn't do that will end up
getting ahead. What that would mean is that you'd have basically AI development going very
rapidly where there's nobody really checking what's going on or hardly checking. And I think a loss
of control in that scenario is more likely. Right. And with the center for AI safety, we're going to talk
today about risks, but we're also going to talk about solutions. And with the Center for AI
Safety, what you're doing is basically pointing out the risks and trying to get to solutions
to these problems. You told me you were just at the White House yesterday, the day before we
were talking. So this stuff is something that you're actually working towards mitigating. And I
think we're going to get to that in a bit. But first, let's talk a little bit through some of the
risks that you see with AI and how serious they actually are. One of them that just jumped out
for me right away was bio, creating bio weapons.
let me run you through what I think the scenario could be in my head, and you tell me what I'm missing.
With bio-weapons, you'd basically be prompting an LLM to help you come up with new biological agents, effectively,
that you could go on leash against an enemy.
And I think wouldn't that be predicated on the AI actually being able to come up with biological discoveries of its own?
Right now, current LLN.
they don't really extend beyond the training set.
Maybe there's an emergent property here or there,
but they haven't made any discoveries
and sort of been the big knock on them to this point.
So I am curious, if you're talking about immediate risks
and one of them being, okay,
there could be bio-weapons that are created with AI,
doesn't that suppose that there's going to be something
much more advanced than the LLMs that we have today?
Because with current LLMs, to me it's basically like Google.
It's a search for what's on the web, and it can produce what's on the web, but it's not coming up with new compounds on its own.
Yeah, so I think that for cyber, that's more in the future, but I think verology, expert level virology capabilities are much more plausible in the short term.
So, for instance, we have a paper that'll be out maybe in some months, we'll see, but most of the works for it's been done.
And in it, we have Harvard and MIT expert level virologists sort of taking pictures of themselves in the wet lab and asking, what step should I do next?
So can the AI, given this image and given this background context, help guide through step by step these various wet lab procedures in making viruses and manipulating their properties?
And we are finding that with the most recent reasoning models, quite unlike the models from
two years ago, like the initial GPT4, the most recent reasoning models are getting around 90th
percentile compared to these expert level virologists in their area of expertise.
So this suggests that they have some of these wet lab type of skills, and so if they can
guide somebody through it step by step, that could be very dangerous. Now, there is an ideation
step, but that seems like a capability, them doing brainstorming to come up with ways to
make viruses more dangerous. I think that's a capability that they've had for over a year,
the brainstorming part, but the implementation part seems to be fairly different. So I think in bio,
actually, I would not be surprised if in a few months there's a consensus that there's
expert level in many relevant ways in that we need to be doing something about that.
Wow, that's crazy to me because I would think it would be the opposite, right?
That cyber would be the thing that we need to be worried about because these things code so well,
not virology.
So I just want to ask you.
But on that, biology has been such an interesting subject because they just know the literature
really well.
They know the instance outside, but they got a fantastic memory.
and they have so much background experience.
It's been, for some reason,
they're easiest subject historically,
biology and virology in earlier forms of measurements.
Like if you see how they do on exams,
but now we're looking at their practical wet lab skills,
and they have those increasingly as well.
So what about the evolution of the technology?
Because this is all with large language models, right?
Reasoning is just something that's taking place
within a large language model.
like the GPT, which powers chat GPT.
So what is it about the current capabilities
that have increased to the point
where they're now able to guide somebody
through the creation or manipulation of a virus?
That seems to be like a step changing capability.
Well, now they have this image understanding,
image understanding skills.
So that's a problem that they didn't use to have.
That makes it a lot easier for them to do guidance
or sort of be an apprentice or sort of a,
guide on one's shoulder saying, now do this, now do that.
But I don't know where that came from, that skill.
They've just trained on the Internet, and maybe they read enough papers and saw enough
pictures of things inside those papers to have a sense of the protocols and how to troubleshoot
appropriately.
So since they've read basically every academic paper written, maybe that's the cause of it.
But it's a surprise.
I mean, I was thinking that this practical.
tacit knowledge or something
wouldn't be something that they would pick up
on necessarily. It'd make a lot
more sense for them to have
academic knowledge about
knowledge of vocab words and things like
that. So I don't know where it came from.
It's there. Right, but this is still
all stuff that is known
to people. It's not like the AI is coming
up with nude viruses
on its own. Well, so... You can't
like prompt whatever GPS
it is and say, create a new coronavirus.
So if you're saying, I'm trying to modify this property of the virus so that it has more
transmissibility or a longer stealth period, then I think it could, with some pretty easy
brainstorming, make some suggestions, and then if it can guide you through the intermediate
steps, that's something that could make it be much more lethal.
I don't think it needs a, you don't need breakthroughs for doing some bioterrorism, generally.
The main limitation for risks, generally, risks will be capability and intent.
And historically, our biarrisks have been fairly low because the people with these capabilities
has been a very small number, maybe a few hundred top virology PhDs, and then a lot of them
just don't intend to do this sort of thing.
However, if these capabilities are out there without any sorts of restrictions and
extremely accessible, then as it happens, then you, you know, you're going to be able to be able to
your risk service is blown up by several orders of magnitude.
A solution for this, to let people keep access to these expert level virology capabilities,
is that they can just speak to sales or ask for permission to have some of these guardrails
take it off.
Like if they're a real researcher at Jen and Tech or what have you, wanting these expert
level virology capabilities, then they could just ask and then like, oh, you're a trusted
user, sure, here's access to these capabilities.
But if somebody just made an account a second to go, then by default, they wouldn't have access to it.
So for safety, a lot of people think that the way you go about safety is, you know, slowing down all of AI development or something like that.
But I think there are very surgical things you can do where you just have it refuse to talk about topics such as reverse genetics or guide you through practical intermediate steps for some virology methods.
And wait, those safeguards don't exist today?
At X-A-I, they do.
You're an advisor at X-A-I?
Yeah, yeah, yeah.
But what were the models that you were testing to try to find out whether they would help with the enhanced integration of virologists of viruses?
We tested pretty much all of the leading ones that have these sort of multimodal capabilities.
And they'll have some sort of safeguards, but there are various holes.
And so those are being patched that we've communicated that, hey, there are various issues here.
And so I'm hopeful that very quickly some of these vulnerabilities will be patched with it.
And then if people want access to those capabilities, then they could possibly be a trusted third-party tester or something like that
or work at a biotech company and then those restrictions could be lifted for those use cases.
But random users, we don't know who they are asking how to make some virus more lethal or something, sorry, animal affecting virus.
It's just have the model refuse on that. That seems fine.
Yeah, we do see the benchmarks come in through each model.
release and it's like, oh, now it's scored 84th or 90th percentile or 97 percentile on this
math test or on this bio test. And for us, it's like, oh, that's the model doing it. But what
you're trying to say is, and correct me if I'm wrong, if it's getting 90 percent of the way
that an expert virologist might get, then it could take a crafty user, you know, a number of
prompts effectively to find their way towards that 100 percent, because if they try it enough
times they might accidentally get to the, not accidentally, but they might end up getting
the bad virus that we're trying not to have the public create. Yeah. Yeah. So this is,
this is what concerns me like quite a bit and I'm being more quiet about this just to, you know.
Well, you're talking about it. Yeah, I suppose I'm talking about it now, but I'm not, you know,
there's this orders of magnitude with it. It's being taken care of at XAI and this is sort of
in our risk management framework there. And other labs are taking this.
sort of stuff more seriously or find some vulnerabilities, and then they're patching them.
So I'm being non-specific about some of the vulnerabilities here, but hopefully can
provide more precision once they have that taken care of.
Okay, I look forward to reading the paper.
You're an advisor to Scale AI.
They are a company that will give a lot of Ph.D. level information to models in post-training,
right?
So you've trained up the model on all of the Internet.
I was pretty good at predicting the next word.
And then it needs some domain-specific knowledge.
Scale, from my understanding, has PhDs and really smart people,
writing their knowledge down and then feeding it into the model to make these models smarter.
How does a company like Scale AI approach this?
Do they, like, have to say, all right, if you're a virology, PhD,
we shouldn't be fine-tuning the model with your information.
Like, what's going on there and how are you advising them?
So I've largely been advising on measuring capabilities and risks
in these models. So we did, for instance, a paper on the weapons mass destruction related knowledge
that models would have together last year. And for that, we were finding a lot of the academic
knowledge or knowledge that you would find in the literature. Like, does it really understand
the literature quite well? And we were seeing that in biology and for biology, and for
bio-weapons-related papers that they did. However, this just tested their knowledge, not their
know-how. So that's why we did the follow-up paper to see what's their actual wet lab
know-how skills. And those were lower, but now they're higher. And so now those vulnerabilities
need to be patched or, and those patches are, I gather, underway. So we've also worked on other
sorts of things together, like in measuring the capabilities of these models, because I think it's
important to the public have some sense of how quickly is AI improving? What level is it at currently?
So a recent paper we did together was Humanity's last exam, where we put together various professors
and postdocs and PhDs from all over the world, and they could join in on the paper if they
submit some good questions that stump the AI systems.
And I think this is a fairly difficult test, so it was think of something really difficult
that you encountered in your research and try and turn that into a question.
And I think each person, each researcher probably has one or two of these sorts of questions.
So it's a compilation of that, and I think when there's very high performance on that benchmark,
that would be suggestive of something that has, say, in the ballpark of super, super
human mathematician capabilities.
And so I think that would revolutionize the academy quite substantially because all the theoretical
sciences that are so dependent on mathematics would be a lot more automatable.
You could just give it the math problem and it could probably crack it better than nearly
anybody on Earth could.
So that's an example capability measurement that we're looking at.
We excluded in Humanity's last exam no virology-related skills.
So we were not collecting data for that because we didn't want to incentivize the
models getting better at that particular skill through this benchmark.
And how's the AI doing today on that exam?
They're in the ballpark of like 10 to 20% overall, the very best models.
So it'll take a while for it to get to 80 plus percent.
But I think once it is 80 plus percent, that's basically a superheaval mathematician is one
way of thinking of it. But the thing is, they're at 10 to 20% now. And many experts within the
AI field, the practitioners, we had Jan on a couple weeks ago talking about how we're getting
to the point of diminishing returns with scaling, right? That current growth trajectory of,
or the current trajectory of generative AI in particular, is limited because basically the labs
are maximizing their ability to increase its capabilities. So I'm curious what you think,
you think that's right because you're obviously working with these companies, working with
XAI, you're working with scale.
If we are getting to this data wall or some wall or some moment of diminishing marginal
return on the technology, is it possible that all this fear is somewhat misplaced?
Because if the AI is not going to get much better than it is right now, at least with the
current methods, you know, we may not be a year or two away from AGI, right?
We may not be getting AGI at the end of 2025, like some people are suggesting.
And so then maybe we shouldn't be as afraid because, again, the stuff is limited.
Yeah, so if we were trapped at around the capability levels that we're at now,
then that would definitely reduce urgency and, you know,
I mean, one could chill out a bit more and take it easy.
But I'm not really seeing that.
I think maybe what he's referring to is the sort of pre-training paradigm,
sort of running out of steam.
So if you take an AI, train on a big blob of data
and have it just sort of predict the next token
to do what basically gave rise to older models like GPT4,
that sort of paradigm does seem like it's running out of steam.
It has held for many, many orders of magnitude,
but the returns on doing that are lower.
That is separate from the new reasoning paradigm that has emerged in the past year,
which is where you train models on math and coding types of questions with reinforcement learning.
And that has a very steep slope.
And I don't see any signs of that slowing down.
That seems to have a faster rate of improvement than the pre-training paradigm,
the previous paradigm had.
And there's still a lot of reasoning data left to go through
and do reinforcement learning on.
So I think we have quite a number of months
or potentially years of being able to do that.
And so personally, I'm not even thinking too specifically
about what AIs will be looking like in a few months.
They'll be, I think, quite a bit better at math and coding.
But I don't know how much better.
I'm largely just waiting because the rate of improvement is so high and we're so early on in this new paradigm that I don't find it useful to try and speculate here.
I'm just going to wait a little while to see.
But I would expect it to be quite better in each of these domains, in these STEM domains.
Right.
I guess reasoning does make it better at the areas that you're mostly concerned about.
Yeah, yeah, that's right.
Because when it goes, tell me again if I'm wrong, when it goes step by step,
It's much better at executing and working on these problems than if it's just printing answers.
Yeah, and there is a possibility, and this is sort of a hope in the field, I don't know whether it will happen,
is that these reasoning capabilities might also give these agent type of capabilities,
where it can do other such of things like make a PowerPoint for you
and do things that would require operating over a very long time horizon.
Potentially those would fall out of this, that skill set would fall out of this paradigm.
but it's not clear.
There has been a fair amount of generalization
from training on coding and mathematics
to other sorts of domains like law, for instance.
And maybe if those skills get high enough,
maybe it will be able to sort of reason its way
through things step by step
and act in a more coherent,
goal-directed way across longer time spans.
I'm going to try to channel Jan here a little bit.
I think he would say that this is still going to be constrained
by the fact that AI has no real understanding.
think of the real world.
Well, I don't know.
It sounds like almost a no true Scotsman type of thing.
Like, it's like what's real understanding?
Right, okay.
Like, um, let me give you an example.
If it's, if it's sort of like, if it can do the stuff, that's what I care about.
But if it like doesn't satisfy some like strict philosophical sense of something,
you know, some people might find that compelling, but I don't.
I'll give you an example, like with the video generators.
Like if AI really understood physics, uh, then, you know, when you try it, when you say,
give me a video of a car driving through a haystack.
It will actually be a car driving through a haystack
as opposed to what I've done is give it that prompt
and it's just hay exploding onto the front of a car
with perfectly intact hay bales in the background.
I think that for a lot of these sorts of queries,
at least with images, for instance,
we'd see a lot of nonsensical arrangements of things
and things that don't make much sense
if you look at it more closely.
But then as you just scale up the models
then they tend to just kind of get it increasingly.
So we might just see the same for images,
or excuse me, for video.
I think as well they have like some good world model stuff.
Like they'll have like vanishing points being more coherent.
And like if I were drawing or anything like that,
I'd probably be lacking, you know,
lacking in understanding of the physics and geometry of the situation
and making things internally coherent relative to them.
So, I don't know, yeah, they seem pretty compelling and have a lot of the details right,
including some of the more structural details.
But there'll be gaps that one can keep zooming into.
But I just think that that set will keep decreasing, as was sort of the case with images and text before.
I mean, text back in the day, the same argument.
It doesn't have a real understanding of causality.
It's just sort of mixing together words and whatnot.
And when it was barely able to construct sentences coherently.
Now it can.
And then, yeah, and now it can.
So I don't know if it, like, then got a real understanding in the sort of philosophical sense that he's thinking for language, but it was good enough.
And that might be the case with video as well.
There were points where I was like, oh, but it is getting the guy sitting on the chair when I say, you know, do a video of a guy sitting on a chair and kicking his legs.
And those legs are kicking.
And they are bending at the joints.
So there must be some understanding there.
Yeah, in some ways.
But if you ask them to do like gymnastics
and I'll just have leaps flailing all over.
No, the person just disappears into the floor.
Okay, like you said at the beginning,
where Chachyp-D isn't going to kill us yet.
Let's talk about hacking.
I do think that we glanced over a little bit before,
but in terms of we're now going through, I think,
the humans plus AI problem, right?
And hacking to me is one that I think we should definitely focus on.
You mentioned that we're still not quite there,
but it does seem to me again
I'm just going to go back to the point I made earlier
you can really code stuff up with these things
and they enable like pretty impressive code
already you could think that
chat GPT could produce
pretty good fishing emails if you just kind of
creatively and not just chat chip T but all
of these GPT models
if you creatively prompt it right
it will give you an email that you can send and try
to fish somebody
or even let's say you just take an open source model
like deep seek download it and then
run it without safeguards
So where's the risk with hacking?
I know you said it's a little bit further off.
Why is it further off?
And what should people be afraid of?
Or what should people be concerned of?
Yeah, yeah.
So the risk from it, more of the risk comes from when they're able to autonomously do the hacking themselves.
So trying to break into a system, finding an exploit, escalating privileges, causing damage from there, things like that.
And that requires multiple different steps and these agential skills that I keep referring to that they currently don't have.
So although they could facilitate in like ransomware development and other forms of other forms of malware,
for them to autonomously execute and infiltrate systems, that is something that will require the new agential skills.
And I don't see, it's very unclear when those arrive.
could be a few months from now, could be a year from now, it's a little less, a little more
suspicious and maybe it would even take two years for that. So that's something for us to get
prepared for figure out how we're going to deal with that, try and make safeguards increasingly
robust to people trying to maliciously use it in those ways. But yeah, I think much of the
risk source comes from being able to take.
one of these AIs, let's say one of these deep-seek AIs, let's say it's deep-seek agent version,
and it's able to actually do these cybertext.
Then you could just run 10,000 of them simultaneously.
And then you, you know, some rogue actor could have it target critical infrastructure.
Then this is causing quite severe damage.
So for, like, critical infrastructure, you know, this could be like have it reduce the detector,
the, or the filter in, you know, a water plant or something like that.
Then the water supply is, like, ruined.
or you could target these thermostats in various homes because they're, you know, often some of the more advanced ones are connected to Wi-Fi.
And then you sort of turn them up and down simultaneously.
And this can just like ruin like transformers and like blow them.
And then, you know, they take multiple years to replace.
Things like that.
And but they aren't capable of doing that sort of thing currently.
So it's more of a on the horizon type of thing.
But I'm not like feeling the urgency with that currently.
I'm more concerned about it.
I think there's more the geopolitics of this,
like making sure that states are aware of what's going on in AI,
like they're at least able to follow the news and things like that in some capacity.
I think things like that feel so more urgent to me than trying to address cyber risks.
There are things to do, though, and I think we should create incentives beforehand, but
maybe I'm too much of an optimist for my own good, but when I hear you talk about this,
I also get a little bit excited about the capabilities of these programs, because, for instance,
if AI can enhance the function of a virus, AI can probably create a vaccine, make medical
discoveries.
If AI can hack into the infrastructure of some country, right, find exploits and
turn the thermostats up and down, then AI could probably do incredible amounts of very
beneficial coding and computer work for humanity. So if we do get to that point, it seems to me
like there's going to be these maybe two poles here, right? One is the potentially scary and
destructive stuff that you can mitigate, right, with some of the controls that you talked about,
but also amazing opportunity. Yeah, so it's, it's, in the thermostat thing was for
messing with the electricity and that causing strain on the power grid, and destroying
Transformers the just for clarification in case it but yeah I think you're pointing at
that it's dual use so I'm not saying AI is bad in every single way and it's like other
dual use technologies bio is a dual use technology can be used for bio weapons can be used for
healthcare nuclear technology is dual use there's civilian applications for it as well
and chemicals too and we have managed all of those other ones
by selectively trying to, you know, limit some particular types of usage and
restricting the capabilities of rogue actors to some of these technologies and making
sure there are good safeguards for the civilian applications. And then we can
actually capture the benefits. So it's not an all or nothing type of thing with
AI. It's what are surgical restrictions one can place so that we can keep
capturing the benefits. And so, for instance, with
That's a matter of you add the safeguards, and then the researchers who want access to those can speak to sales.
That's basically a resolution of that problem, provided that you have the models kept behind APIs.
Now, on this dual-use part, though, there's an offense-defense balance.
So for some applications, it can help, it can hurt, and maybe it helps more than it hurts.
So maybe it will hurt more than it will help.
So in bio, I think that is offense dominant.
If somebody creates a virus, there's not necessarily a cure that it will immediately find for it.
If it would help a rogue actor make a somewhat compelling virus, now that could be enough to cause many millions to die.
And it may take months or years to find a cure.
There are many viruses for which we have not found cures yet.
And for cyber, in most contexts, there's a balance between offense and defense.
defense, where if somebody can find a vulnerability with one of these hacking a
eyes, then they could also use that to patch the vulnerability.
There is an exception, though, where in the context of critical infrastructure, there the
software is not updated rapidly.
So even if you identify various vulnerabilities, there will not necessarily be a patch because
the system needs to always be on, or there are interoperability constraints, or the company
that made the software is no longer in business, these sorts of things.
So our critical infrastructure is a sitting duck.
And so in that context, cyber is offense dominant.
But in normal context, it's roughly, there's roughly a duality.
And for virology, I think that's largely offense dominant.
So before we go to the nation state element of this, I need to ask you a question about the actual research houses themselves.
Every research house says they're a concern with safety.
From open AI to XAI, everything in the middle.
Maybe not DeepSeek.
We'll get to Deepseek.
Yet they're the ones that are building this technology.
And I find it a little strange that you have companies that are saying it's weird.
We have to build this and advance this technology so we can keep people safe.
I never really understood that message.
Yeah, I don't know if it's to say that we need to keep people safe.
I think it's more that the main organizations that have power in the world now are largely companies.
And so if one's trying to influence the outcomes, when basically needs to be a company is how many of them will reason.
They'll think that, yeah, you could be in civil society or you could protest, but this will not determine the course of events as much.
So there's sort of many of them are buying themselves the option to hopefully influence things in a more positive direction, but most of the effort will be to stay competitive and stay in this arena.
So I think over 90% of the intellectual energies that they're going to spend is actually
how can we afford the 10x larger supercomputer?
And that means being very competitive, speeding this up, and making safety be some priority,
but not necessarily a substantial one.
So I do think there is sort of an interesting contradiction or something that looks like a
contradiction there.
But I think if we think back to nuclear weapons, nuclear weapons, nobody wants nuclear weapons.
If there'd be zero on Earth, fantastic, you know, that would be a nice thing to have if that would be a stable state.
But it's not a stable state.
One actor may then develop nuclear weapons, and they could destroy the other.
So this encourages states to do an arms race, and it makes everybody all collectively less secure, but that's just how the game theory ends up working.
So you get a classic, what's called a security dilemma.
everybody's worse off collectively, and even if you took it seriously, you see, yes, nuclear
technology is dual use and potentially catastrophic, and we need to be very risk-conscious
about it.
You can agree with all those things, but you still might want nuclear weapons because other
parties will also have nuclear weapons, and unilateral disarmament in many cases, it just
didn't make game theoretic sense.
So in the way that like an individual company pausing their development while others race ahead doesn't make game theoretic sense.
So I think this just points to the fact that there's some game theory is kind of confusing.
And so you're getting some things that are seeming contradictions that if you use a nuclear analogy,
you go, yeah, I suppose that makes sense.
And it's just kind of an ugly reality to internalize.
Doesn't that discount the fact that like these companies, if they want to influence like the way things are going,
they are going to be it's like you're one and the same yes you're influencing but without you
this wouldn't be moving as fast as it is like it is interesting for instance think about Elon Musk right
obviously he has you in two days a week to work on safety inside xAI but he's also putting together
what million GPU data centers to build the biggest baddest LAM ever um well if he didn't then
then he would be having less influence over it so it's um there's it's not something that
I would envision everybody would just sort of voluntarily pause.
So subject to companies not sort of voluntarily rolling over and dying, then what's the best
you can do subject to those constraints?
But the competitive pressures are quite intense, such that they do end up prioritizing,
focusing on competitiveness and other priorities like what's the budget for safety research.
it will be generally lower than would be nice to have
if this were a less competitive environment.
Do you think Elon is more interested in restoring this original vision
that he had for open AI, making everything open source,
making it safe, I would imagine.
He founded Open AI with Sam Altman as sort of a beach head against Google
because he was afraid of what Google was going to do with this technology.
So I'm curious if you think that XAI is along that mission or is he more interested in the sort of soft cultural power that comes with having the world's best AI.
For instance, like you can change the way that it speaks about certain sensitive political issues.
It can be anti-woke, which we all know is sort of where Elon stands.
So what do you think his true interest lies in building XAI?
Well, I think the, and I won't, you know, position myself as sort of speaking on behalf.
Yeah, we won't put you as a young spokesperson, but you are in there a couple times a week.
So I think that the mission is to understand the universe, and so this means having AIs that are
honest and accurate and truthful to improve the public's understanding of the world.
So we will be getting in a very fast-moving, trying situation with AI if it keeps accelerating,
and so good decision-making will be very important, and us understanding
the world around us will be very important. So if there are more features that enable truth-seeking
and honesty and good forecasts and good judgment and institutional decision-making,
those would be great to have with, the hope is that GROC could help enable some of that
so that civilization is steered, steering itself more prudently in this potentially more turbulent
period that's upcoming. That's one read on the mission statement. But I think that's the
objective of it is understand the universe and there are different sub-objectives that that would
give rise to. And I think it's ability to help culture process events without censorship
or political bias one way or the other
is a stated objective
and I think that would be indispensable
in the years going forward.
Do you buy that that's what they're doing?
Because we also heard the same thing from Elon
when it came to buying Twitter, now X.
I think community notes has been quite good.
But that was something that was built under Jack Dorsey.
I'm not going to take sides of you.
I'm going to just observe empirically what I've seen.
I mean, we know that.
Substack links have been deprioritized because it was seen as a competitor with Twitter.
We know that Musk, I think, according to reporting, changed the algorithm to have his tweets show up more often.
And his tweets took a strong stance towards supporting Donald Trump in the election.
So to me the idea that like hearing again from Elon, and again, look, I respect what Elon's done as a business person,
but hearing again that he has a plan to make a culturally relevant product,
that's free of censorship and politically unbiased, I don't know if I believe that anymore.
So I don't know about some of the specific things, such as the, you know, waiting thing or something
like that, profile things, for instance. I think that overall, in terms of cultural influence
in people being more disagreeable and doing less self-censoring has been successful. I think that
was the main objective of it.
And so I think, I think that X had a large role to play there.
So, I don't know.
I think like, I think in terms of shaping discourse norms in the U.S.,
and that seems to have been successful in my view.
Yeah, I'm not saying pre-Elon Twitter didn't censored, which is the wrong,
probably the wrong word because that's usually from the government, didn't sort of shape
the definition of speech to its own liking.
It obviously had a progressive approach.
and moderated speech on a progressive approach.
I just don't think Elon is not using his own influence
when it comes to how he runs X.
But you and I could speak about this forever.
This isn't even my sort of wheelhouse as much.
But yeah, I mean, it's sort of like since I'm doing the XXX.
You brought it up.
Oh, okay.
All right.
I mean, just the non-biased and truthful thing.
So it's worth talking about.
So, I mean, it is, if there are like ways in which it's like extremely
biased one way or the other, that's useful to know. This is a thing that is continually trying to
be improved, at least for XAI's GROC. And I think that all of the sort of product offering
could get quite better at this. But I'm not speaking as a sort of representative there or
anything like that, but I guess maybe in my, I guess right now, in my personal capacity,
I think that there's things to improve on for all these models in terms of, in terms of their
bias. All right. We agree on that front. You hinted at it previously, but you talk a little bit
about how companies, basically how you don't think it's a good idea for there to be an arms
race here. And certainly there is one between the U.S. and China. We know that U.S. has put export
controls on China. China has in some ways
gotten around them through like
very creative procurement
processes that go through Singapore, right?
We can probably say that with a pretty good degree
of confidence. Then of course we see the release
of Deepseek and some other AI
applications from China and everyone's trying to
build the better AI so that they
have the soft power like we spoke about
to effectively
you know, A control, to influence
culture across the world, but also it's an
offensive capability and
defensive like you're saying. If you're
country has the ability to manipulate viruses or to do cyber hacks, you become more powerful
and you get to sort of, you know, potentially put your view of the world and implant your view
of the world on the way that it operates. You have a paper out that's sort of arguing against
this arms race. It's called Super Intelligence Strategy. It's with you, Eric Schmidt, we all know,
former CEO of Google. I think he just started. He's taking over a drone company, so you can tell
me a little bit about that. And Alexander Wang, the former, I don't know, not the former
the current CEO of Scale AI, who's been formally on this show.
Talk a little bit about why you don't think it's a good idea for countries to pursue this arms race.
You say it might be leading us to mutually assured AI malfunction,
not mutually restored like nuclear destruction.
I think that's what you get that from.
Yeah, so the strategy has three parts, one of which is competitiveness,
but we're saying that some forms of competition could be destabilizing
and that you may be irrational to pursue it
because you couldn't get away with it.
So in particular, this, I'm making a bid for superintelligence
through some automated AI research and development loop
could potentially, to one state having some capabilities
that are vastly beyond another states.
If one state gets to experience,
a decade of development in a year, and the other one is the year behind, then this results
in a very substantial difference in the states' capabilities.
So this could be quite destabilizing if one state might then start to get an insurmountable
lead relative to the other.
So I think that form of competition would be very dangerous, and because there's a risk of loss
of control, and because it might incentivize.
states to engage in preventive sabotage or preemptive sabotage to disable these sorts of projects.
So I think states may want to deter each other from pursuing superintelligence through this means.
And this then means that AI competition gets channeled into other sorts of realms,
such as in the military realm of having more secure supply chains for robotics, for instance,
and for AI chips,
having reduced sole source supply chain dependence on Taiwan for making AI chips.
So states can compete in other dimensions,
but them trying to compete to develop super intelligence first.
I think that seems like a very risky idea,
and I would not suggest that because there's too much of risk of loss of control,
and there's too much of a risk that one state,
if they do control it, uses it to disempower others
and affects the balance of power far too much and destabilizes things.
So, but the strategy overall, think of the Cold War.
Before you go on the strategy, like, my reaction to that is, good luck telling that to China.
So I think it's totally, so for the, for deterrence, I think if the U.S. were pulling ahead,
both Russia and China may have a substantial interest in saying, hey, cut this out,
pulling ahead to develop superintelligence, which could give it a huge advantage and an ability to crush them.
They'd say, you don't get to do that.
We are making a conditional threat that if you keep going forward in this because you're on the cost of building this, then we will disable your data center or the surrounding power infrastructure so that you cannot continue building this.
I think they could make that conditional threat to deter it, and we might do the same, or the U.S. might do the same to China or other states that would do that.
So I don't see why China wouldn't do that later on.
Right now, they're not as thinking about super intelligence and advanced AI.
So this is more of a description of the dynamics later on when AI is more salient.
But it would be surprising to me if China were saying, yes, the United States, go ahead, do your Manhattan project to build superintelligence, come back to us in a few years, and then tell us, you can boss us around because now we're in a complete position of weakness and we'll be at your mercy and we'll accept whatever you say or tell us to do.
see that happening. I think they would just say move to preempt or deter that type of development
so that they don't get put in that fragile position. Are you in like the Eleazar-Yutkowski
camp of bombing the data centers if we get to superintelligence? Well, so I think I'm advocating
or pointing out that it becomes rational for states to deter each other by making conditional
threats and by means that are less escalatory, such as cyber sabotage on data centers
or surrounding power plants, I don't think one needs to get kinetic for this.
And I think that if discussions start earlier, I don't see any reason things need to be
escalating in that way or unilaterally actually doing that.
We didn't need to get in a nuclear exchange with Russia to express that we have a preference
against nuclear war.
So I think...
Thank goodness.
So indicating or making conditional threats through deterrence seems like a much smarter
more smarter than, hey, wait a second, what are you doing there?
And then bomb that.
That seems needless.
Yeah, I'm not into that.
Yeah.
But what you're talking about is sort of assuming that there will be a lead that will be
protectable for a while.
But everything we've seen with AI is that no one protects a lead, right?
Well, if there's, so one difference is that when you get to a different paradigm, like automated AI R&D,
the slope might be extremely high, such that if the competitor starts to do automated AI R&D a year later,
they may never catch up just because you're so far ahead and your gains are compounding on your gains.
Sort of like in social media companies, Eric will use this analogy, where if one of them starts blowing up and growing before you started, it's often the case that you won't be able to catch up and they'll have a winner-take-all type of dynamic.
So right now, the rate of improvement is not that high or there's less of a path for a winner-take-all dynamic currently.
But later on, when you have the ability to run 100,000 AI researchers simultaneously,
this really accelerates things.
Maybe OpenAAS got a few hundred, maybe they'll say 300 AI researchers,
so going from 300 AI researchers to orders of magnitude more world-class ones,
create quite substantial developments.
This is something that isn't new.
This is something that, like, Alan Turing and the founders of computer science
I had pointed out that this is a natural property of when you get AIs at this level of capability,
then this creates this sort of recursive dynamic where things start accelerating extremely quickly and quite explosively.
Okay. We managed to spend most of our conversation today talking about present risks or like risks in the near future.
We should focus a little bit more on intelligence explosion and loss of control, and we're going to do that right after the break.
Hey everyone, let me tell you about The Hustle Daily Show, a podcast filled with business,
tech news, and original stories to keep you in the loop on what's trending.
More than 2 million professionals read The Hustle's daily email for its irreverent and informative
takes on business and tech news.
Now, they have a daily podcast called The Hustle Daily Show, where their team of writers
break down the biggest business headlines in 15 minutes or less and explain why you should
care about them.
So, search for The Hustle Daily Show and your favorite podcast app, like the one you're using
right now. We're back here on Big Technology Podcast with Dan Hendricks. He is the director and
co-founder of the Center for AI Safety, Dan. It's great speaking with you about this stuff.
Let's talk a little bit. You've been sort of talking about it in the first half, but I want to
zero in here on this idea of intelligence explosion or what you talk about is basically
having AI autonomously improve itself. Just talk through a little bit about how that might
happen and whether you see that being something that is actually probable.
in our future.
Yeah.
I mean, the basic idea is just imagine automating one AI researcher, one world-class one.
Then there's a fun property with computers, which is there's copy and paste.
So you can then have a whole fleet of these.
Well, you know, with humans, you know, if you just have one of them, you know, it's maybe
they'll be able to train up somebody else who has a similar level of ability.
So this adds a very interesting dynamic to the mix.
And then you can get so many of them proceeding forward at once.
And, you know, AIs also operate quite quickly.
They can code a lot faster than people.
So maybe it's, maybe you've got 100,000 of these things operating at 100 X of speed of a human.
How fast will that go?
Maybe conservatively, let's say it's just overall 10xing research.
But 10xing research would mean, say, like a decade's worth of development in a year.
telescoping of all these developments makes things pretty wild and means that one player could
possibly get AIs that go from very good, you know, world class to being vastly better than
everybody at everything into superintelligence, something that towers far beyond any living
person or collective of people. So if we get an A.I. like that, this could be to stable
because it could be used to develop a super weapon potentially.
Maybe it could find some breakthrough for anti-ballistic missile systems,
which would make nuclear deterrence no longer work or other types of ways of weaponizing it.
So that's why it's destabilizing.
So states then if they're seeing, oh, they're, you know, don't run this many AIA researchers
simultaneously in these data centers.
working on to build a next generation or superintelligence,
because if you do so, then that will put us in,
that will make our survival be threatened.
So them saying, them deterring that would help them secure themselves.
And they can make those threats very credible currently,
and I think we'll continue to be able to have
these threats be credible going forward.
So this is why I think it might take a while
for super intelligence to be developed
because there'll be deterrents around it later.
on. And then maybe in the farther future, there could be something multilateral, but that's
speaking quite far out in very different economic conditions. In the meantime, with AIs that we'd
have in the future, those could still automate various things and increase prosperity and all
of that. So we'd still have explosive economic growth if you had something that was just
at an average human level ability, running for very cheap.
So I think that those are some of the later stage strategic dynamics, and I don't think we can get away with, or I don't think any state could get away with trying to build a superintelligence, go build a big data center out in the middle of the desert, a trillion dollar cluster, bring all the researchers there, and let's not invite the other states to go, what do you think you're doing here?
You were at the White House yesterday.
Well, this is largely just sort of speaking about some of these, you know, strategic implications.
Are they receptive?
Yeah, I mean, it's a, it's a, this isn't a, there's always, there's always interest in, you know, thinking what, what are some of the, the later term dynamics, what things should happen now and whatnot.
But this is, yeah, I think, I think what people think White House.
it sounds, you know...
Well, it's the word of the president, Liz.
So there's the...
Well, yeah, so there's the Eisenhower building,
which is, you know, part of the White House kind of not.
But, you know, that's where everybody's works and whatnot.
I think, you know, some of the things we were speaking about here,
like virology, advancements, things like that,
there's just a lot of, you know, things to speak about
and think what things make sense
or what things to keep in mind going forward.
So, yeah.
Yeah, I guess I'd rather an executive branch paying
attention to this stuff, they're not. Yeah, yeah, that's right, yeah. Yeah, and what are the sort of ways
that help, you know, maintain competitiveness? Because, you know, how people normally think about
this, they'll think it's all or nothing and good or bad thing. And then we're saying, no, it's dual
use. So that means there are some particular applications that are concerning and there are other
applications that are good. And you want to stem the particularly harmful applications and what are
ways of doing that while capturing the upside. Right. Okay. So the intelligence,
The emergence explosion part of this conversation, Nevely brings up the loss of control part,
where to me, I think the thing that when people think about AI harm, they are always worried
that AI is going to escape the simulation or whatever it is and act on its own and try to basically
ensure that it preserves itself.
We've seen it recently, I think I brought this up at the beginning of the show, where Anthropic
has done some experiments where the AI has run code to try to copy itself over onto a server
if it thinks that its values are at risk of being changed.
Is this, so it's fun to think about,
but it's also like probably just probability.
Like if you run it enough times because it's a probabilistic engine.
If it was like, oh, it's only one in a thousand of them intend to do this.
Well, if you're running a million of them,
then you're basically certain to get many of them to try and, you know, self-exhiltrate.
And so are you worried that this self-exfiltration is going to be a thing?
I think from a, you know, a recursive automated AR&D thing, I think that has really substantial probability behind it of a velocity control in that situation.
So you're worried about this.
So there's that, but I would distinguish between that and these sort of things that are not superintelligencies or things that are not coming from that sort of really rapid loop and like the currently existence systems.
I think that the currently existing systems are relatively controllable, or if there is some very concerning failure mode, we have been able to find ways to make them more controllable.
For instance, for bioweapons refusal, we used not able to make robust safeguards for them two years ago.
But we've done research with methods such as like called circuit breakers and things like that.
And those seem to improve the situation quite a bit and make it actually.
prohibitively difficult to do that jailbreaking. And so maybe we'll find something similar with
self-exaltration. So I think people generally want to claim that like, oh, current AIs are not
controllable. And I think that they're not highly reliably controllable. They're reasonably
controllable. Maybe we could get some, or it seems plausible that we'll get to have increasing
levels of reliability. And so I'm sort of reserving judgment. It'll depend more on the empirical
phenomena. So I think everybody should research this more and we'll sort of see what the risks
actually are. But there are some that seem less empirically tractable or things that can't be
empirically solved like this loop thing. Like how are you going to, you can't run this experiment
a hundred times or something like that and make it, you know, go well. You're making a huge
attempt to building a superintelligence and has destabilizing consequences. This isn't something
And that's totally unprecedented.
And for that, you have more of like a one chance to get it right type of thing.
But with the current systems, we can continually adjust them and retrain them
and come with better methods and iterate.
So it is concerning.
It would not surprise me if this would really start to make AI development itself extremely hazardous
instead of just the deployment.
But instead, inside the lab, like you need to be worried about the AI trying to be breaking out
sometimes. That's totally in the realm possibility. But yeah, I could see it going either way.
Yeah, I mean, this personally freaks me out because, yeah, if you see the AI trying to deceive
evaluators, for instance, or you see the AI trying to break out, you really can't trust anything
it's telling you. And we had Demis Sassabas on the show a little while ago, and he's basically
like, listen, if you see deceptive behavior from AI, if you see alignment faking, you really can't
trust anything in the safety training because it's line.
to you?
There is truth to that.
Are you seeing deceptiveness at GROC, by the way?
Oh, yeah, yeah.
So we have a paper out last week.
We're just measuring the extent to which they're deceptive.
And in the scenarios we have, like all the models were in these sorts of scenarios under,
you know, slight pressure to lie, not being told to lie, but just some slight pressure.
Then some of them will lie like 20% of the time, some of them like 60% of the time.
So they don't really have this sort of virtue, sort of baked into them, the virtue of honesty.
So I think we'll need to do more work and we'll need to do it quickly.
So I'm sort of speaking a more nonchalant way about this, but I can't like, you know, get worked up out every single risk
because they're also just, you know, be at 11 all the time.
So there's some that I'm, you know, putting in different tiers than other risks.
And this is a more speculative one.
We've seen these sometimes get surprisingly handleable.
But, yeah, it could end up making things really, really bad.
We'll see.
We'll do things about it to make that not be the case.
Okay. Thank you.
Two more topics for you then we'll get out of here.
The Center for AI Safety.
Who's funding it?
Well, so there's not sort of one funder.
It's largely just various philanthropists.
The main funder would be Yon Talon and Yon Talon, others who's a Skype, a co-founder.
There's a variety of other philanthropies or philanthropists.
The generally for, so for instance, Elon doesn't, I've never asked him to fund the center.
So that is it to say I don't get any money from Elon?
my appointment at X-AI, I get a dollar a year.
At scale, at scale AI, I've increased my salary exponentially
aid to where I get $12 a year, a dollar per month from scale.
But I'll try not to, or I'll try to avoid, you know,
getting complicated, having some complicated relations with them,
just so that I can, you know, not feel on behalf of any of them in particular.
So you're basically doing the work.
work for them for free? Well, but it's useful. Right. It's useful to do. And I mean, yeah, I mean,
I think the main objective is, yeah, just try and try and generate some value here and as best as one can.
So by reducing these sorts of risks. Yeah, I think it's a good arrangement because it enables me to, like,
you know, do have a choose your adventure type of thing. Right. Now I think the politics or geopolitics,
this is more relevant. Now I can go off and learn about this for some months and then work on a paper there and compared to if it's like, no, you've got to be coding 80 hours a week. That's your job. That would be quite restrictive. And I couldn't be speaking with you. I'm glad you're here. So thank you, Alex Wayne.
So let's talk a little bit about this funding because I think that after Sam Altman was fired and then rehired at Open AI, there was a sort of skepticism around.
around effective altruism's impact on the AI field.
Even Yon Talon, I'm reading from his statements right after the open AI governance crisis
highlights the fragility of voluntary EI-motivated governance schemes.
So the world should not rely on such governance working as intended.
Now, Yon is, of course, associated with EA.
EA is, like, basically leading the conversation around AI safety.
Is that good?
So I think that in terms of Yon, I think he's funded organization.
that are E-affiliated.
I don't know if he'd call himself that,
but whatever, you know, people can, you know,
ascribe labels how they'd like.
I think that the, I mean,
I've tweeted that EA is not equal to AI safety.
I think that EA community generally is insular on these.
So I lived in Berkeley for a long time when it was during my Ph.D.
And there's sort of a school, a sort of AI risk school that was
had very particular views about what things are important.
So malicious use, for instance, when I was talking about malicious use
from the beginning of this thing, you know, they're historically really
against that, yeah, yeah, there'd be only loss of control.
Don't talk about malicious use.
That's a distraction.
And so that was annoying because I'd always been working on robustness as a PhD student
where the main thing was malicious use.
So, yeah, I ended up leaving Berkeley before graduating just because of the sort of relatively suffocating atmosphere and the sort of central focus on.
There'd be some new fad and you'd have to get interested in that.
Some elk eliciting late knowledge.
This is the important thing that you have to focus on or you have to focus on inner optimizers.
There's lots of these speculative, empirically fragile things.
So, for instance, this alignment faking stuff that you're seeing.
Like, there's some concern there, but, you know, I'm not totally sold that this is like a top-tier type of priority.
But in these communities, this is all that matters currently, roughly speaking.
This involuntary commitments from AI's, from AI companies, I think voluntary commitments from AI companies are also a distraction because the companies will, you should expect most of them by default to just break those sorts of commitments if they end up going up against economic competitive.
Okay.
So I think it's a distraction relatively.
And so I think it's, I think there are many people who think that EAA broadly their influence
on this sort of thing has not been overall positive.
I think at least for me and making and other sorts of researchers in this space who've
been interested in AI risks, the amount of pressure to adopt some particular positions,
though, on this, be extraordinarily high and I think quite, quite destructive.
So I'm very pleased now that in the past year or so, there's been a lot more diversity of opinion, which has been quite important.
And I think this is just because the broader world is getting more interest in AI.
So a lot of these, a lot of these, you know, fixation on this is the one particular risk, this is the most important risk and everything else's distraction.
That just doesn't work when you're speaking with the, or interface.
with the real world. There's a lot of complications. And AI is so multifaceted. So you can't,
in your risk management approach, can't just be focusing on one of them. Right. So you're not
an effective altruist? I don't think of myself as that. I don't particularly
get along with this school of thought, this sort of Berkeley AI alignment monolith.
And I'm pleased that people can be more independently operating in this space now,
which I don't think was the case for many, many years,
including basically the entire time it's during my PhD.
And there'll be many people, like Dylan had Phil Minnell, a professor at MIT,
who was also at Berkeley at the time, very suffocating.
Rohan Shah, research, do you mind very suffocating, that I'll all feel this way, yeah.
Okay.
Let's bring it home.
We've been talking for more than an hour about AI safety as if it's controllable.
but open source is like really putting up a pretty valiant effort in this field keeping pace with the proprietary labs
and of course open source is not controllable what do you think about that i mean we just saw deep seek
not to you know go back to it all the time but it effectively equaled the cutting edge at the proprietary labs
and you know put the weights on its website so how can we possibly have a relationship
of safety with AI if open source is out there exposing everything that's been done?
So I've been, I haven't been endorsing open source historically, but I've thought that
releasing the weights of models didn't seem robustly good or bad.
So I sort of was like, it's fine, seems to have complicated effects.
There's an advantage to it, which it helped with diffusion of the technology, so that more
people would have access to it and sort of get a sense of AI and this would increase the
literacy on this topic and just increase public awareness and get the world more prepared for
for more advanced versions of AI. So that's been my historical position, but this depends on
it should always proceed by a cost-benefit analysis. So if the, if for instance, they have
these cyber capabilities later on, yeah, I think that, or I think that would be a potential place
to be drawing the line on open-weight releases, personally,
in particular the ones that could cause damage
to critical infrastructure.
You could still capture the benefits
by having the models be available through APIs.
And if they're like software developers,
they have access to these more cyber-offensive capabilities.
But if they're a random user, they don't.
If they're random faceless user, they don't.
And likewise, for real,
virology. Once there's consensus, once the capabilities are so high that there's consensus
about it being expert level in virology, I think that would be a very natural place to be
saying, having an international norm, not saying a treaty, because it takes forever to write and ratify,
but to a norm against open weights if they are expert level of varologists. For the same reasons
that we had the Biological Weapons Convention. Russia and or the Soviet Union and the U.S. got
together for the biological weapons convention. The U.S. and China did as well. We also
coordinated on chemical weapons with the Chemical Weapons Convention and the Nuclear Non-Proliferation
Treaty. States find it in their incentive to work together to make sure that rogue actors
do not have extremely hazardous, potentially catastrophic capabilities like chem, bio, and nuclear
inputs. So I think something similar might be reasonable for AI when they get at that
capability threshold. Dan, I am at once kind of reassured that people are thinking about this
stuff, but also more freaked out than I was when we sat down. But I do appreciate you coming in
and giving us the full rundown of what to be concerned about and what maybe not to be as concerned
about as we think about where AI is moving next. So thank you so much for coming on the show.
Yep, yep, thank you for having me. This has been fun.
Super fun. If people want to learn more about your work or get in touch, how do they do that?
I guess this paper or strategy you've been speaking about is at national security.a.i
and I'm also on Twitter or X or whatever it's called.
You should know you work with you.
At X.com.
They're at X.com slash Dan Hendricks.
It would be another way of following the goings-ons as a situation evolves.
We'll keep trying to put out work and seeing what's going on with these risks.
and if we come with technical interventions to make him less,
then we'll also put that out too.
So, yeah, that's where you can find me.
Well, Godspeed, Dan, and we'll have to have you back.
Thanks again.
All right, everybody, thank you for listening,
and we'll see you next time on Big Technology Podcast.