Dwarkesh Podcast - Ilya Sutskever – We're moving from the age of scaling to the age of research
Episode Date: November 25, 2025Ilya & I discuss SSI’s strategy, the problems with pre-training, how to improve the generalization of AI models, and how to ensure AGI goes well.Watch on YouTube; read the transcript.Sponsors* Gemin...i 3 is the first model I’ve used that can find connections I haven’t anticipated. I recently wrote a blog post on RL’s information efficiency, and Gemini 3 helped me think it all through. It also generated the relevant charts and ran toy ML experiments for me with zero bugs. Try Gemini 3 today at gemini.google* Labelbox helped me create a tool to transcribe our episodes! I’ve struggled with transcription in the past because I don’t just want verbatim transcripts, I want transcripts reworded to read like essays. Labelbox helped me generate the exact data I needed for this. If you want to learn how Labelbox can help you (or if you want to try out the transcriber tool yourself), go to labelbox.com/dwarkesh* Sardine is an AI risk management platform that brings together thousands of device, behavior, and identity signals to help you assess a user’s risk of fraud & abuse. Sardine also offers a suite of agents to automate investigations so that as fraudsters use AI to scale their attacks, you can use AI to scale your defenses. Learn more at sardine.ai/dwarkeshTo sponsor a future episode, visit dwarkesh.com/advertise.Timestamps(00:00:00) – Explaining model jaggedness(00:09:39) - Emotions and value functions(00:18:49) – What are we scaling?(00:25:13) – Why humans generalize better than models(00:35:45) – SSI’s plan to straight-shot superintelligence(00:46:47) – SSI’s model will learn from deployment(00:55:07) – How to think about powerful AGIs(01:18:13) – “We are squarely an age of research company”(01:30:26) – Self-play and multi-agent(01:32:42) – Research taste Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Transcript
Discussion (0)
You know, it's crazy that all of this is real.
Yeah, meaning what?
Don't you think so?
Meaning what?
Like all this AI stuff and all this Bay Area, yeah, that it's happened.
Like, isn't it straight out of science fiction?
Yeah.
Another thing that's crazy is like how normal the slow takeoff feels.
The idea that we'd be investing one percent of GDP in AI, like I feel like it felt like a bigger deal.
You know, where right now it just feels like...
And we get used to things pretty fast, turns out, yeah.
but also it's kind of like it's abstract like what does it mean what it means that you see it in
the news yeah that such and such company announced such and such dollar amount right that's that's
all you see right it's not really felt in any other way so far yeah should we actually begin here
I think this is an interesting discussion sure I think your point about well from the average person's
point of view nothing is that different will continue being true even into the singularity no I don't
think so. Okay. Interesting. So the thing which I was referring to not feeling different is,
okay, so such and such company announced some difficult to comprehend dollar amount of investment.
Right. I don't think anyone knows what to do with that. Yeah. But I think that the impact of AI is
going to be felt. AI is going to be diffused through the economy. There are very strong economic
forces for this. And I think the impact is going to be felt very strongly. When do you expect that
impact? I think the models seem smarter than their economic impact would imply. Yeah. This is
one of the very confusing things about the models right now. How to reconcile the fact that they are
doing so well on e-vals. And you look at the evals and you go, those are pretty.
hard evils. Right. They're doing so well, but the economic impact seems to be dramatically
behind. And it's almost like, it's very difficult to make sense of how can the model, on
the one hand, do these amazing things. And then on the other hand, like, repeat itself twice
in some situation, in a kind of an example would be, let's say you use vibe coding to do something
and you go to some place
and then you get a bug
and then you tell the model
can you please fix the bug
and the model says
oh my God you're so right
I have a bug let me go fix that
and it reduces a second bug
and then you tell it
you have this new
the second bug
and it tells you oh my God
how could have done it
you're so right again
and brings back the first bug
and you can alternate between those
and it's like how is that possible
it's like
I'm not sure
but it does suggest
that something strange is going on.
I have two possible explanations.
So here, this is the more kind of a whimsical explanation,
is that maybe REL training makes the models
a little bit too single-minded and narrowly focused,
a little bit too, I don't know, unaware,
even though it also makes them aware in some other ways.
And because of this, they can't do basic things.
But there is another explanation,
which is back when people were doing pre-training,
the question of what data to train on was answered
because that answer was everything.
When you do pre-training, you need all the data.
So you don't have to think,
is it going to be this data or that data?
But when people do REL training,
they do need to think.
They say, okay, we want to have this kind of RL training
for this thing and that kind of ARL training for that thing. And from what I hear, all the companies
have teams that just produce new ARL environments and just added to the training mix. And then the
question is, well, what are those? There are so many degrees of freedom. There is such a huge variety
of ARL environments you could produce. And one of the one thing you could do, and I think that's
something that is done inadvertently, is that people take inspiration from the e-vow.
you say, hey, I would love our model to do really well when we release it.
I want the evils to look great.
What would be RL training that could help on this task, right?
I think that is something that happens,
and I think it could explain a lot of what's going on.
If you combine this with generalization of the models actually being inadequate,
that has the potential to explain a lot of what we are seeing,
this disconnect between Eval performance,
an actual real-world performance,
which is something that we don't today
exactly even understand what we mean by that.
I like this idea that the real reward hacking
is the human researchers who are too focused on the evils.
I think there's two ways to understand
or to try to think about what you have just pointed out.
One is, look, if it's the case that
simply by becoming superhumanetic coding competition,
a model will not automatically become more tasteful
and exercise better judgment about how to improve your code base.
Well, then you should expand the suite of environments
such that you're not just testing it on having the best performance in a coding competition.
It should also be able to make the best kind of application for X thing or Y thing or Z thing.
And another, maybe this is what you're hinting at,
is to say, why should it be the case in the first place
that becoming superhuman at coding competitions
doesn't make you a more tasteful program or more generally.
Maybe the thing to do is not to keep stacking up
the amount of environments and the diversity of environments
to figure out an approach would let you learn from one environment
and improve your performance on something else.
So I have a human analogy, which might be helpful.
So even the case, let's take the case of competitive programming
since you mentioned that.
And suppose you have two students.
One of them
work decided they want to be
the best competitive programmer, so they will
practice 10,000 hours
for that domain.
They will solve all the problems,
memorize all the proof techniques, and be very,
you know,
be very skilled
at quickly and correctly implementing
all the algorithms, and by doing
so they became the best,
one of the best. Student
number two thought, oh, competitive
programming is cool. Maybe they practiced for a hundred hours, much, much less, and they also did really
well. Which one do you think is going to do better in their career later on? The second. Right? And I think
that's basically what's going on. The models are much more like the first student, but even more,
because then we say, okay, so the model should be good with competitive programming. So let's get
every single competitive programming problem ever. And then let's do some data augmentation. So we have
even more competitive programming problems. Yes. And we train on that. And so now,
now you've got this great competitive programmer. And with this analogy, I think it's more intuitive.
I think it's more intuitive with this analogy that, yeah, okay, so if it's so well trained,
okay, it's like all the different algorithms and all the different proof techniques are like right at its fingertips.
And it's more intuitive that with this level of preparation, it would not necessarily generalize to other things.
But then what is the analogy for what the second student is doing before they do the 100 hours
fine-tuning. I think it's like they have it. I think it's the eat factor. Yeah. Right. And like I know like
when I was an undergrad, I remember there was there was a student like this that studied with me. So I know I know
it exists. Yeah. I think it's interesting to distinguish it from whatever pre-training does. So
one way to understand what you just said about we don't have to choose the data in pre-training is to say
actually it's not dissimilar to the 10,000 hours of practice. It's just that you get that 10,000
hours of practice for free because it's already somewhere in the pre-training distribution.
But it's like maybe you're suggesting actually there's actually not that much generalization
pre-training. There's just so much data in free training. But it's like it's not necessarily
generalizing better than RL.
Like the main strength of pre-training is that there is A, so much of it.
And B, you don't have to think hard about what data to put into pre-training.
And it's a very kind of natural data. And it does.
including it a lot of what people do, people's thoughts, and a lot of the features of, you know,
it's like the whole world as projected by people onto text.
Yeah.
And pre-training tries to capture that using a huge amount of data.
It's very, the pre-training is very difficult to reason about because it's so hard to
understand the manner in which the model relies on pre-training.
data. And whenever the model makes a mistake, could it be because something by chance is not
as supported by the pre-training data? You know, support by pre-training is maybe a loose term.
I don't know if I can add anything more useful on this, but I don't think there is a human
analog to pre-training. Here's analogies that people have proposed for what the human analogy
to pre-training is, and I'm curious to get your thoughts on why they're particular.
potentially wrong. One is to think about the first 18 or 15 or 13 years of a person's life
when they aren't necessarily economically productive, but they are doing something that is
making them understand the world better and so forth. And the other is to think about evolution
as doing some kind of search for three billion years, which then results in a human lifetime
instance. And then I'm curious if you think either of these are actually analogous to pre-training or
how would you think about at least what lifetime human learning is like, if not pre-training?
I think there are some similarities between both of these to pre-training and pre-training
tries to play the role of both of these. But I think there are some big differences as well.
The amount of pre-training data is very, very staggering. Yes. And somehow,
a human being after
even 15 years with
the tiny fraction of that pre-training data
they know much less
but whatever they do know they know much more deeply
somehow and the mistakes
like already at
that age you would not make mistakes
that RIAs make. There is
another thing you might say could it be something
like evolution and the answer is maybe
but in this case I think evolution
might actually have an edge
like there is this I remember
reading about this
case where some, you know that one thing that neuroscientists do, or rather one way in which
neuroscientists can learn about the brain, is by studying people with brain damage to different
parts of the brain. And some people have the most strange symptoms you could imagine. It's actually
really, really interesting. And there was one case that comes to mind that's relevant. I read about
this person who had some kind of brain damage that took out, I think,
a stroke or an accident that took out his emotional processing.
So he stopped feeling any emotion.
And as a result of that, he still remained very articulate
and he could solve little puzzles and on tests he seemed to be just fine.
But he felt no emotion.
He didn't feel sad.
He didn't feel angry.
He didn't feel animated.
And he became somehow extremely bad at making any decisions at all.
it would take him hours to decide on which socks to wear
and him would make very bad financial decisions
and that's very
what does it say about
the role of our built in emotions
in making us like a viable agent essentially
and I guess to connect to your question about pre-training
it's like maybe preach like maybe if you are good enough
that like getting everything out of pre-training
you could get that as well
but that's the kind of thing which seems
well it may not be possible
to get that from pre-training
what is that
clearly not just directly emotion
it seems like some
almost value function like thing
which is giving telling you which decision to be
like what the end reward for any decision
should be and you think that doesn't sort of implicitly come from I think it could I'm just
saying it's not one it's not 100% obvious yeah but what is that like what do you think about
emotions and what is the ML analogy for emotions it should be some kind of a value function thing
yeah but I don't think there is a great ML analogy because right now value functions don't play
very prominent role in the things people do it might be worth defining for the audience what a value
function is if you want to do that I mean
certainly I'll be very happy to do that right so so when people do reinforcement learning
the very reinforcement learning is done right now how does it how do people train those
agents so you have a neural net and you give it a problem and then you tell the model
go solve it and the model takes maybe thousands hundreds of thousands of actions
or thoughts or something and then it produces a solution the solution is created
and then the score is used to provide a training signal for every single action in your trajectory.
So that means that if you are doing something that goes for a long time,
if you're training a task that takes a long time to solve,
you will do no learning at all until you came up with the proposed solution.
That's how reinforcement learning is done naively.
That's how 01, R1 ostensibly are done.
the value function says something like okay look maybe I could sometimes not always
could tell you if you are doing well or badly the notion of a value function is more
useful in some domains than others so for example when you play chess and you lose a piece
you know I messed up you don't need to play the whole game to know that what I just
did was bad and therefore whatever um whatever preceded it was also bad so the value
lets you short-circuit the weight until the very end.
Like, let's suppose that you started to pursue some kind of,
okay, let's suppose that you are doing some kind of a math thing or a programming thing,
and you're trying to explore a particular solution direction.
And after, let's say, after a thousand steps of thinking,
you concluded that this direction is unpromising.
As soon as you conclude this,
you could already get a reward signal
a thousand times steps previously
when you decided to pursue down this path
you say, oh, next time
I shouldn't pursue this path
in a similar situation
long before you actually
came up with the proposed solution.
This was in the Deep Sikar 1 paper
is that the space of trajectories
is so wide
that maybe it's hard to learn
a mapping from an intermediate trajectory
and value
and also given that,
you know, encoding, for example,
you'll have the wrong idea, then you'll go back, then you'll change something.
This sounds like such lack of faith in deep learning.
Like, I mean, sure, it might be difficult, but nothing deep learning can't do.
Yeah.
So my expectation is that, like, value function should be useful, and I fully expect that they will
be used in the future, if not already.
What was I alluding to with the person whose emotional center got?
damaged is more that maybe what it suggests is that the value function of humans is modulated
by emotions in some important way that's hard-coded by evolution and maybe that is important
for people to be effective in the world that's the thing I was actually going to planning and
asking you there's something really interesting about emotions of the value function which is
that it's impressive that they have this much utility
while still being rather simple to understand.
So I have two responses.
I do agree that compared to
the kind of things that we learn
and the things we are talking about,
the kind of as we are talking about,
emotions are relatively simple.
They might even be so simple
that maybe you could map them out
in a human understandable way.
I think it would be cool to do.
In terms of utility, though,
I think there is a thing where,
you know, there is this complexity, robustness, straight-off,
where complex things can be very useful,
but simple things are very useful
in a very broad range of situations.
And so I think what we've got one way to interpret
what we are seeing is that we've got these emotions
that essentially evolved mostly from our mammal ancestors
and then fine-tuned a little bit while we were hominids, just a bit.
We do have a decent amount of social emotions, though, which mammals may lack.
But they're not very sophisticated.
And because they're not sophisticated, they serve us so well
in this very different world compared to the one that we've been living in.
Actually, they also make mistakes.
For example, our emotions, well, I don't know, does hunger count as an emotion?
motion. It's debatable, but I think, for example, our intuitive feeling of hunger is not succeeding
in guiding us correctly in this world with an abundance of food. Yeah. People have been talking
about scaling data, scaling parameter, scaling compute. Is there a more general way to think
about scaling? What are the other scaling axes? So the thing is, the thing is, the thing
So here is a perspective.
Here's a perspective that I think might be true.
So the way ML used to work is that people would just tinker with stuff and try to get interesting results.
That's what's been going on in the past.
Then the scaling insight arrived, right?
Scaling laws.
GPT3
and suddenly everyone realized
we should scale
and it's just
this is an example
of how language affects thought
scaling is just one word
but it's such a powerful word
because it informs people what to do
they say okay let's try to scale things
and so you say okay so what are we scaling
and pre-training was a thing to scale
it was a particular scaling recipe
the big breakthrough of pre-training
is the realization that this recipe is good
so you say hey
if you mix some compute
with some data into a neural net
of a certain size
you will get results
and you will know that it will be better
if you just scale the recipe up
and this is also great companies love this
because it gives you a very
low risk way
of investing your resources
Right? It's much harder to invest your resources in research. Compare that. You know, if you research, you need to have like go forth researchers and research and come up with something versus get more data, get more compute, you know, it'll get something from pre-training. And indeed, you know, it looks like I, based on various things, people say on Twitter, maybe it appears that Gemini have found a way to get more out of pre-training.
At some point, though, pre-training will run out of data.
The data is very clearly finite.
And so then, okay, what do you do next?
Either you do some kind of a souped-up pre-training,
different recipe from the one we've done before,
or you're doing URL, or maybe something else.
But now that compute is big,
computer is now very big.
In some sense, we are back to the age of research.
So maybe here's another way to put it.
Up until 2020, from 2012 to 2020,
it was the age of research.
Now, from 2020 to 2025, it was the age of scaling,
or maybe plus minus, let's add the aerobars to those years.
Because people say, this is amazing, you've got to scale more,
keep scaling, the one word, scaling.
But now the scale is so big.
Like, is the belief really that, oh, it's so big,
but if you had 100x more, everything would be so different.
Like, it would be different, for sure.
But, like, is the belief that if you just 100x the scale,
everything would be transformed
I don't think that's true
so it's back to the age of research again
just with big computers
that's a very interesting way to put it
but let me ask you the question you just posed then
what are we scaling and what
would it mean to have a recipe
because I guess I'm not aware
of a very clean
relationship that almost looks like a law of physics
which existed in pre-training
there was a power law between
data or computer parameters
and loss, what is the kind of relationship we should be seeking and how should we think about
what this new recipe might look like?
So we've already witnessed a transition from one type of scaling to a different type of
scaling, from pre-training to RL.
Now people are scaling RL, now based on what people say on Twitter, they spend more
compute on REL than on pre-training at this point
because REL can actually
consume quite a bit of compute.
You know, you do very, very long
rollouts. Yes.
So it takes a lot of compute to produce those rollouts.
And then you get relatively small amount of
learning puller rollouts, so you really can spend
a lot of compute.
And I could imagine
like I wouldn't, at this
it's more like
I wouldn't even call it a scale
scaling. I would say, hey, like
what are you doing? And is
the thing you are doing the most productive thing you could be doing.
Can you find a more productive way of using your compute?
We've discussed the value function business earlier,
and maybe once people get good at value functions,
they will be using their resources more productively.
And if you find a whole other way of training models,
you could say, is this scaling or is it just using your resources?
I think it becomes a little bit ambiguous
in a sense that when people were in the age of research
back then it was like people say
hey let's try this and this and this
and this. Let's try that and that. Oh look
something interesting is happening
and I think there will be a return to that.
So if we're back in the era of research
stepping back what is the part of the recipe
that we need to think most about
when you say value function
people are already trying the current recipe
but then having LLM as a judge and so forth
you can say that's a value function
but it sounds like you have something much more fundamental in mind.
Do we need to go back to, should we even rethink pre-training at all
and not just add more steps to the end of that process?
Yeah.
So the discussion about value function, I think it was interesting.
I want to emphasize that I think the value function is something like,
it's going to make our real more efficient.
And I think that makes a difference.
but I think that anything you can do
with the value function
you can do without just more slowly
the thing which I think is the most fundamental
is that these models somehow just generalize
dramatically worse than people
and it's super obvious
that seems like a very fundamental thing
okay so this is the crux generalization
and there's two
sub-questions
there's one which is about
sample efficiency, which is, why should it take so much more data for these models to learn than
humans? There's a second about, even separate from the amount of data it takes, there's a question
of why is it so hard to teach the thing we want to a model than to a human, which is to say,
to a human, we don't necessarily need a verifiable reward to be able to, you're probably mentoring
a bunch of researchers right now, and you're talking with them, you're showing them your code,
and you're showing them how you think. And from that, they're picking.
up your way of thinking and how they should do research, you don't have to set like a verifiable
reward for them that's like, okay, this is the next part of your curriculum, and now this is the
next part of your curriculum, and oh, this training was unstable, and we got to, there's not this
this schleppy bespoke process. So perhaps these two issues are actually related in some way,
but I'd be curious to explore this, this second language was more like continual learning,
and this first thing, which feels just like sample efficiency.
Yeah. So, you know, you could actually wonder, one of the,
One possible explanation for the human sample efficiency that needs to be considered is evolution.
And evolution has given us a small amount of the most useful information possible.
And for things like vision, hearing, and locomotion, I think there is a pretty strong case that evolution actually has given us a lot.
so for example
human dexterity
far exceeds
I mean
robots can become dexterous too
if you subject them to
like a huge amount of training in simulation
but to train a robot in the real world
to quickly like pick up a new skill
like a person does
seems very out of reach
and here you could say oh yeah
like locomotion
all our ancestors
needed great locomotion
squirrels like
so locomotion
maybe like we've got like some
unbelievable prior. You could make the same case. For vision, you know, I believe Jan Lecon
made the point, oh, like children learn to drive after 16 hours, after 10 hours of practice,
which is true. But our vision is so good. At least for me, when I remember myself being five-year-old,
I was very excited about cars back then. And I'm pretty sure my car recognition was
more than addict but self-driving already as a five-year-old. You don't get to see that much
data is a five-year-old. You spend most of your time in your parents' house. So you have very
low data diversity. But you could say maybe that's evolution too. But then language and math
encoding, probably not. It still seems better than models. I mean, obviously models are
better than the average human at language and math encoding. But are they better at the average
human at learning? Oh yeah. Oh, yeah. Absolutely. What I meant to say is that language math
encoding and especially math encoding
suggests that whatever it is that makes people
good at learning
is probably not so much a complicated prior
but something more, some fundamental thing.
Wait, I'm not sure I understood. Why should that be the case?
So consider a skill
that people exhibit some kind of great reliability
or, you know,
if the skill is one that was very
useful to our ancestors for many millions of years, hundreds of millions of years. You could say,
you could argue that maybe humans are good at it because of evolution, because we have a prior,
an evolutionary prior that's encoded in some very non-obvious way that somehow makes us so good
at it. But if people exhibit great ability, reliability, robustness, ability to be able to,
to learn in a domain that really did not exist until recently, then this is more an indication
that people might have just better machine learning period.
But then how should we think about what that is?
Is it a matter of, yeah, what is the ML analogy for what?
There's a couple interesting things about it.
It takes fewer samples.
It's more unsupervised.
You don't have to set a very, like a child learning to drive a car.
A teenager learning to drive a car.
A teenager learning how to drive a car is like not exactly getting some pre-built
verifiable reward.
It comes from their interaction with the machine and with the environment.
And yeah, it takes much of your samples.
It seems more unsupervised.
It seems more robust.
Much more robust.
The robustness of people is really staggering.
Yeah. So is it like, okay, and do you have a unified way of thinking about why are all these things happening at once? What is the ML analogy that would, that could be, it could realize something like this?
So, so this is where, you know, one of the things that you've been asking about is how can, you know, the teenage driver kind of self-correct and learn from their experience without an external teacher? And the answer is well, they have their value function. Right? They have a state.
general sense, which is also, by the way, extremely robust in people. Like, whatever it is,
the human value function, whatever the human value function is, with a few exceptions around
addiction, it's actually very, very robust. And so for something like a teenager that's
learning to drive, they start to drive, and they already have a sense of how they're driving
immediately, how badly they're unconfident, and then they see, okay, and then they see, okay,
And then, of course, the learning speed of any teenager is so fast, after 10 hours, you're good to go.
Yeah.
It seems like humans have some solution, but I'm curious about, like, well, how are they doing it?
And, like, why is it so hard to, like, how do we need to reconceptualize the way we're training models to make something like this possible?
You know, that is a great question to ask.
And it's a question I have a lot of opinions about.
But, unfortunately, we live in a world where not all machine.
known in ideas I discussed freely and this is one of them. So there's probably a way to do it.
I think it can be done. The fact that people are like that, I think it's a proof that it can be
done. There may be another blocker though, which is there is a possibility that the human neurons
actually do more compute than we think. And if that is true and if that plays an important role,
and things might be more difficult.
But regardless, I do think it points to the existence
of some machine learning principle
that I have opinions on,
but unfortunately, circumstances make it hard to discuss in detail.
Nobody listens to this podcast, Ilya.
Yeah.
So I have to say that prepping for Ilya was pretty tough
because neither I nor anybody else
had any idea what he's working on
and what SSI is trying to do.
I had no basis to come up with my questions.
And the only thing I could go off, honestly,
was trying to think from first principles
about what are the bottlenecks to AGI?
Because clearly, Ilya is working on them in some way.
Part of this question involved thinking about RL scaling
because everybody's asking how well RL will generalize
and how we can make it generalize better.
As part of this, I was reading this paper
that came out recently on RL scaling,
and it showed that actually the learning cover on RL looks like a sigmoid.
I found this very curious. Why should it be a sigmoid?
Where it learns very little for a long time, and then it quickly learns a lot, and then it
asymptotes. This is very different from the power law you see in pre-training, where the model
learns a bunch at the very beginning, and then less and less over time. And it actually
reminded me of a note that I had written down after I had a conversation to the researcher friend,
where he pointed out that the number of samples that you need to take in order to find a
correct answer scales exponentially with how different your current probability distribution is from
the target probability distribution. And I was thinking about how these two ideas are related. I had
this vague idea that they should be connected, but I really didn't know how. I don't have a math
background, so I couldn't really formalize it. But I wondered if Gemini 3 could help me out here.
And so I took a picture of my notebook and I took the paper and I put them both in the context
of Gemini 3 and I asked it to find the connection. And it thought a bunch. And then it realized that
the correct way to model the information you gain from a single yes or no outcome in
RL is as the entropy of a random binary variable.
It made a graph which showed how the bits you gain for a sample in RL versus
supervised learning scale as a pass rate increases.
And as soon as I saw the graph that Gemini 3 made, immediately a ton of things started
making sense to me.
Then I wanted to see if there was any empirical basis to this theory.
So I asked Gemini to code an experiment to show whether
the improvement in loss scales in this way with pass rate.
I just took the code that Gemini outputted.
I copy-pasted it into a Google Coalab notebook.
And I was able to run this toy ML experiment
and visualize its results without a single bug.
It's interesting because the results look similar
but not identical to what we should have expected.
And so I downloaded this chart and I put it into Gemini
and I asked it, what is it going on here?
It came up with a hypothesis that I think is actually correct,
which is that we're capping how much supervised learning
can improve in the beginning by having a fixed learning rate,
and in fact, we should decrease the learning rate over time.
It actually gives us an intuitive understanding
for why in practice we have learning rate schedulers
that decrease the learning rate over time.
I did this entire flow from coming up with this vague initial question
to building a theoretical understanding,
to running some toy ML experiments, all with Gemini 3.
This feels like the first model where it can actually come up
with new connections that I wouldn't have anticipated.
It's actually now become the default place I go to when I want to brainstorm new ways to think about a problem.
If you want to read more about RL scaling, you can check out the blog post that I wrote with a little help from Gemini 3.
And if you want to check out Gemini 3 yourself, go to gemini.com.
I am curious, if you say we are back in an era of research, you were there from 2012 to 2020.
And do you have, yeah, what is now the vibe going to be if we go back to the era?
of research. For example, even after AlexNet, the amount of compute that was used to run
experiments kept increasing and the size of frontier systems kept increasing. And do you think
now that this era of research will still require tremendous amounts of compute? Do you think
it will require going back into the archives and reading old papers? What is, maybe what was the vibe
of like you were at Google and Open AI and Stanford
at these places when there was like more of a vibe of research
what kind of thing should we be expecting in the community?
So one consequence of the age of scaling
is that there was this scaling
scaling sucked out all the air in the room.
And so because scaling sucked out all the air in the room
everyone started to do the same thing we got to the point where we are in a world where there are
more companies than ideas by quite a bit actually on that you know there is the silicon valley
saying that says that ideas are cheap execution is everything and people say that a lot yeah
and there is truth to that but then i saw i saw someone say on twitter um
something like, if ideas are so cheap, how come no one's having any ideas?
And I think it's true, too. I think, like, if you think about research progress in terms
of bottlenecks, there are several bottlenecks. If you go back to the, and one of them is
ideas, and one of them is your ability to bring them to life, which might be compute, but also
engineering. So if you go back to the 90s, let's say, you had people who have had, had
pretty good ideas. And if they had much larger computers, maybe they could demonstrate that
their ideas were viable, but they could not. So they could only have very, very small
demonstration and did not convince anyone. Yeah. So the bottleneck was compute. Then in the age
of scaling, computers increased a lot. And of course, there is a question of how much computer is
needed, but compute is large. So compute is large enough such that
it's like not obvious that you need that much more compute to prove some idea.
Like, I'll give you an analogy.
AlexNet was built on two GPUs.
That was the total amount of compute use for it.
The transformer was built on 8 to 64 GPUs.
No single transformer paper experiment used more than 64 GPUs of 2017,
which would be like what, two GPUs of today?
So the Resnet, right, many, like, even the, you could argue that the, like, 01 reasoning was not the most compute heavy thing in the world.
So they're definitely, for research, you need, like, definitely some amount of compute, but it's far from obvious that you need the absolutely largest amount of compute ever for research.
You might argue, and I think it is true,
that if you want to build the absolutely best system,
if you want to build the absolutely best system,
then it helps to have much more compute,
and especially if everyone is within the same paradigm,
then compute becomes one of the big differentiators.
Yeah, I guess while it was possible to develop these ideas,
I'm asking you for the history because you were actually there.
I'm not sure what actually happened,
but it sounds like it was possible to develop these ideas
using minimal amounts of compute.
But it wasn't, the transformer didn't immediately become famous.
It became the thing everybody started doing
and then started experimenting on top of
and building on top of because it was validated
at higher and higher levels of compute.
Correct.
And if you at SSI have 50 different ideas,
how will you know which one is the next transformer
and which one is, you know, brittle
without having the kinds of compute
that other frontier labs have?
So I can comment on that, which is, the short comment is that, you know, you mentioned SSI.
Specifically for us, the amount of compute that SSI has for research is really not that small.
And I want to explain why, like a simple math can explain why the amount of compute that we have
is actually a lot more comparable for research than one might think.
now explain
so
SSI has raised
$3 billion
which is like
not small
but it's like a lot
by any absolute sense
but you could say
but look at the other companies
raising much more
but a lot of what their
a lot of their compute goes for inference
like these big numbers
these big loans
it's earmarked for inference
that's number one
number two
you need if you want to have a product
on which you do inference you need to have a big
staff of engineers
of salespeople
a lot of the research needs to be dedicated
for producing all kinds of
product related features
so then when you look at what's actually left for
research the difference
becomes a lot smaller
now the other thing
is that if you are doing something
different
do you really need the absolute maximum
scale to prove it, I don't think it's true at all. I think that in our case, we have
sufficient compute to prove to convince ourselves and anyone else that what we're doing is
correct. There's been public estimates that, you know, companies like Open AI spend on the
order of five, six billion dollars a year, even just so far on experiments. This is separate
from the amount of money they're sending on inference and so forth. So it seems like they're spending
more a year of running research experiments than you guys have in total funding?
I think it's a question of what you do with it. It's a question of what you do with it.
Like they have a, like, is the more, I think in their case, in the case of others, I think
there's a lot more demand on the training compute. There's a lot more different work streams.
There are different modalities. There is just more stuff. And so it becomes fragmented.
How will SSI make money?
you know my answer to this question is something like we just for right now we just focus on the
research and then the answer to that question will reveal itself i think there will be lots of
possible answers is exercise plans still to straight shot superintelligence maybe i think that
there is merit to it i think there's a lot of merit because i think that it's very nice to not be
affected by the day-to-day market competition. But I think there are two reasons that may cause us to
change the plan. One is pragmatic if timelines turned out to be long, which they might. And second,
I think there is a lot of value in the best and most powerful AI being out there impacting the world.
I think this is a meaningfully valuable thing.
But then, so why is your default plan to straight-shot superintelligence?
Because it sounds like, you know, open AI, anthropic, all these other companies,
their explicit thinking is, look, we have weaker and weaker intelligences that the public can get used to and prepare for.
And why is it potentially better to build a superintelligence directly?
So I'll make the case for and against.
Yeah.
The case four is that you are, so one of the challenges.
that people face when they're in the market
is that they have to participate in the rat race
and the rat race is quite difficult
in that it exposes you to difficult trade-offs
which you need to make
and there is, it is nice to say
we'll insulate ourselves from all this
and just focus on the research
and come out only when we are ready and not before
but the counterpoint is valid too
and those are opposing
forces. The counterpoint is, hey, it is useful for the world to see powerful AI. It is useful for
the world to see powerful AI, because that's the only way you can communicate it. Well, I guess not
even just that you can communicate the idea, but... Communicate the AI. Not the idea. Communicate the
AI. What do you mean communicate the AI? So, okay, so let's suppose you read an essay about AI.
Yeah. And the essay says, AI is going to be this and AI is going to be that and it's going to be this.
And you read it and you say, okay, this is an interesting essay.
Now suppose you see an AI doing this, an AI doing that, it is incomparable.
Like, basically, I think that there is a big benefit from AI being in the public.
And that would be a reason for us to not be quite straight shot.
Yeah.
Well, I guess it's not even that, but I do think that is an important part of it.
The other big thing is I can't think of another discipline in human engineering and research
where the end artifact was made safer mostly through just thinking about how to make it safe
as opposed to why are airplane crashes per mile so much lower today than there were decades
ago? Why is it so much harder to find a bug in Linux than it would have been decades ago?
And I think it's mostly because these systems were deployed to the world, you noticed failure,
those failures were corrected
and the systems became more robust.
And I'm not sure why
AGI and superhuman intelligence
would be any different, especially given
and I hope we can talk
we're going to get to this.
It seems like the harms of superintelligence
are not just about like having some malevolent
paperclip are out there,
but it's just like, this is a really powerful thing
and we don't even know how to conceptualize
how people will interact with it, what people will do with it.
And having gradual access to it seems like
a better way to maybe spread out the impact of it and to help people prepare for it.
Well, I think on this point, even in the straight shot scenario, you would still do a gradual
release of it. It's how I would imagine it. The gradualism would be an inherent component of any
plan. It's just a question of what is the first thing that you get out of the door. That's number one.
Number two, I also think, you know, I believe you have advocated for continual learning more than other people.
And I actually think that this is an important and correct thing.
And here is why.
So one of the things, so I'll give you another example of how thinking, how language affects thinking.
And in this case, this will be two words, two words that have shaped everyone's thinking.
I maintain.
First word AGI.
Second word, pre-training.
Let me explain.
So the term AGI,
why does this term exist?
It's a very particular term.
Why does it exist?
There's a reason.
The reason that the term AGI exists
is, in my opinion,
not so much because it's like a very
important essential descriptor
of some end-state
of intelligence, but because it is a reaction to a different term that existed, and the term is
narrow AI. If you go back to ancient history of gameplay AI, of checkers AI, chess AI,
computer games AI, everyone would say, look at this narrow intelligence. Sure, the chess AI can
beat Kasparov, but it can't do anything else. It is so narrow, artificial narrow intelligence.
in response as a reaction to this
some people said
well this is not good
it is so narrow
what we need is general AI
generally I
an AI that can just do all the things
the second
and that term
just got a lot of traction
the second thing that got a lot of traction
is pre-training
specifically the recipe of pre-training
I think the current the way
people do RL now is maybe
undoing the
conceptual imprint of pre-training
but pre-training had the property
you do more pre-training
and the model gets better at everything
more or less uniformly
general AI
pre-training gives AGI
but
the thing that happened
with AGI and pre-training
is that in some sense they overshot the
target because by the kind, if you think about the term AGI, you will realize, and especially
in the context of pre-training, you will realize that a human being is not an AGI.
Because a human being, yes, there is definitely a foundation of skills, a human being,
a human being lacks a huge amount of knowledge. Instead, we rely on continual learning. We rely on
continual learning. And so then when you think about, okay, so let's suppose that we achieve
success and we produce some kind of safe super intelligence. The question is, but how do you
define it? Where on the curve of continual learning is going to be? I will produce like a super
intelligent 15 year old that's very eager to go and you say, okay, I'm going to, they don't
know very much at all. The great student, very eager. You go and be a programmer. You go and be a
doctor. Go and learn. So you could imagine that the deployment itself will involve some kind
of a learning trial and error period. It's a process as opposed to you drop the finished thing.
Okay. I see. So you're suggesting that the thing you're pointing out with super
intelligence is not some finished mind which knows how to do every single job in the economy.
Because the way, say, the original, I think, Open AI Charter or whatever defines AGI is like, it can do every single job that every single thing a human can do.
You're proposing instead a mind which can learn to do every single job.
Yes.
And that is super intelligence.
And then, but once you have the learning algorithm, it gets deployed into the world the same way a human laborer might join an organization.
And it seems like one of these two things might happen.
Maybe neither of these happens.
One, this super efficient learning algorithm becomes superhuman, becomes as good as you
and potentially even better at the task of ML research.
And as a result, the algorithm itself becomes more and more superhuman.
The other is, even if that doesn't happen, if you have a single model, I mean, this is explicitly
your vision.
If you have a single model where instances of a model, which are deployed,
through the economy, doing different jobs, learning how to do those jobs, continually learning on the job,
picking up all the skills that any human could pick up, but actually picking them all up at the same time
and then amalgamating the learnings. You basically have a model which functionally becomes super
intelligent even without any sort of recursive self-improve in software, right? Because you now
have one model that can do every single job in the economy, and humans can't merge our minds in the
same way. And so do you expect some sort of like intelligence explosion from broad
deployment? I think that it is likely that we will have rapid economic growth. I think the
broad deployment, like there are two arguments you could make which are conflicting. One is that,
look, if indeed you get, once indeed you get to a point where you have an AI,
that can learn to do things quickly, and you have many of them, then there will be a strong
force to deploy them in the economy unless there will be some kind of a regulation that
stops it, which by the way they might be. But I think the idea of very rapid economic growth
for some time, I think it's very possible from broad deployment. The other question is how rapid
it's going to be.
So I think this is hard to know
because on the one hand,
you have this very efficient worker.
On the other hand, the world is just really big
and there's a lot of stuff
and that stuff moves at a different speed.
But then on the other hand, now the AI could
could, you know, so I think very rapid
economic growth is possible. And we will see
like all kinds of things like
different countries with different rules
and the ones which have the friendlier rules
the economic growth will be faster.
Hard to predict.
Some people in our audience like to read the transcripts
instead of listening to the episode.
And so we put a ton of effort into making the transcripts
read like they are standalone essays.
The problem is that if you just transcribe
a conversation verbatim using a speech-to-text model,
it'll be full of all kinds of fits and starts
and confusing phrasing.
We mentioned this problem to Labelbox,
and they asked if they could take a staff.
Working with them on this is probably the reason
that I'm most excited to recommend
label box to people. It wasn't just, oh, hey, tell us what kind of data you need and we'll go get it.
They walked us through the entire process, from helping us identify what kind of data we needed
in the first place to assembling a team of expert aligners to generate it. Even after we got all
the data back, label box stayed involved. They helped us choose the right base model and set up
auto QA on the model's output so that we could tweak and refine it. And now we have a new
transcriber tool that we can use for all our episodes moving forward. This is just one example
of how Labelbox meets their customers at the ideas level
and partners with them through their entire journey.
If you want to learn more,
or if you want to try out the transcriber tool yourself,
go to labelbox.com slash thwartcash.
It seems to me that this is a very precarious situation to be in,
where look in the limit, we know that this should be possible
because if you have something that is as good as a human at learning,
but which can merge its brains,
merge their different instances
in a way that humans can't merge.
Already, this seems like a thing
that should physically be possible.
Humans are possible.
Digital computers are possible.
You just need both of those combined
to produce this thing.
And it also seems like this kind of thing
is extremely powerful
and economic growth is one way to put it.
I mean, Dyson Spears is a lot of economic growth.
But another way to put it is just like
you will have potentially a very short period of time.
Because a human on the job can, you know,
you're hiring people to SSI in six months
they're like net productive probably, right?
A human like learns really fast.
And so this thing is becoming smarter and smarter very fast.
What is, how do you think about making that go well?
And why is SSI positioned to do that well?
Where does SSI plan there basically is what I'm trying to ask?
Yeah.
So one of the ways in which my thinking has been changing
is that I now place more importance on AI being deployed incrementally and in advance.
One very difficult thing about AI is that we are talking about systems that don't yet exist.
And it's hard to imagine them.
I think that one of the things that's happening,
is that in practice, it's very hard to feel the AGI.
It's very hard to feel the AGI.
We can talk about it,
but it's like talking about, like, the long future,
like imagine, like having a conversation about, like,
how is it like to be old when you're, like, old and frail,
and you can have a conversation, you can try to imagine it,
but it's just hard, and you come back to reality,
where that's not the case.
And I think that a lot of the issues around AGI and its future power stem from the fact that it's very difficult to imagine.
Future AI is going to be different.
It's going to be powerful.
Indeed, the whole problem, what is the problem of AI and AGI?
The whole problem is the power.
The whole problem is the power.
When the power is really big, what's going to happen?
And one of the ways in which I've changed my mind over the past year,
and so that change of mind may back, may, I'll say, I'll hedge a little bit,
may back propagate into the plans of our company,
is that, so if it's hard to imagine,
what do you do?
You've got to be showing the thing.
you gotta be showing the thing.
And I maintain that, I think most people
who work on AI also can't imagine it.
Because it's too different from what people see
on a day-to-day basis.
I do maintain, here's something which I predict will happen,
that's a prediction.
I maintain that as AI becomes more powerful
than people will change their behaviors.
And we will see all kinds of unprecedented things which are not happening right now.
And I'll give some examples.
I do like, I think for better or worse, the frontier companies will play a very important role in what happens, as will the government.
And the kind of things that I think we'll see, which you see the beginnings of companies that are fierce competitors starting to collaborate on AI.
safety. You may have seen open AI and anthropic doing a first small step, but that did not exist.
That's actually something which I predicted in one of my talks about three years ago. That's such a
thing will happen. I also maintain that as AI continues to become more powerful, more visibly
powerful, there will also be a desire from governments and the public to do something.
and I think that this is a very important force
of showing the AI.
That's number one.
Number two, okay, so then the AI is being built.
What needs to be done?
So one thing that I maintain that will happen
is that right now, people who are working on AI,
I maintain that the AI doesn't feel powerful
because of its mistakes.
I do think that at some point
the AI will start to feel powerful, actually.
And I think when that happens, we will see a big change in the way all AI companies approach safety.
They'll become much more paranoid.
I say this as a prediction that we will see happen.
We'll see if I'm right.
But I think this is something that will happen because they will see the AI becoming more powerful.
Everything that's happening right now I maintain is because people look at today's AI,
and it's hard to imagine the future AI.
And there is a third thing which needs to happen.
And I think this is this, and I'm talking about it in broader terms, not just from the
perspective of SSI, because you ask me about our company.
But the question is, okay, so then what should the companies aspire to build?
Yeah.
What should they aspire to build?
And there has been one big idea that actually everyone, that everyone has been locked in,
locked into, which is the self-improving AI.
And why did it happen?
because there is fewer ideas than companies.
But I maintain that there is something that's better to build.
And I think that everyone will actually want that.
It's like the AI that's robustly aligned to care about sentient life specifically.
I think in particular it will be,
there's a case to be made that it will be easier to build an AI that cares about sentient life
than an AI that cares about human life alone.
because the AI itself will be sentient.
And if you think about things like mirror neurons
and human empathy for animals,
which is, you know, you might argue it's not big enough,
but it exists.
I think it's an emergent property
from the fact that we model others
with the same circuit that we use to model ourselves
because that's the most efficient thing to do.
So even if you got an AI to hear about sentient beings,
and it's not actually clear to me
that that's what you should try to do if you solve the alignment. It would still be the case
that most sentient beings will be AIs. There will be trillions, eventually quadrillions of AIs.
Humans will be a very small fraction of sentient beings. So it's not clear to me if the goal
is some kind of human control over this future civilization, that this is the best criterion.
It's true. I think that it's possible.
it's not the best criterion. I'll say two things. I think that thing number one, I think that
if there, so I think that care for sentient life, I think there is merit to it. I think it should
be considered. I think that it will be helpful if there was some kind of a shortlist of ideas
that then the companies, when they are in this situation, could use. That's not.
Number two, number three, I think it would be really materially helpful if the power of the most powerful superintelligence was somehow capped, because it would address a lot of these concerns.
The question of how to do it, I'm not sure, but I think that would be materially helpful when you're talking about really, really powerful systems.
Before we continue the alignment discussion, I want to double click on that.
How much room is there at the top?
How do you think about superintelligence?
Do you think, I mean, using this learning efficiency idea, maybe it's just extremely fast at learning new skills or new knowledge.
And does it just have a bigger pool of strategies?
Is there a single cohesive it in the center that's more powerful or bigger?
And if so, do you imagine that this will be sort of godlike in comparison to the rest of human civilization or does it just feel like another agent or another cluster of agents?
So this is an area where different people have different intuitions.
I think it will be very powerful for sure.
I think that what I think is most likely to happen
is that there will be multiple such AIs being created roughly at the same time.
I think that if the cluster is big enough,
like if the cluster is literally continent-sized,
that thing would be really powerful indeed.
If you literally have a continent-sized cluster,
those AIs can be very powerful.
And all I can tell you
is that if you're talking about extremely powerful AI
is like truly dramatically powerful,
then yeah, it would be nice if they could be restrained in some ways
or if there was some kind of an agreement or something.
Because I think that if you are saying,
hey, like if you really, like what?
What is the concern of superintelligence?
What is one way to explain the concern?
If you imagine a system that is sufficiently powerful,
like really sufficiently powerful,
and you could say, okay, you need to do something sensible,
like care for sentient life, let's say,
in a very single-minded way,
we might not like the results.
That's really what it is.
And so maybe, by the way, the answer is that you do not build a single,
you do not build an REL agent in the usual sense.
and actually I'll point several things out
I think human beings are a semi-arral agent
you know we pursue a reward
and then the emotions or whatever
make us tire out of the reward
we pursue a different reward
the market is like
kind it's like a very short-sighted
kind of agent evolution is the same
evolution is very intelligent in some ways
but very dumb in other ways
the government has been designed
to be a never-ending fight between three parts
which has an effect
so I think things like this
another thing that makes this discussion difficult
is that we are talking about systems that don't exist
that we don't know how to build
right that's the other thing
and that's actually my belief
I think what people are doing right now
we'll go some distance
and then peter out
it will continue to improve but it will also not be it
so the it we don't know how to build
and I think that a lot hinges
on understanding reliable generalization.
And I'll say another thing, which is like,
you know, one of the things that you could say
is that cause alignment to be difficult is that human value,
that it's, your ability to learn human values is fragile,
then your ability to optimize them is fragile.
You actually learn to optimize them.
And then can't you say, are these not all instances
of unreliable generalization?
why is it that human beings appear to generalize so much better?
What generalization was much better?
What would happen in this case?
What would be the effect?
But those we can't, we can't, like those questions are right now still answerable.
How does one think about what AI going well looks like?
Because I think you've scoped out how AI might evolve will have these sort of
continual learning agents.
AI will be very powerful.
Maybe there will be many different AI's.
How do you think about lots of.
continent compute size intelligences going around. How dangerous is that? How do we make that
less dangerous? And how do we do that in a way that protects a equilibrium where there might be
misaligned AIs out there and bad actors out there? So one reason why I liked the AI that cares for
sentient life, you know, and we can debate on whether it's good or bad. But if the first
N of these dramatic systems actually do care for, you know, love humanity or something,
you know, care for sentient life, obviously this also needs to be achieved. This needs to be
achieved. So if this is achieved by the first N of those systems, then I can see it go
at least for quite some time.
And then there is the question of what happens in the long run.
What happens in the long run?
How do you achieve a long run equilibrium?
And I think that there, there is an answer as well.
And I don't like this answer, but it needs to be considered.
In the long run, you might say, okay, so if you have a world where powerfully eyes exist,
in the short term you could say, okay, you have, you need to be considered.
okay you have universal high income you have universal high income and we all doing well but we know
that what do the buddhist say change is the only constant and so things change and there is some
kind of government political structure thing and it changes because these things have a shelf life
you know some new government thing comes up and it functions and then after some time it stops
functioning that's something that we see happening all the time and so I think that
for the long run equilibrium, one approach, you could say, okay, so maybe every person will have
an AI that will do their bidding, and that's good. And if that could be maintained indefinitely,
that's true. But the downside with that is, okay, so then the AI goes and, like, earns, you know,
earns money for the person and, you know, advocates for their needs in, like, the political
sphere and maybe then writes a little report saying, okay, here's what I've done, here's the situation,
and the person says, great, keep it up. But the person is no longer a participant. And then you can say
that's a precarious place to be in. But, so I'm going to preface by saying, I don't like this
solution, but it is a solution. And the solution is if people become part AI with some kind
a neuralink plus plus because what will happen as a result is that now the AI understands something
and we understand it too like because now the understanding is transmitted wholesale so now if the
AI is in some situation now it's like you are involved in that situation yourself fully and I think
this is the answer to the equilibrium I wonder if uh the fact that emotions which were
developed millions or in many cases billions of years ago in a totally different environment
are still guiding our actions so strongly is an example of alignment success to maybe spell out
what I mean the brain stem has these I don't know if it's more accurate to call it a value
function or reward function but the brain stem has a directive where it's saying mate with somebody
who's more successful. The cortex is the part that understands, what does success mean in the modern
context? But the brainstem is able to align the cortex and say, however you recognize success
to be, and I'm not smart enough to understand what that is, you're still going to pursue this
directive. I think there is, so I think there's a more general point. I think it's actually
really mysterious how the brain encodes high level desires, sorry, how evolution encodes high level
desires. Like, it's pretty easy to understand how evolution would endow us with the desire for
food that smells good, because smell is a chemical, and so just pursue that chemical. It's very
easy to imagine such an evolution doing such a thing. But evolution also has endowed us with
all these social desires. Like, we really care about being seen positively by society. We care
about being in a good standing.
Like, all these social intuitions that we have,
I feel strongly that they have baked in.
And I don't know how evolution did it.
Because it's a high-level concept that's represented in the brain.
Like, what people think, like, let's say you are like,
you care about some social thing.
It's not like a low-level signal like smell.
It's not something that, for which there is a sense.
sensor. Like the brain needs to do a lot of processing to piece together lots of bits of information
to understand what's going on socially. And somehow evolution said, that's what you should care
about. How did it do it? And he did it quickly too. Because I think all these sophisticated social
things that we care about, I think they evolved pretty recently. So evolution had an easy time
hard-coding this high-level desire. And I maintain, or at least I'll say, I'm unaware
of good hypothesis for how it's done. I had some ideas that was kicking around, but none of them,
none of them are satisfying. Yeah. And what's especially impressive is if it was a desire that you
learned in your lifetime, it kind of makes sense because your brain is intelligent. It makes
sense why we're able to learn intelligent desires. But your point is that the desire is,
maybe this is not your point, but one way to understand it is, the desire is built into the genome. And the
genome is not intelligent, right? But it's able to, you're somehow able to describe this feature
that requires, like it's not even clear how you define that feature and you can get it into
the, you can build it into the genes. Yeah, essentially. Or maybe I'll put it differently. If you think
about the tools that are available to the genome, it says, okay, here's a recipe for building
a brain. And you could say, here is a recipe for connecting the dopamine neurons to like the smell
sensor. Yeah. And if the smell is a certain kind of, you know, good smell, you want to eat that.
I could imagine the genome doing that.
I'm claiming that it is harder to imagine.
It's harder to imagine the genome saying
you should care about some complicated computation
that your entire brain, that's like a big chunk of your brain does.
That's all I'm claiming.
I can tell you like a speculation.
I was wondering how it could be done.
And let me offer a speculation
and I'll explain why the speculation is probably false.
So the speculation is, okay.
So the brain
it's like
the brain has those regions
you know the brain regions
we have our cortex
right
it has all those brain regions
and the cortex is uniform
but the brain regions
and the neurons in the cortex
they kind of speak to their neighbors mostly
and that explains why you get brain regions
because if you want to do some kind of speech
processing all the neurons
that do speech need to talk to each other
and because neurons can only speak to their nearby
neighbors for the most part
it has to be a region
all the regions are most
located in the same place from person to person.
So maybe evolution hard-coded
literally a location on the brain.
So it says, oh, like, when like, you know,
the GPS of the brain,
GPS coordinates such and such,
when that fires, that's what you should care about.
Like, maybe that's what evolution did,
because that would be within the toolkit of evolution.
Yeah, although there are examples where,
for example, people who are born blind
have that area of their cortex
adopted by another,
sense, and I have no idea, but I'd be surprised if the desires or the reward functions
which require visual signal no longer worked, you know, people who have their different areas
of their cortex co-opt-it. For example, if you no longer have vision, can you still feel the
sense that I want people around me to like me and so forth, which usually there's also visual cues
So I actually fully agree with that.
I think there's an even stronger counter argument of this theory, which is, like, if you think
about people, so there are people who get half of their brains removed in childhood.
And they still have all their brain regions, but they all somehow move to just one hemisphere,
which suggests that the brain regions, the location is not fixed.
And so that theory is not true.
It would have been cool if it was true, but it's not.
And so I think that's a mystery, but it's an interesting mystery.
Like, the fact is, somehow evolution was able to endow us to care about social stuff very, very reliably.
And even people who have, like, all kinds of strange mental conditions and deficiencies and emotional problems tend to care about this also.
AI tools like defakes, voice clones, and agents have dramatically increased the sophistication of fraud and abuse.
So it's more important than ever to actually understand the identity and intent of who,
or whatever is using your platform.
That's exactly what Sardine helps you do.
Sardine brings together thousands of device, behavior, and identity signals to help you assess risk.
Everything from how a user types or moves their mouse or holds their device to whether they're
hiding their true location behind a VPN to whether they're injecting a fake camera feed
during K-YC selfie checks.
Sardine combines these signals with insights from their network of almost 4 billion devices.
things like a user's history of fraud or their associations with other high-risk accounts,
so you can spot bad actors before they do damage.
This would literally be impossible if you only use data from your own application.
Sardine doesn't stop a detection.
They offer a suite of agents to streamline onboarding checks and automate investigations.
So as fraudsters use AI to scale their attacks, you can use AI to scale your defenses.
Go to sardine.a.ai slash Thwarkesh to learn more and download their guide on AI fraud detection.
What is SSI planning on doing differently?
So presumably your plan is to be one of the frontier companies when this time arrives.
And then what is, presumably you started SSI because you're like,
I think I have a way of approaching how to do this safely in a way that the other companies don't.
What is that difference?
So the way I would describe it as there are some ideas that I think are promising,
and I want to investigate them and see if they are indeed promising or not.
It's really that simple.
It's an attempt.
I think that if the ideas turn out to be correct,
these ideas that we discussed around understanding generalization,
if these ideas turn out to be correct,
then I think we will have something worthy.
will they turn out to be correct
we are doing research
we are squarely age of research company
we are making progress
we've actually made quite good progress of the past year
but we need to keep making more progress
more research
and that's how I see it
I see it as an attempt
to be
an attempt to be a voice
and a participant
people have asked
your co-founder and previous CEO
left to go to META recently
and people have asked
well if there was a lot of breakthroughs being made
that seems like a thing that should have been unlikely
I wonder how you respond
Yeah so for this I will simply
remind a few facts that may have been forgotten
and I think these facts which provide the context
I think they explain the situation
So the context was that we were
fundraising at a 32 billion valuation
and then Meta came in and offered to acquire us.
And I said no, but my former co-founder, like in some sense, said yes.
And as a result, he also was able to enjoy from a lot of near-term liquidity.
And he was the only person from SSI to join Meta.
It sounds like SSI's plan is to be a company that is at the frontier when you get to this
very important period in human history
where you have superhuman intelligence
and you have these ideas about how to make
superhuman intelligence go well
but other companies will be trying their own ideas
what distinguishes SSI's approach
to making superintelligence go well?
The main thing that distinguishes SSI
is its technical approach
so we have a different technical approach
that I think is worthy
and we are pursuing it.
I maintain that in the end
there will be a convergence of strategies.
So I think there will be a convergence of strategies
where at some point
as AI becomes more powerful,
it's going to become more or less clearer
to everyone what the strategy should be.
And it should be something like,
yeah, you need to find some way to talk to each other
and you want your first
actual like real super intelligent AI to be aligned and somehow be
you know care for sentient life care for people
democratic one of those some combination of thereof and I think
this is the condition that everyone should strive for
and that's what the society is striving for and I think that
this time, if not already,
all the other companies will realize that they're striving
towards the same thing. And we'll see,
I think that the world will truly change as they
becomes more powerful. And I think
a lot of these forecasts will,
like, I think things will be
really different, and people will be acting
really differently. What, speaking of
a forecast, what are your forecast to
this system you're describing, which can
learn as well as a human?
And
subsequently, as a result, becomes superhuman.
I think like 5 to 20
5 to 20 years
So I just want to unroll your
How you might see the world coming
It's like we have a couple more years
Where these other companies are continuing the current approach
And it stalls out
And stalls out here meaning they earn no more than
Low hundreds of billions in revenue
Or how do you think about what styling out means?
Yeah
I think it could I think it could stall out
and I think stolen out will look like
it will all look very similar
among all the different companies
something like this. I'm not sure because I think
I think even with I think even
I think even I think even we stolen out I think these companies could make
a stupendous
stupendous revenue. Maybe not profits because
they will be it will be they will need to work hard to
differentiate each other from themselves but revenue
definitely but there's something in your model
implies that when the correct solution does emerge, there will be convergence between all the
companies. And I'm curious why you think that's the case. Well, I was talking more about
convergence on their largest strategies. I think eventual convergence on the technical approach
is probably going to happen as well. But I was alluding to convergence to the largest
strategies. What exactly is the thing that should be done? I just want to better understand
how you see the future on rolling. So currently we have these different companies and you expect
their approach to continue generating revenue, but not get to this human-like learner.
Yes.
So now we have these different forks of companies.
We have you, we have thinking machines, there's a bunch of other labs.
Yes.
And maybe one of them figures out the correct approach.
But then the release of their product makes it clear to other people how to do this thing.
I think it won't be clear how to do it thing, but it will be clear that something different
is possible.
Right.
And that is information.
And I think people will then be trying to figure out how that's,
how that works.
I do think, though, that one of the things that I think, you know, not addressed here,
not discussed is that with each increase in the AI's capabilities, I think there will
be some kind of changes, but I don't know exactly which ones in how things are being done.
And so, like, I think it's going to be important, yet I can't spell out what that is exactly.
And how are the
By default
you would expect the model company that has
the model company that has that model
to be getting all these gains
because they have the model
that is learning how to do all
has the skills and knowledge
that it's building up in the world
what is the reason to think
that the benefits of that
would be widely distributed
and not just end up at
whatever model company
gets this continuous learning
loop going first
like I think that
empirically what happened
so here is what I think
is going to happen
number one i think empirically when
let's let's look at
let's look at how things have gone so far with
the ayes of the past so one company produced an advance
and the other company scrambled
and produced some similar things after some amount of time
and they started to compete in the market and
push the prices down
and so I think from the market perspective
I think something similar will happen there as well
even if someone's okay we're talking about the good world
by the way where
what's the good world
what's the good world
where we have these
powerful human-like learners
that are also like
and by the way maybe there's another thing
we haven't discussed on the
on the spec of the super intelligent
AI that I think is
worth considering is that you make it narrow
can be useful and narrow at the same time
so you can have lots of narrow super intelligent AIs
but suppose you have many of them
and you have some company that's
producing a lot of profits from it
and then you have another company that comes in
and starts to compete and the way the competition
is going to work is through specialization
I think what's going to happen is that
the way
competition
loves specialization
and you see it in the market
you see it in evolution as well
so you're going to have lots of different niches
and you're going to have lots of different companies
who are occupying different niches
in this kind of world
we might say yeah like one
AI company is really quite a bit better
at some area of really complicated economic activity
and a different company is better at another area
and the third company is really good at litigation
and that's the way is this contradicted by what human
like learning implies is that like it can learn
it can but but you have accumulated learning
you have a big investment
you spent a lot of compute to become
really really really good
really phenomenal at this thing
and someone else spent a huge amount of
computer and a huge amount of experience to get really really good
at some other thing right you apply a lot of human
learning to get there but now like you are you are
at this high point
where someone else
would say, look, like, I don't want to start learning what you've learned to go through this.
That would require many different companies to begin at the human, like,
continual learning agent at the same time so that they can start their different research
in different branches. But if one company, you know, gets that agent first or gets that learner
first, it does then seem like, well, you know, like, if you just think about every single
job in the economy, you just have instance learning each one seems tractable for a company.
Yeah, that's a valid argument. My strong intuition is that it's not how it's going to go.
My strong intuition is that, yeah, like the argument says it will go this way. Yeah.
But my strong intuition is that it will not go this way. That this is the, you know, in theory,
there is no difference between theory in practice and practice there is. And I think that's going to be one of those.
a lot of people's models of recursive self-improvement
literally explicitly state we will have
a million ilias in a server
that are coming in with different ideas
and this will lead to a superintelligence emerging very fast
do you have some intuition about how parallelizable
the thing you are doing is
what are the gains from
making copies of ilia
I don't know
I think
I think there'll definitely be
there'll be diminishing returns because you want
You want people who think differently rather than the same.
I think that if they were literal copies of me,
I'm not sure how much more incremental value you'd get.
I think that, but people who think differently, that's what you want.
Why is it that it's been, if you look at different models,
even released by totally different companies,
trained on potentially non-overlapping datasets,
it's actually crazy how similar LLMs are to each other?
Maybe the datasets are not as non-overlapping as it seems.
But there's some sense that it's like, even if an individual human might be less productive
than the future AI, maybe there's something to the fact that human teams have more diversity
than teams of AIs might have.
But how do we elicit meaningful diversity among AI?
So I think just raising the temperature just results in gibberish.
I think you want something more like different scientists of different prejudices or different ideas.
How do you get that kind of diversity among AI agents?
So the reason there has been no diversity, I believe, is because.
of pre-training.
All the pre-trained models are the same, pretty much, because they pre-trained on the same data.
Now, R.L and post-training is where some differentiation starts to emerge because different
people come up with different RL training.
Yeah.
And then I've heard you hint in the past about self-play as a way to either get data or match
agents to other agents of equivalent intelligence to kick off learning.
how should we think about why there's no public proposals of this kind of thinking working with LLMs?
I would say there are two things to say.
I would say that the reason why I thought self-playful is interesting
is because it offered a way to create models using compute only without data, right?
And if you think that data is the ultimate bottleneck, then using compute only is very interesting.
So that's what makes it interesting.
Now, the thing is that self-play,
at least the way it was done in the past,
when you have agents which are somehow compete with each other,
it's only good for developing a certain set of skills.
It is too narrow.
It's only good for like negotiation, conflict,
certain social skills, strategizing that kind of stuff.
And so if you care about those skills, then self-play will be useful.
Now, actually, I think that self-play did find a home, but just in a different form, in a different form.
So things like debate, prove a verifier, you have some kind of an LLM as a judge, which is also incentivized to find mistakes in your work.
You could say this is not exactly self-play, but this is, you know, a related adversarial setup that people are doing, I believe.
and really self-lay is an example of
is a special case of more general
competition between agents.
The response, the natural response
to competition is to try to be different.
And so if you were to put multiple agents
and you tell them, you know, you all need to
work on some problem, and you are an agent
and you're inspecting what everyone else is working,
you're going to say, well, if they're already
taking this approach, it's not clear
I should pursue it. I should pursue something differentiated.
And so I think that something like this could also create an incentive for a diversity of approaches.
Final question.
What is research taste?
You're obviously the person in the world who is considered to have the best taste in doing research in AI.
You were the co-author on many of the biggest, the biggest things that have happened in the history of deep learning.
from Alex Net to GPT3 to so on.
What is it that, how do you characterize
how you come up with these ideas?
I can answer.
So I can comment on this for myself.
I think different people do it differently.
But one thing that guides me personally
is an aesthetic
of how AI should be
by thinking about how people are.
But thinking correctly.
Like, it's very easy to think about how people are incorrectly.
But what does it mean to think about people correctly?
So I'll give you some examples.
The idea of the artificial neuron is directly inspired by the brain.
And it's a great idea.
Why?
Because you say, sure, the brain has all these different organs.
It has the folds.
But the falls probably don't matter.
Why do we think that the neurons matter?
Because there is many of them.
It kind of feels right.
So you want the neuron.
Yeah.
you want some kind of local learning rule that will change the connections you want some
local learning rule that will change the connections between the neurons right it feels plausible
that the brain does it the idea of the distributed representation the idea that the brain
you know the brain responds to experience or neural net should learn from experience not
response the brain learns from experience the neural net should learn from experience
and you kind of ask yourself is something fundamental or not fundamental how things should be
Yeah.
And I think that's been guiding me a fair bit, kind of thinking from multiple angles and looking
for almost beauty, beauty, simplicity, ugliness, there's no room for ugliness.
It's just beauty, simplicity, elegance, correct inspiration from the brain.
And all of those things need to be present at the same time.
And the more they are present, the more confident you can be in a top-down belief.
And then the top-down belief is the thing that sustains you when the experiments contradict you.
because if you just trust the data all the time
well sometimes you can be doing a correct thing but there's a bug
but you don't know that there is a bug how can you tell that there is a bug
how do you know if you should keep debugging or you conclude it's the wrong direction
well it's the top down well how should you can say the things have to be this way
something like this has to work therefore we got to keep going
that's the top down and it's based on this like multifaceted beauty
and inspiration by the brain all right we'll leave it there
Thank you so much. Thank you so much.
All right. Appreciate it.
That was great.
Yeah. I enjoyed it.
Yes, me too.
Hey, everybody. I hope you enjoyed that episode.
If you did, the most helpful thing you can do is just share it with other people who you think might enjoy it.
It's also helpful if you leave a rating or a comment on whatever platform you're listening on.
If you're interested in sponsoring the podcast, you can reach out at thwarcash.com slash advertise.
Otherwise, I'll see you on the next one.
