Dwarkesh Podcast - Adam Marblestone – AI is missing something fundamental about the brain
Episode Date: December 30, 2025Adam Marblestone has worked on brain-computer interfaces, quantum computing, formal mathematics, nanotech, and AI research. And he thinks AI is missing something fundamental about the brain.Why are hu...mans so much more sample efficient than AIs? How is the brain able to encode desires for things evolution has never seen before (and therefore could not have hard-wired into the genome)? What do human loss functions actually look like?Adam walks me through some potential answers to these questions as we discuss what human learning can tell us about the future of AI.Watch on YouTube; read the transcript.Sponsors* Gemini 3 Pro recently helped me run an experiment to test multi-agent scaling: basically, if you have a fixed budget of compute, what is the optimal way to split it up across agents? Gemini was my colleague throughout the process — honestly, I couldn’t have investigated this question without it. Try Gemini 3 Pro today gemini.google.com* Labelbox helps you train agents to do economically-valuable, real-world tasks. Labelbox’s network of subject-matter experts ensures you get hyper-realistic RL environments, and their custom tooling lets you generate the highest-quality training data possible from those environments. Learn more at labelbox.com/dwarkeshTo sponsor a future episode, visit dwarkesh.com/advertise.Timestamps(00:00:00) – The brain’s secret sauce is the reward functions, not the architecture(00:22:20) – Amortized inference and what the genome actually stores(00:42:42) – Model-based vs model-free RL in the brain(00:50:31) – Is biological hardware a limitation or an advantage?(01:03:59) – Why a map of the human brain is important(01:23:28) – What value will automating math have?(01:38:18) – Architecture of the brain Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Transcript
Discussion (0)
The big million dollar question that I have that I've been trying to get the answer to through all these interviews with the AI researchers, how does the brain do it?
Right?
Like, we're throwing way more data at these LLMs and they still have a small fraction of the total capabilities that a human does.
So what's going on?
Yeah.
I mean, this might be the quadrillion dollar question or something like that.
It's arguably, you can make an argument this is the most important, you know, question in science.
I don't claim to know the answer.
I also don't really think that the answer will necessarily come, even from a lot of smart people thinking about it as much as they are.
My overall, like, meta-level take is that we have to empower the field of neuroscience to just make neuroscience a more powerful field technologically and otherwise to actually be able to crack a question like this.
But maybe the way that we would think about this now with like modern AI, neural nets, deep learning, is that,
there are sort of these certain key components of that there's the
architecture there's maybe hyper parameters of the architecture how many layers
do you have or sort of properties of that architecture there is the learning
algorithm itself how do you train it you know back prop gradient descent is it
something else there is how is it initialized okay so if we take the learning part
of the system it still may have some initialization of the weights and then
there are also cost functions there's like what is it being
trained to do. What's the reward signal? What are the loss functions? Supervision signals.
My personal hunch within that framework is that the field has neglected the role of this
very specific loss functions, very specific cost functions. Machine learning tends to like
mathematically simple loss functions, predict the next token. You know, cross entropy. These simple kind of
computer scientist lost functions.
I think evolution may have built a lot of complexity into the loss functions.
Actually, many different loss functions were different areas, turned on at different stages
of development, a lot of Python code, basically, generating a specific curriculum for what
different parts of the brain need to learn.
Because evolution has seen many times what was successful and unsuccessful, and evolution
could encode the knowledge of the learning curriculum.
So in the machine learning framework, maybe we can come back and we can talk about,
where do the loss functions of the brain come from?
from, can that can loss, different loss functions lead to different efficiency of learning?
You know, people say, like, the cortex has got the universal human learning algorithm,
the special science that humans have.
What's up with that?
This is a huge question, and we don't know.
I've seen models where what the cortex, you know, the cortex has typically this like six-layered
structure layers in a slightly different sense than layers of a neural net.
It's like any one location in the cortex has six physical layers of tissue as you go in layers
of the sheet.
and then those areas then connect to each other,
and that's more like the layers of a network.
I've seen versions of that
where what you're trying to explain
is actually just how does it approximate back prop.
And what is the cost function for that?
What is the network being asked you to do?
If you sort of are trying to say it's something like backprop,
is it doing backprop on Next Token prediction,
or is it doing backprop on classifying images,
or what is it doing?
And no one knows.
But I think one thought about it, one possibility about it, is that it's just this incredibly general prediction engine.
So any one area of cortex is just trying to predict any, basically can it learn to predict any subset of all the variables it sees from any other subsets?
So like omnidirectional inference or omnidirectional prediction, whereas an LLM is just you see everything in the context window and then it, it's,
It computes a very particular conditional probability, which is, given all the last thousands of
things, what is the very probabilities for all the next token.
But it would be weird for a large language model to say, you know, the quick brown fox,
blank, blank, the lazy dog, and filling in the middle versus do the next token.
If it's doing just forward, it can learn how to do that stuff in this emergent level of
in context learning, but natively is just predicting the next token. What if the cortex is just
natively made so that any area of cortex can predict any pattern in any subset of its inputs
given any other missing subset? That is a little bit more like, quote unquote, probabilistic
AI. I think a lot of the things I'm saying, by the way, are extremely similar to like what
Jan Lacoon would say. He's really interested in these energy-based models and something like that is
like the joint distribution of all the variables,
what is the likelihood or unlikelyhood of just any combination of variables?
And if I clamp some of them, I say,
well, definitely these variables are in these states,
then I can compute with probabilistic sampling, for example,
I can compute, okay, conditioned on these being set in this state,
and these could be any arbitrary subset of variables in the model.
Can I predict what any other subset is,
going to do and sample from any other subset given clamping this subset. And they could choose
a totally different subset and sample from that subset. So it's omnidirectional inference. And so,
you know, that could be there's some parts of cortex that might be like association areas of cortex
that may, you know, predict vision from audition. There might be areas that predicts things that
the more innate part of the brain is going to do. Because remember, this whole thing is basically
riding on top of the sort of a lizard brain and lizard body, if you will. And that thing
is a thing that's worth predicting too. So you're not just predicting, do I see this or do I see that?
But is this muscle about to tense? Am I about to have a reflex where I laugh? You know, is my heart rate
about to go up? Am I about to activate this instinctive behavior? Based on my higher level understanding
of, like, I can match somebody has told me there's a spider on my back to this lizard part
that would activate if I was like literally seeing a spider in front of me. And I, you, you, you
learn to associate the two so that even just from somebody hearing you say there's a spider on your back
yeah let's well let's come back to this and this this is partly having to do with with steve burns
theories which i'm recently obsessed about but yeah but on your podcast with ilia um he said look i'm not
aware of any any good theory of how evolution encodes high level desires or intentions i think
this is like this is like very connected uh to to all of these questions about the loss
and the cost functions that the brain would use.
And it's a really profound question, right?
Like, let's say that I am embarrassed for saying the wrong thing on your podcast
because I'm imagining that Young Lacoon is listening.
He says, that's not my theory.
You described energy-based models really badly.
That's going to activate in me innate embarrassment and shame,
and I'm going to want to go hide and whatever.
And that's going to activate these innate reflexes.
And that's important because I might otherwise get killed by Jan Lecoon's marauding army of other.
The French AI researchers coming for you, Adam.
And so it's important that I have that instinctual response.
But of course, evolution has never seen Jan Lecun or known about energy-based models or known what an important scientist or a podcast is.
And so somehow the brain has to encode this desire to, you know, not piss off really important people in the tribe or something like this.
in a very robust way, without knowing in advance,
all the things that the learning subsystem of the brain,
the part that is learning, cortex and other parts,
the cortex is going to learn this world model
that's going to include things like Jan Lacoon and podcasts.
And evolution has to make sure that those neurons,
whatever the Jan Lacoon being upset with me neurons,
get properly wired up to the shame response
or this part of the reward function.
And this is important, right?
Because if we're going to be able to seek status in the tribe or learn from knowledgeable people, as you said, or things like that,
exchange knowledge and skills with friends, but not with enemies, I mean, we have to learn all this stuff.
So it has to be able to robustly wire these learned features of the world, learn parts of the world model up to these innate reward functions,
and then actually use that to then learn more, right?
Because next time I'm not going to try to piss off Young Lacoon if he emails me that I got this wrong.
And so we're going to do further learning based on that.
So in constructing the reward function, it has to use learned information.
But how can evolution, evolution didn't know about John Lecun, so how can it do that?
And so the basic idea that Steve Burns is proposing is that, well, part of the cortex
or other areas like the amygdala that learn, what they're doing is they're modeling
the steering subsystem.
The steering subsystem is the part with these more innate innately program responses and
the innate programming of these series of reward functions, cost functions, bootstrapping,
functions that exist. So there are parts of the amygmina, for example, that are able to monitor
what those parts do and predict what those parts do. So how do you find the neurons that are
important for social status? Well, you have some innate heuristics of social status, for example,
or you have some innate, innate heuristics of friendliness that the steering subsystem can use.
And the steering subsystem actually has its own sensory system, which is kind of crazy.
So we think of vision as being something that the cortex does.
But there's also a steering subsystem, subcortical visual system called the superior colliculus
with innate ability to detect faces, for example, or threats.
So there's a visual system that has innate heuristics, and that the steering subsystem has
his own responses.
So they'll be part of the amygdala or part of the cortex that is learning to predict those
responses. And so what are the neurons that matter in the cortex for social status or for
friendship? Or they're the ones that predicts those innate heuristics for friendship. So you train
a predictor in the cortex and you say which neurons are part of the predictor. Those are the ones
that are now you've actually managed to wire it up. Yeah. This is fascinating. I feel like I still
don't understand. I understand how the cortex could learn how this primitive part of the brain would
respond to
so it can obviously
it has these labels on
here's literally
a picture of a spider
and this is bad
like be scared of this
right
and then the cortex learns
that this is bad
because the innate part
tells it that
but then
it has to generalize to
okay the spider's on my back
yes
and somebody's telling me
the spiders on your back
that's also bad
yes
but it never got supervision
on that
right so how does it
well it's because
the learning
subsystem
um
is a powerful learning algorithm that does have generalization,
that is capable of generalization.
So the steering subsystem, these are the innate responses.
So you're going to have some, let's say, built into your steering subsystem,
these lower brain areas, hypothalamus, brainstem, et cetera.
And again, they include they have their own primitive sensory systems.
So there may be an innate response.
If I see something that's kind of moving fast toward my body that I didn't previously see was there,
is kind of small and dark and high contrast,
that might be an insect
kind of skittering onto my body,
I am going to, like, flinch, right?
And so there are these innate responses,
and so there's going to be some group of neurons,
let's say, in the hypothalamus,
that is that I am flinching.
Yeah.
Or I just flinched, right?
Right. I just flinched neurons in hypothalamus.
So when you flinch, first of all,
that's a negative contribution
with the reward function,
you didn't want that to happen, perhaps.
But that's only happened.
that's a reward function then that doesn't have any generalization in it.
So I'm going to avoid that exact situation of the thing skittering toward me.
And maybe I'm going to avoid some actions that lead to the thing skittering.
So that's something, a generalization you can get.
What Steve calls it is downstream of the reward function.
So I'm going to avoid the situation where the spider was skittering toward me.
But you're also going to do something else.
So there's going to be like a part of your amygdala essay that is saying, okay,
a few milliseconds, you know, a few milliseconds,
you know, hundreds of milliseconds or seconds earlier,
could I have predicted that flinching response?
It's going to be a group of neurons
that is essentially a classifier of,
am I about to flinch?
And I'm going to have classifiers for that
for every important steering subsystem variable
that evolution needs to take care of.
Am I about to flinch?
Am I talking to a friend?
Should I laugh now?
Is the friend high status?
Whatever variables
the hypothalamus brainstem contain,
am I about to taste salt?
So it's going to have all these variables.
And for each one, it's going to have a predictor.
It's going to train that predictor.
Now, the predictor that it trains, that can have some generalization.
And the reason it can have some generalizations
because it just has a totally different input.
So its input data might be things like the word spider.
But the word spider can activate in all sorts of situations
that lead to the word spider activating in your world model.
So, you know, if you have a complex world model,
which really complex features,
that inherently gives you some generalization.
It's not just the thing skittering toward me.
It's even the word spider or the concept of spider
is going to cause that to trigger.
And this predictor can learn that.
So whatever spider neurons are in my world model,
which could even be a book about spiders
or somewhere, a room where there are spiders
or whatever that is.
The amount of hee-jee-jeebies
that this conversation is eliciting in the audience is like.
So now I'm activating your steering subsystem.
Your steering subsystem spider,
a hypothalamus subgroup of neurons, of skittering insect, are activating based on these very
abstract concepts in the conversation.
If you can go, I'm not to put in a trigger warning.
That's because you learn this.
And the, because the cortex inherently has the ability to generalize because it's just predicting
based on these very abstract variables and all these integrated information that it has,
whereas the steering system only can use whatever the superior calculus and a few other sensors
can spit out.
So, by the way, it's remarkable that the person who's made this connection between different
pieces of neuroscience, Stephen Burns, like former
physicist, for the last few years,
has been trying to synthesize. He's an AI safety researcher.
He's just synthesizing. This comes back to the academic
incentives thing. I think that this is
a little bit hard to say, what is the exact next
experiment? How am I going to publish a paper on this?
How am I going to train my grad student to do? It's very
speculative. But there's a lot in the
neuroscience literature, and Stephen has been able to pull this together.
And I think that Steve has an answer to Elio's
question, essentially, which is, how does
the brain ultimately code for these
higher level desires and link them up
to the more primitive rewards? Yeah.
question, but why can't we achieve this omnidirectional inference by just training the model to
not just map from a token to next token, but remove the masks in the training, so it maps
every token to every token, or come up with more labels between video and audio and text
so that it's forced to map one to each one?
I mean, that may be the way. So it's not clear to me. Some people think that,
but there's sort of a different way
that it does probabilistic inference
or a different learning algorithm
that isn't back prop
that might be like other ways
of learning energy-based models
or other things like that that you can imagine
but that is involved in being able to do this
and that the brain has that.
But I think there's a version of it
where what the brain does
is like crappy versions of back-prop
to learn to predict
through a few layers
and that yeah, it's kind of like
a multimodal foundation model.
Yeah, so maybe the cortex
is just kind of like a certain kinds of foundation models.
LLMs are maybe just predicting the next token,
but vision models maybe are trained learning to fill in the blanks
or reconstruct different pieces or combinations.
But I think that it does it in an extremely flexible way.
So if you train a model to just fill in this blank at the center,
okay, that's great.
But if you didn't train it to fill in this other blank over to the left,
then it doesn't know how to do that.
It's not part of its repertoire of predictions
that are like immoritized into the network.
Whereas with a really powerful inference system, you could choose at test time, you know, what is the subset of variables it needs to infer and which ones are clamped.
Okay, two sub-questions.
One, it makes you wonder whether the thing that is lacking in artificial neural networks is less about the reward function and more about the encoder or the embedding, which, like, maybe the issue is that you're not representing.
video and audio and text in the right latent abstraction such that they could intermingle
and conflict. Maybe this is also related to why LLMC bad at drawing connections between different
ideas. Like, it's like, are the ideas represented at a level of generality at which you could
notice different connections? Well, the problem is these questions are all commingles. So if we don't know
if it's doing a back prop like learning and we don't know if it's doing energy-based models and we don't
know how these areas are even connected in the first place. It's like very hard to like
really get to the ground truth of this. But yeah, it's possible. I mean, I think that people
have done some work. My friend Joel Depello actually did something some years ago where I think
he put a model, I think it was a model of V1, of sort of specifically how the early visual cortex
represents images and put that as like an input into like a convet and that like improved
something. So it could be like differences. The retina is also doing, you know, motion detection
and certain things are kind of getting filtered out. So there may be some pre-processing of the
sensory data. There may be some clever combinations of which modalities are predicting which or so on
that that lead to better representation. There may be much more clever things than that. Some people
certainly do think that there's inductive biases built in the architecture that will shape
the representations, you know, differently or that there are clever things that you can do.
So Astera, which is the same organization that employs Steve Berenz, just launched this neuroscience project based on Doris So's work, and she has some ideas about how you can build vision systems that basically require less training.
They put in build into the assumptions of the design of the architecture that things like objects are bounded by surfaces and the surfaces have certain types of shapes and relationships of how they include each.
other and stuff like that. So it may be possible to build more assumptions into the network.
Evolution may have also put some changes of architecture. It's just, I think, that also the
cost functions and so on may be a key thing that it does. So Andy Jones is this amazing
2021 paper where he uses Alpha Zero to show that you can trade off test time compute and
training compute. And while that might seem obvious now, this was three years before people
were talking about inference scaling. So this got me thinking, is there an experience?
you could run today, even if it's a toy experiment, which would help you anticipate the next scaling paradigm.
One idea I had was to see if there was anything to multi-agent scaling.
Basically, if you have a fixed budget of training compute,
are you going to get the smartest agent by dumping all of it into training one single agent?
Or by sledding that compute up amongst a bunch of models, resulting in a diversity of strategies that get to play off each other.
I didn't know how to turn this question into a concrete experiment, though.
So I started brainstorming with Gemini 3 Pro in the Gemini.
Gemini helped me think through a bunch of different judgment calls.
For example, how do you turn the training loop from self-play to this kind of co-evolutionary
league training?
How do you initialize and then maintain diversity amongst different Alpha Zero agents?
How do you even split up the compute between these agents in the first place?
I found this clean implementation of AlphaGo Zero, which I then forked and opened up in
anti-gravity, which is Google's Asian First IDE.
The code was originally written in 2017, and it was meant to be trained on a
single GPU of that time, but I needed to train multiple whole separate populations of Alpha
Zero agents, so I needed to speed things up.
I rented a beefcake of a GPU node, but I needed to refactor the whole implementation
to take advantage of all this scale and parallelism.
Gemini suggested two different ways to parallelize self-play, one which would involve higher
GPU context switching, and the other would involve higher communication overhead.
I wasn't sure which one to pick, so I just asked Gemini.
Not only did it get both of them working in minutes, but it autonomously created and then ran a benchmark to see which one was best.
It would have taken me a week to implement either one of these options.
Think about how many judgment calls a software engineer working on an actually complex project test to make.
If they have to spend weeks architecting some optimization or feature before they can see whether it will work out,
they will just get to test out so many fewer ideas.
Anyways, with all this help from Gemini, I actually ran the experiment and got some results.
Now, please keep in mind that I'm running this experiment on an anemic budget of compute,
and it's very possible I made some mistakes in implementation.
But it looks like there can be gains from splitting up a fixed budget of trading compute
amongst multiple agents rather than just dumping it all into one.
Just to reiterate how surprising this is,
the best agent in the population of 16 is getting 1.16th the amount of trading compute
as the agent trained on self-play alone.
and yet it still outperforms the agent that is hogging all of the compute.
The whole process of vibe coding this experiment with Gemini was really absorbing and fun.
It gave me the chance to actually understand how Alpha Zero works
and to understand the design space around decisions about the hyperparameters
and how search is done and how you do this kind of co-evolutionary training
rather than getting bogged down in my very novice abilities as an engineer.
Go to gemini.google.com to try it out.
I want to talk about this idea that you just glanced off of, which was amortized inference.
And maybe I should try to explain what I think it means, because I think it's probably wrong, and this will help you correct.
It's been a few years for me, too.
Okay.
Right now, the way the models work is you have an input, it maps it to an output.
it. And this is amortizing a process that the real process, which we think is like what
intelligence is, which is like you have some prior over how the world could be. Like what are
the causes that make the world the way that it is? And then the way, when you see some
observation, you should be like, okay, here's all the ways the world could be. This cause
explains what's happening best. Now, like doing this calculation,
over every possible cause is computationally intractable.
So then you just have to sample like, oh, here's a potential cause.
Does this explain this observation?
No, forget it.
Let's keep sampling.
And then eventually you get the cause.
The cause, then the cause explains the observation.
And then this becomes your posterior.
That's actually pretty good, I think, of sort of, yeah.
Yeah, this Bayesian inference, like, in general, like of this very intractable thing.
Right.
The algorithms that we have for doing that tend to require taking.
a lot of samples, Monte Carlo methods, taking a lot of samples. Yeah. And taking samples takes
time. I mean, this is like the original like Baltimore machines and stuff. We're using
techniques like this. And still it's used with probabilistic programming, other types of
methods often. And so, yeah, so the Bayesian inference problem, which is like basically the problem
of like perception, like given some model of the world and given some data, like how should I update
my, what are the like the variables, you know, missing variables in my, in my internal
model and I guess the idea is that neural networks are hopefully um obviously there's mechanistically
the neural network is not starting with like here is my model of the world and I'm going to try
to explain this data but the hope is that instead of starting with um hey does this cause explain
this observation no did this cause explain this explanation yes what you do is just like observation
what's the most what's the cause that we the neural net thinks is the best one observation to cause
so the feed forward like goes observation to cause observation to cause to then the output that
yeah so you don't have to evaluate all these energy values or whatever and and sample around to
make them higher and lower um you just say um approximately that process would result in this being
the top one or something like that one way to think about it might be that test time compute
inference time compute is actually doing this sampling again because you literally read it's
chain of thought, it's like actually doing this toy example we're talking about, where it's like,
oh, can I solve this problem by doing X? Yeah. I need a different approach. And this raises
the question. I mean, over time, it is the case that the capabilities, which were, uh, which required
inference time compute to elicit get distilled into the model. So you're amortizing the thing,
which previously you needed to do these like rollouts, like Monte Carlo rollouts to, um,
to figure out. And so in general, there, maybe there's this principle of digital minds, which can
copied have different tradeoffs which are relevant than biological minds which cannot and so in general
it should make sense to amortize more things because you can literally copy the copy the amortization right
or copy the things that you have sort of like built in yeah um and it maybe this is a tangential
question where it might be interesting to speculate about in the future as these things become
more intelligent and the way we train them becomes more economically rational what will make
sense to amortize into these minds, which evolution did not think it was worth amortizing
into biological minds. You have to retrain every time. Right. I mean, first of all, I think the
probabilistic AI people would be like, of course, you need test time compute because this inference
problem is really hard. And the only ways we know how to do it involve lots of test time compute,
otherwise it's just a crappy approximation that's never going to like, you have to do infinite data
or something to make this. So I think some of the probabilistic people will be like, no, it's like
inherently probabilistic and like amortizing it in this way like just doesn't make sense.
And so, and they might then also point to the brain and say, okay, well, the brain, the neurons are
kind of stochastic and they're sampling and they're doing things. And so maybe the brain actually
is doing more like the non-amortized inference, the real inference. But it's also kind of strange
how perception can work in just like milliseconds or whatever. It doesn't seem like it used
that much sampling. So it's also clearly also doing some kind of baking things into like
approximate forward passes or something like that to do.
this and yeah so in the future you know i don't know i mean i think is it already a trend to some
degree that things that are people were having to use test time compute for are getting like
used to train back the the base model right yeah yeah that so now it can do it in one pass right yeah
so i mean i think yeah you know maybe evolution did or didn't do that uh i think evolution still has
to pass everything through the genome, right, to build the network. So, and the environment in
which humans are living is very dynamic, right? And so maybe that's, if we believe this
is true, that there's a learning subsystem per Steve Burns and a steering subsystem, the learning
subsystem doesn't have a lot of like pre-initialization or pre-training. It has a certain
architecture, but then within lifetime it learns. Then evolution didn't, you know, actually
like, immortalized that much into that network. Right. It immortalized it instead into a set of
innate behaviors in a set of these bootstrapping cost functions or ways of building up very
particular reward signals.
Yeah.
This framework helps explain this mystery that people have pointed out and I've asked a few
guests about, which is if you want to analogize evolution to pre-training, well, how do you
explain the fact that so little information is conveyed through the genome?
So three gigabytes is the size of the total human genome.
Obviously, a small fraction of that is actually relevant to coding at the brain.
And if previously people made this analogy that actually evolution has found the hyperparameters of the model, the numbers which tell you how many layers should there be, the architecture basically, right?
Like how should things be wired together?
But if a big part of the story that increases sample efficiency, aids learning, generally makes systems more performant, is the reward function, is the loss function.
Yeah.
And if evolution found those lost functions, which aid learning, then it actually kind of makes
sense how so you can build an intelligence with so little information because like the reward
function, you're like in Python, right?
The reward function is like literally a line.
And so you just like have like a thousand lines like this.
And that doesn't take up that much space.
Yes.
And it also gets to do this generalization thing with the thing I was describing where we were
talking with about the spider, right, of where it learns that just the word spider,
you know, triggers the spider, you know, reflex or.
whatever it gets to exploit that too right so it gets to build a reward function that
actually has a bunch of generalization in it just by specifying these innate spider stuff
and the thought assessors as Steve calls them that do the learning so that's like a
potentially a really compact solution to building up these more complex reward
functions too that you need so it doesn't have to anticipate everything about the
future of the reward function just to anticipate what variables are relevant what are
heuristics for like finding what those variables are and then yeah so then it has to
have like a very compact specification for like the learning algorithm and basic architecture of
the learning subsystem. And then it has to specify all this Python code of like all the stuff
about the spiders and all the stuff about friends and all the stuff about your mother and all
the stuff about meeting and social groups and joint eye contact. It has to specify all that
stuff. And so is this really true? And so I think that there is some evidence for it. So
So Faye Chen and Evan McCosco and various other researchers who have been doing like these single cell atlases.
So one of the things that neuroscience technology or some of scaling up neuroscience technology, again, this is kind of like one of my obsessions, has done through the brain initiative, a big neuroscience funding program.
They've basically gone through different areas, especially the mouse brain and mapped, like where are the different cell types?
how many different types of cells are there in different areas of cortex are they the same
across different areas and then you then you look at these subcortical regions which are more like
the like steering subsystem or reward function generating regions how many different types of cells
do they have and which neurons types do they have we don't know how they're all connected
and exactly what they do or what the circuits are what they mean but you can just like
quantify like how many different kinds of cells are there um with sequencing the RNA and there
there are a lot more weird and diverse and bespoke cell types in the steering subsystem,
basically, than there are in the learning subsystem.
Like the cortical cell types, there's enough to build.
It seems like there's enough to build a learning algorithm up there and specify some hyperparameters.
And in the steering subsystem, there's like a gazillion, you know, thousands of really weird cells,
which might be like the one for the spider flinch reflex and the one for I'm about to taste salt.
Sorry, why would each reward function need a different cell type?
Well, so this is where you get innately wired circuits, right?
So in the learning algorithm part, in the learning subsystem, you set up, specify the initial
architecture, you specify a learning algorithm.
It's all the, all the juices is happening through plasticity of the synapses, changes
of the synapses within that big network.
But it's kind of like a relatively repeating architecture, how it's initialized.
It's just like the amount of Python code needed to make, you know, an eight-layer transformer is
not that different from one to make a three-layer transformer, right? You're just replicating.
Whereas all this Python code for the reward function, you know, if superior clickless sees something
that's skittering in the land, you know, you're feeling goosebumps on your skin or whatever,
then trigger spider reflex. That's just a bunch of like bespoke species-specific,
um, uh, situation-specific crap. The cortex doesn't know about spiders. It just knows about
layers and, right? But you're saying that the, the only way to have this, like, write this reward
function. Yeah. You used to have a special cell type. Yeah. Yeah. Well, I think so. I think you
either have to have a special cell types or you have to somehow somehow otherwise get special
wiring rules that evolution can say, this neuron needs to wire to this neuron without any
learning. And the way that that is most likely to happen, I think, is that those cells express
like different receptors and proteins that say, okay, when this one comes in contact with this one,
let's form a synapse. So it's genetic wiring. Yeah. And
those needs cell types to do it. Yeah. I'm sure this would make a lot more sense if I knew 101
neuroscience, but like it seems like there's still a lot of complexity or generality rather
in the steering system. So in the steering system, has its own visual system that's separate
from the visual cortex. Yeah. Different features still need to plug into that vision system.
and the so like the spider thing needs to plug into it and also the um the uh love thing needs to
plug into it etc etc yes so it seems complicated like no it's still complicated and that's that's all
the more reason why a lot of the genomic you know real estate in the genome and in terms of these
different cell types and so on would go into wiring up the steering subsystem and can we tell
pre-wiring it can we tell how much of the genome is like clearly working so I guess you could tell
how many are relevant to the producing the RNA that manifest or the epigenetics that manifest in
different cell types in the brain, right?
Yeah, this is what the cell types helps you get at it.
I don't think it's exactly like, oh, this percent of the genome is doing this.
But you could say, okay, in all these steering subsistence, subsypes, you know, how many
different genes are involved in sort of specifying which is which and how they wire and how much
genomic real estate do those genes take up versus the ones that specify, you know, visual cortex
versus auditory cortex, you kind of just
reusing the same genes to do the same thing twice.
Whereas the spider reflex hooking up, yes, you're right.
They have to build a vision system
and they have to build some auditory systems
and touch systems and navigation type systems.
So even feeding into the hippocampus and stuff like that,
there's head direction cells.
Even the fly brain, it has innate circuits
that figure out its orientation
and help it navigate in the world
and it uses vision, figure out as optical flow
of how it's flying,
and, you know, how is it, how is its flight related to the wind direction?
It has all these innate stuff that I think in the mammal brain,
we would all put that and lump that into the steering subsystem.
So there's a lot of work.
So all the genes, basically, that go into specifying all the things a fly has to do,
we're going to have stuff like that, too, just all in the steering subsystem.
But do we have some estimate of, like, here's how many nucleotides,
here how many megabases it takes to?
I don't know.
I mean, but, I mean, I think you might be able to talk to biologists about this.
know, to some degree, because you can say, well, we just have a ton in common. I mean, we have a lot
in common with yeast from a genes perspective. Yeast is still used as a model for, you know, some
amount of drug development and stuff like that in biology. And so so much of the genome is just going
towards, you have a cell at all. It can recycle waste. It can get energy. It can replicate.
And then what we have in common with a mouse. And so we do know at some level that, you know,
the difference is us in a chimpanzee or something, and that includes the social instincts and
the more advanced differences in cortex and so on. It's a tiny number of genes that go into
these additional amount of making the eight-layer transformer instead of the six-layer
transformer or tweaking that reward function. This would help explain why the hominid brain
exploded in size so fast, which is presumably, like, tell me this is correct, but under the story,
social learning or some other thing
increased the ability to learn
from the environment, like increased our sample efficiency, right?
Instead of having to go and kill the bore yourself
and figure out how to do that, you can just be like,
the elder told me this is how you make a spear,
and then now it increases the incentive to have a bigger cortex,
which can like learn these things.
Yes.
And that can be done with a relatively few genes
because it's really replicating what the mouse already has
is making more of it.
It's maybe not exactly the same, and there may be tweaks, but it's like, from a perspective, you don't have to reinvent all this stuff, right?
So then how far back in the history of the evolution of the brain does the cortex go back?
It is the idea that, like, the cortex has always figured out this omnidirectional inference thing.
That's been a solve problem for a long time.
And then the big unlock with primates is this, we got the reward function which increased the returns to having omnidirectional inference.
Or is it's a good question.
Is the cortex, is the omnidirectional inference also something that took a wild tone long?
I'm not sure that there's agreement about that.
I think there might be specific questions about language, you know, are there tweaks to be, you know, whether that's through auditory and memory, some combination of auditory memory regions.
There may also be like macro wiring, right, of like you need to wire auditory regions into memory regions or something like that and into some of these social instincts to get language, for example, to happen.
So there might be, but that might be also a small number of gene changes.
to be able to say,
oh, I just need from my temporal lobe over here
going over to the auditory cortex, something, right?
And there is some evidence for the, you know,
DeBrocka's area, Wernicke's area.
They're connected with these hippocampus and so on.
And so prefrontal cortex.
So there's like some small number of genes
maybe for like enabling humans
to really properly do language.
That could be a big one.
But, yeah, I mean, I think that
is it that something changed about the cortex
and it became possible to do these things
whereas that potential was already there
but there wasn't the incentive
to expand that capability
and then use it, wired it to these social instincts
and use it more.
I mean, I would lean somewhat toward the latter.
I mean, I think a mouse
has a lot of similarity
in terms of cortex as a human.
Although there's that
the Susanna Herkula-Husel work
of the
the number of neurons
scales better with weight
with primate brains than it does with
rodent brains, right? So
does that suggest that there actually was some improvement
in the scalability of the cortex?
Maybe, maybe.
I'm not super deep on this. There may
have been
yeah, changes in architecture, changes
in the folding, changes in
neuron properties and stuff that somehow
slightly tweak this, but they're still A-scaling.
That's right. Either way, right.
And so I was not saying there
aren't something special about humans,
in the architecture of the learning subsystem at all.
But, yeah, I mean, I think it's pretty widely thought that this has expanded.
But then the question is, okay, well, how does that fit in also with the steering subsystem changes
and the instincts that make use of this and allow you to bootstrap using this effectively?
But, I mean, just to say a few other things, I mean, so even the fly brain has some
amount of, for example, even very far back, I mean, I think you've read this great book,
the brief history of intelligence, right? I think this is a really good book. Lots of AI research
is a really good book, it seems like. Yeah, you have some amount of learning going back
all the way to anything that has a brain, basically. You have something kind of like primitive
reinforcement learning, at least, going back at least to like vertebrates. Like imagine like a zebrafish
just like a
other branches
birds maybe kind of
reinvented something
kind of cortex-like
but it doesn't have the six layers
but they have something
a little bit cortex-like
so that some of those things
after reptiles
in some sense birds
and mammals both kind of made
a somewhat cortex-like
but differently organized thing
but even a fly brain
has like associative learning
senders that
actually do things
that maybe look a little bit
like this like thought assessor concept
from from Bairns
where there's like a specific dopamine signal
to train specific subgroups of neurons
in the fly mushroom body
to associate different sensory information
with, am I going to get food now
or am I going to get hurt now?
Yeah.
Brief tangent.
I remember reading in one blockpost
that Baron Millage wrote
that the parts of the cortex,
which are associated with audio and vision,
have skilled disproportionately
between other primates and humans, whereas the parts associated, say, with odor, have not.
And I remember him saying something like, this is explained by that kind of data having worse
scaling law properties. But I think the, and maybe he meant this, but another interpretation
of actually what's happening there is that these social reward functions that are built
into the steering subsystem needed to make use more of,
being able to see your elders and see what the visual cues are and hear what they're saying.
In order to make a sense of these cues, which guide learning, you needed to activate the vision and audio more than older.
I mean, there's all this stuff. I feel like it's come up in your shows before, actually.
But like, even like the design of the human eye where you have like the pupil and the white and everything.
Like we are designed to be able to establish relationships based on joint eye contact.
And maybe this came up in the sudden episode I can't remember.
Yeah, we have to bootstrap to the point where we can detect eye contact and where we can communicate by language, right?
And that's like what the first couple years of life are trying to do.
Okay.
I want to ask you about RL.
So currently the way these elements are trained, you know, they are if they solve the unit test or solve a math problem, that whole trajectory, every token in that trajectory is up weighted.
And what's going on with humans?
Are there different types of model base versus model free that are happening in different
parts of the brain. Yeah. I mean, this is another one of these things. I mean, again, all my
answers to these questions, any specific thing I say is all just kind of like directionally,
this is we can kind of explore around this. I find this interesting. Maybe I feel like the
literature points in these directions in some very broad way. What I actually want to do is like go
and map the entire mouse brain and like figure this out comprehensively and like make neuroscience
a ground truth science. So I don't know, basically. But, but yeah, I mean, there, so first of all,
I mean, I think with Ilya on the podcast, I mean, he was like, it's a,
weird that you don't use value functions, right?
You use the most dumbest form of URL based.
And of course, these people are incredibly smart,
and they're optimizing for how to do it on GPUs,
and it's really incredible what they're achieving.
But conceptually, it's a really dumb form of URL,
even compared to what was being done in like 10 years ago.
Like even the Atari game playing stuff, right,
was using like Q learning, which is basically
like it's a kind of temporal difference learning, right?
And the temporal difference learning basically
means you have some kind of a value function
like what action I choose now doesn't just tell me literally what happens immediately after
this. It tells me like what is the long run consequence of that from my expected, you know,
total reward or something like that. And so you have value functions like the fact that we don't
have like value functions at all is like in the LLMs is like it's crazy. I mean, I think because
I said it, I can say it. I know one one hundredth of what he does about AI. But like it's kind
crazy that this is working.
But, yeah, I mean, in terms of the brain, well, so I think there are some parts of the brain
that are thought to do something that's very much like model-free RL, that sort of parts of
the basal ganglia, sort of striatum and basal ganglia, they have like a certain finite, like,
it is thought that they have a certain, like, finite, relatively small action space.
and the types of actions they could take
first of all might be like
tell the spinal cord
or tell the brain stem and spinal cord
to do this motor action.
Yes, no.
Or it might be more complicated,
cognitive type actions
like tell the thalamus
to allow this part of the cortex
to talk to this other part
or release the memory
of this in the hippocampus
and start a new one or something, right?
But there's some finite set of actions
that kind of come out of the basal ganglia
and that it's just a very simple R.L.
So there are probably parts of other brains
in our brain that are just like doing very simple, naive type RL algorithms.
Layer one thing on top of that is that some of the major work in neuroscience, like Peter
Diane's work and a bunch of work that is part of why I think Deep Mind did the temporal difference
learning stuff in the first place, is they were very interested in neuroscience.
And there's a lot of neuroscience evidence that the dopamine is giving this reward prediction
error signal rather than just reward yes, no, you know, a gazillion time.
steps in the future. It's a prediction error. And that's consistent with, like, learning these
value functions. So there's that. And then there's maybe, like, higher order stuff. So we have
these cortex making this world model. Well, one of the things the cortex world model can contain
is a model of when you do and don't get rewards, right? Again, it's predicting what the steering
subsystem will do. It could be predicting what the basal ganglia will do. And so you have a model
in your cortex that has more generalization and more concepts and all this stuff that says,
okay, these types of plans, these types of actions, will lead in these types of circumstances
to reward. So I have a model of my reward. Some people also think that you can go the other way.
And so this is part of the inference picture. There's this idea of RL as inference. You could say,
well, conditional on my having a high reward, sample a plan that I would have had to get there.
That's inference of the plan part from the reward part. I'm clamping the reward as high and inferring.
the plan sampling from plans that could lead to that.
And so if you have this very general cortical thing,
it can just do,
if you have this very general model-based system
and the model, among other things,
includes plans and rewards,
then you just get it for free, basically.
So like in neural network parlance,
there's a value head associated to the omnidirectional inference
that's happening in the...
Yes, yeah. Or there's a value input.
Yeah.
Oh, okay.
Yeah.
And it can predict one of the, one of the almost sensory variables it can predict is
is what rewards is going to get.
Yeah.
But speaking of this thing about amortizing things, yeah, obviously value is like amortized
rollouts of looking up reward.
Yeah, something like that.
Yeah.
Yeah.
It's like a statistical average or prediction of it.
Yeah.
Right.
Tengential thought.
You know, Joe Henrik and others have this idea that.
The way human societies have learned to do things is just like, how do you figure out that, you know, this kind of bean which actually just almost always poisons you is edible if you do this 10-step incredibly complicated process.
Any one of which, if you fail at, the bean will be poisonous.
How do you figure out how to hunt the seal in this particular way with this like particular weapon at this particular time of the year, et cetera?
there's no way
but just like
trying shit over generations
and it's actually
very much like model free RL
happening at like a civilizational level
no not exactly
evolution is the simplest algorithm
in some sense right
and if we believe that all this can come
in revolution like the outer loop
can be like extremely not foresighted
and yeah right
that's interesting
just like hierarchies of
evolution model free culture
evolution model for a
So what does that tell you?
Maybe the simple algorithms can just get you anything if you do it enough first.
Right, right, yeah.
Yeah, I don't know.
But yeah, so you have like maybe this evolution model free, basal ganglia model free, cortex, model based, culture, model free potentially.
I mean, you pay attention to your elders or whatever.
Yeah, maybe this like group selection or whatever of these things is like more model free.
Yeah.
But now I think culture, well.
it stores some of the model.
Yeah, right.
So let's say you want to train an agent to help you with something like processing loan applications.
Training an agent to do this requires more than just giving the model access to the right tools,
things like browsers and PDF readers and risk models.
There's this level of task and knowledge that you can only get by actually working in an industry.
For example, certain loan applications will pass every single automated check despite being super risky.
Every single individual part of the application might look safe,
But experienced underwriters know to compare across documents to find subtle patterns that signal risk.
Labelbox has experts like this and whatever domain you're focused on,
and they will set up highly realistic training environments that include whatever subtle nuances and watchouts you need to look out for.
Beyond just building the environment itself, Labelbox provides all the scaffolding you need to capture training data for your agent.
They give you the tools to grade agent performance and capture the video of each session
and to reset the entire environment to a clean state between every episode.
So whatever domain you're working in, Labelbox can help you train reliable real-world agents.
Learn more at Labelbox.com slash Thorcasch.
Stepping back, how is it a disadvantage or an advantage for humans that we get to use biological hardware
in comparison to computers as it exists now.
So what I mean by this question is like
if there's the algorithm,
would the algorithm just qualitatively perform much worse
or much better if inscribed in the hardware of today?
And the reason to think it might, like, here's what I mean.
Like, you know, obviously the brain has had to make
a bunch of tradeoffs which are not relevant
to competing hardware.
It has to be much more energetically efficient.
Maybe as a result, it has to run on slower speeds
so that they're going to be a smaller voltage gap.
And so the brain runs at 200 hertz.
and has to like run on 20 watts
on the other hand
with like robotics
we've clearly experienced that
fingers are way more nimble
than we can make motors so far
and maybe there's something in the brain
that is the equivalent
of like cognitive dexterity
which is like maybe do the fact
that we can do unstructured sparsity
we can co-locate the memory
in the compute
where does this all that out
are you like fuck we would be so smart
if we didn't have to deal with these brains
or are you like oh
I mean I think in the end
we will get the best of both worlds
somehow right I think
I think an obvious downside of the brain
is it cannot be copied.
You don't have external read-write access
to every neuron and synapse.
Whereas you do, I can just edit something
in the weight matrix in Python or whatever
and load that up and copy that
in principle, right?
So the fact that it can't be copied
and kind of random accessed is like very annoying.
But otherwise, maybe these are,
it has a lot of advantages.
So it also tells you that you want to like somehow
do the code design.
of the algorithm and the it maybe it even doesn't change it that much from all of what we
discuss but you want to somehow do this code design so um yeah how do you do it with really slow
low voltage switches that's going to be really important for the energy consumption the co-locating
memory and compute so like i think that probably just like hardware companies will try to
co-locate memory and compute they will try to use lower voltages allow some stochastic stuff
there are some people that think that this like all this probabilistic stuff that we were talking about oh oh it's actually energy based models and so on is doing lot it is doing lots of sampling it's not just amortizing everything that the neurons are also very natural for that because they're naturally stochastic and so you don't have to do a random number generator and a bunch of python code basically to generate a sample the neuron just generates samples and it can tune what the different probabilities are and so and like learn
learn those tunings. And so it could be that it's very co-designed with like some kind of inference
method or something. Yeah. It'd be hilarious. I mean, the method I'll take your interview is like,
you know, all these people that folks make fun of on Twitter, you know, Jan Lakoude and Beth Jaisos
and whatever, they're like, no, like, yeah, maybe I don't know. That is actually, that is actually
one read of me and granted, you know, I haven't really worked on AI at all since LOMs, you know,
took off. So I'm just like out of the loop. But I'm, I'm surprised.
And I think it's amazing how the scaling is working and everything.
But yeah, I think Jan Lecun and Beth Jesos are kind of onto something about the probabilistic models, or at least possibly.
And in fact, that's what, you know, all the neuroscientists and all the AI people thought, like until 2021 or something, right?
So there's a bunch of cellular stuff happening in the brain that is not just about neuron-to-neuron synaptic connections.
How much of that is functionally doing more work?
then the synapses themselves are doing
versus it's just a bunch of cludge
that you have to do in order to make the synaptic thing work.
So the way you need to, you know, with a digital mind,
you can nudge the synapse,
sorry, the parameter extremely easily,
but with a cell to modulate a synapse
according to the gradient signal,
it just takes all of this crazy machinery.
So like, is it actually doing more
than it takes extremely little code to do?
So I don't know,
but I'm not a believer.
in the like radical like oh actually memory is not synapses mostly or like learning is mostly
genetic changes or something like that i think it would just make a lot of sense i think you put it
really well for it to be more like the second thing you said like let's say you want to do weight
normalization across all the weights coming out of your neuron right or into your neuron well you
probably have to like somehow tell the nucleus about this of the cell and then have that kind of
send everything back out to the synapses or something right and so there's going to be
a lot of cellular changes, right? Or let's say that, you know, you just had a lot of plasticity
and, like, you're part of this memory. And now that's got consolidated into the cortex or whatever.
And now we want to reuse you as, like, a new one that can learn again. It's going to be a ton of
cellular changes. So there's going to be tons of stuff happening in the cell. But algorithmically,
it's not really adding something beyond these algorithms, right? It's just implementing something
that in a digital computer is very easy for us to go and just find the weights and change them.
And it is a cell.
It just literally has to do all this with molecular machines itself without any central controller, right?
It's kind of incredible.
There are some things that cells do, I think, that seem like more convincing.
So in the cerebellum, so one of the things the cerebellum has to do is, like, predict over time.
Like, predict what is the time delay?
You know, let's say that, you know, I see a flash and then, you know, some number of milliseconds later.
I'm going to get like a puff of air in my eyelid or something, right?
the cerebellum can be very good at predicting what's the timing between the flash and the air puff so that now your eye will just like close automatically like the cerebellum is like involved in that type of reflex like learned reflex and there are some cells in the cerebellum where it seems like the cell body is playing a role in storing that time constant changing that time constant of delay versus that all being somehow done with like I'm going to make a longer ring of synapses to make that delay
longer. It's like, no, the cell body will just, like, store that time delay for you.
So there are some examples, but I'm not a believer, like, out of the box in, like, essentially
this theory that, like, what's happening is changes and connections between neurons.
Yeah. And that's, like, the main algorithmic thing that's going on. Like, I think that's a very good
reason to still believe that it's that rather than some, like, crazy cellular stuff. Yeah.
Going back to this whole perspective of like our intelligence is not just this omnidirectional inference thing that builds a world model, but really this system that teaches us what to pay attention to what are the important salient factors to learn from, et cetera.
I want to see if there's some intuition we can drive from this, but what different kinds of intelligence might be like.
So it seems like AGI or superhuman intelligence should still have.
have this um uh uh like ability to learn a world model that's quite general but then it might
um be incentivized to pay attention to different things that are relevant for what you know the
modern post singularity environment how different should we expect different intelligences to be
basically yeah i mean i think one way of this question is like is it actually possible like make
the paperclip maximizer or whatever right if you make if you try to make the paperclip
maximizer, does that end up like just not being smart or something like that? Because it was just
the only reward function it had was like make paper clips. If I channel Steve Burns more, I mean,
I think he's very concerned that the sort of minimum viable things in the steering subsystem that you
need to get something smart is way less than the minimum viable set of things you need for it
to have human like social instincts and ethics and stuff like that. So a lot of what you want to know
about the steering subsystem is actually the specifics of how you do alignment essentially or
what human behavior and social instincts is versus just what you need for capabilities.
And we talked about it in a slightly different way because we were sort of saying,
well, in order for humans to learn socially, they need to make eye contact and learn from others.
But we already know from LLMs, right, that depending on your starting point, you can learn language
without that stuff, right?
And so, yeah.
And so I think that it probably is possible to make like super powerful, you know,
model-based RL, you know, optimizing.
systems and stuff like that that don't have most of what we have in the human brain reward
functions and as a consequence might want to maximize paperclips and that's a concern yeah right
but you're pointing out that in order to make a competent paperclip maximizer yeah the kind
of thing that can build the spaceships and learn the physics and whatever um it needs to have some
drives which elicit learning including say curiosity and exploration yeah curiosity and interest in
others of interest in social interactions, curiosity. Yeah, but that's pretty minimal, I think.
And that's true for humans, but it might be less true for like something that's already
pre-trained as an LLM or something. Right. And so, so most of why we want to know the steering
subsystem, I think, if I'm channeling Steve, is alignment reasons. Yeah. Right. How confident
are we that we even have the right algorithmic conceptual vocabulary
to think about what the brain is doing.
And what I mean by this is, you know,
there was one big contribution to AI from neuroscience,
which was the side of the neuron.
Yeah.
William, you know, 1950s, just like this original contribution.
But then it seems like a lot of what we've learned afterwards
about what the high-level algorithm the brain is implementing
from the back prop to, if there's something analogous,
the back-prop of happening in the brain,
to, oh, is re-one doing something like CNN's to TD learning and Belmont.
equations, actor critic, whatever, seems inspired by what is, like, we come up with some
idea, like maybe we can make AI neural networks work this way.
And then we notice that's something in the brain also works that way.
So why not think there's more things like this?
There may be, yeah.
I think the reason that I'm not, I think that we might be onto something is that like
the AIs we're making based on these ideas are working surprisingly well.
There's also a bunch of like just empirical stuff, like, like convolutional nets and
variants of convolutional neural nets, I'm not for sure what the absolute latest,
latest, but compared to other, like, models in computational neuroscience of, like,
what the visual system is doing are just, like, more predictive, right?
So you can just, like, score, even, like, pre-trained on, like, cat pictures and stuff,
CNN's, what is the representational similarity that they have on some arbitrary other image
versus, you know, compared to the brain activations, measured in different ways?
Jim de Carlo's lab has the brain score
and like the AI model is actually like
there seems to be some relevance there
in terms of like even like neurosciences don't necessarily
have something better than that
so yes I mean that's just kind of recapitulating
what you're saying is that like the best computational
neuroscience theories we have seem to have been like invented
largely as a result of AI models
and like find things that work and so find back prop works
and then say can we approximate back prop with cortical circuits
or something and there's there's kind of been
things like that. Now, some people totally disagree with this, right? So like Yuri Buzaki as a
neuroscientist who has a book called The Brain from Inside Out, where he basically says like all
our psychology concepts, like AI concepts, all the stuff is just like made up stuff. We actually
have to do is like figure out what is the actual set of primitives that like the brain actually uses
and our vocabulary is not going to be adequate to that. We have to start with the brain and make new
vocabulary rather than saying back prop and then try to apply that to the brain or something like
that and you know he studies a lot of like oscillations and stuff in the brain as opposed to
individual neurons and what they do and you know I don't know I think that there's a case to be
made for that and from a kind of research program design perspective I think there's like one thing
we should be trying to do is just like simulate a tiny worm or a tiny zebrafish like almost like
as biophysical or like as bottom up as possible like get connecto molecules activity and like just
study it as a physical dynamical system and look what it does.
But I don't know.
I mean, just when I like, it just feels like the AI is really good fodder for computational
neuroscience.
Like, those might actually be pretty good models.
We should look at that.
So I'm not a person who thinks that, I think I both think that there should be a part of
the research portfolio that is like totally bottom up and not trying to apply our vocabulary
that we learn from AI onto these systems.
and that there should be another big part of this
that's kind of trying to reverse engineer
it using that vocabulary or variance of that vocabulary
and that we should just be pursuing both
and my guess is that the reverse engineering one
is actually going to like kind of workish
or something like we do see things like TD learning
which you know Sutton also invented
separately right that must be a crazy feeling
to just like this like equation I wrote down
is like in the brain
It seems like the dopamine is like doing some of that.
Yeah.
So let me ask you about this.
You know, you guys are finding different groups that are trying to figure out what's up in the brain.
If we had a perfect representation, how are you defined out of the brain, why think it would actually let us figure out the answer to these questions?
We have neural networks, which are way more interpretable, not just because we understand what's in the weight matrices, but because there are weight matrices.
there are these boxes with numbers in them.
Right.
And even then we can tell very basic things.
We can kind of see circuits for very basic pattern matching of following one token with another.
I feel like we don't really have an explanation of why LLMs are intelligent just because they're interpretable.
I would somewhat dispute it.
I think we have some architectural, we have some description of what the LLM is like fundamentally doing.
And what that's doing is that I have an architecture and I have a learning rule and I have hyperparameters and I have.
hyperparameters and I have initialization and I have training data.
But those are things we learned from because we built them, not because we interpreted them
from seeing the way it's.
We built them.
Which is the analogous thing to connect to them is like seeing the way it's...
What I think we should do is we should describe the brain more in that language of things like
architectures learning rules, initializations, rather than trying to find the golden gate
bridge circuit and saying exactly how does this neuron actually, you know, that's going to be
some incredibly complicated learned pattern.
Yeah, Conrad Recording and Tim Lillicrap have this page.
from a while ago, maybe five years ago, called what does it mean to understand a neural
network or what would it mean to understand a neural network?
And what they say is, yeah, basically that.
Like, you can imagine you train a neural network to, like, compute the digits of pie or something.
Well, like, some crazy, you know, it's like this crazy pattern.
And then you also train that thing to, like, predict the most complicated thing you find,
predict stock prices, basically predict the really complex systems, right?
Computationally complete systems.
I could predict, I could train a neural network to do cellular automata or whatever crazy thing.
And it's like we're never going to be able to fully capture that with interpretability, I think.
It's just going to just be doing really complicated computations internally.
But we can still say that the way it got that way is that it had an architecture and we gave it this training data and it had this loss function.
And so I want to describe the brain in the same way.
And I think that this framework that I've been kind of laying out is like we need to understand the cortex and how it embodies a learning algorithm.
I don't need to understand how it computes golden game range.
But if you can see all the neurons, if you have the connectome, why does that teach you what the learning?
algorithm is? Well, I guess there are a couple different views of it. So it depends on this
different parts of this portfolio. So on the totally bottom up, we have to simulate everything
portfolio. It kind of just doesn't. You have to just like see what are the, you have to make a
simulation of the zebrafish brain or something. And then you like see what are the like
emergent dynamics in this. And you come up with new names and new concepts and all that.
That's like that's like the most extreme bottom up neuroscience view. But even there, the connectome
is like really important for doing that bio physical or bottom up simulation.
But on the other hand, you can say, well, what if we can actually apply some ideas from AI?
We basically need to figure out, is it an energy-based model or is it, you know, an amoratized, you know, V-A-E-type model, you know, is it doing back-propp or is it doing something else?
Are the learning rules local global?
I mean, if we have some repertoire of possible ideas about this, can we just think of the connectome as a huge number of additional constraints that will help to refine
to ultimately have a consistent picture of that.
I think about this for the steering subsystem stuff, too,
just very basic things about it.
How many different types of dopamine signal
or of steering subsystem signal
or thought assessor or so on,
how many different types of what broad categories are there?
Like, even this very basic information
that there's more cell types in the hypothalamus
than there are in the cortex.
Like, that's new information, right?
About how much structure is built there
versus somewhere else?
Yeah, how many different dopamine neurons are there?
Is the wiring between prefrontal
and auditory the same as the wiring
between prefrontal and visual, you know, it's like the most basic things we don't know.
And the problem is learning even the most basic things by a series of bespoke experiments
takes an incredibly long time, whereas just learning all that at once by getting a connectome
is just like way more efficient.
What is the timeline on this?
Because presumably the idea of this is to, well, first, inform the development of AI.
You want to be able to figure out how we do the, how we get AI stories to,
want to care about what other people think of its internal thought pattern. But
Interp researchers are making progress on this question just by inspecting, you know, normal neural
networks. There must be some future. You can do interp on LLMs that exist. You can't do
interp on a hypothetical model-based reinforcement algorithm like the brain that we will eventually
converge to when we do AGI. Fair, sure. But, you know, what timelines on AI do you need for this research
should be practical and relevant to
I think it's fair to say it's not super practical and relevant
if you're in like an AI 2027 scenario.
Yeah.
You know,
and so like what science I'm doing now
is not going to affect the science of like 10 years from now
because what's going to affect the science of 10 years and now
is the outcome of this like AI 2027 scenario, right?
It kind of doesn't matter that much probably
if I have the connecto.
Maybe it slightly tweaks certain things.
But I think there's a lot of reason to think
maybe that we will get a lot out of this paradigm, but then the real thing, the thing that is
like the single event that is like transformative for the entire future or something type
event is still like, you know, more than five years away or something.
Sorry, is that because like we haven't captured Dominion directional inference. We haven't
figured out the right ways to get a mind to pay attention to things in a way that makes it.
I would take the entirety of your, like, collective podcast with everyone
as, like, showing, like, the distribution of these things, right?
I don't know, right?
I mean, what was Carpathie's timeline, right?
You know, what's Demis's timeline, right?
So, these, then not everybody has a three-year timeline.
And so I think if-
But there's different reasons, and I'm curious what are yours?
What are mine?
I don't know.
I'm just watching your podcast.
I'm probably to understand the distribution.
I don't have a super strong claim that LLMs can't do it.
But is it across the data efficiency?
Or is it the...
I think part of it is just
it is weirdly different
than all this brain stuff.
Yeah, yeah, yeah.
And so intuitively,
it's just weirdly different
than all this brain stuff.
And I'm kind of waiting
for like the thing
that starts to look more like brain.
Like, I think if Alpha Zero
and model-based RL
and all these other things
that were being worked on 10 years ago
had been giving us
the GPT-5 type capabilities,
then I would be like,
oh, wow,
we're both in the right paradigm
and seeing the results.
Right.
A priori,
so my prior and my data
are agreeing.
Yeah, yeah.
Right.
And now is like, I don't know what exactly my data is, looks pretty good, but my prior is sort of weird.
So, yeah, so I don't have a super strong opinion on it.
So I think there is a possibility that essentially all other scientific research that is being done is like not, it's somehow obviated, but I don't put a huge amount of probability on that.
I think my timelines might be more in the like, yeah, 10-year-ish range.
And if that's the case, I mean, I think there, yeah, there is probably a different subpoena world where we have connectomes on hard drives and we have,
understanding of steering subsystem architecture,
we've compared the, you know,
even the most basic properties of what are the reward functions,
cost function architecture, et cetera,
of mouse versus a shrew versus a small primate, et cetera.
This is practical in 10 years?
I think it has to be a really big push.
Like, how much funding?
How does it compare it to where we are now?
It's like billion, low billions dollar scale funding
in a very concerted way, I would say.
And how much is on it now?
Well, so if I just talk about some of the specific things
we have going so with connectomics so so e11 bio is kind of like the the the main thing on connectomics
um they are basically trying to make the technology of connectomic brain mapping um several orders
of magnitude cheaper so the welcome trust put out a report a year or two ago that basically said to
get one mouse brain the first mouse brain connectome would be like several billion dollars you know
billions of dollars project um what 11 technology and sort of the suite of efforts in the field
also are trying to get like a single mouse connectum down to like low tens of millions of dollars okay
so that's a mammal brain right now a human brain is about a thousand times bigger so if a mouse
brain you can get to 10 million or 20 million 30 million um with technology you know if you just
naively scale that okay human brain is now still billions of dollars to just one do do one human
brain. Can you go beyond that? So can you get a human brain for like less than a billion? But I'm not
sure you need every neuron in a human brain. I think we want to, for example, do an entire mouse
brain and a human steering subsystem and the entire brains of several different mammals with
different social instincts. And so I think that that with a bunch of technology push and a bunch
of concerted effort can be done in the real significant progress if it's focused effort can be done
in the kind of hundreds of millions to low billions. What is the definition of a connectome?
it presumably it's not a bottom of biophysics model. So is it just that if it can estimate the
input output of a brain? But like what is the level of abstraction? So you can give different
different issues. And one of the things that's cool about, so the kind of standard approach
to connectomics uses the electron microscope and very, very thin slices of brain tissue. And it's
basically labeling the cell membranes are going to show up scatter electrons a lot and everything
else is going to scatter electrons less. But you don't see a lot of details of the molecules.
which types of synapses, different synapses of different molecular combinations and properties.
E11 and some other research in the field has switched to an optical microscope paradigm.
With optical, the photons don't damage the tissue, so you can kind of wash it and look at fragile, gentle molecules.
So with E11 approach, you can get a quote-unquote molecularly annotated connectome.
So that's not just who is connected to who by some kind of synapse, but what are the molecules that are present at the synapse?
what type of cell is that.
So a molecular annotated connectome,
that's not exactly the same as having synaptic weights.
That's not exactly the same as being able to simulate the neurons
and say what's the functional consequence
of having these molecules and connections.
But you can also do some amount of activity mapping
and try to correlate structure to function.
Yeah, so...
Interesting.
Train an ML model to basically predict the activity from the connectome.
What are the lessons
to be taken away from the human genome project,
because one way you could look at it
is that it was actually a mistake,
and you shouldn't have spent whatever billions of dollars
getting one genome mapped.
Rather, you should have just invested in technologies,
which have, and now allows to map genomes for hundreds of dollars.
Yeah, well, yeah.
So George Churchill was my PhD advisor.
And basically, yeah, I mean, what he's pointed out
is that, yeah, it was $3 billion or something,
roughly $1 per base pair for the first genome.
And then the National Human Genome Research Institute,
basically structured the funding process rights, and they got a bunch of companies competing
to lower the cost, and then the cost dropped like a million fold in 10 years, because they
changed the paradigm from kind of macroscopic kind of chemical techniques to these individual DNA
molecules make a little cluster of DNA molecules on the microscope, and you would see just a few
DNA molecules at a time on each pixel of the camera would basically give you a different
in parallel, looking at different fragments of DNA. So you parallelize the thing by the
like millions-fold, and that's what reduced the cost by millions-fold. And, yeah, so, I mean,
essentially with switching from electron microscopy to optical connectomics, potentially even future types
of connectomics technology, we think there should be similar pattern. That's why E-11 with the
Focus Research Organization started with technology development rather than starting with saying,
we're going to do a human brain or something, let's just brute force it. We said, let's get the cost
down with new technology. But then you still, it's still a big thing. Even with new next generation
technology, you still need to spend hundreds of millions on data collection. Is this going to be
funded to philanthropy by governments, by investors? This is very TBD and very much evolving in some
sense as we speak. I'm hearing some rumors going around of Konectomics related companies potentially
forming. So so far, E11 has been philanthropy. The National Science Foundation,
just put out this call for tech labs, which is basically somewhat of it is kind of fro inspired
or related. I think you could have a tech lab for actually going and mapping the mouse brain
with us, and that would be sort of philanthropy plus government still in a nonprofit kind of open
source framework. But can companies accelerate that? Can you credibly link connectomics to
AI in the context of a company and get investment for that? It's like possible? I mean, the cost of
turning these AIs is increasing so much if you could like tell some story of like not only
are going to figure out some safety thing. But in fact, we will, once we do that, we'll also
be able to tell you how AI works. I mean, all these questions. You should like go to these AI
labs and just be like, give me one one hundredth of your projected budget in 2030. I sort of tried
a little bit like seven or eight years ago and there was not a lot of interest and maybe now
there would be. But yeah, I mean, I think all the things that we've been
talking about, like, I think it's really fun to talk about,
but it's ultimately speculation.
What is the actual reason for the energy efficiency
of the brain, for example, right?
Is it doing real inference or amortized inference
or something else?
This is all going to be answerable by neuroscience.
It's going to be hard, but it's actually answerable.
And so if you can only do that for low billions of dollars
or something to really comprehensively solve that,
it seems to me, in the grand scheme of trillions of dollars
of GPUs and stuff, it actually makes sense to do
that investment.
And I think investors also just, there's been many labs that have been launched in the last
year where they're raising on the valuation of billions.
Yes.
Where things which are quite credible, but are not like our ERR, next quarter is going to be
whatever.
It's like we're going to discover materials and dot, dot, dot, dot, right?
Yes, yes.
Moonshot startups or billion-air-backed startups, moon-shot startups I see as I'm kind of
on a continuum with Fro's.
Froze are a way of channeling philanthropic support and ensuring that it's,
open source, public benefit, various other things that may be properties of a given fro.
But yes, billionaire-backed startups, if they can target the right science, the exact right science,
I think there's a lot of ways to do moonshot neuroscience companies that would never get you the connectome.
He was like, oh, we're going to upload the brain or something, but never actually get the mouse connectome or something,
these fundamental things that you need to get to ground truth to science.
There are lots of ways to have a moonshot company kind of go wrong and not do the actual science.
But there are also maybe ways to have companies or big corporate labs get involved and actually do it correctly, yeah.
This brings to mind an idea that you had in a lecture you gave five years ago about, yeah, do you want to explain behavior cloning on?
Right. Yeah. I mean, actually, this is funny because I think that the first time I saw this idea, it was, I think it actually might have been in a blog post by Gwern.
there's always there's always a word blog post and there are now academic research efforts
in some amount of emerging company type efforts to try to do this so yeah so normally like let's
say I'm training an image classifier or something like that I show it pictures of cats and dogs or
whatever and they have the label cat or dog and I have a neural network supposed to predict the label
cat or dog or something like that that is a limited amount of information
per label that you're putting in is just cat or dog. What if I also had predict what is my
neural activity pattern when I see a cat or when I see a dog and all the other things? If you add
that as like an auxiliary loss function or an auxiliary prediction task, does that sculpt the network
to know the information that humans know about cats and dogs and to represent it in a way that's
consistent with how the brain represents it and the kind of representational kind of dimensions
or geometry of how the brain represents things as opposed to just having these labels does that
let it generalize better does that let it have just richer labeling and of course that's like
that sounds really challenging it's very easy to generate lots of lots of labeled cat pictures with
you know scale ai or whatever can do this it is harder to generate lots and lots of brain activity
patterns that correspond to things that you want to train the AI to do. But again, this is just
a technological limitation of neuroscience. If every iPhone was also a brain scanner, you know, you would
not have this problem. Maybe we would be training AI with the brain signals. And it's just the
order in which technology is developed is that we got GPUs before we got portable brain scanners
or whatever, right? And that kind of thing. What is the MLAN, what you'd be doing here? Because
when you distill models, you're still looking at the final layer of like the log props across
across all tokens.
If you do distillation of one model into another,
that is a certain thing.
You are just trying to copy one model into another.
Yeah.
I think that we don't really have a perfect proposal
to like distill the brain.
I think to distill the brain,
you need like a much more complex brain interface.
Like maybe you could also do that.
You could make surrogate models.
Andreas Tolias and people like that
are doing some amount of,
neural network surrogate models of brain activity data instead of having your visual cortex do the
computation just have the surrogate models you're basically distilling your visual cortex into a neural
network to some degree um that's a kind of distillation this is doing something a little different
this is basically just saying i'm adding an auxiliary i think of as regularization or i think of it as
um adding an auxiliary loss function um that's sort of smoothing out the prediction task
to also always be consistent with how the brain represents it like what exactly it might
You're predicting like adversarial examples, for example, right?
But you're predicting the internal state of the brain.
Yes.
So in addition to predicting the label, the vector of labels like, yes, cat, not dog, yes, not boat, you know, one shot vector or whatever, of one hot vector of, yes, it's cat.
Instead of these gazillion other categories, let's say, in this simple example, you're also predicting a vector, which is like all these brain signal measurements.
Right.
Yeah.
Interesting.
And so Gurn, anyway, had this long ago blog post of like, oh, this is like an intermediate.
thing that's like we talk about whole brain emulation we talk about aGI we talk about brain
computer interface we should also be talking about this like brain augmented brain data
augmented um uh thing that's trained on all your behavior but is also trained on like
predicting some of your neural patterns right and you're saying the learning system is already
doing this for the steering system yeah and our learning system also has predict the steering
subsystem as an auxiliary task yeah yeah and that helps the steering subsystem now the steering
sub system can access that predictor and build a cool reward function
using it, yes. Okay, separately, you're on the board for of lean, which is this formal
formal math language that mathematicians use to prove theorems and so forth. And obviously,
there's a bunch of conversation right now about math, AI automating math. What's your take?
Yeah, well, I think that there are parts of math that it seems like it's pretty well on track
to automate.
And that has to do with, like, so first of all, so Lean,
so Lean had been developed for a number of years at Microsoft and other places.
It has become one of the convergent focused research organizations
to kind of drive more engineering and focus onto it.
So Lean is like this language, programming language,
where instead of expressing your math proof on pen and paper,
you express it in this programming language, Lean.
and then at the end, if you do that that way, it is a verifiable language so that you can basically click verify, and Lean will tell you whether the conclusions of your proof actually follow perfectly from your assumptions of your proof.
So it checks whether the proof is correct automatically.
Just like by itself, this is useful for mathematicians collaborating and stuff like that.
Like if I'm some amateur mathematician, I want to add to a proof, you know, Terry Tao is not going to like believe my result.
But if Lean says it's correct, it's just correct.
So it makes it easy for like collaboration to happen.
But it also makes it easy for correctness of proofs to be an RL signal in very much, yeah, RLVR, you know,
it's like a perfect math proofing is now formalized math proofings or formal means that's like expressed in something like lean and verifiable, mechanically verifiable.
That becomes a perfect RLVR, you know, task.
Yeah. And I think that that is going to just keep working. It seems like is the couple billion dollar, at least one billion dollar valuation company harmonic based on this. Alpha proof is based on this. A couple of their emerging really interesting companies. I think that this problem of like RLVRing the crap out of math proving is basically going to work. And we will be able to have things that search for proofs.
and find them
in the same way that we have AlphaGo
or what have you that can search for
ways of playing the game of Go
and with that verifiable signal works.
So does this solve math?
There is still the part that has to do
with conjecturing new interesting ideas.
There is still the kind of conceptual organization
of math of what is interesting,
how do you come up with new theorem statements
in the first place,
or even like the very high-level breakdown
of what strategies you use to do proofs.
I mean, I think this will shift the burden of that so that humans don't have to do a lot of
the mechanical parts of math, validating lemas and proofs and checking if the statement of
this in this paper is exactly the same as that paper and stuff like that.
It will just, that will just work.
You know, if you really think we're going to get all these things we've been talking about
real AGI, it would also be able to make conjectures.
And, you know, Benjiot has like a paper as more like theoretical paper.
there are probably a bunch of other papers emerging about this, like, is there like a loss
function for like good explanations or good conjectures? That's like a pretty profound question,
right? A math, a really interesting math proof or statement might be one that can compresses
lots of information about other, you know, has lots of implications for lots of other theorems.
Otherwise, you would have to prove those theorems using long, complex, passive inference.
Here, if you have this theorem, this theorem is correct, you have short passive inference to all the
other ones. And it's a short, compact statement. So it's like a power.
explanation that explains all the rest of math and like part of what math is doing is
like making these compact things that explain the other things so they call it a more
complexity of this statement or something yeah of generating all the other statements
given that you know this one or stuff like that or if you add this how does it
affect the complexity of the rest of the kind of network of proofs so can you like
make a loss function that adds oh I want this proof to be a really highly
powerful proof I think some people are trying to work on that so so maybe you
can automate the creativity part
If you had true AGI, it would do everything a human can do.
So it would also do the things that the creative mathematicians do.
But way barring that, I think just RLVRing the crap out of proofs.
Well, I think that's going to be just a really useful tool for mathematicians.
It's going to accelerate math a lot and change it a lot, but not necessarily immediately change everything about it.
Will we get mechanical proof of the Riemann hypothesis or something like that or things like that?
Maybe.
I don't know.
I don't know enough details of how hard these things are to search for.
And I'm not sure anyone can fully predict that, just as we couldn't exactly predict when Go would be solved or something like that.
And I think it's going to have lots of really cool applied applications.
So one of the things you want to do is you want to have provably stable, secure, unhackable, et cetera, software.
So you can write math proofs about software and say this code.
not only does it pass these unit tests, but I can mathematically prove that there's no way to hack it in these ways or no way to mess with the memory or this type of things that hackers use, or it has these properties.
It can use the same lien and same proof to do formally verified software. I think that's going to be a really powerful piece of cybersecurity that's relevant for all sorts of other AI hacking the world stuff.
and that yeah if you can prove a remand hypothesis you're also going to be able to to prove insanely complex things about very complex software and then you'll be able to ask the LLM synthesize me a software that is I can prove is correct right why hasn't provable programming language taken off as a result of LLM's you would think that this would I think it's starting to yeah I think that one one challenge and we are actually incubating a potential
focus research organization on this
is the specification problem.
So mathematicians are kind of know
what interesting theorems they want to formalize.
If I have like some code,
let's say I have some code that like is involved
in running the power grid or something
and it has some security properties,
well, what is the formal spec of those properties?
The power grid engineers just made this thing,
but they don't necessarily know how to lift
the formal spec from that.
And it's not necessarily easy to come up with the spec.
It's the spec that you want for your code.
people aren't used to coming up with formal specs,
and they're not a lot of tools for it.
So you also have this kind of user interface plus AI problem
of like, what security specs should I be specifying?
Is this the spec that I wanted?
So there's a spec problem.
And it's just been really complex and hard,
but it's only just in the last very short time
that the LLMs are able to generate,
you know, verifiable proofs of, you know,
things that are useful to mathematicians.
starting to be able to do some amount of that
for software verification, hardware verification.
But I think if you project the trends
over the next couple of years,
it's possible that it just flips the tie
that formal methods, based on this whole field
of formal methods or formal verification,
provable software,
which is kind of this weird,
almost backwater of more like theoretical
part of programming languages and stuff.
Very academically flavored often.
Although there was like this DARPA program
that made like a provably secure like quadcopter helicopter and stuff like that so secure against like what is the property that is exactly proved um not for that particular project but just in general like what because obviously the things malfunctioned for all kinds of reasons like you could say that um what's going on in this part of the memory over here which is supposed to be the part the user can access can't in any way affect what's going on in the memory over here or something like that or or
Or, yeah, things like that.
Yeah.
Got it.
Yeah.
So there's two questions.
One is how useful is this?
Yeah.
And two is like how satisfying as a mathematician would it be?
And the fact that there's this application towards proving that software has certain properties or hardware certain properties,
obviously, like if that works, that would obviously be very useful.
but from a pure like
are we going to figure out
with mathematics?
Yeah, is your sense
that there's something about
finding that one construction
cross maps
to another construction in a different domain
or finding that
oh, this like lemma is
if you reconfigure it
like if you redefine this
term it's still like kind of satisfies
what I meant by this term
but it no long
a counter example that previously
knocked it down no longer applies like that kind of dialectical thing that happens in mathematics
well the software like replace that yeah and like how much of the value of this sort of pure
mathematics just comes from actually just coming up with entirely new ways of thinking about a
problem yeah like mapping into a totally different representation and yeah do we have examples of
i don't know i think of it is i think of it maybe a little bit like the when everybody had to
write assembly code or something like that just like the amount of fun like cool startups that got created
it was like a lot less or something right and so it was just like less people could do it
progress was more grinding and slow and lonely and so on um you had more false failures
because you didn't get something about the assembly code right rather than the essential thing
of like it was your concept right um harder to collaborate and stuff like that and so i think
it will like be really good um there is some worry that by not learning to do the mechanical parts
of the proof that you fail to generate the intuitions that inform the more conceptual part,
creative part, right?
Yeah, same with assembly.
Right.
And so at what point is that applying is vibe coding, are people not learning computer
science, right?
Or actually, are they like vibe coding?
And they're also simultaneously looking at the LOM.
It's like explaining them these abstract computer science concepts, and it's all just
like all happening faster.
Their feedback loop is faster and they're learning way more abstract computer science and algorithm
stuff because they're vibe coding.
You know, I don't know.
It's not obvious.
that might be something the user interface and the human infrastructure around it.
But I guess there's some worry that people don't learn the mechanics
and therefore don't build like the grounded intuitions or something.
But my hunch is it's like super positive.
Exactly on net how useful that will be or how much overall math like breakthroughs
or like math breakthroughs even that we care about will happen.
I don't know.
I mean, one other thing that I think is cool is actually the accessibility question.
It's like, okay, that sounds a little bit corny.
Okay, yeah, and more people can do math.
but who cares but i think there's actually lots of people that like could have interesting ideas
like maybe the quantum theory of gravity or something like yeah one of us will come up with the
quantum theory of gravity instead of like a card carrying physicist in the same way that steve burns
is like reading the neuroscience literature and he's like hasn't been in a neuroscience lab that
much but he's like able to synthesize across the neuroscience literature be like oh learning subsystem
steering subsystem does this all make sense he's you know it's kind of like he's an outsider
neuroscientists in some ways.
Can you have outsider, you know, string theorists or something because the math is just done
for them by the computer?
And does that lead to more innovation in the string theory?
Right?
Maybe yes.
Interesting.
So.
Okay.
So if this approach works and you're right that LLMs are not the final paradigm, and suppose
it takes at least 10 years ago to the final paradigm.
Yeah.
In that world, there's this fun sci-fi premise where you have, it turns how today had a tweet,
where he's like, these models are like automated cleverness,
but not automated intelligence.
And you can quibble with the definitions there.
But, yeah, if you have automated cleverness
and you have some way of filtering,
which if you can formalize and prove things that the LLMs are saying you could do,
then you could have this situation where quantity has a quality all of its own.
Yes.
And so what are the domains of the world?
which could be put in this provable symbolic representation yeah furthermore okay so
in the world where just aGI is super far away maybe it makes sense to like literally turn everything
the LLMs ever do or almost everything they do into like super provable statements and so LLMs
can actually build on top of each other because everything to do is like super provable yeah maybe
maybe this is like just necessary because you have billions of intelligence running around even
if they are super intelligent the only way the future HGI civilization can collaborate with each other
is if they can prove each step
and they're just like
brute force churning out
this is what the Jupiter brains are doing
it's a universal it's a universal language
it's provable and it's also provable from like
are you trying to exploit me or are you sending me
some message that's actually trying to like
sort of hack into my brain effectively
are you trying to socially influence me
are you actually just like sending me
just the information that I need
and no more right for this
and yeah so Davidad who's like this program
director at ARIA now
in the UK I mean he has this whole design
of a kind of ARPA-style program of sort of safeguarded AI that very heavily leverages like
provable safety properties and can you apply proofs to like can you have a world model but that
world model is actually not specified just in neuron activations but it's specified in you know equations
those might be very complex equations but if you can just get insanely good at just auto-proving
these things with cleverness auto-cleverness can you have explicitly interpretable world models
you know, as opposed to neural net world models
and like move back basically the symbolic method
just because you can just have insane amount of ability
to prove things.
Yeah, I mean, that's an interesting vision.
I don't know how, you know, in the next 10 years,
like whether that will be the vision that plays out,
but I think it's really interesting to think about, yeah,
and even for math, I mean, I think Terry Tao
is like doing some amount of stuff
where it's like it's not about whether you can prove
the individual theorems.
It's like, let's prove all the theorems on mass.
And then it was like study the properties of like the aggregate set of proved theorems, right?
Which are the ones that got proved and which are the ones that didn't?
Okay, well, that's like the landscape of all the theorems instead of one theorem at a time, right?
Speaking of symbolic representations, one question I was meaning to ask you is, how does the brain represent the world model?
Like, obviously, that's out in neurons, but I don't mean sort of extremely functionally.
I mean sort of conceptually, is it in something that's analogous to the hidden state of a neural network, or is it something that's closer?
to a symbolic language?
We don't know.
I mean, I think there's some amount of study of this.
I mean, there's these things like, you know,
face patch neurons that represent certain parts of the face
that geometrically combine in interesting ways.
That's sort of with geometry and vision.
Is that true for, like, other more abstract things?
There's, like, this idea of cognitive maps.
Like, a lot of the stuff that a rodent hippocampus has to learn
is, like, place cells, and, like, where is the rodent going to go next?
And is it going to get a reward there?
is like very geometric and like do we organize concepts
with like a abstract version of a spatial map
there's some questions of can we do like true symbolic operations
like can I have like a register in my brain that copies a variable
to the another register regardless of what the content of that
variable that variable is that's like this variable binding problem
and basically I just don't I don't know if we have that like machinery
or if it's like more like cost functions and architectures that like
make some of that approximately emerge but maybe we would
also emerge in a neural net. There's a bunch of interesting neuroscience research trying to study
this, what the representations look like. But what was your hunch? Yeah. My hunch is going to be a
huge mess, and we should look at the architecture as the loss functions and the learning rules,
and we shouldn't really, I don't expect it to be pretty in there. Yeah. Which is it's not a symbolic
language, I think. Yeah, probably. Probably it's not that symbolic. Yeah, but other people think very
differently, you know, yeah. Other random questions, speaking of binding.
yeah what is up with feeling like there's an experience that it's like both all the parts of your
brain which are modeling very different things have different drives feel like at least presumably
feel like there's an experience happening right now and also that across time you feel like what is
yeah i'm pretty much out of loss on this one um i don't know i mean max hodak has been making talks
about this recently. He's another really
hardcore neuroscience
person,
neurotechnology person.
And the thing I mentioned with Dorso
is maybe also, it sounds like it
might have some touching on this question. But
yeah, I think this
I don't think anyways any idea.
It might even involve new physics.
It's like, you know,
yeah.
Another question which might not have an answer yet.
What?
So continual learning.
Is that the product of something extremely fundamental, the level of even the learning algorithm
where you could say, look, at least the way we do backproduct in neural networks is that you freeze
the way, there's a training period and you freeze the way it's.
And so you just need this active inference or some other learning rule in order to do
continue learning, or do you think it's more a matter of architecture and how is memory
exactly stored and is it like what kind of associated memory you have, basically?
so continue learning um i don't know i think that there's probably things that there's probably some
at the architectural level there's probably something interesting stuff that the hippocampus is doing
um and people have long thought this um what kinds of sequences it storing how is it organizing
representing that how is it replaying it back what is it replaying back um how is it exactly how that
that memory consolidation works in sort of training the cortex using replays or or or memories
from the hippocampus or something like that um there's probably some of that stuff there might be
multiple timescales of plasticity or sort of clever learning rules um that can kind of i don't know can
sort of simultaneously kind of be storing sort of short-term information and also doing back prop with it
and neurons might be doing a couple of things you know some fast weight plasticity and some slower
plasticity at the same time or synapses that have many states i mean i don't know i mean i think
that from a neuroscience perspective, I'm not sure that I've seen something that's super clear
on what continual learning, what causes it, except maybe to say that this system's consolidation
idea of sort of hippocampus consolidating the cortex, like some people think is a big piece
of this, and we don't still fully understand the details. Yeah. Speaking of fast weights,
is there something in the brain, which is the equivalent of this distinction between parameters
and activations that we see in neural networks? And specifically, like, in Transformers,
we have this idea.
Some of the activations are the key and value vectors of previous tokens that you build up over time.
And there's like the so-called the fast weights that you, whenever you have a new token,
you query them against these activations, but you also obviously query them against all the other
parameters in the network, which are part of the actual built-in weights.
Is there some such distinction that's analogous?
I don't know.
I mean, we definitely have weights and activation.
whether you can use the activations in these clever ways
different forms of actual attention
like attention in the brain
is that based on I'm trying to pay attention
I think there's probably several different kinds of actual attention
in the brain I want to pay attention to this area of visual cortex
I want to pay attention to this
the content in other areas that is triggered by the content
in this area right
attention that's just based on kind of reflexes and stuff
like that. So I don't know. I mean, I think that there's not just the cortex. There's also the thalamus.
The thalamus is also involved in kind of somehow relaying or gating information. There's cortical,
cortical connections. There's also some amount of connection between cortical areas that goes through
the thalamus. Is it possible that this is doing some sort of matching or kind of constraint satisfaction
or matching across, you know, keys over here and values over there? Is it possible that it can do
stuff like that maybe I don't know this is all part of what's the architecture of this
cortical thalamic yeah system I don't know I don't know how transformer like it is or
if there's anything analogous to like that attention be interesting to find out we're gonna give
you a billion dollars so we can come on the podcast again and then tell me how exactly
yeah mostly I just do data collection it's like really really unbiased data collection so all the all
the other people can figure out these questions yeah maybe the final question to go off on
is what was the most interesting thing you learned from the gap map?
And maybe you want to explain what the gap map is.
So the gap map, so in the process of incubating and coming up with these focus research
organizations, these sort of nonprofit startup like moonshots that we've been getting
philanthropists and now government agencies to fund, we talked to a lot of scientists.
And some of the scientists were just like, here's the next thing my graduate student will do.
here's what I find interesting, exploring these really interesting hypothesis spaces,
like all the types of things we've been talking about.
And some of them are like, here's this gap.
I need this piece of infrastructure, which, like, there's no combination of a grad student's in my lab
or me loosely collaborating with other labs with traditional grants that could ever get me that.
I need to have like an organized engineering team that builds, you know, the miniature
equivalent of the Hubble Space Telescope.
And if I can build that Hubble Space Telescope, then, like, I will unblock all the other
researchers in my field or some path of technological progress in the way that the Hubble Space
Telescope made, lifted the boats, improved the life of every astronomer, but wasn't really
an astronomy discovery in itself. It was just like you had to put this giant mirror in space
with a CCD camera and like organize all the people and engineering and stuff to do that. So some of
the things we talked to scientists about look like that. And so the gap map is basically just like
a list of a lot of those things. And it's like we call it a gap map. I think it's actually more
like a fundamental capabilities map. Like what are all these things like mini-hubes?
global space telescopes.
And then we kind of organize that into gaps for, like, helping people understand that or,
like, search that.
And what was the most surprising thing you found?
So, I mean, I think I've talked about this before, but I think it, one thing is just,
like, kind of like the overall size or shape of it or something like that is, like,
it's like a few hundred fundamental capabilities.
So if each of those was, like, a deep tech startup size project, that's, like, only a few
billion dollars or something, like, you know, each one of those was,
a series a that's only like not you know it's not like a trillion dollars to solve these gaps it's
like lower than that and so that's that's like one maybe we assumed that and we also came to
that's what we got it's not really comprehensive it's really just a way of summarizing a lot of
conversations we've had with scientists um i do think that in the aggregate process like things
like lean are actually like surprising because i did start from sort of neuroscience and biology
and it's like very obvious that they're sort of like these omics we need genomics we also need connectomics
And, you know, we can engineer E. coli, but we also need to engineer the other cells.
And, like, there's, like, somewhat obvious parts of biological infrastructure.
I did not realize that, like, math-proving infrastructure, like, was a thing.
And so, and that was kind of, like, emergent from trying to do this.
So I'm looking forward to seeing other, other things where it's, like, not actually this, like, hard intellectual problem to solve it.
It's maybe the kind of slightly the equivalent of AI researchers just needed GPUs or something like that in focus and really good pie torch code to, like,
start doing this like what is the full diversity of fields in which that exists um we've even now
found in which are the fields that do or don't need that so fields that have had gazillions of dollars
of investment do they still need some of those do they still have some of those gaps or is it only
more like neglected fields um we're even finding some interesting ones in actual astronomy actual
telescopes that have not been explored, maybe because of the kind of, if you're getting
above a critical mass size project, then you have to have like a really big project, and
that's a more bureaucratic process with the federal agencies.
Yes, I guess you just kind of need scale in every single domain of science these days.
Yeah, I think you need scale in many of the domains of science.
And that does not mean that the low scale work is not important.
It does not mean the kind of creativity, serendipity, et cetera, each student pursuing a totally different
direction or thesis that you see in universities is not, like, also really key.
But, yeah, I think we need some amount of scalable infrastructure is missing in essentially
every area of science, even math, which is crazy because math, mathematicians, I thought,
just needed whiteboards.
Right, yeah.
Right, but they actually need lien.
They actually need verifiable programming languages and stuff.
Like, like, I didn't know that.
Yeah.
Cool.
Adam, this is super fun.
That's coming on.
Thank you so much.
Where can people find your stuff?
Pleasure.
The easiest way now, my Adamarblestone.org website is currently down, I guess.
You can find Convergentresearchant.org can link to a lot of the stuff we've been doing, yeah.
And then you have a great blog, Longitudinal Science.
Yes, longitudinal science, yes, on WordPress, yeah.
Cool.
Thank you so much.
Pleasure, yeah.
Hey, everybody.
I hope you enjoyed that episode.
If you did, the most helpful thing you can do is just share it with other people who you think might enjoy it.
It's also helpful if you leave a rating or comment on whatever platform you're listening on.
If you're interested in sponsoring the podcast, you can reach out at thwarcash.com
advertise. Otherwise, I'll see you at the next one.
