Theories of Everything with Curt Jaimungal - Vitaly Vanchurin: This Cosmologist Discovered Something Strange...
Episode Date: February 9, 2026What if physics is just the universe learning? Most Theories of Everything episodes are mind‑bending for their math, physics, philosophy, or consciousness implications. This one hits all four simult...aneously. Professor Vitaly Vanchurin joins me to argue the cosmos isn't just modeled by neural networks—it literally is one. Learning dynamics aren't a metaphor for physics; they are the physics. Vanchurin shows why we need a three‑way unification: quantum mechanics, general relativity, and observers. As a listener of TOE you can get a special 20% off discount to The Economist and all it has to offer! Visit https://www.economist.com/toe TIMESTAMPS: - 00:00:00 - The Neural Network Universe - 00:05:48 - Learning Dynamics as Physics - 00:11:52 - Optimization and Variational Principles - 00:21:17 - Deriving Fundamental Field Equations - 00:28:47 - Fermions and Particle Emergence - 00:37:17 - Geometry of Learning Algorithms - 00:44:53 - Emergent Quantum Mechanics - 00:50:01 - Renormalization and Interpretability - 00:57:00 - Second Law of Learning - 01:05:10 - Subatomic Natural Selection - 01:15:40 - Consciousness and Learning Efficiency - 01:24:09 - Unifying Physics and Observers - 01:31:01 - Qualia and Hidden Variables - 01:40:24 - Free Energy Principle Integration - 01:46:04 - Epistemological Doubt and Advice LINKS MENTIONED: - Vitaly's Papers: https://inspirebeta.net/literature?sort=mostrecent&size=25&page=1&q=find%20author%20vanchurin - Vitaly's Lecture: https://youtu.be/TagDLiLb2VQ - Vitaly's Website: https://cosmos.phy.tufts.edu/~vitaly/ - Towards A Theory Of Machine Learning [Paper]: https://arxiv.org/pdf/2004.09280 - Autonomous Particles [Paper]: https://arxiv.org/pdf/2301.10077 - Emergent Field Theories From Neural Networks [Paper]: https://arxiv.org/pdf/2411.08138 - Covariant Gradient Descent [Paper]: https://arxiv.org/pdf/2504.05279 - A Quantum-Classical Duality And Emergent Spacetime [Paper]: https://arxiv.org/abs/1903.06083 - Emergent Quantumness In Neural Networks [Paper]: https://arxiv.org/abs/2012.05082 - Predictability Crisis In Inflationary Cosmology And Its Resolution [Paper]: https://arxiv.org/abs/gr-qc/9905097 - Stationary Measure In The Multiverse [Paper]: https://arxiv.org/abs/0812.0005 - The World As A Neural Network [Paper]: https://arxiv.org/pdf/2008.01540 - Self-Organized Criticality In Neural Networks [Paper]: https://arxiv.org/pdf/2107.03402v1 - One Hundred Authors Against Einstein [Book]: https://amazon.com/dp/B09PHH7KC8?tag=toe08-20 - Geocentric Cosmology: A New Look At The Measure Problem [Paper]: https://arxiv.org/abs/1006.4148 - Jacob Barandes [TOE]: https://youtu.be/gEK4-XtMwro - Yang-Hui He [TOE]: https://youtu.be/spIquD_mBFk - Eva Miranda [TOE]: https://youtu.be/6XyMepn-AZo - Felix Finster [TOE]: https://youtu.be/fXzO_KAqrh0 - Stephen Wolfram [TOE]: https://youtu.be/FkYer0xP37E - Stephen Wolfram 2 [TOE]: https://youtu.be/0YRlQQw0d-4 - Avshalom Elitzur [TOE]: https://youtu.be/pWRAaimQT1E - Ted Jacobson [TOE]: https://youtu.be/3mhctWlXyV8 - Geoffrey Hinton [TOE]: https://youtu.be/b_DUft-BdIE - Wayne Myrvold [TOE]: https://youtu.be/HIoviZe14pY - Cumrun Vafa [TOE]: https://youtu.be/kUHOoMX4Bqw - Claudia De Rham [TOE]: https://youtu.be/Ve_Mpd6dGv8 - Lee Smolin [TOE]: https://youtu.be/uOKOodQXjhc - Consciousness Iceberg [TOE]: https://youtu.be/65yjqIDghEk - Matthew Segall [TOE]: https://youtu.be/DeTm4fSXpbM - Andres Emilsson [TOE]: https://youtu.be/BBP8WZpYp0Y - Will Hahn [TOE]: https://youtu.be/3fkg0uTA3qU - David Wallace [TOE]: https://youtu.be/4MjNuJK5RzM - Karl Friston [TOE]: https://youtu.be/uk4NZorRjCo Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
Most of my best ideas don't happen during interviews.
They come spontaneously, maybe in the shower or while I'm walking.
And until Plaud, I kept losing them because by the time I write it down, half of it's gone.
I've tried voice capture before, like Google Home and just cuts me off in the middle of a thought.
And I don't know about you, but my ideas don't come in these 10-second short sound bites.
They're ponderous, they wind, they're often five minutes long, and Apple Notes, Google Keep,
the transcription is quite horrible, and you even have to do multiple,
to get to it. Plaud lets me talk for as long as I want to. There's no interruptions. It's accurate
capture and it organizes everything into clear summaries, key takeaways, and action items. I can even
come back later and say, okay, what was that thread that I was talking about about consciousness and
information? My personal workflow is that I have their auto flow feature enabled, and it sends me
an email whenever I take a note. I have the note pin S for the shower, and then I carry this one
around me in the apartment, and I love them both very much, especially.
especially this one, the fact that I can just press it and it turns on instantly and starts
recording without a delay is an extremely underrated feature. And it's battery. I haven't had to
charge this since I received it. Over one and a half million people use plot around the world.
If your work depends on conversations or the ideas that come after them, it's worth checking out.
That's p-l-a-u-d-a-I-slash-TOE. Use code T-O-E for 10% off at checkout.
The universe is self-tuning itself.
It likes to be observed.
And so observers emerge, not because there's carefully chosen constants of nature,
but because if they were not carefully chosen,
then they would be learned to evolve towards being carefully chosen.
Five years ago, an unintuitive and startling result was dropped like a bombshell.
Professor Vitaly Venturin of cosmology found a way to monitor.
the universe as a neural network where the learning dynamics are the physics.
This has huge implications for what the cosmos is, what you are, and potentially what consciousness is, and its relationship to everything.
As you'll see in this conversation, this is not another way of saying that you can use neural networks to simulate general relativity or the standard model.
That's been done. Instead, the professor shows that the universe's own learning is a lot.
the physics. What happens is gravity falls out. The Dirac equation falls out. Klein Gordon falls
out. The algorithm behind most modern AI, copacetically named the atom optimizer, implicitly carries
a curved metric on parameter space. The presence of the curved space is essential,
essential for convergence. Space-time curvature is actually there precisely because
it makes the universe's learning efficient. This conversation spans natural selection,
at the subatomic scale, the Boltzmann brain paradox, Carl Fristin's free energy principle,
consciousness as learning efficiency, and the ramshackle state of observer physics,
which Vancheran argues, demands a three-way unification of quantum mechanics,
general relativity, and observers. My name's Kurt Jai Mungal, and on this channel I interview
researchers about their theories of reality with rigor and technical depth, even at the risk
of limiting the audience, because the slow, meticulous, candid.
approach is superior to a fast, flashy, potentially misleading approach.
The universe is a black box, but today Van Turen opens it.
We'll definitely have a part two, so leave your questions in the comments section.
There's plenty more to explore.
Enjoy today's episode of Theories of Everything.
Professor, you claim the universe is literally a neural net,
so not that it's a useful model.
It literally is, ontologically.
Justify yourself, young man.
Okay, not so young anymore, but I'll try to do my best.
Now, when you're saying that I claim the universe is a neural network
and not just a model, if I did say that at some point,
I want to take this back, okay?
As a physicist, I am not, as a theoretical physicist or as a physicist,
I am not allowed to say what the universe actually is.
what am I allowed to say is what is a good way to model it
because at the end of the day
I cannot really know or check or test
prove or disprove whether this is how the universe work
but I can test and check whether any given model
mathematical model is good for modeling
sometimes a phenomenon so
so if I did say that at some point or somebody misinterpreted
Now, I'm always talking about what's a good model of describing it.
And at that point, yeah, I have to say it looks like it's a promising candidate.
It's not a final, there's no final verdict yet.
But it's a promising candidate that should be explored, whether it is a good way,
a convenient way, compact way of describing phenomena in the universe using neural networks
in perhaps how exactly I want to do it.
We can discuss later,
but I just want to open all my cards and say,
I would never claim that this is how the universe works.
Now, if we get to philosophical questions,
of course we can say, what if, right?
What does it mean?
If this is really the universe, you know,
how the universe is,
what kind of philosophical conclusion we can reach out of that.
But as a physicist with my physicist's head on, I can only say this is an interesting, good model,
and it works remarkably well in the places where I wouldn't expect it to work well.
Okay, now people who know some things about neural nets know that their universal function approximators.
So why would it be surprising that neural nets can satisfy functions, which the universe is described by functions,
like it would be more surprising if you had the counterclaim.
The universe cannot be modeled by a neural net.
I think this is an excellent question, okay?
And it actually, now I can actually put my finger in exactly what I mean, right?
It is true, right?
Why should we be surprised, you know, neural networks are universal approximators?
Why should be surprised that you can use neural networks,
well-trained neural networks to, you know, reproduce the,
dynamics that we observe in classical or quantum systems.
With quantum systems, I'll have to take it back.
It's not so obvious.
But at least for classical systems, yeah, I would say, why we should be surprised?
The difference that I'm proposing is that I'm not only saying that the trained network
is good at describing a given function or a given dynamics, but actually the process
of training, the process of learning is a part of dynamics.
So it is different, right?
So now it's not for me,
enough to just show you, here it is.
Here's a train network.
It describes well harmonic oscillator.
No.
Because to train it, I've used trainable variables.
I use some kind of learning algorithms.
And that dynamics didn't disappear.
It must be there.
It must be part of me telling you
why harmonic oscillator can be described by a neural network.
So if we remove learning, it's an almost trivial state.
If we say, no, no, no, let's talk about the entire dynamical system of neural network
that comes with learning dynamics, that comes with activation dynamics,
and can that thing together, combine thing, can that be useful model for describing the universe?
Okay, and what learning algorithm are using?
Right. So now, at the very beginning when I start, you know, studying this subject,
I just took the most popular stochastic gradient descent and, you know, let's just see it looks like
the one that used little resources and still does an amazing thing. For me, it was interesting
that even a very simple algorithm can produce behaviors that, you know, we physicists don't
have tools to describe, okay, just because it's a learning dynamic.
So, so that was like five years ago.
And that was, that was all, and it was already some, some very spectacular result would
come out of this with the collaborators.
We've showed that, you know, quantum behavior can emerge.
We can discuss all that later.
Now, now I've learned more.
I knew about this, but now I actually understood more that it isn't just stochastic gradient
descent that is interesting and important for modeling the phenomena.
But there are other well-known algorithms that anybody working in machine learning know about
and use them daily, atom optimization optimizer.
This is one example.
And it works for many, many problems, it works much better.
It has much better learning efficiency.
You know, you train your model and loss function goes down much faster, right?
So I wanted to understand more recently why is that the case?
What's the physical reason?
What is the physical reason why it works better?
But now it isn't just stochastic gradient in sand,
but it's a whole class of learning algorithms.
We call it covariant gradient gradient.
Covariant comes from the physics definition of covariance
that we can again discuss.
And those algorithms, atom-like algorithms and their generalizations,
give something
something I haven't again expected.
There's like always you study something
in machine learning you don't expect and you get it
and so in particular
the emergence of curved
curved space and space time
comes naturally
if you are actually thinking about
covariant gradient descent algorithms
or such as Adam and
so the presence of metric
there
you know whether the people
in machine learning community
know that there is a curve metric or don't know about this,
that the presence of the curved space is essential,
essential for convergence,
essential for algorithm to be efficient.
And so, yes, I originally it was a sarcastic gradient descent,
but there's new things come up like it,
every month that I learn about, yeah.
So the audience of this podcast are researchers and computer science,
but also researchers in physics and math,
and that's more on the hardcore stem side.
But then there's a large swath of artists
and miscellaneous laypeople.
It's quite interesting because it's one lump here
that's quite hardcore,
and then another lump on the softcore,
let's call it side.
And it's interesting that the overlap is small.
It just goes extremely nitty-gritty PhD level
or much more, much, much more layman
wondering about ontology and philosophy and so forth.
I forgot to mention there's also researchers and philosophy.
Okay.
To those who are not computer scientists,
a neural net is what?
What is the minimum someone needs to know?
Right.
So I mentioned it earlier,
but let me just discuss it one more time.
The neural networks comes with this one feature that even I as a physicist wouldn't know, as a research in physics, wouldn't know.
And that's the learning dynamics.
And that's the dynamics that is taking some function that machine learning researchers call loss function,
some cost function you can call it, some function that you're trying to optimize.
What is it that you're trying to do?
You don't have to be a scientist or a researcher to understand that there is some kind of optimization.
There is something that you optimize.
And so the neural network dynamics comes with that something, that goal, you can say.
What's the goal of the system?
What is it trying to do?
It's trying to speak English language without mistakes or action.
Or it's trying to do speech, to text.
recognition. Whatever it is trying to do, this is the difference. The presence of that
objective function, this is the loss function, something that essential, and that is, well, I'll
just once again, it isn't something we're used to in physics. And so, and, and, and, uh, it is
something that machine learning people are used in machine learning, uh, research, uh, but, but not
us. So I had to kind of try to, to get all of the nice experimental results in America
obtains from machine learning,
try to use our toolbox that we use in physics
and try to understand it.
But yes, we are trying to tell this story
to people who are not running models every day
or writing equations every day.
Then that's the difference.
So you have a system that has this one boring dynamics
that we know about before activation dynamics.
There's some state and it keeps changing.
according to some law.
And then there is this learning dynamics,
that there is some objective function
that the system is trying to optimize.
So the presence of those, you know, two things is essential.
Now for neural net, okay, well, first the optimization,
physicists do know about it
if they're doing any minimization of the Lagrangian
or extremal point of a Lagrangian.
So is there something particular
about the way that the technique
of optimization from neural nets
compared to other optimizations
that's well suited for describing
the fundamental laws in such generality
that you've been able to find out?
Good point, yeah. So we do use
variational principle, right?
So we study the extrema
of Lagrangians, right? Action,
actually, right? So we are
interested to find certain
solutions, we take this
beast, which is called action,
varied with respect to degrees of freedom, and we are
interested in its minimum or maxima.
Now, what is new here is that you are not only interested in minimum maximum,
you are interested in the entire, you know, trajectory from whatever you started with
to whatever complicated state you're going to get.
And that complicated state will certainly satisfy some variational principle in some sense.
And that's where you will kind of, because of that,
see the emergence of some kind of classical-like behavior.
Even out of this equilibrium, right, there is this whole evolution that takes your learning optimization evolution to reach the minimum or maximum.
That is present in optimizing machine learning systems and isn't present in physics.
Now, I have to correct a little bit because, you know, right now physicists adapt machine learning to solve lots of problems as a tool, okay?
Not as a model of physics, but as a tool, right?
Yes.
So let's say you have a very complex quantum many-body system.
You're trying to find its ground state.
You're doing some difficult problem.
And so, of course, you're going to use all the tools there are,
all the computational tools there are, including using machine law.
Now, when I'm saying that a physicist aren't used to optimization as a model of a system that they're trying to say.
But as a tool, absolutely.
This is a great tool and it's been used by physicists
and used by physicists more and more now.
What is the input into this neural net?
Right.
Okay, so if we are talking about the neural network
as a model of the entire universe, let's say,
then that's all there is.
This is the state.
You know, you describe the state of all neurons,
they describe the state of all connection weights,
and that's the state of the system.
That's your input.
This is your initial state.
So in physicists, we actually have a very nice setup of modeling everything.
We say, well, you need two things.
You need to know the state and how it goes, right?
Now, quantum mechanics, again, putting aside,
those are the only two things you need.
So the state or the input of this neural network
is the state of all neurons, right?
And then, and the state of all of the trainable,
trainable and non-trainable variables.
And then they evolve according to the one side activation dynamics
and the other side of the line.
So this stays, so this setup that physicists came up with,
there's actually a mathematician have a much more general setup,
dynamical systems set up, right?
Then they don't even bother whether the dynamics is Hamiltonian
or, you know, satisfies some kind of,
constraints, there's like energy like function. They don't care about that.
So what I'm talking about, it is a dynamical system, but it isn't a dynamical system in a sense
where you restrict yourself to classical hematolline-like dynamics.
In traditional physics, the input may be the state at time zero, and then the output may be
the state of time T. In neural nets, let me just talk about an image classifier. So an image, let's
say you're given an image of a dog and it's just a square image and maybe it's 20 by 20 pixels,
and so it's 400 pixels. Then you have 400 numbers as your input. I mean, if it's a gray scale.
And then at the end, you want to know, is it a dog, is it a cat, is it a flower, or what have you?
So however many categories you have here is your output. So what is the input on this side?
Is it the whole state of the whole universe? And then the output is the whole state of the whole universe again.
Very good. No. No, but very good question. Again, this is,
So now you have your, we'll keep switching heads.
So now you put your machine learning head on and said,
okay, here's like I understand what you're talking about.
And I'm saying no.
So in this sense, the entire network with the input and the output
before you even started propagating your image through
and figuring out whether it's cat or a dog,
this whole thing, the state of all of the degrees of freedom,
is the state of the system, not just input, the whole thing.
Now, in the case of the cats and dogs classifier,
it happened to be that there is, in your problem,
there is a clear distinction.
What is you calling input, and what is it that you're calling out?
So there is a kind of flow of information in this direction.
But this is just because you set up your network that way.
You didn't have to do.
You could have used the recurrent networks.
You could have used a lot more complicated loss functions.
For example, in this case, your loss function would be, well, did I get the right?
Is it a dog, you know, zero or one at the end?
But it's just, you're talking about restricted class of machine learning problems.
In this case, the information really flows in one way.
Now, if we have the entire network, an entire universe described for a neural network,
it may happen then at some place there is like only left-going wave or right-going wave,
where information only goes one direction.
But that's because of the initial conditions that you set up,
not because your network cannot start up with some other states.
So imagine in your example with a cat's a dock,
imagine that the zero and one that you got at the end,
you look it back to the input, right?
And then now it may not do something that you wanted to do,
but it will run.
you will get this, you know, pixel changing one of the pixel in what you call input,
and then going through.
So in this case, still, the whole thing is input at previous time step.
So probably better call it the state of the system in the previous time step.
And then once, you know, one step of activation took place in the next time step,
and then another step of activation took as a third time step and so forth.
And that's kind of time evolution of the activation dynamic.
And then there is learning, right?
So then they have to upgrade your weights,
which is a game.
And you can just keep going,
keep activating and learning.
Boarding for flight 246 to Toronto is delayed 50 minutes.
Ugh, what?
Sounds like Ojo time.
Play Ojo? Great idea.
Feel the fun with all the latest slots in live casino games
and with no wagering requirements.
What you win is yours to keep groovy.
Hey, I won!
Beal the fun.
Play Ojo.
when passenger Fisher is done celebrating.
19 plus Ontario only.
Please play responsibly.
Concerned by your gambling or that if someone close you,
call 18665331-2-60 or visitcomex-Ontario.ca.
Some of the key equations in physics are general relativity.
So Einstein's field equations or Dirac or Klein Gordon.
I know that you're not able to, with your words,
say how you derive them exactly in such a way that is rigorous.
But we can, of course, point to your papers and lectures on screen right now.
But either way, can you just walk us through as much as you can with your words as to what you started as your input and how were you able to get these as outputs?
Sure.
So let's start with the field theories.
So we'll know very well that the standard model, standard model of high energy particles, high energy physics is very well described by the collection of fields.
And so if you want to get that physics out of your framework, mathematical framework,
you want to show how fields will emerge, how we would get fields out of it.
Now, it's a difficult task.
So let me just put it right away.
And it's not something that I can say, well, here it is.
I get, you know, quarks, three generations.
I get everything.
And it's simple.
And I can, like, write one paper and go home.
No, it's not even close.
It took years to get Dirac equation out of it.
Okay?
So Klein Gordon was easier.
Hamiltonian mechanics was easier.
Getting Fermions, getting Dirac equation,
it turned out to be a difficult task.
So just since we talked about this direction of information flow,
it turns out that for the direct field,
there's some tensor factor, something in your,
neural network set up has to have an
anti-symmetry in it, so it has to be anti-symmetric.
And so if you put that in, if you put this constraint,
now why would you put this constraint?
I don't know.
Is that something that this constraint was learned
because of some kind of microscopic optimization
algorithm that's running?
Great. Can I show it? No.
What I can show, if I assume a certain constraint,
if I only take into account
certain trainable degrees of freedom.
That's essential, so we cannot throw away trainable.
And certain non-trainable,
then the dynamics resembles, you know, lattice field theory,
where, you know, individual nodes would be like neurons,
and they would have some very precisely defined, like,
connections to each other.
It's not like, you know, any connections would do the trick.
So, as I said, getting Klein Gordon's scalar field equations was easier.
It's, like, kind of more generic.
Getting something like Dirac is hard.
And we're not there.
I'm not ready to write down the standard model Lagrangea
and say, well, here it is.
So that's for the field theory.
Now, the other part is Einstein equation.
Once again, telling you that I have finalized my understanding
how Einstein equation emerged from this framework would be a lie.
This is not true.
But what I do know, I do know how to get emergent space.
curved space from it.
I also know how to get emergent space time from it.
That's again, I mean, it's like...
And that's a subtle difference that most people wouldn't pick up on.
Okay.
So expand on that, please.
Sure, sure.
So, you know, the space you can probably understand by showing like surface of an apple, right?
Or a chip, potato chip.
And you'll say, well, it looks like two-dimensional surface.
And, you know, since we are three-dimensional beings,
for us easy to look and say, well, yeah, there is...
It's curved.
It's not something flat.
I cannot put it on my table, which is flat.
And the same for the potato chip.
If I take a potato chip, which has negative curvature,
Apple would have positive curvature.
Curvature, if I put it on the table, it wouldn't be laying down.
So that's kind of our understanding as three-dimensional creatures of what curvature looks like.
Now, this concept can be generalized to 3D.
Now, I cannot now actually, you know, draw it or move my hands
because I am in 3D dimensions,
but we know the tricks.
We know the tricks how to do this calculations,
how to imagine.
We even know how to draw three-dimensional objects
on two-dimensional pieces of paper.
So it's not so surprising that we are able to carry out calculations in 3D.
And so when I'm saying the 3-D curvature,
I mean the three-dimensional space, which is curved.
And that turns out to be actually not, you know,
some feature of this theory.
It should be a feature of any theory.
of everything.
So if your theory doesn't produce
in some limit
emergence of the curved space,
then you are against Einstein
and of course,
this is one of the most beautiful theories
that we have, and we cannot
just throw it out of
our considerations.
Okay, so that's three-dimensional space.
Now, for that space time,
that again,
involves a little bit
of
if you want to
understand it correctly, you have to write equations
but since the audience is
by modular distributions, we should
try to explain what space time means
even
in that sense. So what
turns out to be that
the space time
or space, when you're
talking about that, you have
to tell how you measure distances.
So what do you mean by
distances between two points? If you
have that definition of how to measure
distances, which has to satisfy certain
requirements, then you know
what kind of
space you are dealing. And then the
apparatus for
that, we call it metric tensor.
Not very important. And space
time comes with very,
very strange
at first. You tell
to any student that this is how
distances should be measured and they will
question why. This looks bizarre.
It turns out that
to measure distances in the Euclidean space or in just space,
you take like X square plus Y square and take a square root of this, Pythogran theorem.
Well, it turns out that if you are working in space time, this isn't true.
You should not be adding the two squares and taking a square root.
You should be subtracting.
So one of those squares which corresponds to time coordinates has to be subtracted.
And because of this stupid science difference, there is a huge difference between
space and the space time.
And so it took, you know, some time to get the curved space,
but if you cannot get space time,
then again, your theory is not in agreement with observation.
And we do observe a curved space time.
You know, my background is in cosmology,
and the space time is important there.
I hope I wasn't too technical because...
No, no, no, no.
And I have a technical question.
You're absolutely right.
Actually, I love that you said that because this is always true.
There is those who actually know the terminology and would appreciate me speaking more like using the physicist or machine learning terms and that those who don't and you don't want to board any one of those.
So the aim of this podcast is to aim toward researchers, toward postdocs and graduate level PhDs and professors and so forth.
and that the advantage of this podcast or the niche part of it,
the difference in it is that it's as if for that other distribution of people,
they finally get to peer into what it looks like when professors are talking.
I'm not a professor, but you get the idea.
So, okay, you mentioned lattice field theory,
and lattice field theory has a problem with Fermion doubling.
So I'm curious if anything about your approach helps solve that problem.
problem. No, and we're not there yet. Not even close to actually fitting the lettuce-like field theory. I shouldn't say it lettuce field theory because it isn't, but it is in a sense of how their weight matrix is arranged. So you have a lattice. Now, this is actually why I am not happy with this particular model of how fields emerge. Okay. There is
now another one, another approach, which I wasn't able to take as far as getting thermons out of this,
but the approach is that it's closer to particles as opposed to fields.
Now, we do know that fields work better, and the particles are kind of only, only a good description of certain links, right?
But so speaking of that, second approach is neurons or some sub-networks, they behave like particles in the emergence,
space and that emergence space is the space of actually
trainable variables. And I already mentioned Adam like
algorithm and so, you know, machine learning people would say, okay,
now I know that Adam comes with metric and there is curvature.
Now, but from the point of view of the physicist,
it's more like there is second approach again. As I said, you know, the theory is not
final. And so you take all approaches you can and you're just trying to say,
okay, well, what can I say? Can I get
a firmness? And in this second approach,
it is as if what you have
we have some kind of sub-network
of neurons that
are doing their usual business,
activation, learning dynamics,
but their motion is considered
in the space of trainable variables.
And that space does
not have any lattice structure.
It is just a completely continuous
space. And then you do have
places in that space where no
states are occupied.
like vacuum.
And even if there are, you know, once in a while, certain neurons appear to have such
and such configurations of the trainable variables.
This is not a field.
It's a kind of discretized, more like similar to particles, as I said, but also strings, right?
Strings are assumed not to actually be fields in a sense of occupying.
They're like, you know, one-dimensional objects.
So I think it's probably a good idea to say like, you know, fields that work extremely well are three-dimensional objects plus one-time.
Strings are one-dimensional objects plus time.
And these neurons in this second picture are like one-dimensional objects plus time.
Oh, zero-dimension.
Sorry.
You're right.
Okay.
Okay.
I think it will be super useful for people.
If on screen right now, the video editor will place in what a neural net looks like.
like in terms of, we're then giving tutorials on what neural nets are.
And then I think what's useful would be for you to say what your theory is not saying.
So for instance, in the beginning I said, why at all is this surprising if a neural net can
approximate any function?
You're like, well, but that's not word.
We're not saying that.
Okay.
Something else you're not saying, and again, referencing this image, is you're not saying
that each one of these nodes is somehow space discretized.
Exactly.
Because there are other causal set models and,
causal dynamical triangulation models
and other discretized forms.
Okay, you're not saying that.
Neither are you saying that this is a hypergraph model
like a Wolfram model.
Okay, so when you start to talk about this
with your colleagues, what else do they think you're saying?
But you're like, no, no, no, that's not what I'm saying.
It's this.
Yeah.
So the first thing you identified right away
and you're absolutely right.
That's what people think, and I'm saying,
no.
Learning must be there and you're absolutely right.
Now, the second thing is,
I am in the superposition
of saying and not saying it.
So I'm saying there is two possibilities
and both of them are being explored.
One possibility that it is like
lattice space,
whether it is square letters
or some other letters
where it has triangulations,
some hypergraph-like model
which is, and that is a possibility
and that is a possibility
that I'm exploring.
It comes naturally
because neural networks are natural,
you can easily get a graph out of this.
Now, in a distinction from the models,
other models where you have this network or graph-like structure
is that I am constrained to how this network will evolve.
I'm not able to just say, look, you know, you have a graph,
now I want it to form a torus, or I want it to be flat.
I'm not able to just impose rules
without saying where they come from.
So where they can they're from from
is for me to specify actually the one most important object
in this entire theory.
Like, you know, in physics,
we have one object that kind of describes the entire series,
Sir reaction or Lagrangian.
You give it Hamiltonian and you're done.
So here, you have to specify loss function.
And loss function is a very strict object.
I cannot write, you know, it's a scalar, right?
So you have to have to pay attention to that.
And so if you want to use this hypergraph-like structure
and you want to see how it evolves,
and you know that experiments suggest that you have to form such and such
approximated geometry, you have to go back and say,
all right, what loss function would give you that?
And that kind of puts a limit.
So this is one approach, which again, I say,
and don't say because in this approach,
I do say that.
In this alternative approach,
it's like two types of neural network theory,
if you wish, right?
Type one is that, yes,
you discretize it and you work with it.
Type two, your space
is the space of trainable variables
and things involve in that space.
And there are pros and cons of both approaches.
And in the first approach,
in the first approach, your space is discrete.
There is nothing between the nodes.
In the second approach, your space is continuous.
Continuous trainable variables, you know, they were not continuous.
You wouldn't be able to use gradient descent or atom, whatever.
And so, and you try both things.
And you try, and you see that one approach helps you.
And it's very similar to what we do, particles versus, you know,
Let Us Phil theory, which comes with its own problem,
both come with.
problems. But yeah, I don't want to say that I don't say that, but I say that in addition to that,
I also, you know, investigate this other possibility that actually recently proved to work
better in a sense because the curvature emerges not because I've assembled my graph in a certain
way, but because it is an algorithm, which is more official. So the curvature is a way for the
system to learn faster.
Not for, because of, so it's like a
one direct way of saying where the
curvature, where the geometry comes
from. I do know that
in machine learning, literature people are using
Adam, they're not using the terminology
of the curved
space of trainable variables which
emerge from learning. It's again, it's not
something you specify ahead of time.
It's emergence as an efficient algorithm.
But this is a,
what I'm saying here.
So it's not the curvature of the lost landscape
that corresponds to the curved space time of our universe?
Oh, absolutely no.
No.
It's like you were saying,
I think it's a very useful analogy
when you're talking about loss function
is to think about Lagrangian.
So it's not the curvature of the Lagrangian landscape.
Yes, okay.
That gives you the curvature of space.
No.
It is the degrees of freedom
in the Lagrangean, which we call metric,
which describes a space which is curved.
So, yeah, so there is a big, big distinction key.
And same here.
Now, maybe this is a good point to,
since I'm drawing this connection between Lagrangian
and loss function,
originally I was associating the loss function
with more of the energy.
and because there were like
a stochastic thermodynamic description
where a canonical ensemble naturally
would emerge from that picture.
Later I understood that adding a kinetic-like term
to the Lagrange, to loss function
actually makes learning in certain situations better.
So like, you add one more term,
which isn't maybe the term that you are trying to optimize,
but once you add it to the loss function,
and once the loss function uses this term,
it learns fast.
So there is this little bit of a new twist.
If you want to minimize something,
you may actually use a stochastic gradient,
in the sense, something else,
something with an additional term.
And so in this case,
I think it's a very good analogy
to think about loss function,
as a Lagrangian, although there are different objects, of course.
Local news is in decline across Canada, and this is bad news for all of us.
With less local news, noise, rumors, and misinformation fill the void, and it gets harder
to separate truth from fiction.
That's why CBC News is putting more journalists in more places across Canada,
reporting on the ground from where you live, telling the stories that matter to all of us,
because local news is big news.
Choose news, not noise.
CBC News.
Earlier, you said that the quantum dynamics were extremely difficult, non-trivial, or what have you.
So walk us through the insight when you were studying machine learning.
Why were you even studying machine learning?
You mentioned cosmology.
I don't know about its connections there for you in your particular use case.
Anyhow, walk us through you as...
Circa six years ago or so.
Yeah.
Okay, so six years ago, I was on sabbatical leave.
So when you're on sabbatical, you can do whatever you want, right?
And first I finished the project that I was interested in at that time,
and I had to do with certain dualities of quantum mechanical, strongly coupled systems
that I thought would be a good candidate also to describing curved spaces.
and then, you know, quantum gravity aspects of that.
And then I had time.
And so I attended many, many talks by machine learning people
who would present nice slides, nice results,
and no formulas, no formulas, no equations apart from something, you know,
like a sarcastic gradient in a sense,
something that kind of looks trivial.
But I knew that neural networks, and they would always say, well, it's like a black box.
Black box, meaning it works.
We don't really know why.
So I had time.
I have a few months, and I said, okay, why not just try to open this black box?
You know, because the universe is also a black box.
Nobody in the beginning told us that this is the standard model.
And somehow we came up with the tools of Lagrangian Hamiltonian mechanics to actually
understand why it works, maybe not understand why this particular Lagrangian, but understand at least
how to model it. So that was my motivation, taking it, and it has nothing to do with quantum
mechanics and so at this point, I see the system with many degrees of freedom. They evolve
according to learning and activation dynamics. So I knew that some of the physics will be relevant,
but not all of it, and because it would be more complex. But if you see a system with many
degrees of freedom, your first reaction, well, maybe you can discuss ensemble, statistical ensembles,
and maybe in some limit, you can understand how the system behaves in what we call a Mergent
regime.
So something like, can we have a thermodynamics of machine learning?
So something along this line, I knew it was different because of the learning disdemeanor,
So that I figured out sooner, but can we have a certain thermodynamic description,
which can be verified?
Now, is there a notion of temperature?
Is there a notion of entropy?
Is there the first, second law of thermodynamics,
would they still hold or do they have to be metified?
So that was kind of, you know, you have a toolbox that you think should be the first
you try to model the system.
And that direction I went.
before, as I said, quantum mechanics was not on the horizon.
But then I saw that because of the learning dynamics,
it is not, the system just doesn't go to a boring canonical ensemble distribution and stays there.
It has a very interesting behavior, even in equilibrium,
because of the presence of those two different dynamics, activation and learning.
And so the idea was to set up some kind of variation.
principle that maybe describes it beyond thermodynamics.
So first, you know, thermodynamic,
are there any macroscopic objects like temperature entropy that we describe it?
But maybe we can go beyond it.
Since this equilibrium, I call it learning equilibrium,
it's kind of boiling and then, you know,
things fall out of the equilibrium and go back and it's kind of not.
Can you right really, you know, zoom in and say,
well, I don't want to just calculate temperature, pressure,
volume, whatever you usually do, although you have to still define all those things in machine learning
system. Can you zoom in and say, I'll pay attention to, let's say, only trainable there.
I will still integrate out and kind of coarse grain over non-training, but we'll pay attention to
train them. Well, the reason for that was that we know that the non-trainable, they flip very fast.
you know, you'll put your input image and get cats and dogs on out in your example, right, zeros or one.
So this activation goes fast, and the learning goes slowly, and then you calculate your loss function and gradually propagate changes.
So I knew there was two scales, and if anything we know in physics, that's what we should do.
We should integrate out, remove the irrelevant information and keep on irrelevant.
And so that's what I did.
So I integrated out this and say, okay, how does this system behave?
And it turned out that behavior of this trainable variables.
If you assume, you have to assume certain principle for how entropy changes.
So I assume maximum entropy production, extreme, stationing entropy production principle.
But if you assume that, the equations you derive from that are the Mandolonga equations.
Now, Mande Lunga equations, again, those who know know,
but those who don't know, it's close to quantum, but it isn't quant.
So there is still this step of, you know, quantum is this theory where you,
what propagates is not the probability, but a square root probability, square root of
probability, and that's the relevant degree of freedom.
And that comes with this quantum phase, complex numbers, right?
So we all know that, you know.
And so if you only pay attention to the mandolung equation, as you limit, this isn't quantum
yet, but it gives you a hope.
So maybe you can actually understand
why this complex phase would emerge.
What is the physical meaning of the complex phase?
Not like exactly quantum mechanics,
but maybe as an emerging quantum mechanics.
And then I collaborated with Katsnelsen and he pointed,
corrective pointed out that we need the discreteness
of the phase for this to work.
We don't have to call it a phase,
but something has to be discrete.
Something you change discreetly
and your loss function, let's say, doesn't change, right?
Much.
Or the dynamics doesn't change.
So that's what the meaning of the complex phase,
you're rotated by 2 pi and you come to the very same point.
Right.
So having this within the system,
and it's like having age bar,
having age bar, having something that...
Yes.
It's like, without this, it isn't quite quantum yet,
although in certain regimes, your system.
And so it took this...
little extension
to the original derivation of
mandoling equation, like
almost classical quantum
equations. And that came
from very interesting
picture,
suggestion that we made
that actually can you can explain
it to arches to anyone.
So you have your system, a learning
system, and yes, you pay attention to
trainable variables, and then, yes, they follow
almost
a shorteninger equation. But for you to
For a Schrodinger equation, the system has to have access to a bath, to a reservoir of neurons, that it can borrow.
It's like, you know, external resources.
You run your machine learning system, but you say, well, if you need, here is a few more neurons you can use.
You can plug in.
If you don't need it, just give it back.
And so if you have this access to the system, in physics we call it Grand Canonical Ensemble, move from Canonical Ensemble to the Grand Canon.
So if you do have that, in your algorithm, you provide that option.
Whether it is an option that you provided by actually programming it this way,
or whether it's an emergent phenomenon,
because there is an emerging phenomenon that certain neurons stop working and start working,
stop working and start working.
If you do that, then it turns out that you do get a shoddinger equation.
In some limits, again, it's not an exact shoddinger equation,
which means that in certain limits
it should be violated
but with that little twist
with this space
of neurons that you can
kind of hire
like you hire to do some work
and then once you don't need it
you put it back
with that your dynamics effectively
becomes linear
because Schrodinger is linear
and described by Schrodinger equation
and then everything fell into places
and at that point it was actually
more than just a proposal
for some abstract theory
that in certain limit can describe
mandolin blank equation,
which isn't quite,
but it is actually,
you could see that,
well, maybe this quantumist can emerge
from completely classic,
it was completely classical system.
You don't understand.
So there's no,
it's not the quantum machine learning.
It's a classical machine learning,
but with this little twist,
it acquires this quantum like behavior.
I do want to get to consciousness.
Before we get to that, I have a sort of a silly question.
So in machine learning nowadays, there's a huge field of interpretability.
So people want to peer into the black box.
And it makes sense to some degree when it comes to LLMs because there's semantics
underneath of what LLMs are trying to capture.
So you're trying to understand what are the LLMs doing.
But when it comes to the universe, more abstractly here in your model,
is our universe interpretable?
Right.
So let me start with machine learning,
then move on to physics
and then maybe to more general answer the question.
So in the machine learning,
the fact that we need to interpret
how machine learning system works
but happening inside is essential.
And I already said that physicists
had dealt with this problem.
And our tool was,
yeah, modeled the dynamics
of this system.
but then model it as a Hamiltonian system or as a Lagrangian system.
So that was our way to dig into the black box.
Now, with LLM-like models or with any other machine learning systems,
you start with designing certain architecture.
So architecture can be written also in a certain way that is interpretable.
In a sense, you will know exactly which blocks do what.
and that would be similar to writing not just something that produces results,
but something that produces results, and you know why it produces results,
because there is a term in the loss function or in the Lagrangian,
and if you remove that term, then it would produce different results.
Maybe your LLM would not work, your chat GPT would break.
And so in this sense, this is a way to model and understand
how the interiors work.
So maybe this is the term that is responsible for math being right in their large language
models.
This is a term that is responsible for a good translation between certain languages.
This is a term that's responsible.
And then you can kind of concentrate on the term and say, okay, well, I don't like my LM model
is not producing, you know, not doing good calculations of tensors, which is.
which it isn't.
So if you look and try to kind of,
you do tensor calculus,
all models I tried,
they all, you know,
at some point start producing
garbage results,
and so you have to kind of locate it.
So there's certain tests
they can don't know how to do,
it may be just the term
that you have to tune
in your loss function
in the Lagrangian in architecture.
Now, the same thing,
and this is interpretability.
How do we interpret?
Why is it, you know,
certain things work,
and certain things don't work?
What is it inside of the state
of the neural network?
that is responsible for one thing.
How to do alignment, right?
What is it there that I can say to my neural network
and then in some sense it will align with my interest,
with my loss function?
Yes.
So we know how to do it with physics.
I think this toolbox in physics can be useful
to enhance our interpretability,
interpret how the LLMs work,
how machine learning systems work.
And coming back to the physics, well, we also should be able to dig deeper than just writing down a symmetry group and saying, okay, well, this is the standard model, this is it.
You know, we can start asking question, why is it that the case?
You know, what is it?
You know, because in this description, the field theories on everything we observe in microscopic levels are emergent phenomena.
They would come from some microscopic loss function where, just to throw one example, you know, maybe each new.
If a neuron wants to minimize entropy or maximize certain local loss.
So if you want to minimize entropy, it's not a good idea to connect to all neurons,
because then it will be chaotic.
Every state, every next time step will be chaotic.
But not connecting to anything was also good.
So maybe neuron individually on a microscopic law will try to find some.
And then there'll be some kind of RG flow where this microscopic loss function would give rise to
so more macroscopic behavior,
and then we would say, okay, well, at this level,
it is described as a standard model.
But the flow doesn't stop there,
and then you go, okay, biology level.
And a lot of work that we've done, again,
we may touch upon this during discussion of consciousness,
but there again, I mean,
it doesn't mean that at a biological level,
it should be the standard model
that governs the correct description.
So if you can identify how the interiors work,
you can kind of,
RG flowed and understand how things will.
Now, and for the more general audience,
this is just a way of saying,
we are describing the universe around us,
and depending on the scale on which discuss,
the different languages are appropriate, more or less.
So in a very microscopic level,
the language is maybe neural networks.
On the bigger level,
maybe the right languages are field theories.
Interesting.
Now even bigger,
genotype or phenotype and then even bigger.
And so you should not be surprised in this approach that on each level
there is just a different language which is correct for describing certain
on each scale on each energy, different configurations.
There will be different languages which are more appropriate.
That's kind of what we are saying here.
And it shouldn't stop here.
Once we move on to gravity, cosmological scales, yes, we are trying right.
now to use the language of field theories to describe cosmology, inflation, gravity,
but things don't quite work on a cosmological level.
There is this dark energy problem, there is dark matter problem.
And so maybe once we adjust the language and start describing those scales using different
language, different modeling, then you may have more agreement with the experiments that
than we have now.
So it would now be a good time to talk about your second law of learning?
Sure, sure, sure.
I mean, I've touched upon this.
So I think it's important to emphasize that just like the second law of thermodynamics,
we really like it.
But we should understand that it doesn't work.
It works all the time until it doesn't.
So we should not take any microscopic laws should be taken.
with a grain of salt.
But nevertheless, it is useful for many, many different calculations.
And so second law of thermodynamics tells that the entropy should grow.
And then if you really take it literally, then you'll have hard time explaining the emergence of life.
You'll have hard time explaining many biological phenomena if you want to be honest.
Now, if you want to...
Wait a moment.
Why would you have a difficult time explaining the origins of life?
because it's always that global entropy increases,
but you're going to have local.
Right, but okay, but if you're only talking about global entropy,
there is just one equation, there's one number.
It's not very interesting.
What's interesting is what, and okay, it's increases or decreases.
It doesn't matter.
What really happens locally,
because, you know, in the thermodynamics,
it says you have any subsystem big enough,
and they're locally entropy will grow.
And so instead of just having one number
in the entire universe that you,
observed that grows, what's more interesting is to pay attention to what happens in the different
subsup systems and then explain how the entropy and how you define the entropy. It's also like, you know,
because if you talk about gravity, it's very important to think about how you actually define
entropy. This is, now you have a space that is curved. Now, if things, if gravity pulls things
together, wouldn't it mean that
the entropy decreases, right?
So you have, kind of
before they would be distributed everywhere
and they pulled together. Now, naively, you would
say it decreases, but then I say,
well, no, no, no, no, I'll define entropy in different
way. I'll attach
the problem, so that it
doesn't. But again, if you
only pay attention to one
number that you want so badly
to increase, then
okay, let it be, I'm
saying that the usefulness of
of second law of thermodynamics
that can be applied for many subsystems,
very successful.
And then when it, but there are certain things
that with this you will have hard time explaining.
And as a cosmologist, I have to say,
you know, we have one beautiful theory in cosmology
called the theory of cosmic inflation
that is kind of, is hopeless.
It doesn't know what to do
if you try to assign probabilities
for observers to emerge
and if you use the usual classical mechanical
or physical approaches and trying to calculate,
what's the probability of a certain observatory tournament?
Is it easier for us to go through this, you know,
inflation, then galaxy formation, then biology formation,
or it's just easier for us to form Boltzmann brains
that are floating in the empty space
with the memory of us
thinking that we are
on the meeting right now.
And you will see that
in this very successful
otherwise models of cosmology,
you will see that very often
you get the answer that, yeah,
all this
correct,
what we think is correct,
it just gives you a lower probability.
It's higher probability
for you just to nucleate out of nothing.
And that's called the Boltzmann brain problem.
So no, it's, once you, first of all, once you enter gravity territory, cosmology, you have to be careful about what you call entropy.
And then there is this problem of defining probabilities for us observing what we observe.
So, and people employ different ideas, one of which is on tropic principle.
Well, let's put that in mind.
So maybe we should not be just a random point in space because of it.
What we were, we probably would not be observing what we're observing.
Now, maybe we should only look at the places that are actually tuned for life,
smart enough observers who can ask those questions, entropic principles.
To be clear, the anthropic principle is different from an entropic principle, which is...
Good point. Very good point.
Actually, it's a lot of confusion about that, and partially,
because
because entropy is used in those conclusions,
but you're absolutely right.
Anthropic meaning with an A
starts with an entropy principle.
And that's people, you know, many,
very large fraction of physicists
don't like the principle.
They disregarded as a non-scientific.
Now in cosmology, that was kind of
only game in town.
Now, until Lee Smolling proposed
his natural selection approach,
which is, I think, closely related to what I'm trying to say.
Interesting.
With the only twist, I'm actually giving a mechanism.
Yes, yes.
A mechanism of how,
how instead of a universe being fine-tuned for life,
it is self-tune for life.
So you start with whatever you want,
because universe is learning.
So it's consistent of learning subsystems.
And the learning subsystem,
and they all try to learn what around them
in the most stupid way.
Because of that, you should not, you are not tuning anything.
It is self-tuning itself.
It likes to be observed.
And so observers emerge not because there's carefully chosen constants of nature,
but because if they were not carefully chosen,
then they would be learned to evolve towards being carefully chosen.
And said, you're giving you a physical mechanism of how,
how ideas that were, and actually,
Lee Smolin told me that this idea came.
Before him, there were philosophy.
There's always, any idea you described,
there's always some philosopher in the past who said the same.
Okay, so even, so those ideas were, of course,
but here that you can actually say what the mechanism is
for this to take place.
With MX Platinum, you have access to over 14,
300 airport lounges worldwide.
So your experience before takeoff is a taste of what's to come.
That's the powerful backing of Amex. Conditions apply.
And then in the comment section there'll be, oh, and this philosopher is predated by the Vedic texts.
Yeah, there's always someone before.
Yeah, but I don't think there is competition here.
Just with every new time you rediscover something, what you're trying to do is trying to do is trying to
to make it more rigorous using the tools you currently have.
So, you know, there were no machine learning systems 100 years ago.
There were no neural network dynamics back then.
Yes, exactly.
And so those are the tools.
You know, can you use those tools?
Well, speaking about the tools, there's one little problem that kind of maybe unique, maybe not.
Many times physicists came to realization that new tools are needed.
Einstein is a great example.
you know, who would have thought that curved spaces
or rimmed geometry is important for modeling the universe?
Nobody, but Einstein came around and said,
no, this is the mathematics you need.
You need differential geometry to describe.
So, but at least at that time, there was already a body of work
where you can just take this framework.
With neural networks station is a little bit different,
we have so much experiments, so many experiments,
So many experiments.
So every time you're amazed by what neural network does, it's an experiment.
And not so much of the actual theory of you being able to actually predict ahead of time.
Yes, this architecture will work.
And no, that will not work for such and such reasons.
So that's kind of the theory is a bit behind here, which is like perfect playground for the theories.
because I can set up experiment, write down my theory,
and then test it experimental and numerically right away.
So that's great.
Okay, but that's a diversion from.
I want to get to where you said that the universe likes to be observed.
We're going to get to that and we're going to get to consciousness.
But before we do, I recall you saying that natural selection operates on the level of subatomic particles.
Am I shaky in my recollection?
Not from this conversation, but somewhere else.
No, no, you absolutely, it may have not been in this conversation, as I said that, I certainly wrote about it in the papers.
And yeah, and so this natural selection like where by natural selection, I mean, the more useful configurations of networks survive because they help loss function to be minimized better.
And the other configuration will not survive.
So in this sense, natural destruction, those architectures that are useful for learning will stay.
And those who are not useful for learning will be removed because their loss function is not as low as should be.
So in this sense, yeah.
But it doesn't work just on the level of particles, the mental particles.
Remember, again, about this analogy of the right language on different scales.
So if you want to talk about this at the level of particles,
Yes, you would say particles are such, you know, the way they are is because they
underwent the series of natural selection of their scales and figured out that this is the states,
this is the state of their neural networks that describe them that are allowing their loss
function to be the smallest. But this is, can be this argument, this natural selection
argument, or I call it more, you know, learning argument, right?
So you're trying to minimize something.
It can be applied to any scales.
It can be applied to scales of biology, which we usually do.
When we say in national selection, we usually think about the scales of organisms, right?
You know, organisms against some kind of configurations.
They are more maybe fluid configurations because no two organisms are alike.
But so maybe particles, right?
So, yeah, they look very similar.
Like all electrons look very similar, but we don't know,
maybe there is some tiny difference.
And the way we are trying to understand the tiny difference,
or maybe it's already went through this very long period of natural selection,
and that is the value.
That's what electron mass should be.
And there's nothing else I can do at this point.
I imagine if you could put a mark on an electron and a bosan
and distinguish them,
then the spin statistics theorem wouldn't apply anymore,
and we would see some effects of that.
Right.
So the loss function would be, it's just not convenient for not to have fermions.
And again, we kind of understand that.
I think one good example I can give that maybe a very general audience will understand
and machine learning people will definitely understand cars, cars and moving in the traffic
and they are trying to sell driving cars.
So we assume we're 10 years from now where all the cars are, you know, all the cars are
self-driving, maybe sooner, maybe later, I don't know.
And then all of them are driving, and they're all trying to optimize their loss function.
But to optimize their loss function, they have to do some calculations about the environment.
They have to, you know, scan the environment, you know, find something, plug it into maybe
their network, and then network will say turn left, turn right.
And so, and at this time, that information that it collects, each of the cars collects,
we actually call in the physics of electrodynamics,
we call that bosonic field.
So it's electromagnetic field or green function
of other electrons that I scan around propagates to me
and that gives me relevant information
for what me to do as an electron.
Or cars scan around for other cars,
get that information and say,
okay, that's the relevant information
to me, for me to make left.
right turn or accelerate.
So in this sense, cars are like fermions.
They are advanced enough to be able to process,
not the set of the entire universe around them,
because they are tiny,
but relevant information for them to optimize their life.
And if they would be doing something else,
they would not behave as an electron,
and that would create some kind of unstable behavior,
and this whole system would not work as it should.
So just like, you know, all the cars converge to some, all self-driving cars are using the same software just because that's useful.
You know, electrons kind of using all the same software of how to navigate in the electromagnetic field.
So this is this analogy maybe helps to think about electrons as, you know, self-driving particles that have already established what is the best.
Maybe they haven't established it exactly, and so there's still internal degrees of freedom.
You know, there's spin and there are other things.
So, you know, in one circumstance, I will be doing this and other that.
And the same for the cars, right?
So you will have different cars driving in UK and US, right?
Because a left, you know, left side, right side driving.
So there's state of the self-driving softwares in the cars.
We still have to be different.
Have to have to agree.
but other than that, a lot of similarities in describing those cars would be, and the same
so if it's useful, this is a correct analogy.
There's a structural similarity between the cosmos if you zoom out far enough and neurons,
and some people use this to suggest there's a cosmic brain.
Now, I want to talk about what you're not saying.
Are you not saying that, or are you saying that?
No, absolutely not.
I'm sorry if I interrupted you, but I'm not saying that.
Okay, it seems to me like this comports with your theory.
So it would seem like you'd be like, oh, that's great evidence for my theories.
Maybe, maybe not.
Yeah.
So both of things that you said are true.
So first of all, I'm not saying that because I haven't confirmed.
No, no, visually I've confirmed they look similar.
Right.
So we've all done that.
There are papers of people who actually try to do statistical analysis,
which is the right thing to do and statistically showed that there are.
certain things that are similar.
There is a well-known critical-like phenomenon
where you have some kind of scale and variance
that is observed in the cosmic web
and observed in the biological networks.
So now I haven't mapped out exactly the dynamics
of the galaxies formation
and how all this would come around.
I've only done calculations suggesting
that self-organized criticality
or critical state is something
that you should expect to see in the learning system.
And it's a good thing for a learning system
to have criticality.
So there is this calculation that tells you,
yes, criticality is good.
And then you can say it and say,
okay, once it's good,
doesn't this confirm the observed criticality
in the brain activity or the observed criticality
that we see?
Yes.
But this is indirect evidences.
So maybe this is actually,
we are talking about
this cosmological-like scale
where the system performing very slowly maybe some kind of very complex calculations.
Or maybe we're just saying it slowly,
but actually doing some important learning task.
So, yeah, so because of the, I do not say that, usually.
I show those pictures when I give public talks,
but I do not say that I have done enough rigorous calculations
for the structure formations to say that this.
I know how to do those calculations, but there's just only 24 hours a day.
Yes.
You have a rare quality where you will assert something and then say,
and here's why it's either not fully the case or I don't believe it,
or here's the limitation of my own model, here's the counter evidence.
I haven't seen that in almost any of the people that I interview.
Okay, so I also hated that when I was a student.
because even when you're a student
and you come to the class
and then they tell you something
that they've been taught
and they take it
without actually trying to question it,
I think this is a horrible quantity.
We physicists actually can do better.
I think, I don't remember who exactly said that,
but we should be doubting everything.
So we should be doubting our own models,
our own calculations,
calculations that other people had done
even if a hundred people come to you and say
that general activity is wrong
it doesn't mean it's wrong
and we know all that that story when the 100
physicists wrote a letter to Einstein
signing that saying that general activity is wrong
and his reply was brilliant
I mean again if you don't need a hundred
if you have a point
or show me a point and I'll
consider it so yeah I
I think this is the quality that you absolutely must,
you have to show all the good and bad things about,
because I've thought about this.
I mean, of course, I've tried to answer it.
I don't have all the answers.
So I think this is more honest and correct way of doing this.
And we should be doing it not just with new theories.
There are a lot of problems with existing theories.
Classical theories are not as pure as they thought.
There are divergences and things we don't fully understand.
And we should be telling people and students about them
when we discuss all this.
So don't put anything under the rock.
That's kind of my approach.
Because sooner or later, the smarter people
will find what's under the rock.
And that's certainly very, very important.
With that out of the way, and thank you for that, by the way.
Let's get to consciousness.
Is the universe itself in your model
with the universe being the neuromet?
Conscious?
Okay.
Very good question, and of course I get this question all the time.
Now, here is my maybe a bit longer answer, but I think I need to say that.
You come with the mathematical framework, new mathematical framework, which is very rich,
which relies on neural networks and the learning dynamics,
and you're trying to use that to describe some phenomenon.
This phenomena may be physical phenomenon like we talked about,
or the phenomena that people are discussing
in other branch of science like consciousness.
So they already have a term for something,
and you're bringing a toolbox.
In this toolbox that I have or learning things,
there is nothing that I would call consciousness.
But I'm trying to use it to describe
what people mean by consciousness,
and I can have many attempts.
So I may suggest something,
and they will say, well, this looks like
a not a good definition of consciousness
because here's the system
that we all agreed,
a hundred of us agreed that is conscious,
but your system, your definition tells it it's not.
Then okay, then either I say,
well, maybe you should adjust your notion
of what consciousness,
or maybe I should adjust my definition of consciousness.
And both ways are fun.
Now, my definition of consciousness comes
from how I understand it,
how I can build it within framework
that I understand, a mathematical frame.
And then this mathematical framework,
you know, system undergoes learning dynamics,
and there are three macroscopic things
that are directly related to learning
that I can calculate.
So one of the things is how fast system adapts
to the new data set, to the new environment,
how fast it learns.
So this is the, I can actually calculate it, is the decay rate of the loss function.
That sounds to me more like intelligence than consciousness.
Right.
And well, okay, so, but then I say, hold on in the intelligence because I have comment about that as well.
So, so, so, so, so, and then I'll say, I want to, as a hypothesis, call the risk rate of decay, how fast it, I want to call it consciousness.
Maybe it will be wrong, but I will call it, because I come with a new, um, new framework and this framework I can call it.
But I say right away there are two more things that are also macroscopic, and some people may relate it to things related to consciousness, but I would relate it to intelligence.
And I would say actually three things contribute to intelligence.
If you judge a learning system how well it behaves, then there are three quantities you have to calculate.
One is how fast things learns, and I call it consciousness.
Maybe you would call it intelligence.
Second thing is how low does the loss function go?
Asymptotical.
If I had infinite time, how will it go?
Yeah, it may learn fast, but then just halt, stop.
So I would say, okay, that's another ingredient that is important, you know,
how low is a loss function.
And the third one is once it reached this asymptotic loss,
it's not going to stay there.
It's going to be fluctuating because that's what learning activation dynamics.
You just don't stop.
You never stop.
You're always in this learning equilibrium,
and sometimes loss goes a little bit up, down, up, down,
so you always fluctuate.
So how big are the fluctuations?
So then I have a learning system,
and I can calculate three things.
How fast you learn, how good you learn if you had infinite time,
and how stable what you learn.
And so I would say that because of that,
those three things, I actually describe what I think is intelligence,
not just one IQ number.
Three things.
And then you can have, you know, different people.
Some people learn very fast, but they stop again.
They are not learning more and more.
Their loss function is halted.
Other people may take very long time to learn.
And then eventually they end up knowing all of the differential geometry and, you know,
quantum field theory, what's not.
And then the third type of people who maybe also learn fast and maybe they
know all of the advanced mathematics,
but they're very unstable.
Until they keep repeating it,
they keep forgetting and then opening your book again.
We all forget stuff, right?
Do you learn something?
I forget the things I wrote in my papers, right?
There are many papers.
I have to look out.
So there is some degree of how my knowledge,
my loss function is actually fluctuated.
So I would put those three things.
I would put those three things and say,
that's intelligence, at least three.
So it's at least three, maybe more.
because actually there are more because, you know, it's a stochastic variable.
There are statistical moments, first, second, or third.
So I'm simplifying things.
You can describe these fluctuations with the infinite number of parameters.
But at least those three things are very important.
And I think when you look at different systems, you can actually say,
all different people, you can say students, right, you can say, well, yeah, this has a very good learning efficiency.
I would say he's more conscious.
And this has a very bad learning efficient.
less conscious. But again, this is just a definition. If somebody tells me that even with your
framework, I suggest it has to be, you know, a square root of two times the first number
plus square of two. I'm okay with that. So as long as my, you know, I declare what I mean by,
and then, then I'm happy. I understand the first two. But the last one about stability,
why does that have anything to do with your intelligence when it could be the universe is changing?
seem to have anything to do with you.
Yeah, so under assumption,
if the universe doesn't change at all,
so, so kind of, or changes are,
it's always changing, okay, so it's always changing.
This is what processing data set,
you, your data set,
you get always different images of cats and dogs.
You look around and every day
there is like new shape of trees and leaves
arrange themselves, so you always have that.
But I say, let's integrate that,
so it will be just some stage,
statistical state of the universe.
So no major events.
There is no, you know, major,
nothing hits the earth
and creates a sequence of earthquakes
and there's no nuclear wars.
Nothing major.
So I'm more or less in this learning equilibrium.
So if that's the case,
if nothing major change,
maybe a good example would be,
you know,
you take a bacteria as an observer place in some kind of controlled environment,
where you kind of keep maybe the temperature the same, the same amount of light or so.
So if you work with this ensemble, then the loss function will still change.
It will still change because of, you know, stochastic gradient descent.
Now, you know, sometimes our bacteria is the light appears on the left,
then it moves to the left,
maybe it appears to the right,
or an opposite direction, it moves to the right.
So there are some changes,
and it always updates its new,
it's trainable variables to,
if, why it would do that?
Because if they are state,
statistical change, I would be able to adopt.
So kind of that your ability to adapt is actually,
you know, backfires on you.
And it creates more fluctuation.
So we have to come up.
And people actually know about that
if you kind of set the learning rate
to be smaller, like in some algorithm,
then it will go to a stable, very stable minimum,
but it will not be as good minimal.
So these fluctuations should not be treated as a bug.
It's actually a feature to get out of the local equilibrium.
And so that happens all the time.
When McDonald's partnered with Franks Redhot,
they said they could put that shit on everything.
So that's exactly what McDonald's did.
They put it on your McChrispy.
They put it in your hot Honey McNuggets dip.
They even put it in the creamy garlic sauce on your McMuffin.
The McDonald's Frank's Red Hot menu.
They put that shit on everything.
Breakfast available until 11 a.m.
At participating Canadian restaurants for a limited time.
Franks Red Hot is a registered trademark of the French's food company LLC.
Am I correct in saying that you said at some point that we need to unify not two,
not quantum theory and general relativity,
but quantum theory, general relativity, and observers.
Okay, so most physicists tend to think of observers as coming from the physics,
something emergent.
Why do you think that we have to unify these three at the same time?
Right.
And most physicists will tell you, biologists somehow will, you know, emerge from,
once I have string theory, you know, completely done,
and quantum gravity quantized.
I call it wishful thinking.
There is no evidence for that.
other than we think, we kind of putting things, in one way we're saying, putting under the rug,
we're just saying if something is complex, well, yeah, yeah, yeah, but if I do long enough calculations,
if I have long enough time, that's how it's going to work. And I don't, for example, one example,
most features are convinced that quantum mechanics has placed no role for how, you know,
for consciousness, how we function, how our brain works, right? So, yeah, makes sense, microscopic objects,
why would quantum mechanics?
But we have no, you know, proof for that.
We have no, and I think it is more spiritual thinking,
again, related to how the second law of thermodynamics
is a wishful thinking that it has to be really working.
So I don't think so.
But the other answer to that is that the fact that observers are very special
and should how be understood,
I think realized by most,
physicist who pay attention to two important problems in physics.
One important problem is the measurement problem.
So every single physicist who actually seriously thought about
foundations quantum mechanics.
Not the person who is just doing shut up and calculate type of things and
like following the manual of the,
but who is trying to understand the study problem will necessarily realize that
there is a measurement problem.
Measurement problem is about this third postulate of quantum mechanics, that is very new, because in classical physics, all we need is state and how it evolves.
Here you need state, how it evolves, and how it is observed.
So that's one.
And then you kind of have to say something that maybe there is something additional to quantum mechanics that you have to describe.
Maybe it is an observer.
Observer may play a special role.
And if that's the case, if you realize that quantum mechanics is incomplete and the measurement seems to be playing a special role, then you are stuck with trying to describe it.
Now, another problem in physics that comes around and also has to do with observable is cosmology. It's called the measure problem.
Essentially, more or less, different problem, but more or less the same complication coming from observers.
So if you're trying to assign probabilities to different observations in cosmology,
what should be the right probability?
We discussed the both brain problems.
It's a part of it.
So you have to specify the rules.
You have to describe how to deal with observers.
And in both cases, if you actually think about this,
the complexity comes from the fact that we are trying to put observer into the system.
So when observer is outside of the system, we all know what to do.
We know, you know, there is Hamiltonian, how it describes.
but once you put the observer into the system,
so this is, you know, for more general audience,
people know about Schrodinger-Kat problem.
So if you put something and then the Winger's Farrant problem,
and you start putting observers inside, things start to break.
And is it important that the observer is conscious?
Right.
So at this point, no.
There's something fishy about observers,
but we don't have a model of observers.
So once you say, okay,
some people would claim, yes, consciousness is important
and they have their own definition.
I say it's important to model observer.
So if you want to put it inside in the system,
then you really want to model how it behaves
and model not just saying, well,
maybe some kind of emergent phenomena of biology will happen
and maybe some kind of wave function collapse will happen.
This isn't going to work if you're trying to do the calculation.
So my answer is, yes, observers are very important,
And that's why you really have to describe them
if you want to do calculations
even separately in quantum mechanics and gravity.
And more so, and maybe
because those two problems persist in cosmology
where is the gravity
and quantum mechanics where there is
where there is the measurement problem,
maybe the solution is actually
try to understand how observers work.
And then once you understand that,
the both theories may some kind of be unified
and their observer would be model as well
because it seems to be the problem
with all of the,
with both theories,
which again, you know, you can certainly put it under the rock,
you can stop to ignore it.
Maybe elephant in the room is,
everybody knows it's there,
but we are trying to look the opposite way.
And I'm, as you said,
I see this elephant, I say, well, it's there.
At what point do you imagine
observers entered into physics?
Is it at the plank epoch?
Is it prior?
So in this model, everything is conscious.
There are observers everywhere.
Every subsystem is an observer.
Some of them are efficient observers.
They have efficient architecture,
so the loss function falls down.
Some of them are stable observers.
They've already reached the very low value
of the loss function.
And some of them are not stable,
and they've always fluctuate out of.
So any subsystem
because the building blocks in this model are neurons
and they come with the trainable and non-trainable variables
because of that, everything is learning,
so everything in this sense and this observer,
just some observers are capable of doing
and asking perhaps more complex questions than others.
So maybe, although we don't know, right?
Maybe inside of electron, there is whole complex neural network
that has already solved the problem,
of quantum gravity and just looking at us and laughing, saying,
well, guys, I mean, it's simple.
Yes.
Maybe, maybe.
We don't know about that.
But in this model, as a model, we started with this.
I'm not saying this is how it works.
But as a model, that could very well be.
As far as I understand, there's a number that you can associate with consciousness.
How conscious is this subsystem?
But consciousness to us is far more than just a number.
We care about how conscious is someone, most of the time when it comes
the health, are they alive? Should we remove the plug? And are they going to wake up? But
our consciousness, we're conscious of so much. So in your model, do you have any qualia?
Right. So I wasn't one of the FQXI conference. And there was like heated discussion.
Every time consciousness is discussed, and if there are physicists and non-physic in the room, it's always a heated discussion.
And so should we call consciousness the person who is actually conscious in the sense of, you know, talking and interacting with you another conscious observer?
Or should we call consciousness something? So is it like a discrete? You know, I talk to you, then I call you conscious. And you reply.
You know, I talk to a dog and he replies it's conscious. So yeah, absolutely. You can then say consciousness will be, um, uh, you know, I talk to a dog and he replies it's conscious. So yeah, absolutely. You can then say consciousness will be, um, uh,
defined as a coupling,
a strength of the coupling of the organism
with the sound wave or light or some electromagnetic phenomenon.
You could do that.
You can do that.
And that would be your definition of conscience.
Maybe it's better than mine, right?
And then we would not be arguing to say,
okay, look, this is a person, he's not conscious.
But then there will be people who say, yes,
I have such and such friend or,
or a relative who is in comma,
but he is conscious.
And he would say, no,
the fact that the person is in comma
and isn't interacting with you
the way you wanted to interact,
it doesn't mean that he's not interacting
with you some other way.
And actually, I've mentioned,
I mean, I have to say this speculative idea
because, you know,
maybe philosophers will love it
or maybe not.
Sure.
But we discussed for quantum mechanics,
you need this death of neurons.
Otherwise, it just doesn't behave like...
So it could be this bat of neurons is always there,
but it's not in our physical space.
It's in the hidden space, what I call.
And if it's there, then nothing stops for a person
who's not interacting with you in a physical space
to interact with a hidden space.
Maybe that's what you do in your dreams,
or maybe, you know, people are interacting
through this hidden space all the time,
and people do claim that.
So there's a lot of people who claim that.
this special abilities, right, to interact.
And I think the reason we don't take it seriously, physicists,
I think for two reasons, we don't have a good enough framework to modeling this.
And the second reason, we don't have controlled enough experiments to do that.
But I think we should not be disregarding that when we become equipped with a better
mathematical model and with better experiments.
So, yeah, I wouldn't like this definition
where the person is conscious
only if you can hear or tell, reply back.
But, you know, chat GPT would be conscious
according to that definition
because it certainly replies when you write.
So, again, there's a lot of discussion
maybe not very important about definitions,
but we need to do this.
We need to define terms before we can make statements
and that's just not attempt to do that.
What distinguishes between trainable and hidden variables?
Like do physical entities correspond to some,
or even mental entities correspond to one, not the other, or what?
Right.
So the hidden variables in this case are like hidden neurons,
and their states are described by non-trainable variables,
something that, but all of the non-trainable variables in your network
are connected by trainable variables, they're called weights.
So kind of, you can not like just draw,
a sharp line and say here is the, you know,
trainable and here's not trainable. Very much
like in physics, you cannot say
here is electromagnetic wave and here
are electrons. They're coupled. They're all
communicating through
each other. Now the
difference of a hidden
non-trainable variables
and a physical non-trainable
variables is that the one that are
physical, they've organized themselves
in these three-dimensional structures
and they've kind of discovered
their
effectiveness of using three-dimensional space
for exchanging information and minimizing their loss function.
The hidden space at this point,
they can have arbitrary connections to each other.
It's not maybe a good idea to think about
initially you have a soup of neurons,
everything is connected to everything.
And it's all hidden in a sense that no physical space
had yet emerged.
And then there is like bubble from a big bank.
and then there's a certain number of those neurons
figured out that they can learn a lot more
like phase transition
and can minimize their microscopic loss function
if they arrange themselves in the three-dimensional space.
So if that phase transition took place,
then you still have the hidden variables,
which you can always hire if you need to do calculations.
But they need not be present in your physical space.
they need not interact with the classical degrees of freedom.
They still interact by providing you with this quantumness,
but they need not be directly observable and capable.
So that's the model for them.
And it is correct to call them hidden variables
because hidden variables is one of those.
It's called interpretation of quantum mechanics,
but it's an attempt to actually make quantum mechanics
more mysterious,
less mysterious,
trying to actually,
it comes with its own problems
and we can certainly talk about them,
but at least it doesn't put,
tries to put less stuff under the rock.
We physicists keep doing that,
like keep lying without saying that we are lying in a good sense.
We're not doing it intentionally.
We're simplifying things, right?
But one of those things is that we are,
we can say something like,
I believe in the many worlds interpretation.
But once you corner the person, he will admit that it's not as clear and they're abroad.
So, yeah.
If the universe is learning, what is it learning toward?
Right.
So if the universe is learning and there is nothing but the universe, so that's it.
That's all there is.
It's an unsupervised learning.
And so the only thing it can learn, every subsystem,
can learn the rest of the universe.
So you put an arbitrary boundary,
me and the rest.
So I'm as a subsystem.
The only thing I can learn,
I can try to learn about myself
if I'm unconscious,
like, you know, in coma,
maybe I can do that.
But that's what I will be interested to do.
And that will actually help me also
to survive.
The more I learn,
now we are moving to the biology level
where we've done a lot of work
trying to understand how this exactly works.
But basically, you know, organism has to learn its environment,
model its environment,
in order to better predict how our environment will behave.
And then once it's able to predict it,
it's more likely to survive.
So this is what you have to do in order to survive.
This is very, actually,
so first of all, it's similar, of course,
it's to natural selection,
but it's similar to the phenomenological model
or that Carl Fristin is constructing
where he's trying to say,
okay, well, let's define
some phenomenological function
that maybe our ability
to predict the state of the environment
is what,
and what I'm adding to this story
is that, yes, that's great,
but let's actually dig deeper
and give a microscopic
interpretation of that.
It's like, you know,
you can have thermodynamics,
But then you can have a derivation of thermodynamics from statistical mechanics.
So I'm going to say, well, you can also derive it through statistical mechanics of how neural networks work.
So that's the idea.
So then coming back to your question, every sum system is learning the rest, right?
And we are not different.
We are as our cells are not different.
that each cell is learning
how to feed best into the organism
and so that optimize its own loss function
and the society isn't different.
It's still learning but on different scales
and so languages how we describe it
changing, you know, addressing to the physicist audience.
There is an RG flow.
The loss function changes
as you start
renormalizing
and generalizing,
and generalizing the concept of neurons.
So in a small scale, it can be fundamental neurons.
On the bigger scale, it can be sub-networks like particles.
Then you can have something like cells, people, you know, civilizations,
and societies like that.
If I recall correctly, your second law of learning is that learning efficiency
is proportional to the Laplacian of the free energy.
Is that correct?
So in very, very, it's just like in standard physics, we can derive thermodynamics in very, very simple limits.
We can, unfortunately, we physicists are only good in doing Gaussian integrals and doing calculations for a very simplified system.
And for those systems, where you can simplify, you can do those calculations.
You can show that it's related to our passing of the free energy, where the free energy is actually defined,
microscopically as, you know, you start with energy-like function, which is a loss function,
okay, and then from that, you define free energy. You do not start it from the
phenomenology, you start from micro-solve it. But in this case, in that particular limit,
that was the answer in more complex systems where we're not dealing with Gaussians.
And in the critical systems that we discuss, we are not dealing with Gaussians.
So there's, you know, there's many, many, many scales are important.
and those limits are things are much more complex and you cannot really like give this
formula and say it's exact. I see.
I see.
So, so, so, so yeah, in this limit, Laplacein of the free energy was important and more
general, it may be, you know, have lots of possible corrections or as, as you know,
in the perturbation theory may happen that it's not just correction, but they're
dominating everything.
And then yes, yes.
Your zero thought is wrong.
It's a non-perturbative limit and then the answer is completely wrong.
And there are reasons.
to believe that the system is in this sense non-perturbative because of the criticality that
we observe.
So there's symmetry-breaking transitions take place.
There's lots of complex things that, of course, I won't be able to talk about here.
But, yeah, analytical calculations are hard, but I guess without them, we are not going to
understand what's actually happening and what's the relevant language of describing different phenomena.
Does Carl Fristence free energy, is it an independent claim from yours or does it emerge from your framework?
No, I completely agree with him.
So there has to be a free energy in this setup.
It's like it's a phenomenological way of saying that there is a function that you will be, you know, minimizing, optimizing.
And that's right.
I mean, you started with the beginning.
How come classical mechanics also optimizes?
Yeah, but it only deals with the, you know, very close,
when you're close to the equilibrium in the sense of the learning dynamics.
And the same with the, so he has a phenomenological model
that is very intuitive and very nice,
and it describes that such a function must exist.
It's a kind of existence.
Now he doesn't start with a learning theory,
but he starts with his understanding of how, you know,
organism behavior. It makes sense.
Makes total sense.
Now, I'll give you an example how we can
his free energy may be corrected
to be much better. So, for example, you can have an organism
that isn't interested in predicting an environment
but is interested in quantizing gravity.
Okay, so for that organism,
he will spend all his resources
maybe locked in a room with no windows
trying to quantize gravity writing equations.
So this organism will have its own
free energy. Now it will not be the one that tries to predict how the environment
doesn't matter how, maybe a little bit. I mean, I want to make sure that I survive.
So on the high levels there, the free energy can be different for different organs.
The question is whether you can derive them always from the microscopic dynamic. So you can
do this RG flow and actually starting from some microscopic loss function.
assumption derived.
And it's an open question.
The only thing I can really add to free energy principle that Carl Fistner is advocating
is that we can model it using this trainable and non-trainable variables and think
about what you get once the non-trainable are integrated out and you pay attention
to a handful of trainable variables.
And then the system becomes something you can calculate and then you can model.
And now you can model it phenomenally.
So if you have a controlled enough experiment, you don't care what microscopic physics give rise to this energy.
You can just calculate by seeing how the system behaves to the changing environment.
For example, you say, I want to interest it how the system behaves on the sound or on the light or on the temperature.
And then you just model it as a function of those parameters.
And that may hint you to actually how such a system would emerge from some Microsoft.
So I don't think it's a contradiction to what Carl Friesen says.
I'm saying that we can dig deeper if we assume that there's this learning dynamics happening on all scales.
The scorebed app here with trusted stats and real-time sports news.
Yeah, hey, who should I take in the Boston game?
Well, statistically speaking.
Nah, no more statistically speaking.
I want hot takes.
I want knee-jerk reactions.
That's not really what I do.
Is that because you don't have any knees?
The score bet.
Trusted sports content, seamless sports betting.
Download today.
19 plus Ontario only.
If you have questions or concerns about your gambling
or the gambling of someone close to you,
please go to conicsonterio.ca.
What's a piece of advice that you found inspirational
that you keep coming back to?
Advice that somebody gave me?
It could be also that you read in your book.
It could be from an advisor.
It could be from a movie.
Something you found that's helped you.
Yeah.
Well, one advice I said, and it was very, very useful to me,
and actually maybe not something I would advise students to do,
but it worked for me, is to doubt everything.
So do not trust.
anything that is relevant for your work until you try to do, you know,
try to do as much calculation yourself as much understanding yourself.
Now, why is this a bad advice is because it may not be optimal for a student who is,
you know, trying to get professor position, tenure position or whatever, you know,
if you are going to be doubting everything and doing all calculations yourself,
you may just not, you will rubbish, right, publish or rubbish.
So that's a bad advice, but it was,
It is something I couldn't avoid, I could not to do.
So once I figure out that there is no problem that I cannot solve myself,
I said, okay, I'll be doing that.
And of course, I haven't done all the calculations.
The archive is full of the calculations they have done, but as much as I can.
So doubting is, I think, one advice that, and doubting.
And that advice, of course, that I know about more concrete advice
that was given me by my advisor, Alex Flinking.
So I would come every day with some new idea
And he gave me advice that somebody gave to him
And then I don't know how long it was
And the device was you come up with some idea
Some theory with some equations
And then the next day
You should try to criticize it as much as you can
So like you flip up
And like you know try to act as if you are opponent
To that idea
And have this order and even days of their mind
It's very helpful
like really objective to look at what you have done and say, no, no, no, I don't like that you've done it.
I'll try devil's advocate, right? So I'll try to disprove it and find all of the problems with it.
And that's why whenever I'm talking to you and saying, look, I know why it's may or may not work.
So I'm not trying to sell you, you know, and use car without telling you that, you know, something is broken because that wouldn't be fair.
I wouldn't feel right.
And I would be confident about the coalition that I had done.
So, yeah, constantly flipping with this, why is this wrong?
Okay, one day is come up with ideas, do calculations.
Next day, try to criticize as much as you can.
And maybe like last statement here.
Now, the chat GPT is horrible in doing that, you know, or other language.
They tend to agree everything you say.
And so my advice is push it into.
My advice not to use chart GPT for correcting your work without you verifying it.
Using it for correcting your work is fine.
Using it for suggesting ideas is fine.
It's an excellent tool.
We just don't know how to use it yet.
We are completely, we are students of LLMs.
Once we learn how to use it, but never trust or verify, we say in Russian.
You should always verify it.
and you should, to the point that you redo the same calculations many times,
because honestly, how many times we make mistakes when we do calculations?
Well, I do the mistakes all the time.
You know, we all do mistakes.
And so you should keep questioning that.
And so it's related to doubt and then, but doubt even your own ideas.
I guess that's my advice.
It was given to me.
Professor, thank you so much for spending two hours with me.
It was two hours.
Yes.
It went by like that.
It went by so quick.
But space and time don't exist anyhow, so at least not fundamentally.
Right, right.
Well, Kurt, that was fun.
That was a lot of fun.
I appreciate it having me on your podcast.
It was very nice talking.
Very nice questions, by the way.
I have so many more.
Let me just full disclosure.
In addition to providing information, it was experiment for me,
because from the time I conjectured that
the world is a neural network
every time I talk to a person
I conduct an experiment, how this person
reacts to what I say
and so you've been a great
a great opponent, a great person to talk to
to actually... A great guinea pig.
Yeah, to actually...
So I've been experimenting with you
whether you know it or not
and so...
And at some point when I will be constructing
not just theory of biology, which we've
done, but a theory of psychology, I might be using some of this discussion as an experimental
evidence of certain psychological phenomena. All right. I'll take that as a compliment. It is. It is. No,
no, it was really great. I mean, it's exceptionally. I'm very happy with the questions. It was very good.
Thank you. I'm honest with telling this that I've given interviews to a podcast when the people
were not equipped at all with any of physics,
lingua or the, and they just were pushing their own worldviews
without trying to understand what I am trying to say.
And that was really a torture for me.
Because, you know, the point of the podcast,
well, I think the point was,
is to try to do both,
try to understand what I'm trying to say, and then try to point me in the right direction.
So that's what I appreciate. And I've had good experiences with people who actually, you know,
done their homework and then, you know, so it was obvious. You have a physics degree.
That helps a lot because at least certain things I may say between the lines and you would, you know,
put me back and say, okay, I'll clarify that more often. So that was very useful.
And so I think that that was, thank you for that. So it was great.
Great experience.
All right.
Okay, take care, sir.
And I'm sure we'll talk again.
The audience is going to love you.
I guarantee you.
Hi there.
Kurt here.
If you'd like more content from theories of everything
and the very best listening experience,
then be sure to check out my substack at kurtjymongle.org.
Some of the top perks are that every week you get brand new episodes ahead of time.
You also get bonus written content exclusively.
For our members, that's C-U-R-T-J-I-M-U-N-G-A-L.org.
You can also just search my name and the word substack on Google.
Since I started that substack, it somehow already became number two in the science category.
Now, substack for those who are unfamiliar is like a newsletter, one that's beautifully formatted, there's zero spam.
This is the best place to follow the content of this channel that isn't anywhere else.
It's not on YouTube. It's not on Patreon. It's exclusive to the substack. It's free.
There are ways for you to support me on substack if you want, and you'll get special bonuses if you do.
Several people ask me like, hey, Kurt, you've spoken to so many people in the fields of theoretical physics,
of philosophy, of consciousness. What are your thoughts, man?
Well, while I remain impartial in interviews, this substack is a way to peer into my present
deliberations on these topics. And it's the perfect way to support me directly.
Kurtjymongle.org or search Kurtjymongle substack on Google. Oh, and I've received several messages,
emails, and comments from professors and researchers saying that they recommend theories of
everything to their students. That's fantastic. If you're a professor or a lecturer or what have you,
and there's a particular standout episode
that students can benefit from
or your friends, please do share.
And of course, a huge thank you
to our advertising sponsor, The Economist.
Visit Economist.com slash Toe,
to get a massive discount on their annual subscription.
I subscribe to The Economist,
and you'll love it as well.
Toe is actually the only podcast
that they currently partner with.
So it's a huge honor for me,
and for you, you're getting
an exclusive discount. That's
Economist.com
slash tow, T-O-E. And
finally, you should know this podcast
is on iTunes, it's on Spotify,
it's on all the audio platforms.
All you have to do is type in theories
of everything and you'll find it.
I know my last name is complicated,
so maybe you don't want to type in Jai Mungal,
but you can type in theories of everything
and you'll find it.
Personally, I gain from re-watching lectures
and podcasts. I also read in the comment
that Toe listeners,
also gain from replaying.
So how about instead you relisten on one of those platforms like iTunes, Spotify, Google
podcasts?
Whatever podcast catcher you use, I'm there with you.
Thank you for listening.
