Theories of Everything with Curt Jaimungal - Vitaly Vanchurin: This Cosmologist Discovered Something Strange...

Episode Date: February 9, 2026

What if physics is just the universe learning? Most Theories of Everything episodes are mind‑bending for their math, physics, philosophy, or consciousness implications. This one hits all four simult...aneously. Professor Vitaly Vanchurin joins me to argue the cosmos isn't just modeled by neural networks—it literally is one. Learning dynamics aren't a metaphor for physics; they are the physics. Vanchurin shows why we need a three‑way unification: quantum mechanics, general relativity, and observers. As a listener of TOE you can get a special 20% off discount to The Economist and all it has to offer! Visit https://www.economist.com/toe TIMESTAMPS: - 00:00:00 - The Neural Network Universe - 00:05:48 - Learning Dynamics as Physics - 00:11:52 - Optimization and Variational Principles - 00:21:17 - Deriving Fundamental Field Equations - 00:28:47 - Fermions and Particle Emergence - 00:37:17 - Geometry of Learning Algorithms - 00:44:53 - Emergent Quantum Mechanics - 00:50:01 - Renormalization and Interpretability - 00:57:00 - Second Law of Learning - 01:05:10 - Subatomic Natural Selection - 01:15:40 - Consciousness and Learning Efficiency - 01:24:09 - Unifying Physics and Observers - 01:31:01 - Qualia and Hidden Variables - 01:40:24 - Free Energy Principle Integration - 01:46:04 - Epistemological Doubt and Advice LINKS MENTIONED: - Vitaly's Papers: https://inspirebeta.net/literature?sort=mostrecent&size=25&page=1&q=find%20author%20vanchurin - Vitaly's Lecture: https://youtu.be/TagDLiLb2VQ - Vitaly's Website: https://cosmos.phy.tufts.edu/~vitaly/ - Towards A Theory Of Machine Learning [Paper]: https://arxiv.org/pdf/2004.09280 - Autonomous Particles [Paper]: https://arxiv.org/pdf/2301.10077 - Emergent Field Theories From Neural Networks [Paper]: https://arxiv.org/pdf/2411.08138 - Covariant Gradient Descent [Paper]: https://arxiv.org/pdf/2504.05279 - A Quantum-Classical Duality And Emergent Spacetime [Paper]: https://arxiv.org/abs/1903.06083 - Emergent Quantumness In Neural Networks [Paper]: https://arxiv.org/abs/2012.05082 - Predictability Crisis In Inflationary Cosmology And Its Resolution [Paper]: https://arxiv.org/abs/gr-qc/9905097 - Stationary Measure In The Multiverse [Paper]: https://arxiv.org/abs/0812.0005 - The World As A Neural Network [Paper]: https://arxiv.org/pdf/2008.01540 - Self-Organized Criticality In Neural Networks [Paper]: https://arxiv.org/pdf/2107.03402v1 - One Hundred Authors Against Einstein [Book]: https://amazon.com/dp/B09PHH7KC8?tag=toe08-20 - Geocentric Cosmology: A New Look At The Measure Problem [Paper]: https://arxiv.org/abs/1006.4148 - Jacob Barandes [TOE]: https://youtu.be/gEK4-XtMwro - Yang-Hui He [TOE]: https://youtu.be/spIquD_mBFk - Eva Miranda [TOE]: https://youtu.be/6XyMepn-AZo - Felix Finster [TOE]: https://youtu.be/fXzO_KAqrh0 - Stephen Wolfram [TOE]: https://youtu.be/FkYer0xP37E - Stephen Wolfram 2 [TOE]: https://youtu.be/0YRlQQw0d-4 - Avshalom Elitzur [TOE]: https://youtu.be/pWRAaimQT1E - Ted Jacobson [TOE]: https://youtu.be/3mhctWlXyV8 - Geoffrey Hinton [TOE]: https://youtu.be/b_DUft-BdIE - Wayne Myrvold [TOE]: https://youtu.be/HIoviZe14pY - Cumrun Vafa [TOE]: https://youtu.be/kUHOoMX4Bqw - Claudia De Rham [TOE]: https://youtu.be/Ve_Mpd6dGv8 - Lee Smolin [TOE]: https://youtu.be/uOKOodQXjhc - Consciousness Iceberg [TOE]: https://youtu.be/65yjqIDghEk - Matthew Segall [TOE]: https://youtu.be/DeTm4fSXpbM - Andres Emilsson [TOE]: https://youtu.be/BBP8WZpYp0Y - Will Hahn [TOE]: https://youtu.be/3fkg0uTA3qU - David Wallace [TOE]: https://youtu.be/4MjNuJK5RzM - Karl Friston [TOE]: https://youtu.be/uk4NZorRjCo Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 Most of my best ideas don't happen during interviews. They come spontaneously, maybe in the shower or while I'm walking. And until Plaud, I kept losing them because by the time I write it down, half of it's gone. I've tried voice capture before, like Google Home and just cuts me off in the middle of a thought. And I don't know about you, but my ideas don't come in these 10-second short sound bites. They're ponderous, they wind, they're often five minutes long, and Apple Notes, Google Keep, the transcription is quite horrible, and you even have to do multiple, to get to it. Plaud lets me talk for as long as I want to. There's no interruptions. It's accurate
Starting point is 00:00:36 capture and it organizes everything into clear summaries, key takeaways, and action items. I can even come back later and say, okay, what was that thread that I was talking about about consciousness and information? My personal workflow is that I have their auto flow feature enabled, and it sends me an email whenever I take a note. I have the note pin S for the shower, and then I carry this one around me in the apartment, and I love them both very much, especially. especially this one, the fact that I can just press it and it turns on instantly and starts recording without a delay is an extremely underrated feature. And it's battery. I haven't had to charge this since I received it. Over one and a half million people use plot around the world.
Starting point is 00:01:15 If your work depends on conversations or the ideas that come after them, it's worth checking out. That's p-l-a-u-d-a-I-slash-TOE. Use code T-O-E for 10% off at checkout. The universe is self-tuning itself. It likes to be observed. And so observers emerge, not because there's carefully chosen constants of nature, but because if they were not carefully chosen, then they would be learned to evolve towards being carefully chosen. Five years ago, an unintuitive and startling result was dropped like a bombshell.
Starting point is 00:01:53 Professor Vitaly Venturin of cosmology found a way to monitor. the universe as a neural network where the learning dynamics are the physics. This has huge implications for what the cosmos is, what you are, and potentially what consciousness is, and its relationship to everything. As you'll see in this conversation, this is not another way of saying that you can use neural networks to simulate general relativity or the standard model. That's been done. Instead, the professor shows that the universe's own learning is a lot. the physics. What happens is gravity falls out. The Dirac equation falls out. Klein Gordon falls out. The algorithm behind most modern AI, copacetically named the atom optimizer, implicitly carries a curved metric on parameter space. The presence of the curved space is essential,
Starting point is 00:02:47 essential for convergence. Space-time curvature is actually there precisely because it makes the universe's learning efficient. This conversation spans natural selection, at the subatomic scale, the Boltzmann brain paradox, Carl Fristin's free energy principle, consciousness as learning efficiency, and the ramshackle state of observer physics, which Vancheran argues, demands a three-way unification of quantum mechanics, general relativity, and observers. My name's Kurt Jai Mungal, and on this channel I interview researchers about their theories of reality with rigor and technical depth, even at the risk of limiting the audience, because the slow, meticulous, candid.
Starting point is 00:03:28 approach is superior to a fast, flashy, potentially misleading approach. The universe is a black box, but today Van Turen opens it. We'll definitely have a part two, so leave your questions in the comments section. There's plenty more to explore. Enjoy today's episode of Theories of Everything. Professor, you claim the universe is literally a neural net, so not that it's a useful model. It literally is, ontologically.
Starting point is 00:03:58 Justify yourself, young man. Okay, not so young anymore, but I'll try to do my best. Now, when you're saying that I claim the universe is a neural network and not just a model, if I did say that at some point, I want to take this back, okay? As a physicist, I am not, as a theoretical physicist or as a physicist, I am not allowed to say what the universe actually is. what am I allowed to say is what is a good way to model it
Starting point is 00:04:29 because at the end of the day I cannot really know or check or test prove or disprove whether this is how the universe work but I can test and check whether any given model mathematical model is good for modeling sometimes a phenomenon so so if I did say that at some point or somebody misinterpreted Now, I'm always talking about what's a good model of describing it.
Starting point is 00:05:00 And at that point, yeah, I have to say it looks like it's a promising candidate. It's not a final, there's no final verdict yet. But it's a promising candidate that should be explored, whether it is a good way, a convenient way, compact way of describing phenomena in the universe using neural networks in perhaps how exactly I want to do it. We can discuss later, but I just want to open all my cards and say, I would never claim that this is how the universe works.
Starting point is 00:05:36 Now, if we get to philosophical questions, of course we can say, what if, right? What does it mean? If this is really the universe, you know, how the universe is, what kind of philosophical conclusion we can reach out of that. But as a physicist with my physicist's head on, I can only say this is an interesting, good model, and it works remarkably well in the places where I wouldn't expect it to work well.
Starting point is 00:06:06 Okay, now people who know some things about neural nets know that their universal function approximators. So why would it be surprising that neural nets can satisfy functions, which the universe is described by functions, like it would be more surprising if you had the counterclaim. The universe cannot be modeled by a neural net. I think this is an excellent question, okay? And it actually, now I can actually put my finger in exactly what I mean, right? It is true, right? Why should we be surprised, you know, neural networks are universal approximators?
Starting point is 00:06:39 Why should be surprised that you can use neural networks, well-trained neural networks to, you know, reproduce the, dynamics that we observe in classical or quantum systems. With quantum systems, I'll have to take it back. It's not so obvious. But at least for classical systems, yeah, I would say, why we should be surprised? The difference that I'm proposing is that I'm not only saying that the trained network is good at describing a given function or a given dynamics, but actually the process
Starting point is 00:07:13 of training, the process of learning is a part of dynamics. So it is different, right? So now it's not for me, enough to just show you, here it is. Here's a train network. It describes well harmonic oscillator. No. Because to train it, I've used trainable variables.
Starting point is 00:07:36 I use some kind of learning algorithms. And that dynamics didn't disappear. It must be there. It must be part of me telling you why harmonic oscillator can be described by a neural network. So if we remove learning, it's an almost trivial state. If we say, no, no, no, let's talk about the entire dynamical system of neural network that comes with learning dynamics, that comes with activation dynamics,
Starting point is 00:08:03 and can that thing together, combine thing, can that be useful model for describing the universe? Okay, and what learning algorithm are using? Right. So now, at the very beginning when I start, you know, studying this subject, I just took the most popular stochastic gradient descent and, you know, let's just see it looks like the one that used little resources and still does an amazing thing. For me, it was interesting that even a very simple algorithm can produce behaviors that, you know, we physicists don't have tools to describe, okay, just because it's a learning dynamic. So, so that was like five years ago.
Starting point is 00:08:50 And that was, that was all, and it was already some, some very spectacular result would come out of this with the collaborators. We've showed that, you know, quantum behavior can emerge. We can discuss all that later. Now, now I've learned more. I knew about this, but now I actually understood more that it isn't just stochastic gradient descent that is interesting and important for modeling the phenomena. But there are other well-known algorithms that anybody working in machine learning know about
Starting point is 00:09:22 and use them daily, atom optimization optimizer. This is one example. And it works for many, many problems, it works much better. It has much better learning efficiency. You know, you train your model and loss function goes down much faster, right? So I wanted to understand more recently why is that the case? What's the physical reason? What is the physical reason why it works better?
Starting point is 00:09:49 But now it isn't just stochastic gradient in sand, but it's a whole class of learning algorithms. We call it covariant gradient gradient. Covariant comes from the physics definition of covariance that we can again discuss. And those algorithms, atom-like algorithms and their generalizations, give something something I haven't again expected.
Starting point is 00:10:12 There's like always you study something in machine learning you don't expect and you get it and so in particular the emergence of curved curved space and space time comes naturally if you are actually thinking about covariant gradient descent algorithms
Starting point is 00:10:29 or such as Adam and so the presence of metric there you know whether the people in machine learning community know that there is a curve metric or don't know about this, that the presence of the curved space is essential, essential for convergence,
Starting point is 00:10:50 essential for algorithm to be efficient. And so, yes, I originally it was a sarcastic gradient descent, but there's new things come up like it, every month that I learn about, yeah. So the audience of this podcast are researchers and computer science, but also researchers in physics and math, and that's more on the hardcore stem side. But then there's a large swath of artists
Starting point is 00:11:17 and miscellaneous laypeople. It's quite interesting because it's one lump here that's quite hardcore, and then another lump on the softcore, let's call it side. And it's interesting that the overlap is small. It just goes extremely nitty-gritty PhD level or much more, much, much more layman
Starting point is 00:11:37 wondering about ontology and philosophy and so forth. I forgot to mention there's also researchers and philosophy. Okay. To those who are not computer scientists, a neural net is what? What is the minimum someone needs to know? Right. So I mentioned it earlier,
Starting point is 00:11:58 but let me just discuss it one more time. The neural networks comes with this one feature that even I as a physicist wouldn't know, as a research in physics, wouldn't know. And that's the learning dynamics. And that's the dynamics that is taking some function that machine learning researchers call loss function, some cost function you can call it, some function that you're trying to optimize. What is it that you're trying to do? You don't have to be a scientist or a researcher to understand that there is some kind of optimization. There is something that you optimize.
Starting point is 00:12:37 And so the neural network dynamics comes with that something, that goal, you can say. What's the goal of the system? What is it trying to do? It's trying to speak English language without mistakes or action. Or it's trying to do speech, to text. recognition. Whatever it is trying to do, this is the difference. The presence of that objective function, this is the loss function, something that essential, and that is, well, I'll just once again, it isn't something we're used to in physics. And so, and, and, and, uh, it is
Starting point is 00:13:15 something that machine learning people are used in machine learning, uh, research, uh, but, but not us. So I had to kind of try to, to get all of the nice experimental results in America obtains from machine learning, try to use our toolbox that we use in physics and try to understand it. But yes, we are trying to tell this story to people who are not running models every day or writing equations every day.
Starting point is 00:13:45 Then that's the difference. So you have a system that has this one boring dynamics that we know about before activation dynamics. There's some state and it keeps changing. according to some law. And then there is this learning dynamics, that there is some objective function that the system is trying to optimize.
Starting point is 00:14:06 So the presence of those, you know, two things is essential. Now for neural net, okay, well, first the optimization, physicists do know about it if they're doing any minimization of the Lagrangian or extremal point of a Lagrangian. So is there something particular about the way that the technique of optimization from neural nets
Starting point is 00:14:29 compared to other optimizations that's well suited for describing the fundamental laws in such generality that you've been able to find out? Good point, yeah. So we do use variational principle, right? So we study the extrema of Lagrangians, right? Action,
Starting point is 00:14:45 actually, right? So we are interested to find certain solutions, we take this beast, which is called action, varied with respect to degrees of freedom, and we are interested in its minimum or maxima. Now, what is new here is that you are not only interested in minimum maximum, you are interested in the entire, you know, trajectory from whatever you started with
Starting point is 00:15:08 to whatever complicated state you're going to get. And that complicated state will certainly satisfy some variational principle in some sense. And that's where you will kind of, because of that, see the emergence of some kind of classical-like behavior. Even out of this equilibrium, right, there is this whole evolution that takes your learning optimization evolution to reach the minimum or maximum. That is present in optimizing machine learning systems and isn't present in physics. Now, I have to correct a little bit because, you know, right now physicists adapt machine learning to solve lots of problems as a tool, okay? Not as a model of physics, but as a tool, right?
Starting point is 00:15:58 Yes. So let's say you have a very complex quantum many-body system. You're trying to find its ground state. You're doing some difficult problem. And so, of course, you're going to use all the tools there are, all the computational tools there are, including using machine law. Now, when I'm saying that a physicist aren't used to optimization as a model of a system that they're trying to say. But as a tool, absolutely.
Starting point is 00:16:23 This is a great tool and it's been used by physicists and used by physicists more and more now. What is the input into this neural net? Right. Okay, so if we are talking about the neural network as a model of the entire universe, let's say, then that's all there is. This is the state.
Starting point is 00:16:48 You know, you describe the state of all neurons, they describe the state of all connection weights, and that's the state of the system. That's your input. This is your initial state. So in physicists, we actually have a very nice setup of modeling everything. We say, well, you need two things. You need to know the state and how it goes, right?
Starting point is 00:17:13 Now, quantum mechanics, again, putting aside, those are the only two things you need. So the state or the input of this neural network is the state of all neurons, right? And then, and the state of all of the trainable, trainable and non-trainable variables. And then they evolve according to the one side activation dynamics and the other side of the line.
Starting point is 00:17:34 So this stays, so this setup that physicists came up with, there's actually a mathematician have a much more general setup, dynamical systems set up, right? Then they don't even bother whether the dynamics is Hamiltonian or, you know, satisfies some kind of, constraints, there's like energy like function. They don't care about that. So what I'm talking about, it is a dynamical system, but it isn't a dynamical system in a sense where you restrict yourself to classical hematolline-like dynamics.
Starting point is 00:18:08 In traditional physics, the input may be the state at time zero, and then the output may be the state of time T. In neural nets, let me just talk about an image classifier. So an image, let's say you're given an image of a dog and it's just a square image and maybe it's 20 by 20 pixels, and so it's 400 pixels. Then you have 400 numbers as your input. I mean, if it's a gray scale. And then at the end, you want to know, is it a dog, is it a cat, is it a flower, or what have you? So however many categories you have here is your output. So what is the input on this side? Is it the whole state of the whole universe? And then the output is the whole state of the whole universe again. Very good. No. No, but very good question. Again, this is,
Starting point is 00:18:50 So now you have your, we'll keep switching heads. So now you put your machine learning head on and said, okay, here's like I understand what you're talking about. And I'm saying no. So in this sense, the entire network with the input and the output before you even started propagating your image through and figuring out whether it's cat or a dog, this whole thing, the state of all of the degrees of freedom,
Starting point is 00:19:16 is the state of the system, not just input, the whole thing. Now, in the case of the cats and dogs classifier, it happened to be that there is, in your problem, there is a clear distinction. What is you calling input, and what is it that you're calling out? So there is a kind of flow of information in this direction. But this is just because you set up your network that way. You didn't have to do.
Starting point is 00:19:44 You could have used the recurrent networks. You could have used a lot more complicated loss functions. For example, in this case, your loss function would be, well, did I get the right? Is it a dog, you know, zero or one at the end? But it's just, you're talking about restricted class of machine learning problems. In this case, the information really flows in one way. Now, if we have the entire network, an entire universe described for a neural network, it may happen then at some place there is like only left-going wave or right-going wave,
Starting point is 00:20:18 where information only goes one direction. But that's because of the initial conditions that you set up, not because your network cannot start up with some other states. So imagine in your example with a cat's a dock, imagine that the zero and one that you got at the end, you look it back to the input, right? And then now it may not do something that you wanted to do, but it will run.
Starting point is 00:20:45 you will get this, you know, pixel changing one of the pixel in what you call input, and then going through. So in this case, still, the whole thing is input at previous time step. So probably better call it the state of the system in the previous time step. And then once, you know, one step of activation took place in the next time step, and then another step of activation took as a third time step and so forth. And that's kind of time evolution of the activation dynamic. And then there is learning, right?
Starting point is 00:21:17 So then they have to upgrade your weights, which is a game. And you can just keep going, keep activating and learning. Boarding for flight 246 to Toronto is delayed 50 minutes. Ugh, what? Sounds like Ojo time. Play Ojo? Great idea.
Starting point is 00:21:33 Feel the fun with all the latest slots in live casino games and with no wagering requirements. What you win is yours to keep groovy. Hey, I won! Beal the fun. Play Ojo. when passenger Fisher is done celebrating. 19 plus Ontario only.
Starting point is 00:21:49 Please play responsibly. Concerned by your gambling or that if someone close you, call 18665331-2-60 or visitcomex-Ontario.ca. Some of the key equations in physics are general relativity. So Einstein's field equations or Dirac or Klein Gordon. I know that you're not able to, with your words, say how you derive them exactly in such a way that is rigorous. But we can, of course, point to your papers and lectures on screen right now.
Starting point is 00:22:14 But either way, can you just walk us through as much as you can with your words as to what you started as your input and how were you able to get these as outputs? Sure. So let's start with the field theories. So we'll know very well that the standard model, standard model of high energy particles, high energy physics is very well described by the collection of fields. And so if you want to get that physics out of your framework, mathematical framework, you want to show how fields will emerge, how we would get fields out of it. Now, it's a difficult task. So let me just put it right away.
Starting point is 00:22:58 And it's not something that I can say, well, here it is. I get, you know, quarks, three generations. I get everything. And it's simple. And I can, like, write one paper and go home. No, it's not even close. It took years to get Dirac equation out of it. Okay?
Starting point is 00:23:18 So Klein Gordon was easier. Hamiltonian mechanics was easier. Getting Fermions, getting Dirac equation, it turned out to be a difficult task. So just since we talked about this direction of information flow, it turns out that for the direct field, there's some tensor factor, something in your, neural network set up has to have an
Starting point is 00:23:45 anti-symmetry in it, so it has to be anti-symmetric. And so if you put that in, if you put this constraint, now why would you put this constraint? I don't know. Is that something that this constraint was learned because of some kind of microscopic optimization algorithm that's running? Great. Can I show it? No.
Starting point is 00:24:04 What I can show, if I assume a certain constraint, if I only take into account certain trainable degrees of freedom. That's essential, so we cannot throw away trainable. And certain non-trainable, then the dynamics resembles, you know, lattice field theory, where, you know, individual nodes would be like neurons, and they would have some very precisely defined, like,
Starting point is 00:24:29 connections to each other. It's not like, you know, any connections would do the trick. So, as I said, getting Klein Gordon's scalar field equations was easier. It's, like, kind of more generic. Getting something like Dirac is hard. And we're not there. I'm not ready to write down the standard model Lagrangea and say, well, here it is.
Starting point is 00:24:48 So that's for the field theory. Now, the other part is Einstein equation. Once again, telling you that I have finalized my understanding how Einstein equation emerged from this framework would be a lie. This is not true. But what I do know, I do know how to get emergent space. curved space from it. I also know how to get emergent space time from it.
Starting point is 00:25:17 That's again, I mean, it's like... And that's a subtle difference that most people wouldn't pick up on. Okay. So expand on that, please. Sure, sure. So, you know, the space you can probably understand by showing like surface of an apple, right? Or a chip, potato chip. And you'll say, well, it looks like two-dimensional surface.
Starting point is 00:25:35 And, you know, since we are three-dimensional beings, for us easy to look and say, well, yeah, there is... It's curved. It's not something flat. I cannot put it on my table, which is flat. And the same for the potato chip. If I take a potato chip, which has negative curvature, Apple would have positive curvature.
Starting point is 00:25:54 Curvature, if I put it on the table, it wouldn't be laying down. So that's kind of our understanding as three-dimensional creatures of what curvature looks like. Now, this concept can be generalized to 3D. Now, I cannot now actually, you know, draw it or move my hands because I am in 3D dimensions, but we know the tricks. We know the tricks how to do this calculations, how to imagine.
Starting point is 00:26:17 We even know how to draw three-dimensional objects on two-dimensional pieces of paper. So it's not so surprising that we are able to carry out calculations in 3D. And so when I'm saying the 3-D curvature, I mean the three-dimensional space, which is curved. And that turns out to be actually not, you know, some feature of this theory. It should be a feature of any theory.
Starting point is 00:26:40 of everything. So if your theory doesn't produce in some limit emergence of the curved space, then you are against Einstein and of course, this is one of the most beautiful theories that we have, and we cannot
Starting point is 00:26:55 just throw it out of our considerations. Okay, so that's three-dimensional space. Now, for that space time, that again, involves a little bit of if you want to
Starting point is 00:27:12 understand it correctly, you have to write equations but since the audience is by modular distributions, we should try to explain what space time means even in that sense. So what turns out to be that the space time
Starting point is 00:27:27 or space, when you're talking about that, you have to tell how you measure distances. So what do you mean by distances between two points? If you have that definition of how to measure distances, which has to satisfy certain requirements, then you know
Starting point is 00:27:43 what kind of space you are dealing. And then the apparatus for that, we call it metric tensor. Not very important. And space time comes with very, very strange at first. You tell
Starting point is 00:27:59 to any student that this is how distances should be measured and they will question why. This looks bizarre. It turns out that to measure distances in the Euclidean space or in just space, you take like X square plus Y square and take a square root of this, Pythogran theorem. Well, it turns out that if you are working in space time, this isn't true. You should not be adding the two squares and taking a square root.
Starting point is 00:28:25 You should be subtracting. So one of those squares which corresponds to time coordinates has to be subtracted. And because of this stupid science difference, there is a huge difference between space and the space time. And so it took, you know, some time to get the curved space, but if you cannot get space time, then again, your theory is not in agreement with observation. And we do observe a curved space time.
Starting point is 00:28:55 You know, my background is in cosmology, and the space time is important there. I hope I wasn't too technical because... No, no, no, no. And I have a technical question. You're absolutely right. Actually, I love that you said that because this is always true. There is those who actually know the terminology and would appreciate me speaking more like using the physicist or machine learning terms and that those who don't and you don't want to board any one of those.
Starting point is 00:29:25 So the aim of this podcast is to aim toward researchers, toward postdocs and graduate level PhDs and professors and so forth. and that the advantage of this podcast or the niche part of it, the difference in it is that it's as if for that other distribution of people, they finally get to peer into what it looks like when professors are talking. I'm not a professor, but you get the idea. So, okay, you mentioned lattice field theory, and lattice field theory has a problem with Fermion doubling. So I'm curious if anything about your approach helps solve that problem.
Starting point is 00:30:04 problem. No, and we're not there yet. Not even close to actually fitting the lettuce-like field theory. I shouldn't say it lettuce field theory because it isn't, but it is in a sense of how their weight matrix is arranged. So you have a lattice. Now, this is actually why I am not happy with this particular model of how fields emerge. Okay. There is now another one, another approach, which I wasn't able to take as far as getting thermons out of this, but the approach is that it's closer to particles as opposed to fields. Now, we do know that fields work better, and the particles are kind of only, only a good description of certain links, right? But so speaking of that, second approach is neurons or some sub-networks, they behave like particles in the emergence, space and that emergence space is the space of actually trainable variables. And I already mentioned Adam like algorithm and so, you know, machine learning people would say, okay,
Starting point is 00:31:16 now I know that Adam comes with metric and there is curvature. Now, but from the point of view of the physicist, it's more like there is second approach again. As I said, you know, the theory is not final. And so you take all approaches you can and you're just trying to say, okay, well, what can I say? Can I get a firmness? And in this second approach, it is as if what you have we have some kind of sub-network
Starting point is 00:31:40 of neurons that are doing their usual business, activation, learning dynamics, but their motion is considered in the space of trainable variables. And that space does not have any lattice structure. It is just a completely continuous
Starting point is 00:31:56 space. And then you do have places in that space where no states are occupied. like vacuum. And even if there are, you know, once in a while, certain neurons appear to have such and such configurations of the trainable variables. This is not a field. It's a kind of discretized, more like similar to particles, as I said, but also strings, right?
Starting point is 00:32:21 Strings are assumed not to actually be fields in a sense of occupying. They're like, you know, one-dimensional objects. So I think it's probably a good idea to say like, you know, fields that work extremely well are three-dimensional objects plus one-time. Strings are one-dimensional objects plus time. And these neurons in this second picture are like one-dimensional objects plus time. Oh, zero-dimension. Sorry. You're right.
Starting point is 00:32:51 Okay. Okay. I think it will be super useful for people. If on screen right now, the video editor will place in what a neural net looks like. like in terms of, we're then giving tutorials on what neural nets are. And then I think what's useful would be for you to say what your theory is not saying. So for instance, in the beginning I said, why at all is this surprising if a neural net can approximate any function?
Starting point is 00:33:16 You're like, well, but that's not word. We're not saying that. Okay. Something else you're not saying, and again, referencing this image, is you're not saying that each one of these nodes is somehow space discretized. Exactly. Because there are other causal set models and, causal dynamical triangulation models
Starting point is 00:33:31 and other discretized forms. Okay, you're not saying that. Neither are you saying that this is a hypergraph model like a Wolfram model. Okay, so when you start to talk about this with your colleagues, what else do they think you're saying? But you're like, no, no, no, that's not what I'm saying. It's this.
Starting point is 00:33:46 Yeah. So the first thing you identified right away and you're absolutely right. That's what people think, and I'm saying, no. Learning must be there and you're absolutely right. Now, the second thing is, I am in the superposition
Starting point is 00:33:59 of saying and not saying it. So I'm saying there is two possibilities and both of them are being explored. One possibility that it is like lattice space, whether it is square letters or some other letters where it has triangulations,
Starting point is 00:34:18 some hypergraph-like model which is, and that is a possibility and that is a possibility that I'm exploring. It comes naturally because neural networks are natural, you can easily get a graph out of this. Now, in a distinction from the models,
Starting point is 00:34:38 other models where you have this network or graph-like structure is that I am constrained to how this network will evolve. I'm not able to just say, look, you know, you have a graph, now I want it to form a torus, or I want it to be flat. I'm not able to just impose rules without saying where they come from. So where they can they're from from is for me to specify actually the one most important object
Starting point is 00:35:07 in this entire theory. Like, you know, in physics, we have one object that kind of describes the entire series, Sir reaction or Lagrangian. You give it Hamiltonian and you're done. So here, you have to specify loss function. And loss function is a very strict object. I cannot write, you know, it's a scalar, right?
Starting point is 00:35:29 So you have to have to pay attention to that. And so if you want to use this hypergraph-like structure and you want to see how it evolves, and you know that experiments suggest that you have to form such and such approximated geometry, you have to go back and say, all right, what loss function would give you that? And that kind of puts a limit. So this is one approach, which again, I say,
Starting point is 00:35:54 and don't say because in this approach, I do say that. In this alternative approach, it's like two types of neural network theory, if you wish, right? Type one is that, yes, you discretize it and you work with it. Type two, your space
Starting point is 00:36:12 is the space of trainable variables and things involve in that space. And there are pros and cons of both approaches. And in the first approach, in the first approach, your space is discrete. There is nothing between the nodes. In the second approach, your space is continuous. Continuous trainable variables, you know, they were not continuous.
Starting point is 00:36:34 You wouldn't be able to use gradient descent or atom, whatever. And so, and you try both things. And you try, and you see that one approach helps you. And it's very similar to what we do, particles versus, you know, Let Us Phil theory, which comes with its own problem, both come with. problems. But yeah, I don't want to say that I don't say that, but I say that in addition to that, I also, you know, investigate this other possibility that actually recently proved to work
Starting point is 00:37:06 better in a sense because the curvature emerges not because I've assembled my graph in a certain way, but because it is an algorithm, which is more official. So the curvature is a way for the system to learn faster. Not for, because of, so it's like a one direct way of saying where the curvature, where the geometry comes from. I do know that in machine learning, literature people are using
Starting point is 00:37:39 Adam, they're not using the terminology of the curved space of trainable variables which emerge from learning. It's again, it's not something you specify ahead of time. It's emergence as an efficient algorithm. But this is a, what I'm saying here.
Starting point is 00:37:56 So it's not the curvature of the lost landscape that corresponds to the curved space time of our universe? Oh, absolutely no. No. It's like you were saying, I think it's a very useful analogy when you're talking about loss function is to think about Lagrangian.
Starting point is 00:38:13 So it's not the curvature of the Lagrangian landscape. Yes, okay. That gives you the curvature of space. No. It is the degrees of freedom in the Lagrangean, which we call metric, which describes a space which is curved. So, yeah, so there is a big, big distinction key.
Starting point is 00:38:35 And same here. Now, maybe this is a good point to, since I'm drawing this connection between Lagrangian and loss function, originally I was associating the loss function with more of the energy. and because there were like a stochastic thermodynamic description
Starting point is 00:38:57 where a canonical ensemble naturally would emerge from that picture. Later I understood that adding a kinetic-like term to the Lagrange, to loss function actually makes learning in certain situations better. So like, you add one more term, which isn't maybe the term that you are trying to optimize, but once you add it to the loss function,
Starting point is 00:39:25 and once the loss function uses this term, it learns fast. So there is this little bit of a new twist. If you want to minimize something, you may actually use a stochastic gradient, in the sense, something else, something with an additional term. And so in this case,
Starting point is 00:39:46 I think it's a very good analogy to think about loss function, as a Lagrangian, although there are different objects, of course. Local news is in decline across Canada, and this is bad news for all of us. With less local news, noise, rumors, and misinformation fill the void, and it gets harder to separate truth from fiction. That's why CBC News is putting more journalists in more places across Canada, reporting on the ground from where you live, telling the stories that matter to all of us,
Starting point is 00:40:16 because local news is big news. Choose news, not noise. CBC News. Earlier, you said that the quantum dynamics were extremely difficult, non-trivial, or what have you. So walk us through the insight when you were studying machine learning. Why were you even studying machine learning? You mentioned cosmology. I don't know about its connections there for you in your particular use case.
Starting point is 00:40:44 Anyhow, walk us through you as... Circa six years ago or so. Yeah. Okay, so six years ago, I was on sabbatical leave. So when you're on sabbatical, you can do whatever you want, right? And first I finished the project that I was interested in at that time, and I had to do with certain dualities of quantum mechanical, strongly coupled systems that I thought would be a good candidate also to describing curved spaces.
Starting point is 00:41:18 and then, you know, quantum gravity aspects of that. And then I had time. And so I attended many, many talks by machine learning people who would present nice slides, nice results, and no formulas, no formulas, no equations apart from something, you know, like a sarcastic gradient in a sense, something that kind of looks trivial. But I knew that neural networks, and they would always say, well, it's like a black box.
Starting point is 00:41:54 Black box, meaning it works. We don't really know why. So I had time. I have a few months, and I said, okay, why not just try to open this black box? You know, because the universe is also a black box. Nobody in the beginning told us that this is the standard model. And somehow we came up with the tools of Lagrangian Hamiltonian mechanics to actually understand why it works, maybe not understand why this particular Lagrangian, but understand at least
Starting point is 00:42:23 how to model it. So that was my motivation, taking it, and it has nothing to do with quantum mechanics and so at this point, I see the system with many degrees of freedom. They evolve according to learning and activation dynamics. So I knew that some of the physics will be relevant, but not all of it, and because it would be more complex. But if you see a system with many degrees of freedom, your first reaction, well, maybe you can discuss ensemble, statistical ensembles, and maybe in some limit, you can understand how the system behaves in what we call a Mergent regime. So something like, can we have a thermodynamics of machine learning?
Starting point is 00:43:10 So something along this line, I knew it was different because of the learning disdemeanor, So that I figured out sooner, but can we have a certain thermodynamic description, which can be verified? Now, is there a notion of temperature? Is there a notion of entropy? Is there the first, second law of thermodynamics, would they still hold or do they have to be metified? So that was kind of, you know, you have a toolbox that you think should be the first
Starting point is 00:43:36 you try to model the system. And that direction I went. before, as I said, quantum mechanics was not on the horizon. But then I saw that because of the learning dynamics, it is not, the system just doesn't go to a boring canonical ensemble distribution and stays there. It has a very interesting behavior, even in equilibrium, because of the presence of those two different dynamics, activation and learning. And so the idea was to set up some kind of variation.
Starting point is 00:44:10 principle that maybe describes it beyond thermodynamics. So first, you know, thermodynamic, are there any macroscopic objects like temperature entropy that we describe it? But maybe we can go beyond it. Since this equilibrium, I call it learning equilibrium, it's kind of boiling and then, you know, things fall out of the equilibrium and go back and it's kind of not. Can you right really, you know, zoom in and say,
Starting point is 00:44:36 well, I don't want to just calculate temperature, pressure, volume, whatever you usually do, although you have to still define all those things in machine learning system. Can you zoom in and say, I'll pay attention to, let's say, only trainable there. I will still integrate out and kind of coarse grain over non-training, but we'll pay attention to train them. Well, the reason for that was that we know that the non-trainable, they flip very fast. you know, you'll put your input image and get cats and dogs on out in your example, right, zeros or one. So this activation goes fast, and the learning goes slowly, and then you calculate your loss function and gradually propagate changes. So I knew there was two scales, and if anything we know in physics, that's what we should do.
Starting point is 00:45:25 We should integrate out, remove the irrelevant information and keep on irrelevant. And so that's what I did. So I integrated out this and say, okay, how does this system behave? And it turned out that behavior of this trainable variables. If you assume, you have to assume certain principle for how entropy changes. So I assume maximum entropy production, extreme, stationing entropy production principle. But if you assume that, the equations you derive from that are the Mandolonga equations. Now, Mande Lunga equations, again, those who know know,
Starting point is 00:46:01 but those who don't know, it's close to quantum, but it isn't quant. So there is still this step of, you know, quantum is this theory where you, what propagates is not the probability, but a square root probability, square root of probability, and that's the relevant degree of freedom. And that comes with this quantum phase, complex numbers, right? So we all know that, you know. And so if you only pay attention to the mandolung equation, as you limit, this isn't quantum yet, but it gives you a hope.
Starting point is 00:46:33 So maybe you can actually understand why this complex phase would emerge. What is the physical meaning of the complex phase? Not like exactly quantum mechanics, but maybe as an emerging quantum mechanics. And then I collaborated with Katsnelsen and he pointed, corrective pointed out that we need the discreteness of the phase for this to work.
Starting point is 00:46:58 We don't have to call it a phase, but something has to be discrete. Something you change discreetly and your loss function, let's say, doesn't change, right? Much. Or the dynamics doesn't change. So that's what the meaning of the complex phase, you're rotated by 2 pi and you come to the very same point.
Starting point is 00:47:14 Right. So having this within the system, and it's like having age bar, having age bar, having something that... Yes. It's like, without this, it isn't quite quantum yet, although in certain regimes, your system. And so it took this...
Starting point is 00:47:29 little extension to the original derivation of mandoling equation, like almost classical quantum equations. And that came from very interesting picture, suggestion that we made
Starting point is 00:47:45 that actually can you can explain it to arches to anyone. So you have your system, a learning system, and yes, you pay attention to trainable variables, and then, yes, they follow almost a shorteninger equation. But for you to For a Schrodinger equation, the system has to have access to a bath, to a reservoir of neurons, that it can borrow.
Starting point is 00:48:08 It's like, you know, external resources. You run your machine learning system, but you say, well, if you need, here is a few more neurons you can use. You can plug in. If you don't need it, just give it back. And so if you have this access to the system, in physics we call it Grand Canonical Ensemble, move from Canonical Ensemble to the Grand Canon. So if you do have that, in your algorithm, you provide that option. Whether it is an option that you provided by actually programming it this way, or whether it's an emergent phenomenon,
Starting point is 00:48:43 because there is an emerging phenomenon that certain neurons stop working and start working, stop working and start working. If you do that, then it turns out that you do get a shoddinger equation. In some limits, again, it's not an exact shoddinger equation, which means that in certain limits it should be violated but with that little twist with this space
Starting point is 00:49:05 of neurons that you can kind of hire like you hire to do some work and then once you don't need it you put it back with that your dynamics effectively becomes linear because Schrodinger is linear
Starting point is 00:49:20 and described by Schrodinger equation and then everything fell into places and at that point it was actually more than just a proposal for some abstract theory that in certain limit can describe mandolin blank equation, which isn't quite,
Starting point is 00:49:35 but it is actually, you could see that, well, maybe this quantumist can emerge from completely classic, it was completely classical system. You don't understand. So there's no, it's not the quantum machine learning.
Starting point is 00:49:46 It's a classical machine learning, but with this little twist, it acquires this quantum like behavior. I do want to get to consciousness. Before we get to that, I have a sort of a silly question. So in machine learning nowadays, there's a huge field of interpretability. So people want to peer into the black box. And it makes sense to some degree when it comes to LLMs because there's semantics
Starting point is 00:50:13 underneath of what LLMs are trying to capture. So you're trying to understand what are the LLMs doing. But when it comes to the universe, more abstractly here in your model, is our universe interpretable? Right. So let me start with machine learning, then move on to physics and then maybe to more general answer the question.
Starting point is 00:50:35 So in the machine learning, the fact that we need to interpret how machine learning system works but happening inside is essential. And I already said that physicists had dealt with this problem. And our tool was, yeah, modeled the dynamics
Starting point is 00:50:51 of this system. but then model it as a Hamiltonian system or as a Lagrangian system. So that was our way to dig into the black box. Now, with LLM-like models or with any other machine learning systems, you start with designing certain architecture. So architecture can be written also in a certain way that is interpretable. In a sense, you will know exactly which blocks do what. and that would be similar to writing not just something that produces results,
Starting point is 00:51:29 but something that produces results, and you know why it produces results, because there is a term in the loss function or in the Lagrangian, and if you remove that term, then it would produce different results. Maybe your LLM would not work, your chat GPT would break. And so in this sense, this is a way to model and understand how the interiors work. So maybe this is the term that is responsible for math being right in their large language models.
Starting point is 00:52:02 This is a term that is responsible for a good translation between certain languages. This is a term that's responsible. And then you can kind of concentrate on the term and say, okay, well, I don't like my LM model is not producing, you know, not doing good calculations of tensors, which is. which it isn't. So if you look and try to kind of, you do tensor calculus, all models I tried,
Starting point is 00:52:29 they all, you know, at some point start producing garbage results, and so you have to kind of locate it. So there's certain tests they can don't know how to do, it may be just the term that you have to tune
Starting point is 00:52:38 in your loss function in the Lagrangian in architecture. Now, the same thing, and this is interpretability. How do we interpret? Why is it, you know, certain things work, and certain things don't work?
Starting point is 00:52:49 What is it inside of the state of the neural network? that is responsible for one thing. How to do alignment, right? What is it there that I can say to my neural network and then in some sense it will align with my interest, with my loss function? Yes.
Starting point is 00:53:06 So we know how to do it with physics. I think this toolbox in physics can be useful to enhance our interpretability, interpret how the LLMs work, how machine learning systems work. And coming back to the physics, well, we also should be able to dig deeper than just writing down a symmetry group and saying, okay, well, this is the standard model, this is it. You know, we can start asking question, why is it that the case? You know, what is it?
Starting point is 00:53:39 You know, because in this description, the field theories on everything we observe in microscopic levels are emergent phenomena. They would come from some microscopic loss function where, just to throw one example, you know, maybe each new. If a neuron wants to minimize entropy or maximize certain local loss. So if you want to minimize entropy, it's not a good idea to connect to all neurons, because then it will be chaotic. Every state, every next time step will be chaotic. But not connecting to anything was also good. So maybe neuron individually on a microscopic law will try to find some.
Starting point is 00:54:16 And then there'll be some kind of RG flow where this microscopic loss function would give rise to so more macroscopic behavior, and then we would say, okay, well, at this level, it is described as a standard model. But the flow doesn't stop there, and then you go, okay, biology level. And a lot of work that we've done, again, we may touch upon this during discussion of consciousness,
Starting point is 00:54:39 but there again, I mean, it doesn't mean that at a biological level, it should be the standard model that governs the correct description. So if you can identify how the interiors work, you can kind of, RG flowed and understand how things will. Now, and for the more general audience,
Starting point is 00:54:58 this is just a way of saying, we are describing the universe around us, and depending on the scale on which discuss, the different languages are appropriate, more or less. So in a very microscopic level, the language is maybe neural networks. On the bigger level, maybe the right languages are field theories.
Starting point is 00:55:20 Interesting. Now even bigger, genotype or phenotype and then even bigger. And so you should not be surprised in this approach that on each level there is just a different language which is correct for describing certain on each scale on each energy, different configurations. There will be different languages which are more appropriate. That's kind of what we are saying here.
Starting point is 00:55:44 And it shouldn't stop here. Once we move on to gravity, cosmological scales, yes, we are trying right. now to use the language of field theories to describe cosmology, inflation, gravity, but things don't quite work on a cosmological level. There is this dark energy problem, there is dark matter problem. And so maybe once we adjust the language and start describing those scales using different language, different modeling, then you may have more agreement with the experiments that than we have now.
Starting point is 00:56:23 So it would now be a good time to talk about your second law of learning? Sure, sure, sure. I mean, I've touched upon this. So I think it's important to emphasize that just like the second law of thermodynamics, we really like it. But we should understand that it doesn't work. It works all the time until it doesn't. So we should not take any microscopic laws should be taken.
Starting point is 00:56:51 with a grain of salt. But nevertheless, it is useful for many, many different calculations. And so second law of thermodynamics tells that the entropy should grow. And then if you really take it literally, then you'll have hard time explaining the emergence of life. You'll have hard time explaining many biological phenomena if you want to be honest. Now, if you want to... Wait a moment. Why would you have a difficult time explaining the origins of life?
Starting point is 00:57:19 because it's always that global entropy increases, but you're going to have local. Right, but okay, but if you're only talking about global entropy, there is just one equation, there's one number. It's not very interesting. What's interesting is what, and okay, it's increases or decreases. It doesn't matter. What really happens locally,
Starting point is 00:57:36 because, you know, in the thermodynamics, it says you have any subsystem big enough, and they're locally entropy will grow. And so instead of just having one number in the entire universe that you, observed that grows, what's more interesting is to pay attention to what happens in the different subsup systems and then explain how the entropy and how you define the entropy. It's also like, you know, because if you talk about gravity, it's very important to think about how you actually define
Starting point is 00:58:06 entropy. This is, now you have a space that is curved. Now, if things, if gravity pulls things together, wouldn't it mean that the entropy decreases, right? So you have, kind of before they would be distributed everywhere and they pulled together. Now, naively, you would say it decreases, but then I say, well, no, no, no, no, I'll define entropy in different
Starting point is 00:58:29 way. I'll attach the problem, so that it doesn't. But again, if you only pay attention to one number that you want so badly to increase, then okay, let it be, I'm saying that the usefulness of
Starting point is 00:58:45 of second law of thermodynamics that can be applied for many subsystems, very successful. And then when it, but there are certain things that with this you will have hard time explaining. And as a cosmologist, I have to say, you know, we have one beautiful theory in cosmology called the theory of cosmic inflation
Starting point is 00:59:03 that is kind of, is hopeless. It doesn't know what to do if you try to assign probabilities for observers to emerge and if you use the usual classical mechanical or physical approaches and trying to calculate, what's the probability of a certain observatory tournament? Is it easier for us to go through this, you know,
Starting point is 00:59:30 inflation, then galaxy formation, then biology formation, or it's just easier for us to form Boltzmann brains that are floating in the empty space with the memory of us thinking that we are on the meeting right now. And you will see that in this very successful
Starting point is 00:59:52 otherwise models of cosmology, you will see that very often you get the answer that, yeah, all this correct, what we think is correct, it just gives you a lower probability. It's higher probability
Starting point is 01:00:05 for you just to nucleate out of nothing. And that's called the Boltzmann brain problem. So no, it's, once you, first of all, once you enter gravity territory, cosmology, you have to be careful about what you call entropy. And then there is this problem of defining probabilities for us observing what we observe. So, and people employ different ideas, one of which is on tropic principle. Well, let's put that in mind. So maybe we should not be just a random point in space because of it. What we were, we probably would not be observing what we're observing.
Starting point is 01:00:49 Now, maybe we should only look at the places that are actually tuned for life, smart enough observers who can ask those questions, entropic principles. To be clear, the anthropic principle is different from an entropic principle, which is... Good point. Very good point. Actually, it's a lot of confusion about that, and partially, because because entropy is used in those conclusions, but you're absolutely right.
Starting point is 01:01:17 Anthropic meaning with an A starts with an entropy principle. And that's people, you know, many, very large fraction of physicists don't like the principle. They disregarded as a non-scientific. Now in cosmology, that was kind of only game in town.
Starting point is 01:01:35 Now, until Lee Smolling proposed his natural selection approach, which is, I think, closely related to what I'm trying to say. Interesting. With the only twist, I'm actually giving a mechanism. Yes, yes. A mechanism of how, how instead of a universe being fine-tuned for life,
Starting point is 01:01:57 it is self-tune for life. So you start with whatever you want, because universe is learning. So it's consistent of learning subsystems. And the learning subsystem, and they all try to learn what around them in the most stupid way. Because of that, you should not, you are not tuning anything.
Starting point is 01:02:14 It is self-tuning itself. It likes to be observed. And so observers emerge not because there's carefully chosen constants of nature, but because if they were not carefully chosen, then they would be learned to evolve towards being carefully chosen. And said, you're giving you a physical mechanism of how, how ideas that were, and actually, Lee Smolin told me that this idea came.
Starting point is 01:02:47 Before him, there were philosophy. There's always, any idea you described, there's always some philosopher in the past who said the same. Okay, so even, so those ideas were, of course, but here that you can actually say what the mechanism is for this to take place. With MX Platinum, you have access to over 14, 300 airport lounges worldwide.
Starting point is 01:03:10 So your experience before takeoff is a taste of what's to come. That's the powerful backing of Amex. Conditions apply. And then in the comment section there'll be, oh, and this philosopher is predated by the Vedic texts. Yeah, there's always someone before. Yeah, but I don't think there is competition here. Just with every new time you rediscover something, what you're trying to do is trying to do is trying to to make it more rigorous using the tools you currently have. So, you know, there were no machine learning systems 100 years ago.
Starting point is 01:03:45 There were no neural network dynamics back then. Yes, exactly. And so those are the tools. You know, can you use those tools? Well, speaking about the tools, there's one little problem that kind of maybe unique, maybe not. Many times physicists came to realization that new tools are needed. Einstein is a great example. you know, who would have thought that curved spaces
Starting point is 01:04:10 or rimmed geometry is important for modeling the universe? Nobody, but Einstein came around and said, no, this is the mathematics you need. You need differential geometry to describe. So, but at least at that time, there was already a body of work where you can just take this framework. With neural networks station is a little bit different, we have so much experiments, so many experiments,
Starting point is 01:04:35 So many experiments. So every time you're amazed by what neural network does, it's an experiment. And not so much of the actual theory of you being able to actually predict ahead of time. Yes, this architecture will work. And no, that will not work for such and such reasons. So that's kind of the theory is a bit behind here, which is like perfect playground for the theories. because I can set up experiment, write down my theory, and then test it experimental and numerically right away.
Starting point is 01:05:09 So that's great. Okay, but that's a diversion from. I want to get to where you said that the universe likes to be observed. We're going to get to that and we're going to get to consciousness. But before we do, I recall you saying that natural selection operates on the level of subatomic particles. Am I shaky in my recollection? Not from this conversation, but somewhere else. No, no, you absolutely, it may have not been in this conversation, as I said that, I certainly wrote about it in the papers.
Starting point is 01:05:39 And yeah, and so this natural selection like where by natural selection, I mean, the more useful configurations of networks survive because they help loss function to be minimized better. And the other configuration will not survive. So in this sense, natural destruction, those architectures that are useful for learning will stay. And those who are not useful for learning will be removed because their loss function is not as low as should be. So in this sense, yeah. But it doesn't work just on the level of particles, the mental particles. Remember, again, about this analogy of the right language on different scales. So if you want to talk about this at the level of particles,
Starting point is 01:06:26 Yes, you would say particles are such, you know, the way they are is because they underwent the series of natural selection of their scales and figured out that this is the states, this is the state of their neural networks that describe them that are allowing their loss function to be the smallest. But this is, can be this argument, this natural selection argument, or I call it more, you know, learning argument, right? So you're trying to minimize something. It can be applied to any scales. It can be applied to scales of biology, which we usually do.
Starting point is 01:07:02 When we say in national selection, we usually think about the scales of organisms, right? You know, organisms against some kind of configurations. They are more maybe fluid configurations because no two organisms are alike. But so maybe particles, right? So, yeah, they look very similar. Like all electrons look very similar, but we don't know, maybe there is some tiny difference. And the way we are trying to understand the tiny difference,
Starting point is 01:07:32 or maybe it's already went through this very long period of natural selection, and that is the value. That's what electron mass should be. And there's nothing else I can do at this point. I imagine if you could put a mark on an electron and a bosan and distinguish them, then the spin statistics theorem wouldn't apply anymore, and we would see some effects of that.
Starting point is 01:07:54 Right. So the loss function would be, it's just not convenient for not to have fermions. And again, we kind of understand that. I think one good example I can give that maybe a very general audience will understand and machine learning people will definitely understand cars, cars and moving in the traffic and they are trying to sell driving cars. So we assume we're 10 years from now where all the cars are, you know, all the cars are self-driving, maybe sooner, maybe later, I don't know.
Starting point is 01:08:28 And then all of them are driving, and they're all trying to optimize their loss function. But to optimize their loss function, they have to do some calculations about the environment. They have to, you know, scan the environment, you know, find something, plug it into maybe their network, and then network will say turn left, turn right. And so, and at this time, that information that it collects, each of the cars collects, we actually call in the physics of electrodynamics, we call that bosonic field. So it's electromagnetic field or green function
Starting point is 01:09:06 of other electrons that I scan around propagates to me and that gives me relevant information for what me to do as an electron. Or cars scan around for other cars, get that information and say, okay, that's the relevant information to me, for me to make left. right turn or accelerate.
Starting point is 01:09:25 So in this sense, cars are like fermions. They are advanced enough to be able to process, not the set of the entire universe around them, because they are tiny, but relevant information for them to optimize their life. And if they would be doing something else, they would not behave as an electron, and that would create some kind of unstable behavior,
Starting point is 01:09:49 and this whole system would not work as it should. So just like, you know, all the cars converge to some, all self-driving cars are using the same software just because that's useful. You know, electrons kind of using all the same software of how to navigate in the electromagnetic field. So this is this analogy maybe helps to think about electrons as, you know, self-driving particles that have already established what is the best. Maybe they haven't established it exactly, and so there's still internal degrees of freedom. You know, there's spin and there are other things. So, you know, in one circumstance, I will be doing this and other that. And the same for the cars, right?
Starting point is 01:10:35 So you will have different cars driving in UK and US, right? Because a left, you know, left side, right side driving. So there's state of the self-driving softwares in the cars. We still have to be different. Have to have to agree. but other than that, a lot of similarities in describing those cars would be, and the same so if it's useful, this is a correct analogy. There's a structural similarity between the cosmos if you zoom out far enough and neurons,
Starting point is 01:11:08 and some people use this to suggest there's a cosmic brain. Now, I want to talk about what you're not saying. Are you not saying that, or are you saying that? No, absolutely not. I'm sorry if I interrupted you, but I'm not saying that. Okay, it seems to me like this comports with your theory. So it would seem like you'd be like, oh, that's great evidence for my theories. Maybe, maybe not.
Starting point is 01:11:29 Yeah. So both of things that you said are true. So first of all, I'm not saying that because I haven't confirmed. No, no, visually I've confirmed they look similar. Right. So we've all done that. There are papers of people who actually try to do statistical analysis, which is the right thing to do and statistically showed that there are.
Starting point is 01:11:50 certain things that are similar. There is a well-known critical-like phenomenon where you have some kind of scale and variance that is observed in the cosmic web and observed in the biological networks. So now I haven't mapped out exactly the dynamics of the galaxies formation and how all this would come around.
Starting point is 01:12:14 I've only done calculations suggesting that self-organized criticality or critical state is something that you should expect to see in the learning system. And it's a good thing for a learning system to have criticality. So there is this calculation that tells you, yes, criticality is good.
Starting point is 01:12:30 And then you can say it and say, okay, once it's good, doesn't this confirm the observed criticality in the brain activity or the observed criticality that we see? Yes. But this is indirect evidences. So maybe this is actually,
Starting point is 01:12:46 we are talking about this cosmological-like scale where the system performing very slowly maybe some kind of very complex calculations. Or maybe we're just saying it slowly, but actually doing some important learning task. So, yeah, so because of the, I do not say that, usually. I show those pictures when I give public talks, but I do not say that I have done enough rigorous calculations
Starting point is 01:13:17 for the structure formations to say that this. I know how to do those calculations, but there's just only 24 hours a day. Yes. You have a rare quality where you will assert something and then say, and here's why it's either not fully the case or I don't believe it, or here's the limitation of my own model, here's the counter evidence. I haven't seen that in almost any of the people that I interview. Okay, so I also hated that when I was a student.
Starting point is 01:13:46 because even when you're a student and you come to the class and then they tell you something that they've been taught and they take it without actually trying to question it, I think this is a horrible quantity. We physicists actually can do better.
Starting point is 01:14:04 I think, I don't remember who exactly said that, but we should be doubting everything. So we should be doubting our own models, our own calculations, calculations that other people had done even if a hundred people come to you and say that general activity is wrong it doesn't mean it's wrong
Starting point is 01:14:27 and we know all that that story when the 100 physicists wrote a letter to Einstein signing that saying that general activity is wrong and his reply was brilliant I mean again if you don't need a hundred if you have a point or show me a point and I'll consider it so yeah I
Starting point is 01:14:46 I think this is the quality that you absolutely must, you have to show all the good and bad things about, because I've thought about this. I mean, of course, I've tried to answer it. I don't have all the answers. So I think this is more honest and correct way of doing this. And we should be doing it not just with new theories. There are a lot of problems with existing theories.
Starting point is 01:15:08 Classical theories are not as pure as they thought. There are divergences and things we don't fully understand. And we should be telling people and students about them when we discuss all this. So don't put anything under the rock. That's kind of my approach. Because sooner or later, the smarter people will find what's under the rock.
Starting point is 01:15:29 And that's certainly very, very important. With that out of the way, and thank you for that, by the way. Let's get to consciousness. Is the universe itself in your model with the universe being the neuromet? Conscious? Okay. Very good question, and of course I get this question all the time.
Starting point is 01:15:50 Now, here is my maybe a bit longer answer, but I think I need to say that. You come with the mathematical framework, new mathematical framework, which is very rich, which relies on neural networks and the learning dynamics, and you're trying to use that to describe some phenomenon. This phenomena may be physical phenomenon like we talked about, or the phenomena that people are discussing in other branch of science like consciousness. So they already have a term for something,
Starting point is 01:16:21 and you're bringing a toolbox. In this toolbox that I have or learning things, there is nothing that I would call consciousness. But I'm trying to use it to describe what people mean by consciousness, and I can have many attempts. So I may suggest something, and they will say, well, this looks like
Starting point is 01:16:43 a not a good definition of consciousness because here's the system that we all agreed, a hundred of us agreed that is conscious, but your system, your definition tells it it's not. Then okay, then either I say, well, maybe you should adjust your notion of what consciousness,
Starting point is 01:17:04 or maybe I should adjust my definition of consciousness. And both ways are fun. Now, my definition of consciousness comes from how I understand it, how I can build it within framework that I understand, a mathematical frame. And then this mathematical framework, you know, system undergoes learning dynamics,
Starting point is 01:17:23 and there are three macroscopic things that are directly related to learning that I can calculate. So one of the things is how fast system adapts to the new data set, to the new environment, how fast it learns. So this is the, I can actually calculate it, is the decay rate of the loss function. That sounds to me more like intelligence than consciousness.
Starting point is 01:17:47 Right. And well, okay, so, but then I say, hold on in the intelligence because I have comment about that as well. So, so, so, so, so, and then I'll say, I want to, as a hypothesis, call the risk rate of decay, how fast it, I want to call it consciousness. Maybe it will be wrong, but I will call it, because I come with a new, um, new framework and this framework I can call it. But I say right away there are two more things that are also macroscopic, and some people may relate it to things related to consciousness, but I would relate it to intelligence. And I would say actually three things contribute to intelligence. If you judge a learning system how well it behaves, then there are three quantities you have to calculate. One is how fast things learns, and I call it consciousness.
Starting point is 01:18:36 Maybe you would call it intelligence. Second thing is how low does the loss function go? Asymptotical. If I had infinite time, how will it go? Yeah, it may learn fast, but then just halt, stop. So I would say, okay, that's another ingredient that is important, you know, how low is a loss function. And the third one is once it reached this asymptotic loss,
Starting point is 01:19:00 it's not going to stay there. It's going to be fluctuating because that's what learning activation dynamics. You just don't stop. You never stop. You're always in this learning equilibrium, and sometimes loss goes a little bit up, down, up, down, so you always fluctuate. So how big are the fluctuations?
Starting point is 01:19:17 So then I have a learning system, and I can calculate three things. How fast you learn, how good you learn if you had infinite time, and how stable what you learn. And so I would say that because of that, those three things, I actually describe what I think is intelligence, not just one IQ number. Three things.
Starting point is 01:19:36 And then you can have, you know, different people. Some people learn very fast, but they stop again. They are not learning more and more. Their loss function is halted. Other people may take very long time to learn. And then eventually they end up knowing all of the differential geometry and, you know, quantum field theory, what's not. And then the third type of people who maybe also learn fast and maybe they
Starting point is 01:20:05 know all of the advanced mathematics, but they're very unstable. Until they keep repeating it, they keep forgetting and then opening your book again. We all forget stuff, right? Do you learn something? I forget the things I wrote in my papers, right? There are many papers.
Starting point is 01:20:19 I have to look out. So there is some degree of how my knowledge, my loss function is actually fluctuated. So I would put those three things. I would put those three things and say, that's intelligence, at least three. So it's at least three, maybe more. because actually there are more because, you know, it's a stochastic variable.
Starting point is 01:20:38 There are statistical moments, first, second, or third. So I'm simplifying things. You can describe these fluctuations with the infinite number of parameters. But at least those three things are very important. And I think when you look at different systems, you can actually say, all different people, you can say students, right, you can say, well, yeah, this has a very good learning efficiency. I would say he's more conscious. And this has a very bad learning efficient.
Starting point is 01:21:04 less conscious. But again, this is just a definition. If somebody tells me that even with your framework, I suggest it has to be, you know, a square root of two times the first number plus square of two. I'm okay with that. So as long as my, you know, I declare what I mean by, and then, then I'm happy. I understand the first two. But the last one about stability, why does that have anything to do with your intelligence when it could be the universe is changing? seem to have anything to do with you. Yeah, so under assumption, if the universe doesn't change at all,
Starting point is 01:21:40 so, so kind of, or changes are, it's always changing, okay, so it's always changing. This is what processing data set, you, your data set, you get always different images of cats and dogs. You look around and every day there is like new shape of trees and leaves arrange themselves, so you always have that.
Starting point is 01:22:01 But I say, let's integrate that, so it will be just some stage, statistical state of the universe. So no major events. There is no, you know, major, nothing hits the earth and creates a sequence of earthquakes and there's no nuclear wars.
Starting point is 01:22:25 Nothing major. So I'm more or less in this learning equilibrium. So if that's the case, if nothing major change, maybe a good example would be, you know, you take a bacteria as an observer place in some kind of controlled environment, where you kind of keep maybe the temperature the same, the same amount of light or so.
Starting point is 01:22:45 So if you work with this ensemble, then the loss function will still change. It will still change because of, you know, stochastic gradient descent. Now, you know, sometimes our bacteria is the light appears on the left, then it moves to the left, maybe it appears to the right, or an opposite direction, it moves to the right. So there are some changes, and it always updates its new,
Starting point is 01:23:11 it's trainable variables to, if, why it would do that? Because if they are state, statistical change, I would be able to adopt. So kind of that your ability to adapt is actually, you know, backfires on you. And it creates more fluctuation. So we have to come up.
Starting point is 01:23:28 And people actually know about that if you kind of set the learning rate to be smaller, like in some algorithm, then it will go to a stable, very stable minimum, but it will not be as good minimal. So these fluctuations should not be treated as a bug. It's actually a feature to get out of the local equilibrium. And so that happens all the time.
Starting point is 01:23:54 When McDonald's partnered with Franks Redhot, they said they could put that shit on everything. So that's exactly what McDonald's did. They put it on your McChrispy. They put it in your hot Honey McNuggets dip. They even put it in the creamy garlic sauce on your McMuffin. The McDonald's Frank's Red Hot menu. They put that shit on everything.
Starting point is 01:24:16 Breakfast available until 11 a.m. At participating Canadian restaurants for a limited time. Franks Red Hot is a registered trademark of the French's food company LLC. Am I correct in saying that you said at some point that we need to unify not two, not quantum theory and general relativity, but quantum theory, general relativity, and observers. Okay, so most physicists tend to think of observers as coming from the physics, something emergent.
Starting point is 01:24:41 Why do you think that we have to unify these three at the same time? Right. And most physicists will tell you, biologists somehow will, you know, emerge from, once I have string theory, you know, completely done, and quantum gravity quantized. I call it wishful thinking. There is no evidence for that. other than we think, we kind of putting things, in one way we're saying, putting under the rug,
Starting point is 01:25:08 we're just saying if something is complex, well, yeah, yeah, yeah, but if I do long enough calculations, if I have long enough time, that's how it's going to work. And I don't, for example, one example, most features are convinced that quantum mechanics has placed no role for how, you know, for consciousness, how we function, how our brain works, right? So, yeah, makes sense, microscopic objects, why would quantum mechanics? But we have no, you know, proof for that. We have no, and I think it is more spiritual thinking, again, related to how the second law of thermodynamics
Starting point is 01:25:42 is a wishful thinking that it has to be really working. So I don't think so. But the other answer to that is that the fact that observers are very special and should how be understood, I think realized by most, physicist who pay attention to two important problems in physics. One important problem is the measurement problem. So every single physicist who actually seriously thought about
Starting point is 01:26:13 foundations quantum mechanics. Not the person who is just doing shut up and calculate type of things and like following the manual of the, but who is trying to understand the study problem will necessarily realize that there is a measurement problem. Measurement problem is about this third postulate of quantum mechanics, that is very new, because in classical physics, all we need is state and how it evolves. Here you need state, how it evolves, and how it is observed. So that's one.
Starting point is 01:26:43 And then you kind of have to say something that maybe there is something additional to quantum mechanics that you have to describe. Maybe it is an observer. Observer may play a special role. And if that's the case, if you realize that quantum mechanics is incomplete and the measurement seems to be playing a special role, then you are stuck with trying to describe it. Now, another problem in physics that comes around and also has to do with observable is cosmology. It's called the measure problem. Essentially, more or less, different problem, but more or less the same complication coming from observers. So if you're trying to assign probabilities to different observations in cosmology, what should be the right probability?
Starting point is 01:27:34 We discussed the both brain problems. It's a part of it. So you have to specify the rules. You have to describe how to deal with observers. And in both cases, if you actually think about this, the complexity comes from the fact that we are trying to put observer into the system. So when observer is outside of the system, we all know what to do. We know, you know, there is Hamiltonian, how it describes.
Starting point is 01:27:56 but once you put the observer into the system, so this is, you know, for more general audience, people know about Schrodinger-Kat problem. So if you put something and then the Winger's Farrant problem, and you start putting observers inside, things start to break. And is it important that the observer is conscious? Right. So at this point, no.
Starting point is 01:28:20 There's something fishy about observers, but we don't have a model of observers. So once you say, okay, some people would claim, yes, consciousness is important and they have their own definition. I say it's important to model observer. So if you want to put it inside in the system, then you really want to model how it behaves
Starting point is 01:28:39 and model not just saying, well, maybe some kind of emergent phenomena of biology will happen and maybe some kind of wave function collapse will happen. This isn't going to work if you're trying to do the calculation. So my answer is, yes, observers are very important, And that's why you really have to describe them if you want to do calculations even separately in quantum mechanics and gravity.
Starting point is 01:29:03 And more so, and maybe because those two problems persist in cosmology where is the gravity and quantum mechanics where there is where there is the measurement problem, maybe the solution is actually try to understand how observers work. And then once you understand that,
Starting point is 01:29:20 the both theories may some kind of be unified and their observer would be model as well because it seems to be the problem with all of the, with both theories, which again, you know, you can certainly put it under the rock, you can stop to ignore it. Maybe elephant in the room is,
Starting point is 01:29:40 everybody knows it's there, but we are trying to look the opposite way. And I'm, as you said, I see this elephant, I say, well, it's there. At what point do you imagine observers entered into physics? Is it at the plank epoch? Is it prior?
Starting point is 01:29:58 So in this model, everything is conscious. There are observers everywhere. Every subsystem is an observer. Some of them are efficient observers. They have efficient architecture, so the loss function falls down. Some of them are stable observers. They've already reached the very low value
Starting point is 01:30:16 of the loss function. And some of them are not stable, and they've always fluctuate out of. So any subsystem because the building blocks in this model are neurons and they come with the trainable and non-trainable variables because of that, everything is learning, so everything in this sense and this observer,
Starting point is 01:30:35 just some observers are capable of doing and asking perhaps more complex questions than others. So maybe, although we don't know, right? Maybe inside of electron, there is whole complex neural network that has already solved the problem, of quantum gravity and just looking at us and laughing, saying, well, guys, I mean, it's simple. Yes.
Starting point is 01:30:59 Maybe, maybe. We don't know about that. But in this model, as a model, we started with this. I'm not saying this is how it works. But as a model, that could very well be. As far as I understand, there's a number that you can associate with consciousness. How conscious is this subsystem? But consciousness to us is far more than just a number.
Starting point is 01:31:19 We care about how conscious is someone, most of the time when it comes the health, are they alive? Should we remove the plug? And are they going to wake up? But our consciousness, we're conscious of so much. So in your model, do you have any qualia? Right. So I wasn't one of the FQXI conference. And there was like heated discussion. Every time consciousness is discussed, and if there are physicists and non-physic in the room, it's always a heated discussion. And so should we call consciousness the person who is actually conscious in the sense of, you know, talking and interacting with you another conscious observer? Or should we call consciousness something? So is it like a discrete? You know, I talk to you, then I call you conscious. And you reply. You know, I talk to a dog and he replies it's conscious. So yeah, absolutely. You can then say consciousness will be, um, uh, you know, I talk to a dog and he replies it's conscious. So yeah, absolutely. You can then say consciousness will be, um, uh,
Starting point is 01:32:20 defined as a coupling, a strength of the coupling of the organism with the sound wave or light or some electromagnetic phenomenon. You could do that. You can do that. And that would be your definition of conscience. Maybe it's better than mine, right? And then we would not be arguing to say,
Starting point is 01:32:40 okay, look, this is a person, he's not conscious. But then there will be people who say, yes, I have such and such friend or, or a relative who is in comma, but he is conscious. And he would say, no, the fact that the person is in comma and isn't interacting with you
Starting point is 01:33:00 the way you wanted to interact, it doesn't mean that he's not interacting with you some other way. And actually, I've mentioned, I mean, I have to say this speculative idea because, you know, maybe philosophers will love it or maybe not.
Starting point is 01:33:14 Sure. But we discussed for quantum mechanics, you need this death of neurons. Otherwise, it just doesn't behave like... So it could be this bat of neurons is always there, but it's not in our physical space. It's in the hidden space, what I call. And if it's there, then nothing stops for a person
Starting point is 01:33:34 who's not interacting with you in a physical space to interact with a hidden space. Maybe that's what you do in your dreams, or maybe, you know, people are interacting through this hidden space all the time, and people do claim that. So there's a lot of people who claim that. this special abilities, right, to interact.
Starting point is 01:33:55 And I think the reason we don't take it seriously, physicists, I think for two reasons, we don't have a good enough framework to modeling this. And the second reason, we don't have controlled enough experiments to do that. But I think we should not be disregarding that when we become equipped with a better mathematical model and with better experiments. So, yeah, I wouldn't like this definition where the person is conscious only if you can hear or tell, reply back.
Starting point is 01:34:27 But, you know, chat GPT would be conscious according to that definition because it certainly replies when you write. So, again, there's a lot of discussion maybe not very important about definitions, but we need to do this. We need to define terms before we can make statements and that's just not attempt to do that.
Starting point is 01:34:45 What distinguishes between trainable and hidden variables? Like do physical entities correspond to some, or even mental entities correspond to one, not the other, or what? Right. So the hidden variables in this case are like hidden neurons, and their states are described by non-trainable variables, something that, but all of the non-trainable variables in your network are connected by trainable variables, they're called weights.
Starting point is 01:35:12 So kind of, you can not like just draw, a sharp line and say here is the, you know, trainable and here's not trainable. Very much like in physics, you cannot say here is electromagnetic wave and here are electrons. They're coupled. They're all communicating through each other. Now the
Starting point is 01:35:28 difference of a hidden non-trainable variables and a physical non-trainable variables is that the one that are physical, they've organized themselves in these three-dimensional structures and they've kind of discovered their
Starting point is 01:35:43 effectiveness of using three-dimensional space for exchanging information and minimizing their loss function. The hidden space at this point, they can have arbitrary connections to each other. It's not maybe a good idea to think about initially you have a soup of neurons, everything is connected to everything. And it's all hidden in a sense that no physical space
Starting point is 01:36:08 had yet emerged. And then there is like bubble from a big bank. and then there's a certain number of those neurons figured out that they can learn a lot more like phase transition and can minimize their microscopic loss function if they arrange themselves in the three-dimensional space. So if that phase transition took place,
Starting point is 01:36:29 then you still have the hidden variables, which you can always hire if you need to do calculations. But they need not be present in your physical space. they need not interact with the classical degrees of freedom. They still interact by providing you with this quantumness, but they need not be directly observable and capable. So that's the model for them. And it is correct to call them hidden variables
Starting point is 01:36:59 because hidden variables is one of those. It's called interpretation of quantum mechanics, but it's an attempt to actually make quantum mechanics more mysterious, less mysterious, trying to actually, it comes with its own problems and we can certainly talk about them,
Starting point is 01:37:17 but at least it doesn't put, tries to put less stuff under the rock. We physicists keep doing that, like keep lying without saying that we are lying in a good sense. We're not doing it intentionally. We're simplifying things, right? But one of those things is that we are, we can say something like,
Starting point is 01:37:35 I believe in the many worlds interpretation. But once you corner the person, he will admit that it's not as clear and they're abroad. So, yeah. If the universe is learning, what is it learning toward? Right. So if the universe is learning and there is nothing but the universe, so that's it. That's all there is. It's an unsupervised learning.
Starting point is 01:38:05 And so the only thing it can learn, every subsystem, can learn the rest of the universe. So you put an arbitrary boundary, me and the rest. So I'm as a subsystem. The only thing I can learn, I can try to learn about myself if I'm unconscious,
Starting point is 01:38:21 like, you know, in coma, maybe I can do that. But that's what I will be interested to do. And that will actually help me also to survive. The more I learn, now we are moving to the biology level where we've done a lot of work
Starting point is 01:38:39 trying to understand how this exactly works. But basically, you know, organism has to learn its environment, model its environment, in order to better predict how our environment will behave. And then once it's able to predict it, it's more likely to survive. So this is what you have to do in order to survive. This is very, actually,
Starting point is 01:38:59 so first of all, it's similar, of course, it's to natural selection, but it's similar to the phenomenological model or that Carl Fristin is constructing where he's trying to say, okay, well, let's define some phenomenological function that maybe our ability
Starting point is 01:39:19 to predict the state of the environment is what, and what I'm adding to this story is that, yes, that's great, but let's actually dig deeper and give a microscopic interpretation of that. It's like, you know,
Starting point is 01:39:35 you can have thermodynamics, But then you can have a derivation of thermodynamics from statistical mechanics. So I'm going to say, well, you can also derive it through statistical mechanics of how neural networks work. So that's the idea. So then coming back to your question, every sum system is learning the rest, right? And we are not different. We are as our cells are not different. that each cell is learning
Starting point is 01:40:03 how to feed best into the organism and so that optimize its own loss function and the society isn't different. It's still learning but on different scales and so languages how we describe it changing, you know, addressing to the physicist audience. There is an RG flow. The loss function changes
Starting point is 01:40:25 as you start renormalizing and generalizing, and generalizing the concept of neurons. So in a small scale, it can be fundamental neurons. On the bigger scale, it can be sub-networks like particles. Then you can have something like cells, people, you know, civilizations, and societies like that.
Starting point is 01:40:49 If I recall correctly, your second law of learning is that learning efficiency is proportional to the Laplacian of the free energy. Is that correct? So in very, very, it's just like in standard physics, we can derive thermodynamics in very, very simple limits. We can, unfortunately, we physicists are only good in doing Gaussian integrals and doing calculations for a very simplified system. And for those systems, where you can simplify, you can do those calculations. You can show that it's related to our passing of the free energy, where the free energy is actually defined, microscopically as, you know, you start with energy-like function, which is a loss function,
Starting point is 01:41:36 okay, and then from that, you define free energy. You do not start it from the phenomenology, you start from micro-solve it. But in this case, in that particular limit, that was the answer in more complex systems where we're not dealing with Gaussians. And in the critical systems that we discuss, we are not dealing with Gaussians. So there's, you know, there's many, many, many scales are important. and those limits are things are much more complex and you cannot really like give this formula and say it's exact. I see. I see.
Starting point is 01:42:08 So, so, so, so yeah, in this limit, Laplacein of the free energy was important and more general, it may be, you know, have lots of possible corrections or as, as you know, in the perturbation theory may happen that it's not just correction, but they're dominating everything. And then yes, yes. Your zero thought is wrong. It's a non-perturbative limit and then the answer is completely wrong. And there are reasons.
Starting point is 01:42:29 to believe that the system is in this sense non-perturbative because of the criticality that we observe. So there's symmetry-breaking transitions take place. There's lots of complex things that, of course, I won't be able to talk about here. But, yeah, analytical calculations are hard, but I guess without them, we are not going to understand what's actually happening and what's the relevant language of describing different phenomena. Does Carl Fristence free energy, is it an independent claim from yours or does it emerge from your framework? No, I completely agree with him.
Starting point is 01:43:06 So there has to be a free energy in this setup. It's like it's a phenomenological way of saying that there is a function that you will be, you know, minimizing, optimizing. And that's right. I mean, you started with the beginning. How come classical mechanics also optimizes? Yeah, but it only deals with the, you know, very close, when you're close to the equilibrium in the sense of the learning dynamics. And the same with the, so he has a phenomenological model
Starting point is 01:43:37 that is very intuitive and very nice, and it describes that such a function must exist. It's a kind of existence. Now he doesn't start with a learning theory, but he starts with his understanding of how, you know, organism behavior. It makes sense. Makes total sense. Now, I'll give you an example how we can
Starting point is 01:43:58 his free energy may be corrected to be much better. So, for example, you can have an organism that isn't interested in predicting an environment but is interested in quantizing gravity. Okay, so for that organism, he will spend all his resources maybe locked in a room with no windows trying to quantize gravity writing equations.
Starting point is 01:44:20 So this organism will have its own free energy. Now it will not be the one that tries to predict how the environment doesn't matter how, maybe a little bit. I mean, I want to make sure that I survive. So on the high levels there, the free energy can be different for different organs. The question is whether you can derive them always from the microscopic dynamic. So you can do this RG flow and actually starting from some microscopic loss function. assumption derived. And it's an open question.
Starting point is 01:44:55 The only thing I can really add to free energy principle that Carl Fistner is advocating is that we can model it using this trainable and non-trainable variables and think about what you get once the non-trainable are integrated out and you pay attention to a handful of trainable variables. And then the system becomes something you can calculate and then you can model. And now you can model it phenomenally. So if you have a controlled enough experiment, you don't care what microscopic physics give rise to this energy. You can just calculate by seeing how the system behaves to the changing environment.
Starting point is 01:45:35 For example, you say, I want to interest it how the system behaves on the sound or on the light or on the temperature. And then you just model it as a function of those parameters. And that may hint you to actually how such a system would emerge from some Microsoft. So I don't think it's a contradiction to what Carl Friesen says. I'm saying that we can dig deeper if we assume that there's this learning dynamics happening on all scales. The scorebed app here with trusted stats and real-time sports news. Yeah, hey, who should I take in the Boston game? Well, statistically speaking.
Starting point is 01:46:15 Nah, no more statistically speaking. I want hot takes. I want knee-jerk reactions. That's not really what I do. Is that because you don't have any knees? The score bet. Trusted sports content, seamless sports betting. Download today.
Starting point is 01:46:30 19 plus Ontario only. If you have questions or concerns about your gambling or the gambling of someone close to you, please go to conicsonterio.ca. What's a piece of advice that you found inspirational that you keep coming back to? Advice that somebody gave me? It could be also that you read in your book.
Starting point is 01:46:50 It could be from an advisor. It could be from a movie. Something you found that's helped you. Yeah. Well, one advice I said, and it was very, very useful to me, and actually maybe not something I would advise students to do, but it worked for me, is to doubt everything. So do not trust.
Starting point is 01:47:18 anything that is relevant for your work until you try to do, you know, try to do as much calculation yourself as much understanding yourself. Now, why is this a bad advice is because it may not be optimal for a student who is, you know, trying to get professor position, tenure position or whatever, you know, if you are going to be doubting everything and doing all calculations yourself, you may just not, you will rubbish, right, publish or rubbish. So that's a bad advice, but it was, It is something I couldn't avoid, I could not to do.
Starting point is 01:47:51 So once I figure out that there is no problem that I cannot solve myself, I said, okay, I'll be doing that. And of course, I haven't done all the calculations. The archive is full of the calculations they have done, but as much as I can. So doubting is, I think, one advice that, and doubting. And that advice, of course, that I know about more concrete advice that was given me by my advisor, Alex Flinking. So I would come every day with some new idea
Starting point is 01:48:22 And he gave me advice that somebody gave to him And then I don't know how long it was And the device was you come up with some idea Some theory with some equations And then the next day You should try to criticize it as much as you can So like you flip up And like you know try to act as if you are opponent
Starting point is 01:48:42 To that idea And have this order and even days of their mind It's very helpful like really objective to look at what you have done and say, no, no, no, I don't like that you've done it. I'll try devil's advocate, right? So I'll try to disprove it and find all of the problems with it. And that's why whenever I'm talking to you and saying, look, I know why it's may or may not work. So I'm not trying to sell you, you know, and use car without telling you that, you know, something is broken because that wouldn't be fair. I wouldn't feel right.
Starting point is 01:49:18 And I would be confident about the coalition that I had done. So, yeah, constantly flipping with this, why is this wrong? Okay, one day is come up with ideas, do calculations. Next day, try to criticize as much as you can. And maybe like last statement here. Now, the chat GPT is horrible in doing that, you know, or other language. They tend to agree everything you say. And so my advice is push it into.
Starting point is 01:49:46 My advice not to use chart GPT for correcting your work without you verifying it. Using it for correcting your work is fine. Using it for suggesting ideas is fine. It's an excellent tool. We just don't know how to use it yet. We are completely, we are students of LLMs. Once we learn how to use it, but never trust or verify, we say in Russian. You should always verify it.
Starting point is 01:50:15 and you should, to the point that you redo the same calculations many times, because honestly, how many times we make mistakes when we do calculations? Well, I do the mistakes all the time. You know, we all do mistakes. And so you should keep questioning that. And so it's related to doubt and then, but doubt even your own ideas. I guess that's my advice. It was given to me.
Starting point is 01:50:39 Professor, thank you so much for spending two hours with me. It was two hours. Yes. It went by like that. It went by so quick. But space and time don't exist anyhow, so at least not fundamentally. Right, right. Well, Kurt, that was fun.
Starting point is 01:50:54 That was a lot of fun. I appreciate it having me on your podcast. It was very nice talking. Very nice questions, by the way. I have so many more. Let me just full disclosure. In addition to providing information, it was experiment for me, because from the time I conjectured that
Starting point is 01:51:16 the world is a neural network every time I talk to a person I conduct an experiment, how this person reacts to what I say and so you've been a great a great opponent, a great person to talk to to actually... A great guinea pig. Yeah, to actually...
Starting point is 01:51:32 So I've been experimenting with you whether you know it or not and so... And at some point when I will be constructing not just theory of biology, which we've done, but a theory of psychology, I might be using some of this discussion as an experimental evidence of certain psychological phenomena. All right. I'll take that as a compliment. It is. It is. No, no, it was really great. I mean, it's exceptionally. I'm very happy with the questions. It was very good.
Starting point is 01:52:03 Thank you. I'm honest with telling this that I've given interviews to a podcast when the people were not equipped at all with any of physics, lingua or the, and they just were pushing their own worldviews without trying to understand what I am trying to say. And that was really a torture for me. Because, you know, the point of the podcast, well, I think the point was, is to try to do both,
Starting point is 01:52:39 try to understand what I'm trying to say, and then try to point me in the right direction. So that's what I appreciate. And I've had good experiences with people who actually, you know, done their homework and then, you know, so it was obvious. You have a physics degree. That helps a lot because at least certain things I may say between the lines and you would, you know, put me back and say, okay, I'll clarify that more often. So that was very useful. And so I think that that was, thank you for that. So it was great. Great experience. All right.
Starting point is 01:53:09 Okay, take care, sir. And I'm sure we'll talk again. The audience is going to love you. I guarantee you. Hi there. Kurt here. If you'd like more content from theories of everything and the very best listening experience,
Starting point is 01:53:22 then be sure to check out my substack at kurtjymongle.org. Some of the top perks are that every week you get brand new episodes ahead of time. You also get bonus written content exclusively. For our members, that's C-U-R-T-J-I-M-U-N-G-A-L.org. You can also just search my name and the word substack on Google. Since I started that substack, it somehow already became number two in the science category. Now, substack for those who are unfamiliar is like a newsletter, one that's beautifully formatted, there's zero spam. This is the best place to follow the content of this channel that isn't anywhere else.
Starting point is 01:54:07 It's not on YouTube. It's not on Patreon. It's exclusive to the substack. It's free. There are ways for you to support me on substack if you want, and you'll get special bonuses if you do. Several people ask me like, hey, Kurt, you've spoken to so many people in the fields of theoretical physics, of philosophy, of consciousness. What are your thoughts, man? Well, while I remain impartial in interviews, this substack is a way to peer into my present deliberations on these topics. And it's the perfect way to support me directly. Kurtjymongle.org or search Kurtjymongle substack on Google. Oh, and I've received several messages, emails, and comments from professors and researchers saying that they recommend theories of
Starting point is 01:54:58 everything to their students. That's fantastic. If you're a professor or a lecturer or what have you, and there's a particular standout episode that students can benefit from or your friends, please do share. And of course, a huge thank you to our advertising sponsor, The Economist. Visit Economist.com slash Toe, to get a massive discount on their annual subscription.
Starting point is 01:55:23 I subscribe to The Economist, and you'll love it as well. Toe is actually the only podcast that they currently partner with. So it's a huge honor for me, and for you, you're getting an exclusive discount. That's Economist.com
Starting point is 01:55:37 slash tow, T-O-E. And finally, you should know this podcast is on iTunes, it's on Spotify, it's on all the audio platforms. All you have to do is type in theories of everything and you'll find it. I know my last name is complicated, so maybe you don't want to type in Jai Mungal,
Starting point is 01:55:55 but you can type in theories of everything and you'll find it. Personally, I gain from re-watching lectures and podcasts. I also read in the comment that Toe listeners, also gain from replaying. So how about instead you relisten on one of those platforms like iTunes, Spotify, Google podcasts?
Starting point is 01:56:11 Whatever podcast catcher you use, I'm there with you. Thank you for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.