Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas - 266 | Christoph Adami on How Information Makes Sense of Biology

Episode Date: February 19, 2024

Evolution is sometimes described -- not precisely, but with some justification -- as being about the "survival of the fittest." But that idea doesn't work unless there is some way for one generation t...o pass down information about how best to survive. We now know that such information is passed down in a variety of ways: through our inherited genome, through epigenetic factors, and of course through cultural transmission. Chris Adami suggests that we update Dobzhansky's maxim "Nothing in biology makes sense except in the light of evolution" to "... except in the light of information." We talk about information theory as a subject in its own right, and how it helps us to understand organisms, evolution, and the origin of life. Blog post with transcript: https://www.preposterousuniverse.com/podcast/2024/02/19/266-christoph-adami-on-how-information-makes-sense-of-biology/ Support Mindscape on Patreon. Christoph Adami received his Ph.D. in physics from Stony Brook University. He is currently professor of Microbiology and Molecular Genetics as well as Physics and Astronomy at Michigan State University. Among his awards are the NASA Exceptional Achievement Medal and the Lifetime Achievement Award from the International Society for Artificial Life. His new book is The Evolution of Biological Information: How Evolution Creates Complexity, from Viruses to Brains. Web site Michigan State web page Google Scholar publications Wikipedia

Transcript
Discussion (0)
Starting point is 00:00:01 Hey everyone, it's Cal Penn. I'm inviting you to join the best-sounding book club you've ever heard with my podcast, Earsay, the Audible and I-Heart Audio Book Club. Every episode, I nerd out with amazing guests and dive into the best new audiobooks available on Audible. It's the book club for your ears. Listen to Earsay, the Audible and I-Heart Audio Book Club. On the I-Heart Radio app or wherever you get your podcasts. Exema's Unrelenting, Itch and Rash.
Starting point is 00:00:34 If you know the feeling, you should know the facts. The eczema medication you're taking may not be right for you. Visit MyRawtruth.com and talk to your dermatologist about your symptoms and treatment options. Hello, everyone. Welcome to the Mindscape podcast. I'm your host, Sean Carroll. Long-time listeners will know that I've long been a fan of Las Vegas. I'd like to go to Vegas, play poker, eat, shop, whatever, just relax and have a good time.
Starting point is 00:00:59 Haven't been able to do it quite as much. Now that we're on the East Coast, that's okay. There's always pluses and minuses. Now we can go to New York. It's like the Las Vegas of the East. But anyway, one of the reasons why I like Las Vegas is in addition to the stereotypes. So the stereotypes are absolutely there, right? There are long rows of slot machines with dull-eyed people, mindlessly pressing buttons hoping to get rich someday.
Starting point is 00:01:25 It's kind of depressing, that part. But there's a lot of weird stuff in Vegas if you look closely enough. So by Vegas standards, one of the weird things, that I really like is there's an outpost of Bauman Rare Books. This is a well-known rare book store with headquarters in New York, but they have a Vegas store in the Palazzo Hotel Casino, where I used to stay sometimes. And it's just weird, among all the glitz and glamour, here's rare books. And they actually have a lot of first editions of things, but also some weird sciencey books. They have assigned Albert Einstein, for example. And I was very amused to see in
Starting point is 00:02:03 Bowman's Rare Books in Vegas, a copy of Claude Shannon's master's thesis. Claude Shannon, famous 20th century scientist for many reasons, and his master's thesis, I didn't actually know this before I saw it in the bookstore, is very famous. It's been called by Howard Gardner, the most influential master's thesis of all 20th century science. This is a master's thesis that has its own Wikipedia page, okay? and it's about basically using Boolean analysis to model and improve the efficiency of circuits and things like that. Pretty awesome for a master's thesis kind of thing. But for my purposes today, I'm just using this as a somewhat strained segue into, of course,
Starting point is 00:02:48 Claude Shannon's big contribution, even bigger than the master's thesis, came 10 years after, when he essentially invented information theory. He wrote a paper called the mathematical theory of communication, and like many great scientific breakthroughs, theoretical breakthroughs, this was driven by a down-to-earth technological practical need. Shannon was interested in sending signals across the Atlantic Ocean in wires, and you want to send a signal in a way that is efficient, right? That is least likely to be garbled or to lose the information, and to do that, to figure out how to do that, you need to have a mathematical way of characterizing what you mean by information. And it turns out, as he discovered, by doing the math, that it's very, very similar to the formula for entropy in statistical mechanics. So there was clearly some interesting hidden relationship here between information and communication on the one side and statistical mechanics and thermodynamics on the other side.
Starting point is 00:03:50 And of course, so this is the middle of the century, 1948. This idea has blossomed quite a bit. You hear about information all the time. you hear about it in the political arena, right? Misinformation, disinformation. You hear about it in technology. Remember the information superhighway? That was a big thing.
Starting point is 00:04:08 And you must have heard that information as a concept is useful in biology, in neuroscience, quantum mechanics, quantum information, and so on. So today on the podcast, we have Christoph Adami, who is, like many Minescape guests. He started out his career as a relatively traditional physicist doing nuclear and particle physics. And now he's a professor of Michigan State, not only in physics, but also in, I think, molecular biology department, but he studies evolution and life, including artificial life. And he has a new book out called The Evolution of Biological Information, How Evolution Creates Complexity from Viruses to Brains. So I'm going to use this as an excuse to really get clear on what is this information theory stuff. I mean, by the way, the book is a little technical. I'm not going to, advocate it for people who don't like equations. There are some equations there, but the equations are quite mild. You know, if you're the kind of person who doesn't read equations every day, but doesn't blanche when you see one, it might be a good book to pick up. And we're going to talk
Starting point is 00:05:13 about what information theory is in general, and specifically how biologists or people who care about biology, even if they're physicists, are going to use information theory to better understand things like evolution and even the origin of life. I would say that 75 years later, the idea of information theory we got from Claude Shannon is still not settled in. You know, we're still working through the basic implications of this. So it's very exciting. It's fun. It's a new lens with which to look at the world in a detailed way. And, you know, that's all we like to do here at the Mindscape Podcast.
Starting point is 00:05:46 So let's go. Chris Adami, welcome to the Mindscape Podcast. Glad you're here. You know, let's start with this idea of information theory. It's always going to be a challenge when you have an idea like information theory that uses a natural language word that people are familiar with, right? Information. They think they know what information is. But now we have math behind it. We've talked about information theory a lot on the podcast without ever just taking a breath and saying what it is. So do you have a best way, a favorite way of explaining information theory to the person on the street? I do. It's actually strange that it's a bit of. In this case, the mathematical understanding of the word information is actually very similar or very close to our understanding, generally speaking, in our normal everyday usage. So to really define information mathematically or even in words, we can just say that information is that which allows someone who has that information to make predictions with accuracy better than chance.
Starting point is 00:07:09 right so there's a system that i know something about it and then that means i can actually predict some of its states for example you know where i can find coffee with accuracy better than chance if i would be just randomly go somewhere um well i would most likely not end up in a coffee place but if i have information that means i can actually say hey i have to go to this particular place And that's how we understand it normally in our everyday language. And it is also exactly how it is defined mathematically. Mathematically information is that which is used to make predictions. But it has to be better than chance to be information.
Starting point is 00:07:51 Otherwise, it's just a guess, right? And so because making predictions with accuracy better than chance is really powerful, that means it's also powerful in biology. But of course, it's also powerful, let's say, in the stock market. If I have inside information, I can get rich using information because I can make predictions about, for example, the stock price was accuracy better than chance. Right. And Shannon, when he defined this concept in 1948, he defined it precisely that way, even though he didn't use exactly those words. And as a consequence, there's a lot of misunderstanding about what information really is.
Starting point is 00:08:28 It's often confounded with entropy, which is in a sense the absolute opposite of information, namely it is. about what we don't know. Right. And so if you then confound information and entropy, then you're going to get a lot of misunderstandings. But didn't, I completely agree with you, by the way, but didn't Shannon himself do that? I mean, didn't he, you know, use the word information, like maximal information for him in a code or an alphabet is maximum entropy, right, when every symbol is equally likely? You are right?
Starting point is 00:09:01 He has done that. I should say not so much in his original article, but you know, there was a book written or co-written, but I forget it's Shannon and McMillan or something, Shannon and Weaver. In particular, in the book, Shannon and Weaver, the words entropy and information get mixed up. And I have a suspicion that this was mostly Weaver doing that. I would have to actually look at the original article if Shannon himself does that, because in a sense, you know, he should know better. You know, just because I saw so much of the confounding between entropy information, I felt compelled to just simply write an article called What is Information?
Starting point is 00:09:40 Which this main purpose was to say, hey, guys, you know, you have to be careful. One is not the other. No, that's great. But I do sometimes wonder, you know, I agree with everything you just said. And I wonder if it's just my physics training. Because to me, thinking like Boltzman would have thought, a low entropy. state, if you tell me that you're in a low-entry macro state, all the molecules are in one corner of the room, you've given me a lot of information, right? Low entropy is a lot of information, and that makes perfect sense to me. That's right. Low entropy is a lot of information. You can think of information as simply a difference between two entropies, namely the maximum entropy and the actual entropy. Okay, good. So that's why when your actual entropy is low, then your information is high,
Starting point is 00:10:29 because it's this difference. And the fact that the fact is, you know, the fact that you're not, you're not. So, that's why, the fact that it's a difference, by the way, is a fundamentally important thing, namely entropy itself, in a sense, has no real meaning or existence, just like energy doesn't have in physics. Only differences of entropy have a meaning, just like in physics, only differences between energies, you know, have any real life, you know, effect. And so the fact that it's a difference is important, but it's important that the first term is a maximum measurement. Like, how much there is to know about the system. And if you forget about that, then your information is just minus entropy or what a bunch of people have called negentropy, right? Which is nonsense. It's forgetting the first term, which is the maximum entropy.
Starting point is 00:11:10 So that this thing, which is information, is always positive. So basically, in this way of talking, you would say, when I have some configuration and I have some statistical knowledge about it, I can divide the state into sort of the entropy part, which I don't understand very well. and the information part which specifies something. Right, or very simply speaking, the, you can, for every system, write down or understand what is the maximum uncertainty that you have. It's just really counting the number of degrees of freedom that there are and then taking the log of that, right?
Starting point is 00:11:48 And then you ask yourself, well, how much entropy do I actually have? In other words, do I have a probability distribution that is different than the maximum? entropy distribution. And if the answer is yes, then that means already you know something. That means that the actual distribution has a lower entropy and then you have some information. However, you may not know at that point exactly what this information is about. In a gas, of course, very often you do, just like you said, when you have a bunch of molecules in a corner, well, then you have information about where these molecules are.
Starting point is 00:12:21 Of course, if you allow the molecules to disappear or to sort of like do their thing, what happens is that your knowledge about the positions of the molecules starts decreasing. And at the same time, of course, the actual entropy starts increasing, right? That is what we call the second law of thermodynamics. It's literally just the loss of information that you have. And of course, during that equilibration phase, the system is returning to its, equilibrium, but it's actually at this equilibrium. So in other words, whenever you have information, it gives you a hint that the system is not at equilibrium. It is away from equilibrium, and the amount of information is in fact characterizing how far away from equilibrium you are.
Starting point is 00:13:08 So when you, you know, when I'm speaking to you right now, you're actually an ordered state, a very ordered state, as far as the molecules are concerned. And I can tell you in a sense how much ordered state there is. It's about half a million bits. Why? Because in fact, that is the amount of information that's actually stored in your genome. Ah, okay.
Starting point is 00:13:29 This is probably going to be leaping ahead, but what about the particular configuration of my body? Certainly that is more than half a million bits. That's a good question. But since, in fact, that organization is done by the half a million bits,
Starting point is 00:13:46 my suspicion is that in fact that must be the same because how else would you achieve this order at state? So this is actually an interesting question. I would have to sit down and think a little bit. My intuition is that even though it seems that, you know, that would be more than half a million bits, that in fact it is not. However, it's such a good question that I cannot actually tell you right now why. Good final exam question for the next time you teach a course based on your book. Yeah, that's actually a good point. I mean, after this interview, I'll try to sit down and figure this up.
Starting point is 00:14:19 Okay, that's a good one. And it's simply because the information cannot be anywhere else. The fact that you ordered, the only thing that makes that possible is the information in your genome. And as a consequence, in a sense, mathematically it has to be like that, even though just like you said, that seems low. It does seem low. I like to think I convey more information. I mean, certainly in my brain, there are neurons with certain weights and things like that, and there's probably more than a half a million bits there.
Starting point is 00:14:53 Now, you're making another point. Of course, there's information that you have acquired during your lifetime that is not being counted. That is definitely there. Good. However, if you take, let's say, a brain, before it learned something, let's say a baby brain, physically speaking, of course there are differences. Yeah, but I mean, like the information is stored in the weights of the neurons in the brain in some way. So definitely that is in addition, right?
Starting point is 00:15:26 However, that, of course, does not give you your physical appearance. It doesn't change the organization of the rest of your body. It's an interesting train of thought, but not doubt. Either a final exam question or a paper to be written here, I think, about this. How many bits am I? Most concerts, you're in a seat. You're watching. Downtown Rocks at Fremont Street Experience is something else entirely.
Starting point is 00:15:49 Three stages. Live music spilling into the street. Into the crowd. Under the world's largest overhead screen. The neon's on, the night's wide open, and you're right in the middle of it. Downtown Rocks at Fremont Street Experience. All summer. All welcome.
Starting point is 00:16:05 All free. Search Fremont Street Experience for the full. lineup and dates. When people turn to telehealth for weight loss, they're looking for real support. That's why more people are choosing orderly meds.com. Orderly meds connects you with real doctors and access to proven GLP1 medications like semaglutide and terseptatide. No guessing, just a more supportive experience, and all ship directly to your door in
Starting point is 00:16:27 discrete packaging. Do your research. Ask questions. Then visit orderlymeds.com slash podcast for an exclusive offer. That's orderlymeds.com slash podcast. Individual results may vary now. Medical advice, eligibility, required C-Sight for details. But, okay, something very interesting and important happened just now, you know, in your discussion of information, because I think that if you asked people on the street, you know, do you have information about where your car keys are?
Starting point is 00:16:55 They would think about that as a yes, no question. I know where they are or I don't. And you've sneaked in the idea of a probability distribution, which is kind of central here. Right. So how does that come in? How central is probability to information theory? Well, actually, it is the central aspect of it. So before you can really understand information theory,
Starting point is 00:17:19 you should have a very basic cause in probability in statistics. Now, to tell you the truth, not everybody who's working in the field did have that, and that shows. But simply because the concept of probability, actually, to fully understand it, is it not as simple as just writing down, you know, P sub I or some distribution, you know, what a probability is, and I know you're
Starting point is 00:17:43 familiar with this because you've thought about this in the context of quantum mechanics for a long time, is, you know, not immediately trivial. Some people think, oh, a probability exists by itself. Or other people say, no, a probability is just simply something that you can estimate, but that itself in the sense really doesn't exist. But the idea of prediction, of course, is central to that. So when you are asking yourself, can I predict with accuracy better than chance, then you have to, in a sense, have an idea of, well, what is the chance prediction here? And for that, in a sense, you need a prior.
Starting point is 00:18:28 You need an assumption about what is expected, right? And so, if you're going to talk about a probability distribution, then you're really talking about what am I expecting. So let's go to the example of the car keys, right? So generally speaking, we could say, okay, here's a person who's looking for their car keys. They know, with 90% of the chance from past experience, right, we're setting up the priors now. From past experience, 90% of the time it's in my pocket. But if it's not in my pocket, it could be in 10 different places. Yeah.
Starting point is 00:19:09 Right. If that is the case, you can set up, I just set up the probability distribution for you. Because that means that there are 11 possible places that the keys could be in, right? And the chance to find it in your pocket is 10%, sorry, 90%. And then in each of the 10 other places, the chance to find it is one out of a hundred. right so now I've set the expectation before I've done any experiment
Starting point is 00:19:36 right now I'm doing an experiment and of course what experiment are you going to be doing you're going to check your pocket first yes right and now what happens is that
Starting point is 00:19:47 this measurement is going to give you information right and in fact we call this you know this outcome of this measurement it's called a specific information Okay.
Starting point is 00:20:01 It's a specific information because I can do 11 measurements, and the average of the outcomes of that, that's actually the information gained from the measurement of the locations. The interesting thing here is that if I don't find the key in my pocket, then I have a problem. Now it could be in 10 different places. So it turns out that the specific information of this measurement, namely that the key is not in your pocket, it's negative. Yeah. It's a negative specific information,
Starting point is 00:20:33 even though the average information gained from the car key searching measurement is still positive. Because you're now less sure of where the keys are than you were before. Exactly, exactly. So it's intuitive to clear. It's like, oh my God. It's like, I thought I knew. See, that's the thing.
Starting point is 00:20:47 I thought I knew means that's the priors. From the priors, you had a pretty good idea. Then you make your measurement and you go like, oh, my God, now I don't know anything. And this is mathematically. In fact, it's an exercise in my book in chapter three. Very good. And let me just raise, again, I'm super on your side, but it's my job to, you know, play the devil's advocate sometimes. People worry about this way of thinking about information because it sounds a little subjective.
Starting point is 00:21:17 It's not out there in the world. You're talking about information as an ability for some agent to make predictions, right? Is it truly objective or is it only relative to the capacity? of some thinking being? So you're asking, of course, the most important question about information. And the answer is information is contextual, always. It is never absolute.
Starting point is 00:21:40 And now I'm going to give you a great example that will get you to think about this. Suppose you are a virus, and you are, in fact, wreaking havoc inside of a patient. You are replicating fast, everything is going good for you, right? Let's imagine your mutation rate
Starting point is 00:21:58 is actually kind of low, so you don't change much while, so the information that you have about doing your job is doing its job, right? Now the patient takes an antiviral, right? The patient takes an antiviral and suddenly this virus goes like, man, you know, this replicating thing is now really, really, really hard, right? And what you can now do is you can calculate the information about how to do this job, and it has changed dramatically. In fact, now the virus, is does not have this amount of information about doing his job anymore. And you go like, but the sequence is the same. How can a sequence is it's the same?
Starting point is 00:22:39 Earlier, I have lots of information and then have very little information. The answer is because the context changed. Because that, what you have information about has changed. So what was the environment yesterday? Isn't the environment today? You have information about yesterday's world, not about, today's world. And that is the important thing. Information is completely contextual. The same sequence evaluated in different environments will have different information and therefore different meaning.
Starting point is 00:23:11 A lot of time people are saying like, oh, information theory can't deal with meaning. It's absolutely untrue. My molecular sequence means something here on earth actually, you know, above the surface of the water, for example. You put me underwater and my sequence is making wrong predictions, right? Namely about how to breathe. No, underwater, that thing doesn't work. Your fitness is zero. I will have zero offspring underwater
Starting point is 00:23:38 unless I actually am in a submarine or something like that, where I'm taking the environment with me, right? Then I could still have information. So very, you know, very nearly so. In biology,
Starting point is 00:23:54 information and fitness are exchangeable. You have high information about how to do something, how to survive in the environment, means having high fitness. They are related. In fact, they are mathematically related, and we don't have to talk about what that relationship is, but you can read about it in my book. So, in other words, what happened in evolution is that fitness and information are
Starting point is 00:24:15 essentially intertwined. They are almost the same thing. And so the way you are increasing your fitness is you're increasing the information about how to live in that environment. I guess I was trying to make a similar point to what you just said in my book, the big picture. I mentioned the Voynich manuscript, right? This wonderful old book full of symbols.
Starting point is 00:24:38 And nobody knows whether it's nonsense or whether it refers to something. And so I asked the question, how much information is there in the Voynich manuscript? How would you answer that question? The answer is unless we know what the information is about, this is entropy for us. Right. So if you have text or any data, that is not information. Okay. In fact, it is literally just entropy in the sense that you are talking about,
Starting point is 00:25:07 oh, it represents an example out of another similar set of things, right? And so you're setting up your hypothetical ensemble so that you can talk about an entropy. The moment that you know what things you can predict with it, then it becomes information. If you do not know this, the same piece of information, or the same text, should I say, the same symbolic sequence would just be entropy to you. In other words, just to repeat it, because information allows, is that which allows you to make predictions about a particular system with accuracy better than chance. If you don't know what this is, it's not information. Another example is somebody, you know, hands you a subway chart, right? And he goes like, well, that's fantastic.
Starting point is 00:25:51 except I don't know which city this is. Well, then it's useless to you. And if it's useless, so in other words, useless information is entropy. Got it. The other idea at the technical level that I have found super important in my tiny little forays and information theory
Starting point is 00:26:07 is mutual information. I love this concept and it never gets explained in a popular level, so maybe you can give us a shot. Right. So first of all, information is really a mutual entropy.
Starting point is 00:26:21 It's a shared entropy. So in other words, the reason why me, if I have information, can make predictions about another system, is because, in a sense, we have certain correlations. We know something about each other, right? And so information, even though I showed you earlier that it's really a difference of entropy, you can also think of it as a shared entropy. The mathematics to show that this is one and the same thing, we're not going to go into it. But just trust me here that, you know, if you think about a Venn diagram about a system and the one who's trying to measure the system, the intersection of that Venn diagram, that's the information that the measurement device has about the system.
Starting point is 00:27:06 Right. That's a shared entropy. That shared entropy is, of course, the information. So when people say mutual information, they meaning mutual entropy. So, in fact, nobody should be using that word. Okay. Like in my book, for example, when, you know, I have like an index where it says mutual information, I say see information. Because, you know, they are, you know.
Starting point is 00:27:32 But it's just a big of, for the reason that we talked about earlier that people are using information and entropy synonymously, they have taken mutual entropy, which is an information and called that mutual information. right. But in principle, there are one in the same thing. We just should call it information. It is that, you know, that correlates the two systems. Because remember, we always have to talk about two systems when we are talking about information, one that makes the prediction and one that is being predicted. Right. Right. And what is shared, the shared correlation between them, you know, in other words, what's not random between them, that is information. Good. So now we have a, pretty firm grounding on information theory. We can move on to applying it to biology, which is what your book is all about. But let me just first ask the background question. How popular in biology is this task of thinking about things in terms of information theory? Is it everyone does it or no one does it or it is the new hotness and it's sweeping the field? Let me put it this way. If this was something that everybody does, have had to write this book.
Starting point is 00:28:46 Okay. This book really came out of a frustration that I have this tool, I see how valuable it is to understand essentially anything in biology and nobody is using it. When I say nobody, that's not quite true. There are a few people who do. Bill Bealeck and Princeton, for example, has certainly done this and he does know how to do this. There's a few people, often in fact in Bealeck's orbit who have looked at information transmission in cells, for example, in gene regulation,
Starting point is 00:29:21 and calculated the channel capacity of a cellular communication channel and did this very well. And I, in fact, have these examples in my book. But it is woefully underrepresented. There is something about information theory that presents a barrier, a hurdle to, in a sense, to acceptance. And, you know, I only have a vague memory of when I first learned about information theory. It was, in fact, at Caltech, when a postdoc that was placed into my office, I actually first started explaining classical information theory to me because we were actually working in quantum information. And it was like, well, let's do the classical theory first.
Starting point is 00:30:05 And I remember when he was first writing these formulas on the board, which has these vertical bars and these columns and, semicolons, is thinking, like, what is this? This is not my mathematics. My mathematics, there's integrals, there's differentials, you know, there are matrices, and now I'm seeing these weird symbols and my brain doesn't really understand them. In a sense, from the cognitive science perspective, is you don't have these representations that allow you to recognize these things and they have to build over time. And I think this barrier that I felt at the time, And it was like, oh, there's all these different entropies. There's a mutual, there's a shared, the conditional.
Starting point is 00:30:49 And you're like, how am I going to keep them apart? And you've got to get used to them just like you got used to integrals and differentials and things like that, right? Like, if I ask anyone who had algebra, I think calculus, it's like, would you ever confound differentials with integrals? And it goes like, of course not, right? This is yes, because you got used to this stuff, right? So there seems to be this barrier.
Starting point is 00:31:12 And in particular, very often information theory is perceived as being an engineering discipline. Right. And they go like, well, the engineers, you know, like, well, they want to do error correcting codes. Like, what does this have to do with biology? In fact, I've seen this in print saying, like, you know, the engineering discipline of information theory has nothing to say about biology because these are very different things. It's like saying, well, information theory can't apply to physics when, in fact, we do know very well that it does. In fact, I just explained the second law of thermodynamics in terms of
Starting point is 00:31:42 information theory, right? And so this barrier is very palpable. So if somebody is an established scientist and you tell them, well, you know what, you should really be using information theory, they're not going to do it. They're entrenched in their ways and they usually are not receptive to learning in a sense a whole new bag of tricks. Right. It is not that different from what happened in black hole physics where they constantly talk about information loss in black holes and not a single paper is trying to calculate the capacity of the black hole channel except of course me that's because you know I you know because I know these are these people have been working in this area for 30 years
Starting point is 00:32:29 and not picked up a single book or article on information theory again there's this barrier there's something like this concept is different from the concept I need. And the answer is no, it is the same concept. You know, I'm a little bit sympathetic because I know when I read papers in economics, they have this habit of denoting variables in equations by words rather than by single letters. And it's so trivial and silly, but it truly rubs me the wrong way and I have trouble wrapping my brain around it. No, I fully agree with you because that is not how we write equations. Yeah. Right. And And I still almost vividly remember this aversion where I'm like, this is not my thing, right?
Starting point is 00:33:18 But you have to power through this, right? And in my lab, everybody gets, you know, the basic introduction to information theory. And then once you've sort of internalized it, you cannot see the world except through that lens of information theory. It's like the hammer that makes everything look like a nail. Right. And in my book, in fact, I sort of co-op the famous saying, which we all know in biology, which says nothing in biology makes sense except in the light of evolution, which certainly is true. But I basically have changed it around to say nothing in biology makes sense except in the light of information. Then, of course, information is that which evolves. Really, I mean, people would say, our organisms involve. No, what evolves actually is information. The information is what is essential in an organism. The organism itself isn't really that essential.
Starting point is 00:34:09 And it says it's replaceable, right? Like if we have offspring, they carry most of the information with them. But only the encoded information. Of course, what we think as specific to ourselves is, of course, the stuff in our brain that we have talked about 25 minutes earlier. The information that we acquired over a lifetime is, of course, making us special. right? The information in your genome does not make you special, but it is what makes you alive. So is it fair to think of Charles Darwin as an early information theorist? Well, I would say no, simply because he didn't really, and he had no way of understanding
Starting point is 00:34:54 that the basis of inheritance was really the replication of inheritance. information. He didn't even know how any of the stuff was encoded because after all that was discovered in 1958. Yeah. Or we should say that John von Neumann kind of figured it out a little bit earlier when he came up with his theory of self-propocating machines, which essentially could have told, you know, Watson and Crick where to look for, you know, stored information. But so Darwin did not. know that, but he had, via his flushing, in a sense, going on boats and looking at stuff, figured out, hey, there's variation going on, there's selection going on, and there's inheritance going on. The fact that these three things are properties of information. So, inheritance being the replication of information, variance being the mutation of information,
Starting point is 00:35:54 the changing of information, and selection is the meaning of information. In other words, those, you know, pieces of information that have a lot of information are fitter and therefore I will have more offspring and therefore, in fact, because I told you about the relationship between information and fitness, then the meaning of information is selection, right? So we can think of the entire evolutionary process in terms of what happens to information, and we should, in fact. But Darwin did not think that way, even though, of course, I haven't read all of his works, even though many of them are, you know, in my little natural history collection in my bookshelf,
Starting point is 00:36:32 but they are big, big tomes. I mean, one of the things that is so astonishing to me is that Darwin had in his head essentially, you know, a 12 book treatise and then was forced to publish the abstract of it as the first book, which is now, you know, the origin of species. But he had so much more to say. And then over the rest of his life, in fact, said many of these things. and of course most of those things you know like about worms
Starting point is 00:37:01 and about plants and things like that barnacles we don't usually read but in there if you study those volumes maybe with more attention to the concept of information in the idea that what makes these plants how they are in fact
Starting point is 00:37:22 is doing making predictions but he's in fact a good example. Darwin himself at some point noticed a particular orchid which had a very, very long neck, which he knew had to be pollinated.
Starting point is 00:37:38 And it was about 30 centimeters. What is that? It's about a foot. Right? Yeah. Or even longer, in fact. But basically he said, now I will make a prediction.
Starting point is 00:37:53 I will predict that there's a pollinator that exists with a proboscis or a nose, you know, what they're using for, which is about that length, right? And even though he didn't live to see that prediction come true, in fact, his competitor in the evolutionary field that almost scooped him, he in fact basically said, oh, and I'm going to tell you that this must be a sphinx moth. And in fact, they later found it after Darwin died, and that was exactly, you know, as you had predicted it.
Starting point is 00:38:24 So to some extent, you know, he knew that evolution, this idea, the theory of evolution, is a predictive theory. Right. Even though he didn't think of it in terms of information theoretical terms. When people turn to telehealth or weight loss, they're looking for real support. That's why more people are choosing orderly meds.com. Orderly meds connects you with real doctors and access to proven GLP1 medications like semaglutide and terseptatide. No guessing, just a more supportive experience. and all ship directly to your door in discrete packaging.
Starting point is 00:38:55 Do your research, ask questions, then visit orderlymeds.com slash podcast for an exclusive offer. That's orderlymeds.com slash podcast. Individual results may vary now. Medical advice, eligibility required seaside for details. So, yeah, a typical evolutionary biologist would try to explain things in terms of there's a population with certain traits and there's a fitness landscape and they, you know, move towards peaks of the fitness landscape. And so you're not undermining that. You're just saying that a useful way of thinking about fitness is having the information to successfully predict what's going to happen in your environment and therefore survive.
Starting point is 00:39:34 That's right. I don't undermine any of the standard population genetics. It is really just a very different way of understanding what fitness is. After all, fitness, the way, you know, it is not the same word as we talk about physical fitness, of course, right? in biology means fitting your environment well, right? It means being adapted to your environment, which means corresponding to, right? And see, we're saying, aha, that means in a sense that your body structure, you know, is predictive about what world you live in, right?
Starting point is 00:40:10 So, for example, E. coli bacteria, they grow best at 37 degrees Celsius. Well, that's weird, isn't it? Well, no, it's not weird. That's the temperature of our stomach. So basically, their molecular biology makes a prediction about what environment they're in, namely 1 at 37 degrees Celsius. If their prediction is off, well, that means, in fact, they don't have as much information as they think, and they're actually not going to grow as fast.
Starting point is 00:40:39 So this correlation between the genome and the environment gives you the fitness, because it tells you you're fitting this environment well, you're well adapted to it. And the concept information is precisely that, this correlation. And this is a very important point because it reminds us that just so people don't get the wrong idea, information isn't necessarily conscious, right? It's not something that you might say you know. The information in your genome is very well adapted to your environment, even though you personally might have no idea with the arrangement of nucleotides in your, in your, DNA is. That's right.
Starting point is 00:41:19 So obviously, you know, when we're talking about the fact that cells make predictions about their environment, which they do all the time because cells have to make decisions, it's not like they have a brain. Yeah. But to some extent, we also know that because once you understand that information is just a correlation, which is a non-random correlation, because sometimes you get correlation by chance. But no, we're looking for correlations that are not by chance. And that are on top of that, being maintained, right?
Starting point is 00:41:47 Because in thermodynamics, you might have correlations by chance, but they are going to disappear very soon, right? The genome is making sure that these correlations are being maintained so that we can continue using, you know, what we have to make predictions about the environment. If you would be loosening this continuous maintenance, then information would go away, and we call that death. Is that what we call death? Good, now that we know. But this does bring us to a very fun point you make in the book is that we can think of evolution as a kind of Maxwell's demon and maybe be fun to explain what Maxwell's demon is,
Starting point is 00:42:29 not everyone knows, and what it has to do with evolution? Right, so that's a good point, and thank you for leading me into that, because it's one of my favorite parts of the book, actually. So Maxwell's a demon, let's not talk about why this sort of devil was invented. But let's focus on what he does. So Maxwell's demon basically is sitting at the intersection of two, let's say, boxes.
Starting point is 00:42:57 And there's a little window inside of the box. And now both boxes have gas in them, gas molecules who have different speeds, you know, by, in fact, described by the Maxwell distribution, right? Turns out. And now imagine that this demon who sits there, he, operates sort of the door between the two boxes. And he also has a measurement device that he uses in order to measure the speed of a molecule, for example, one that is about to go through the door.
Starting point is 00:43:30 And then he goes like, okay, if this is a fast molecule, I'm going to let it go through, but if it's a slow molecule, I'm going to shut the door and so that it is in fact going to be reflected. If he does that a lot, he's going to have one half of the, the two boxes, I mean, or should say the left box as opposed to the right box, with lots of fast molecules, and the other one will be, you know, stuck with all the slow molecules. That is in violation, in apparent violation, I should say, of the second law of thermodynamics, which basically says, no, that can't happen, namely the formation of a non-ecolybrium
Starting point is 00:44:06 situation from an equilibrium. So mathematically, it looks like what the demon has achieved is, in fact, creating order or violating the second law of thermodynamics. The fact that he didn't do such a thing was in fact proven mathematically and fully correctly later by Rolf, what's his name? Lendauer. Lundauer, Rudolf Landerer, a German-American physicist. And we're not going to go into how that proof goes, but it is not a violation of the second law of thermodynamics. But this idea that via measurement, you can actually reduce entropy is, of course, a very common one, because measurements give you information, and information is sort of the opposite of entropy.
Starting point is 00:44:56 So, yeah, if you, for example, look into a room of molecules and then essentially measure the speed and position of all the molecules you're looking at, you could, in principle, achieve a lot of order because you could punch them, so to speak, with a laser, and then all go them into one corner. So in other words, measurements do allow you to decrease entropy. Now that we have described the Maxwell demon. Now let's think about the Darwin demon. So in order to understand this, let's imagine that instead of these measurements that the demon does, we're basically doing mutate.
Starting point is 00:45:32 So the molecules are now mutations. Okay. So you have a genome and a mutation happens. The mutation happens. It's completely random. It could actually increase your feet. or it could decrease your fitness. And now the demon basically says,
Starting point is 00:45:47 look, I'm actually closing the door and not allowing the decreasing fitness mutations to persist, but I'm going to keep those that actually are increasing my fitness. Right. So in a sense, the organism now performs a measurement, but it turns out, of course, that the mechanism of evolution
Starting point is 00:46:10 is precisely that. way, namely the deleterious mutation will make it such that the organism carrying it doesn't have as many offspring, right? It's less fit. And as a consequence, its frequency and the population will go down or else even if it's a lethal mutation will disappear. Whereas the beneficial mutations will in fact be enhanced, right? So because the beneficial mutations create order, because there's now a type that increases
Starting point is 00:46:40 in frequency. maybe even very fast at the detriment of all of the other sequences, well, that creates order in the population. And it's essentially because you learned something about the environment. You've extracted information about the environment by having this measurement of the environment. So in other words, these mutations that are being kept, they are measurements of the environment. So it's like, ah, 37 degrees Celsius.
Starting point is 00:47:06 Okay, so let's adjust our genome in such a way that we grow well, you know, at 37. degrees Celsius. Whereas the deleterious mutations, of course, are misinformation, right? And they reduce your fitness and as a consequence are thrown out of the window. That's the Maxwell demon closing the door on them. And what that means is that, well, if this is a continuous process, then it should lead to a constant increase of information inside of an organism over time as evolution proceeds. And therefore, we can actually formulate this as a law, namely the law of increasing information in evolution that actually predicts that if you start out with a low information beginning sequence,
Starting point is 00:47:56 that over time the information must increase, but not all the time. For example, if the environment changes, well, we already discussed that. Then your information drops, right? And you have to sort of relearn things, right? There are other ways in which information can get lost. For example, recombination can actually destroy information, right? It can also lead to a good thing. So this natural demon, this Darwin demon, is not perfect.
Starting point is 00:48:28 In a sense, it's leaky. It sometimes makes wrong decisions where information can actually go down instead of up. But overall, on average, this theory of evolution predicts that amount of information that is stored in a population of genomes has to be going up. And that is very interesting because we've been searching for a way to understand the bio-complexity that we are seeing around us and asking, how on earth is all of this possible? And can I understand if one organism is more complex than another? And this idea of complexity, of course, is difficult to put in mathematics.
Starting point is 00:49:06 But in fact, it turns out that complexity is literally just information. In other words, information or complexity is just a proxy for information. Why? Complexity essentially is stuff that is complicated but helps you do something interesting, right? We don't associate complexity with something that is just structurally interesting, but actually doesn't do anything, right? What was called a spandrol by Gould and Leuinton in biology, right? Something that looks very complicated.
Starting point is 00:49:37 Once we understand, it's actually just, let's say, an icicle. like, oh, it's not really that complex. But there are really complex things, for example, our brains. But that is in fact reflected by the information necessary to make it. So it turns out, therefore, that information is really the correct way of measuring complexity. And therefore, the question of, is complexity increasing in evolution is simply answered as a yes, as long as you understand that complexity is really information, and the natural demon is responsible for that law of increasing information.
Starting point is 00:50:16 Good, because naively we might look at an organism and try to figure out, oh, that looks complicated, that looks pretty simple, and we wonder why all this complexity has grown, but you're saying that that's just a pale reflection of what's really going on underneath the hood. That's right. But it's almost like saying, like if I look at a sequence, right, somebody's genome or some animal's genome or a bacterial genome, I can't immediately see, well, this is information, this is not information,
Starting point is 00:50:44 here's some information, they look the same, right? So we need to figure out how they're making predictions, or in other words, how are they functioning in the world? So if you have something that looks complex, then you have to ask yourself, let's observe this thing and see whether what looks such as an intricate mechanism is in fact necessary for survival. or whether it is just something that is a consequence of something else.
Starting point is 00:51:13 We see sometimes like these complex display behaviors in animals, and they are obviously important in mating, and if you would remove them, in fact, your fitness drops to zero, because you don't get to mate, zero offspring, zero fitness. But then there are other things that are in a sense just, they're not important for the actual survival. You would take them away, and it doesn't change the fitness.
Starting point is 00:51:37 It's like saying you're removing a section of the genomic code, but if that is not predictive of evidence, if it does not carry any information, it is not important for fitness. You can take it away. And we don't, so to get, if I understood what you said correctly, we don't right now have a clear-cut way
Starting point is 00:51:53 of looking at a genome and saying this part contains information, this part doesn't. We actually do, but only if we're looking at many of them, not a single one, right? If I have a hundred versions of a gene, there are regions in it that are unimportant for survival. If I then make an alignment of them, then I can recognize, oh, look, this changes all the time at this position. Clearly, that doesn't mean anything because you can just willy-nilly change it.
Starting point is 00:52:24 But then you go like, oh, but look at this section. It's the same. Everywhere the same for every organism. I bet you it is because when you change it, you die. And that's why I haven't seen it. If you would make it, it would simply die, and therefore it doesn't enter your database. So you have to do these multiple sequence alignment, as they're called in bioinformatics. And when you're looking across, you know, column by column, and you see some columns being completely conserved by evolution, even though evolution constantly tries to change them, right, through the meditational process.
Starting point is 00:52:58 And then you see columns which are willy-nilly. Like on the nuclear tight level, you see A, Cs, Gs, and T's with 25% frequency. And you go to like, clearly not information, right? And so that is, in fact, the basis of the algorithm that I describe in the book that allows you to measure the amount of information in a sequence. But if you do not have a multiple sequence alignment, in other words, a bunch of examples, then you cannot understand what is information because everything looks to say. Most concerts, you're in a seat. You're watching. Downtown Rocks at Fremont Street Experience is something else entirely.
Starting point is 00:53:35 Three stages. Live music spilling into the street. Into the crowd. Under the world's largest overhead screen. The neon's on, the night's wide open, and you're right in the middle of it. Downtown Rocks at Fremont Street Experience. All summer. All welcome.
Starting point is 00:53:51 All free. Search Fremont Street Experience for the full lineup and dates. When people turn to telehealth or weight loss, they're looking for real support. That's why more people are choosing orderly meds.com. Orderly meds connects you with real doctors and access to proven GLP1 medications like semaglutide and terseptatide. No guessing, just a more supportive experience. And all shift directly to your door in discrete packaging.
Starting point is 00:54:14 Do your research. Ask questions. Then visit orderlymeds.com slash podcast for an exclusive offer. That's orderlymeds.com slash podcast. Individual results may vary. Not medical advice. Eligibility required C-Sight for details. And this reminds me that you, of course, have collaborated with your colleague,
Starting point is 00:54:32 Richard Lensky on his long-term, who is the boss of the, was, of the long-term evolution experiments. You've actually seen data like this. That's right. In fact, we constantly see this. We see this more easily in digital life experiments where we have self-replicating computer programs that live inside of a computer just like the one I'm staring at right now and that you're staring at right now. And there we can extract these sequences and we can look precisely where the information is. And in fact, we can measure the information and see as the fitness increases, the information increases.
Starting point is 00:55:05 It's literally, if you would be taking the log of the fitness and superimpose it with the calculated information, they're almost on top of each other. Right? Because it's the log fitness, which is basically the growth rate, which is in a sense mathematically equivalent to information. So if information is zero, your growth rate is zero, right? So my own biological knowledge is not very big, but I do recall understanding that the human genome is not the longest out there in the ecosystem, right? Is that something we understand? So you're addressing an important question. It's in the biological literature is called the C-value paradox.
Starting point is 00:55:47 If you just look at the length of genomes, there is an amoeba out there which has 200 times more. DNA than we have. It's called amoeba W actually. So it's almost like the W in a former president. Yeah. So you know, you can think of that as you may. So clearly genomic content isn't information, right? But if you would take the genome of, let's say, a thousand of these amoeba and then align it, then you would see that most of the stuff is completely meaningless or else, in fact, it might not be haploid. In other words, you have 100 copies of the same small piece of information. And here's another thing that you need to understand about information. If I have one book that, let's say, is predictive about certain things, and then you have a
Starting point is 00:56:39 copy of the book, that's not twice as much information. It is exactly one time as much information. So if you have 100 copies because you have 100 copies of your genome, like we, for example, have two copies. We are deployed. There's many other organisms, like in particular in plants, which are terraploid, you know, they have 16 copies and so on. So this amoeba probably has over 100 copies of its own genome.
Starting point is 00:57:03 So in other words, clearly this amount of DNA you have to divide it by 100. That still wouldn't get us to the $3 billion that we have, $3 billion per chromosome, right? And it's not $6 billion in a 6 billion base pairs. But in fact, out of the $3 billion nucleotides, which in a sense give you, a entropy or potential information of 6 billion bits, but it's potential information, which is the same thing as entropy. Only a small fraction of this actually encodes information.
Starting point is 00:57:41 And from the encode project, it is probably about 8%. And that's how you're getting to that number of 500 million bits, which is about 8%, I think, of the 3.1 billion or so. Right? And so there are some plant genomes that are enormous because of this, you know, explosion. But, you know, if you have several copies, and if you have in plants, for example, there's lots and lots of intergenic regions which are meaningless. In other words, there's no information in there. Plants are sort of spectacular for that. They have far more intergenic stuff than humans, for example. Some of them have up to 70% intergenic stuff. And this intergenic stuff is probably also really, with transposal.
Starting point is 00:58:25 So literally, if you would be taking it out or mutating it, it would not change what the plant does at all. So there is no information encoded there. So if you really want to plot just information content, in fact, Eugene Coonan, who works at the NIH, he has done that. In fact, his plot is in my book. And you find out that, yeah, you know, bacteria are sort of very low there. And humans, as far as we can measure, in fact, have the most information.
Starting point is 00:58:54 Mammals generally, but humans the most. So this idea that humans are in a sense the apex of complexity that some people think is sort of very self-serving, that might not be wrong. Certainly if you take a look, we look at it through the lens of information theory. Yeah, I believe that. That number went by pretty quickly, so I want to get it right. 8% of the human genome is information-Berry? Yeah. Okay.
Starting point is 00:59:22 And the rest is intergenic stuff or repetitive stuff, in other words, is uninformative. It seems a little inefficient, I don't know. Well, I mean, it's because there is so much, for example, you know, if you have a gene, there are regions, you know, that are called introns, which just have junk in it. And they're being excised before, you know, this stuff is being translated into proteins. Right. And so there's a vast amount of that. And then inside of chromosomes, there's vast regions. of repetitive stuff.
Starting point is 00:59:53 Some of it might even serve structural features so that molecules can attach to it, but it does not actually provide information about surviving. So obviously it is difficult to measure the information content because we would have to have thousands of humans of human genomes
Starting point is 01:00:15 that you can align. Now we have, in fact, now thousands of human genomes, but nobody has actually tried to do that. And I should say that in order for a good ensemble to really be reflective of the information, we needed to have a good amount of variation. In other words, those nucleotides that are mutatable, they should have had a chance to mutate. But because we humans have a relatively common recent ancestor,
Starting point is 01:00:47 there's a lot of conservation in the genome that is through common descent. And that looks like it might be information if I'm making such an alignment of a thousand genomes, but in fact might not be. So one of the things you might want to do is you want to take a lot of African genomes for this alignment because there's much more variation in the African genomes because they have in a sense been evolving longer and their common ancestors much, much earlier, which is why there is so much more variation. And that would be a much better estimate is if you take European genomes. and make that alignment because really, you know, we're looking at like 100,000 years of variation, which given the population sizes might not be enough to really reveal the information content. So if you have a population that has a fairly common recent ancestor, then even this alignment is not going to do a very good job of giving you the information.
Starting point is 01:01:45 But very often in proteins, single proteins, for example, or viral proteins, which are evolving very quickly, you can actually measure this information. And I've done this, for example, for the HIV protease, which is a 99 amino acid sequence, and then see, for example, like the example with drugs that I gave you, but when you introduce anti-HIV drugs in the population, you see the information dropping very rapidly. And then because we have data over time,
Starting point is 01:02:13 you can see how as the virus evolves drug resistance, it really learns about this new world that is being cast into. And you see the information over 10 years really approaching the pre-introduction levels. Right. So for certain proteins and molecules, it's much easier to actually do this analysis and measuring information. Then, for example, to do it for a whole genome. In principle, it can be done. But ideally, you would do assays where you're trying to mutate every nucleotide.
Starting point is 01:02:48 and see how it react. And that's, of course, not feasible right now. And it would be unethical, by the way, also. By the way, yeah, that happens in other podcasts I do. There'd be fun to do certain experiments. You just can't really do it. But one way of saying what you just said a little while ago is that the human genome has not had time to equilibrate in some sense.
Starting point is 01:03:09 That is exactly what I say, yeah. Good. So there's plenty of unexplored variation in it. Because I was going to ask, are there sections of the human genome or for that matter, other advanced genomes, where there is some continuity over time. So clearly this section of the DNA is carrying information, but we don't know what it does.
Starting point is 01:03:32 Absolutely. One of my favorite examples is that there are untranslated regions in the genome that seem to be completely conserved. And it's untranslated, which means no protein is being made. Well, an RNA is actually being made, right? So, in fact, these are long stretches of RNA, and what they do is they can form molecules called ribosimes. And it's this fairly recent discovery that these, in fact,
Starting point is 01:04:03 are actually very important in understanding brain function. So there are these long, non-coding RNAs. And remember, DNA has this neutrality in the third position of a codon. So for many of the amino acids, when you mutate the third codon, because every amino acid is encoded by a triplet, you can change it willy-nilly, doesn't change the amino acid. So there should be neutrality in that position. And you all, you see this throughout the genome, right?
Starting point is 01:04:30 But then you have these non-coding regions where there's no neutrality whatsoever. Well, yes, because if you translate it into an RNA, RNA doesn't, the ribosyme, there's no codon translation. It's not made from amino acids. So in fact, every nucleotide is important. That's how it comes that these vast, long, non-coding RNA regions are completely conserved, which means that they carry 100% information or close to that. They're not 100% conserved.
Starting point is 01:04:59 But that's because sometimes in these ribosimes, because they are folding into structures, you can make a mutation on one nucleotide as long as you are making the complementary mutation at the corresponding one. But this is a fundamental discovery because it means that these ribosome, that are super important in the survival of the organism. And in fact, May explain, so they did this study in brains of flies, that they are super important in controlling mating behavior of the fly, which was previously not understood. And you can't find a protein that does it.
Starting point is 01:05:36 Well, it's done by these RNA sequences. That's actually super interesting, and I'm pretty sure I didn't know this. So the central dogma in molecular biology is that the DNA contains information that gets transcribed into RNA that tells the ribosome which proteins to make. And what you're saying is that that's not all the DNA does. There's a separate function where parts of it get transcribed into RNA and then don't make proteins. They make rhizomes instead, and those also play a crucial role. That's absolutely right.
Starting point is 01:06:09 So whenever you hear the word dogma, then you're always going like, well, okay, so I'm sure. It's not really a dogma, so let's actually look for the violations of it, and that's always a fruitful avenue of investigation. So, yeah, in fact, the role that RNAs play in the molecular biology of the cell cannot be underestimated. I mean, these are highly functional molecules. They modify DNA. If you didn't know that, there's a whole bunch of gene editing going on.
Starting point is 01:06:42 certainly in single-celled ucharyotes, and I talk a little bit about that in my book about the tropanosomes, a fantastic, absolutely stunning story about how these RNAs are ensuring, in a sense, the survival of genetic information that should have died 100 million years ago in a sense. And so these molecules, they're also regulators. So the RNA molecules, the ribosymes,
Starting point is 01:07:10 which are untranslated, so functionally important that we cannot understand really the molecular biology of the cell without their action. Interesting. And that's actually a great segue into the sort of the final topic that I wanted to hit here. It would be a shame if we didn't talk about the origin of life. And of course, you know that there's a lot of people who put RNA front and center in that story, but we don't know what the right story is for the origin of life. So how can information theory help us here? Right. So that's a great segue because, in fact, the current, you know, thinking is that prior to DNA, we know that DNA and proteins cannot have been at the very origin of life because you need proteins to, in a sense, understand how DNA is acting. And you cannot understand one without the other. So it cannot be, you cannot have this separation between storage material. which is DNA, and machine, which is the proteins,
Starting point is 01:08:11 as in the origin. Now, it turns out that RNAs, they can store information because really the RNA nucleotides are cousins of DNA. They just basically have an oxygen, you know, compound more. That's why the RNAs are called ribonucleotides and the DNA is called deoxy, namely removed ribonucleotides with an oxygen removed. moved, right? That difference is actually very important because it affects the stability of the
Starting point is 01:08:43 molecule. And of course, if you're going to do information storage, you want to have a stable one, and that's the DNA. However, the RNA molecule, which, like, it's almost like a DNA, but can actually conform in all kinds of different ways, they can play the role of a machine, the ribosimes that we just talked about. So according to the RNA world framework, you're going like, hey, how do you have something where information and machine is one and the same, and that would be the RNA world. So in other words, self-replicating RNA molecules, like where they store information and are folding into the machinery that replicates the information. We, as you correctly pointed out, we don't know if that is the ancestor.
Starting point is 01:09:29 because the biochemists tell me that there are some fundamental problems of, you know, how this whole stuff could have worked out. And that's a problem in biochemistry, which I'd only lightly touch upon in the book, because I'm certainly not an expert on that. But the important thing to understand the origin of life is to first understand what life is. And what life is is basically information, a string of information, or a string that contains information about how to make copies of that information.
Starting point is 01:10:03 Right? You might think, oh, but that's not so difficult. I make copies of information all the time. I have a copy machine. It's like, yes, but who built the copy machine? Right. The copy machine isn't some random thing, right? If you just throw a bunch of piles of metal together,
Starting point is 01:10:22 it's not going to form a copy machine. In fact, it takes an enormous, complex set of instructions to make that copy machine. So, yes, it makes copies and it's great at doing that. And you could, in fact, make copies of the blueprint of a copy machine, right? But that copy machine isn't going to make the copy machine from that blueprint. It's not. It makes copies.
Starting point is 01:10:47 That's it. Yeah. But if you had a machine that is, in fact, built from information, and therefore, then, once it's crumpled up, in a sense, into this ribozyme, then makes copies of the actual information, well, that's life. So life is information that copies itself. But if you just have a little bit of information, it is not going to do that. Because, like I said, to create a machine that can make copies of information, you need a minimum amount, right?
Starting point is 01:11:17 Five bits is not going to do it. Maybe 100 bits is not going to do it. In fact, we don't know what the smallest amount of information is that, that, you know, is that has a capacity to replicate. You know why? Because we also don't know what the environment is within which this replication happened. Because I told you that the information
Starting point is 01:11:38 is really an information about the environment. So a string of 100 bits, that's 100 potential information. But depending on the environment, it could be a lot and it could be a little, right? And so we don't know a whole lot about the very early Earth, but there could be very many different environments, right?
Starting point is 01:11:57 But what we do know, and I can say that unequivocally, is that there is a minimum about information, and that amount of information cannot be obtained by chance, under no circumstance. Okay. Right? Well, I should say, mathematically speaking, if the likelihood of finding any of these molecules would be chance, right?
Starting point is 01:12:24 However, there are in fact mechanisms that could make an avenue where information can actually gradually accumulate without a Darwinian process. This is speculation. But think of it this way. We can imagine a process where a molecule that carries no information but is in fact a sequence in some alphabet. Let's imagine even it to be an RNA. We can imagine that there is a process by which it can be replicated, but in a sense, passively. For example, it could be laying on a clay surface, and then you can basically have a process that will make sure that only complementary bases are being added, right? Then basically it becomes copied, but there's no information in it.
Starting point is 01:13:19 So this copying of information or of a sequence, you're copying entropy really, right? But now, because the copying machinery is actually in the hardware, not in the software, right? It's this clay and it's the process of, you know, of complementation. Now, imagine that if, in fact, the rate of making errors in the complementarity can sort of vary. So it turns out that the speed of this operation of copying very, very, very. depending on how many errors you are making in that complementarity. So if for some reason you have something that makes this faster, well, that's actually a good thing and you're going to make more of these copies, even though we might think there's no information. But now imagine that what makes this faster, which means makes it more accurate, because you have the same thing, is actually something that is part of that information that sort of like helps and binds.
Starting point is 01:14:23 Like something breaks off and then sort of goes like, hey, I'm using that broken off piece to actually make that process more error free and therefore faster. What just happened then is that information just seeped into the genome. Because remember, that piece that broke off is actually one of your own. And you can imagine, but so a process like that that slowly but surely information about how to do the replication moves from the hardware into the software. It's a non-equilibrium process, but it is at least imaginable, right? Because you have to reach this limit until there's enough information in the sequence that it autonomously replicates without the support of the hardware. That is my sort of imaginary view of the origin of life, but it is a hard thing because you have to reach this limit. Let's say the limit is 200 bits, right?
Starting point is 01:15:18 Only after 200 bits can evolution take hold, which then means I can have the evolutionary process, increase this amount of information, you know, to infinity, right, theoretically. But before you have achieved Darwinian evolution, any information that is in the sequence is now highly fragile. It is basically prone to deteriorating due to the second law of thermodynamics, right? So you basically have to have a process where you are constantly working against this law. Yeah. But it is because information from the replicative machinery, the hardware is seeping into your sequence. And if this happens fast enough, you might actually get to this point where Darwinian evolution can take place,
Starting point is 01:16:08 and therefore you separated yourself from the second law of thermodynamics. Okay, good. I actually think I understand that. Let me try to run it by you again and see if I got the right idea. Right. You know, we talked about Maxwell's demon and evolution, and that was pretty clear to me. If you just had a set of genomes and you just acted randomly on them, they would randomly walk through the space of all possible genomes, and the entropy of that distribution would go up. That's the second law of thermodynamics.
Starting point is 01:16:35 But you're saying that selection kicks in in the case of evolution, and that basically acts like Maxwell's demon to lower the entropy by sort of nudging all of the genomes towards areas of higher fitness. And that's pretty darn efficient. And you're suggesting that maybe the origin of life was a similar mechanism, not quite the same and not quite as efficient, but something that was a little Maxwell demon-y to allow the distribution to become lower entropy, that's exactly right. So if you think about this idea that if you're making fewer mistakes in the copying process, that the copying goes faster and you're therefore making more copies, that means that there is a Darwin-y mechanism at work, even though there is no self-replication. But there is replication, right?
Starting point is 01:17:31 But it is replication of non-informative things, which by the mechanism, well, some of the stuff by chance happens to increase your speed of replication, well, those are being maintained, right? And that's this idea, well, yes, the information about to replicate is in the hardware, not in the software, but it slowly starts seeping in. Slowly or fast, it really depends on the environment that you're in. And of course, we have no glimpse of any of that. Because if you imagine that, you know, let's say it's actually a process that isn't that difficult to achieve, well, wherever it constantly keeps occurring,
Starting point is 01:18:10 it would be dwarfed by any form of life that is actually there that has perfected this for four billion years, right? So there are these dark smokers and like these underground, what are they called? This pew hydrogen, basically. They're these underground volcanoes almost. And there is all kinds of interesting bacterial lives. going on and who knows what else goes on there.
Starting point is 01:18:35 There are some theorists that claim that, yes, that is, you know, these type of environments are perfect for the origin of life. But even if it would be constantly reoccurring there, we wouldn't see it because bacteria have already colonized. Right. Okay, that is great. That's a very interesting set of ideas. I hope that it's followed up on.
Starting point is 01:18:55 So, but we're near the end of the podcast, so we are allowed to think big now. And let me, it does lead into, I guess, the last question, which is, you know, it's, is human beings have developed capacities for manipulating information that didn't exist before human beings came along, right? We can not only think, but we can learn, we can teach in ways that other species don't, we can store information and replicate it in books and on computers and so forth. Is this analogous to the origin of life? Is this another way that information is reproducing itself? Are we seeing a phase transition to a new, kind of information phase?
Starting point is 01:19:36 I think you're right about that. I mean, just imagine the advance of being able to write down what you have learned so that other people can build upon that. That obviously, in a sense, I'm saying obviously, but maybe that's not obvious to everyone. But obviously, that is the magic power that we have acquired as a human species that no other species has. The best that other species can do is to somehow teach young generations about certain behaviors, hunting behaviors, or maybe even tricks that you can use with rocks and sticks and things like that. But to actually figure out mathematics and write it down into a book and then
Starting point is 01:20:19 teach that to other people so that they can, for example, predict the orbit of a planet 100 million years from now. I mean, if that is not mind bogg, then I don't know what. Now, if you think about it, what we call our ability to make predictions over long distances as far as humans are concerned, in a sense, we call that intelligence, right?
Starting point is 01:20:44 We're making intelligent decisions so that we, in the sense, can survive better, right? But in fact, clearly, making predictions about what the environment is tomorrow is going to be an important element in my survival. It was an important element for survival for any hunting species that goes like, oh, I predict that this prey is going to be at this location in about 10 seconds. And if I can make that prediction very accurate, I will catch that prey, right?
Starting point is 01:21:10 Now, if you think about intelligence as the temporal radius at which you can make predictions with accuracy better than chance, well, then you realize, well, bacteria make predictions all the time, but they make predictions about the next second, right, because they're making predictions about where the nutrient is, but that could change very quickly. But for example, you know, this behavior of finding food, chemotaxis, well, that's clearly, in a sense, an intelligent behavior in the sense that it makes certain predictions that, you know, are going to come true or not within a second or so. But other animals, for example, let's say chipmunks, they make predictions about, well,
Starting point is 01:21:53 the next season will be such that, you know, there will be winter, so I may have to actually store some food, right? But we as humans, by having developed writing system, by having developed mathematics, can now make predictions in time over a far, far longer period that are accurate, better than chance. The fact that we are talking about global warming and how that could affect us as a species is an obvious, intelligent concept because we have understood that if we don't do anything, then we might be in big, big trouble. That is making predictions in a time scale far removed of our day-to-day activity, right? And our brain does that. But our brain, of course, has been doing this before we had figured out writing systems, right? But now the conjunction of
Starting point is 01:22:43 our brain, where information during a lifetime is stored, and then we can make it permanent in books. That is the real secret source to the success of this species. as far as I'm concerned. Well, we'll see whether or not we have the collective strength of will to actually act on these predictions that we're making or not. Yeah, that's right. Of course, that's the big political question. Unfortunately, it is a political one and not a mathematical one.
Starting point is 01:23:07 The mathematical one is very obvious. Good. That helps us a little bit. Chris Adami, thanks so much for being on the Mindscape podcast. It was really my pleasure. Thank you, Sean. In I shop Ashero. At Ashero, I find statement dresses, powerful suits and afrocentric styles.
Starting point is 01:23:48 made for my life, church, celebrations, date nights, moments that matter. From head-turning outfits to the finishing touches, wigs, hats, handbags, and shoes, Astro helps me show up confident. For me, it isn't just fashion. It's how I express who I am. Shop your style at astro.com. That's ashro.com. What if you could have even more and more and more help to pursue your goals? At LPL Financial, we offer more ways for advisors and their clients to thrive. So what if you could? Paid advertisement, investing involves risk including potential loss of principal, LPL Financial LLC member FINRA SIPC.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.