Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas - 196 | Judea Pearl on Cause and Effect

Starting point is 00:00:00 You're confused about your credit score. One site has one number and another site, something completely... What? That can't be right. It's okay. Forget everything except MyFICO. These free scores from other apps can differ by as much as 100 points from your FICO score that 90% of top lenders actually use when you apply for a credit card, personal loan, car loan, or mortgage.

Starting point is 00:00:22 For the moments that matter, get the score that matters, your FICO score. Visit MyFICO.com and get started for free today. From the neon lights of the club to the harsh, buzzing lights of the office. Don't let the wear show on your face. Just swipe Mabeline instant eraser concealer to erase the night before, wherever that happens to be. Instantly cover dark circles and under-eye bags

Starting point is 00:00:45 for a brighter, more awake look. This do-it-all formula also contours, corrects, and highlights, all while staying lightweight, crease-resistant, and smooth. It may be the world's greatest eraser. Find your shade of instant eraser concealer at your local retailer.

Starting point is 00:01:01 Hello, everyone, and welcome to the Binescape podcast. I'm your host, Sean Carroll. As we go through life, one of the things that we're inevitably going to do all the time is to assign credit or blame for things that happen in the world, either to people or to other events that are happening, right? We have effects, things that happen, and we have the causes for those effects, the reasons why those things happen.

Starting point is 00:01:24 This idea of a structure of reality based on causes and effects and their relationships is perfectly obvious, right? I mean, it's something that is completely evident to us ever since we were little kids. The ancients talked about it. Aristotle famously kind of organized a whole categorization of causes and different kinds of causes and their effects. But like many such ideas, when you think about it a little bit more deeply, it becomes tricky. What exactly is going on?

Starting point is 00:01:52 If I say, I got sick because of a virus, what do I really mean? I mean, there's a kind of simple answer, which is, if it weren't for the virus, I wouldn't have gotten sick, right? If the virus weren't there. But you try to implement that in some systematic way, and what you find is it's much trickier than that. For example, what if you had gotten the virus, but you were also vaccinated, so therefore you were protected against it? Or what if you didn't get the virus, but you got something else, so you got sick anyway? Furthermore, that kind of reasoning isn't limited to just the virus, right? I mean, Darwin's theory of evolution is responsible for viruses in the first place, in some sense.

Starting point is 00:02:29 Did you get sick because of Darwin's theory of evolution? Did you get sick because space time is four-dimensional, without which maybe there wouldn't be such a thing as viruses? It's hard to pin down exactly what's going on. And this difficulty is not just for physicists or philosophers or other kind of scientists. It's becoming increasingly important in artificial intelligence research, because computers don't have this immediate, obvious feeling that there are causes and effects in the world like human beings do.

Starting point is 00:02:57 So we really do need to get at. the guts of what's going on when we talk about causes and effects. And modern scientists and philosophers and mathematicians and computer scientists have done this. No one has been more influential than today's guest, Judea Pearl. He's done foundational research on understanding what is meant by causes and effects, and he's written about it. He has a popular book from a few years ago, 2018, with Dana McKenzie called The Book of Why, The New Science of Cause and Effects.

Starting point is 00:03:23 You can read about it there, but you can also get it here on this podcast. So we're going to talk about exactly this set of questions. Just to give you a little bit of a hint, the idea is we think about probabilities of things happening. And even if there's definite things that happen, even if it's not about randomness, maybe you don't know what's happening, right? So for a statistician, if you say there are people, a set of things called people, and some of those people drink alcohol and some tea total, some don't drink alcohol. So what that will mean is there's a fraction of the people who drink, and that's saying that the probability that a randomly selected person will drink is a known quantity. So there's a probability involved even if everything is perfectly deterministic. And then also for people, there's a probability that they own a cat or own a dog or have no pets at all or both or whatever.

Starting point is 00:04:07 And then you can ask questions about, well, okay, given that you drink or you don't drink, what is more likely? Do you own a cat, a dog, or no pet at all? And then you can say, what causes what? Are people who drink more likely to own pets? Or if you own a cat, does that force you to drink? That's the kind of question that this new size. science of causality is designed to answer. I'm not going to give away all the ways that it happens, but again, crucially important for computer science and artificial intelligence, also for areas

Starting point is 00:04:38 like medicine. You want to know what medical intervention is giving some effect in the patients. For politics or economics, what policy changes lead to what effects. This idea of cause and effect in getting it right pervades how we think about the world. As a physicist, of course, There's a whole other dialogue to have about the fact that in Newton's laws of motion, there are no causes and effects. So how do you recover them at the macroscopic level? We get into that a little bit and many more interesting ideas. So let's go. Welcome to the Mindscape podcast.

Starting point is 00:05:27 Glad to be here. So causality is one of my favorite topics to think about, both as a human being and as an academic researcher. So this is going to be a great thrill for me to talk to the world's master. let's try to get on the table how the typical person out there should think about causality. We'll hear, we're both right now in Los Angeles as we're recording this. We'll hear people say things like, I was late because there was a traffic jam on the 405, attributing to the fact that there are late a cause, namely there's a traffic jam on the 405. So I guess the first question is, does that make sense?

Starting point is 00:06:07 Are these good ways of thinking? Is that causal language? Oh, that's the best way, because that's the way people talk. And to distinguish my profession or my hobby from yours, I'm interested in capturing the way people think and not the way nature is constructed. So this is because I am in the circle of AI people, and we have a certain mission.

Starting point is 00:06:37 We want to capture how. You and I think so that a robot can communicate with that in a natural way, regardless of how the molecules move. And I think it's a great fact that AI helps us understand not only the world, but how we think. Because we take so many things for granted, right? And the computers don't. Absolutely. And that is a real test. And that's why I'm accused many times that I didn't pay attention to the great philosophers to what. can't say the Hegel and Aristotle they didn't have a pressure

Starting point is 00:07:17 to build a robot that behaved like us and they didn't have a metaphor of thinking that's important, yeah. So explain more what do you mean by that, the metaphor of thinking? Well,

Starting point is 00:07:31 Descartes had a metaphor. We have gills in our mind and they turn and that's what makes us deduce one thing from another. Why? He needed to have a metaphor because he was familiar with gears.

Starting point is 00:07:50 He wasn't familiar with neurons. He wasn't familiar with even logic circuits. He had only one metaphor for a deductive machine, or at least a machine that has output based on input. And that was the gear.

Starting point is 00:08:09 system that Archimedes may be invented, right? So he put it together. Once you have, I call it a laboratory or a playground. Right. You have to have a playground for your own ideas. So you can take them apart and try different combinations. So philosophers did not have a playground for ideas about thinking. So that...

Starting point is 00:08:34 And we have. We have. The computers are forcing us. And that is why I don't see that I can learn much from philosophers. Good. Perfectly fair. Now, I will mention Aristotle once, and not because I learn a lot from his theories of causation, but because, you know, he did sort of try to divide up different kinds of causes.

Starting point is 00:08:58 And I think he went too far into things that we don't even call causes. But let me just distinguish between the kind of thing I said, I was late because there was a traffic jam versus something like, why is the sky blue? Well, it's blue because short wavelengths of light scatter off of air preferentially. But that's not an event, right? I mean, the traffic jam is an event in spacetime. The properties of the air are properties that are more or less permanent. Are those the same kind of cause-effect relationships from your point of view, or do you distinguish

Starting point is 00:09:30 between those? We distinguish between them. Okay. One is called the actual cause, and I think philosopher called it token. Okay. Token versus, I forgot the other one. Type. Type.

Starting point is 00:09:49 One is based on variable, and the other one is based on event. Right. If I say, I was late because of the traffic jam. I'm talking about one specific. event, in one specific time, one individual, that's me, in one situation, okay, that's talking. And this, one event caused another one. And the contrast to that is a variable-based causation. Like the careless driving cause accident, okay?

Starting point is 00:10:28 which means it's in the variable, the driving type, which has many values, depending on how you drive, and tends to cause a higher risk of accidents. The philosopher used the example of drinking hemlock caused death, or Socrates died because he drank this hemlock. It's a different. That's singular. Singular versus token. Okay.

Starting point is 00:11:00 That's how they use it. Right. Okay. So we have different name for them, and we have different algorithms for identify each one. Good. Do you think that causality is something that is fundamental in nature, or is it something that is helpful to be human beings to describe what's going on in nature? Is it more emergent or is it built in to the fabric of reality? There's no cause and effect in physics, as you know it.

Starting point is 00:11:34 Because all the equation of physics are symmetrical in time. So, and they are built around algebra. Algebra is towed to a connective called equality, and equality is symmetric. So if F is equal to M, A, then A is equal to F over M. Okay, physics doesn't distinguish between the two. However, as an emergent property, we perceive certain things as directional. We say that rooster crow does not cause a sunrise, but the other way around, even though it occurs earlier and is highly, highly correlated. I will be forced sometimes to ask questions that I think I know the answer to, just because,

Starting point is 00:12:26 It will help the audience along. So don't be surprised. No, no, no, no. I like those because it gives me a chance to repeat myself. That's good. We'll both be doing that. Okay. So, no, I mean, I like this distinction that you're drawing,

Starting point is 00:12:41 and I just want to emphasize how very profound it is, right? I mean, there was this giant revolution with Newton and Descartes and Galileo in constructing mathematical theories of physics. And you're right. the most commonly appearing symbol in a mathematical equation, set of mathematical equations is the equal sign, right? And there's no arrow on it. And so in some sense, you're doing something absolutely audacious,

Starting point is 00:13:07 or at least going back to a previous time when these audacious things were commonplace, when you're saying, I want to know not just equal signs, but arrows. Which is the cause and which is the effect? Yeah, correct. I'm glad you that you share with me the astonishment. or the about what Galileo did. Yeah.

Starting point is 00:13:29 That he chose algebra and say nature speaks algebra, which wasn't clear at the time. And it enabled him to do so many things that weren't done before. Just 50 years after the invention of algebra by Vietta. Right? Okay. I think students today should appreciate this revolution. and take analogy now, draw parallel to what civil right did.

Starting point is 00:14:03 Civil right is a geneticist, the 1920 that got sick and tired of working with equation and said, it doesn't represent what I want. I want to have an assignment operator. I put an arrow. He didn't think about assignment, but we in computer science know that there is a difference between assignment and equality. I take the content of register A and assign it to be the content of register B.

Starting point is 00:14:34 It's a different operator, which is asymmetric, and he was the first to put a symbol for us. The symbol was an arrow, and he built his path diagrams, which everybody attacked him for that, But at least he said it represents what he wants. Right.

Starting point is 00:14:55 And before, okay. So that's why I admire him for having the audacity to put a new symbol for something that he understood is needed. And I'm trying to understand. So, all right, we're done with the questions that I think I know the answers to. Now we're already moving on to questions I don't know the answers to. What is the relationship of this new way of thinking? which again, it's an old way of thinking, right? Aristotle would have been perfectly happy with these arrows,

Starting point is 00:15:25 but we sort of got rid of them and we're bringing them back. What is the relationship between this and the idea of counterfactuals or possible worlds? I mean, a very simple-minded guess as to what causes are is that A causes B, if had A not happened, B would not have happened. And so already you're talking about a whole different universe where different things happen. Right. And that was a glitch of Hume. He used almost this term.

Starting point is 00:15:57 Had A not happened, B would not have done. What the relation is that we have a calculus of counterfactual. Very simple, which means we take path diagram the way that C will write put them on, and we can define what a counterfacture. is for every two variables or for every two events, we can assign truth value to every counterfactual you can think of based on the path diagram. So we have a calculus. Plus, we have an understanding of what you need to build those path diagram.

Starting point is 00:16:39 And it's built on one relationship. Listen to. Everything is built on knowledge. You have to combine data. with knowledge. So somebody has to build those path diagrams. What do you need to think of when you decide whether to put an error or not to put an error between A and B? That is a question. What comes to your mind? And the only primitive we need is the primitive of listens to. Okay. The barometer deflection listens to the atmospheric pressure. The rooster listens to the glare in the sky.

Starting point is 00:17:16 Okay, that's only a relationship. It's a very primitive one, and it's a very natural one, because I believe this is the most rudimentary. You cannot ask for more rudimentary and simple relationship, for a scientist or for a robot to think about. The barometer example is a very good one. We think that, you know, the barometer is telling us the atmospheric pressure, and if all we had was the data,

Starting point is 00:17:48 if all we knew was what the pressure was and what the barometer reading was, we're back in Galileo land and there's an equal sign and there's no arrow going either way. And what you're saying is there's something extra that that is missing. It's not just a correlation. There is a clear fact of the matter

Starting point is 00:18:05 that the pressure is causing the barometer, not the other way around. And that's what we would like to understand. And we only come aware of it when we have to program on a stupid robot. Because the stupid robot, if we look for the equation, we'll try to move the barometer and hope to prevent the rain tomorrow. Right. Good.

Starting point is 00:18:27 And let me also just give the audience a visual here because you and I have in mind these diagrams. You've already referred to Sewell Wright's diagrams, and you and your collaborators have built these into a wonderful tool. So these diagrams represent what? Let me let you say what they are. The collection of judgment about who listens to whom. I'll put an arrow between the barometer and atmospheric pressure

Starting point is 00:18:55 if I think the barometer listens to the atmospheric pressure. And I would not put an arrow there if I think that the barometer between the barometer deflection and the price of beans in China tomorrow. Okay. Even though there could be indirect connection between the two. Good. So we have in mind a bunch of facts or a bunch of things in the world that could take on different values, right? The barometer could have different readings.

Starting point is 00:19:29 The pressure could be different. Yeah, variables. Right, good. But it's a man-made, you know. Variable is a man-made entity. And then we put all of these variables in circles, and we draw arrows connecting ones, where we think that one thing listens to another thing. Okay.

Starting point is 00:19:46 And the, I do have one philosophy question about the counterfactuals. We want to say, you know, if A hadn't happened, then B wouldn't happen. If all, I mean, all the robot knows is the world. All it knows is the data, right? I mean, what gives us the license to talk about what would have happened in a different world where all we have is what did happen in our world? Okay.

Starting point is 00:20:10 Assuming that you are willing to make assumptions in the form of who listens to whom. When you feed the robot this collection of assumptions in terms of a path diagram, and that's enough. From now on, the robot can reason counterfactually because all the knowledge about counterfactual is contained in that diagram. So the game is to use things we observe about the world. to construct this kind of diagram telling us what listens to what and then we can

Starting point is 00:20:46 deduce what would happen counterfactually. Absolutely. That's the whole point. Yeah. Good. So this is a parsimonious representation of counterfactuals of super exponentially

Starting point is 00:21:02 large number of counterfactual. Sure. Sure. Yeah. And that's something, by the way, if you compare it to what philosophers did in about the closest world semantics. Like David Lewis, right? He had the idea that A is counterfactually related to B. If B is true in all the world which are closest to A or something like that, right?

Starting point is 00:21:31 So as a philosopher, he didn't care about computer representation or mental representation. How would you ever write down, how does the mind represent all the infinite relationship between who is closer, which world is closer to which world, given another world, okay? This is a enormously large set. But we computer science cannot deal with super exponential storage. We have, we must, two things. As a psychologist, we must agree that we have to face the problem of representation. If you have a theory and the theory does not allow for parsimonious representation, scrap the theory.

Starting point is 00:22:22 It cannot work in our mind, right? So, and the other consideration is practical. We cannot feed the robot super exponentially large memory. That's a very good point. And, you know, I guess as a physicist slash philosopher, I'm guilty of thinking like Lewis sometimes and just letting ourselves imagine arbitrarily complicated different situations. But when you're making the point that if what we care about is teaching a robot, our understanding had better be simple in the sense that there's only a tiny number of rules and choices that need to be implemented to say something about what happens next. And something else to enforce it is the fact that we form a consensus. We, human beings, society, we do form a consensus about counterfacture. How is that possible? If each of one had a different notion of which world is closest to which, we wouldn't form a consensus.

Starting point is 00:23:31 Yeah, and it makes me wonder, okay, so why? Can we do that? I mean, much like why can we talk a language of causality at all if it's not to be found in physics? And I think that ultimately, physics is where our description of the world bottoms out. Are there special features of physics that allow us to talk about these ideas at the higher emergent levels? I am not sure I understand what the question. Yes, we have to ask, why do we, what is about? the mechanism and the motion of the molecules that makes us believe that the barometer is a result of the atmospheric way around. Now, Simon has said in Herbert Simon talking about, right, he had some conjectures about it. He said it's probably consideration of power, okay, energy, the sun doesn't care about the rooster,

Starting point is 00:24:31 okay? And because of the enormous difference in mass, in energy involved here. So we rather give the sun the power of influencing the rooster and not the other way around. So we have all kind of time progression is also important, right? So both time and energy and mass gives us the clues about directionality. Good, good. Okay, that is very helpful. But now let's roll up our sleeves a little bit and get into the nitty-gritty of how this works. So we have a diagram. We imagine a diagram between all sorts of different variables and what listens to what. But just because B listens to A doesn't necessarily mean that A is the cause of what's happening in B. It's more subtle than that, right? Well, yeah, because B is listening to many other things. Some of them are influenced by A negatively. So, yeah, it's a combination of listening for which we have algorithm that unpack them and give you answers to questions on three levels of the reasoning hierarchy.

Starting point is 00:25:48 And that is the organization in which I like to think. Is the ability to answer questions of a certain type. So, sorry, what are the three levels of the reasoning hierarchy? That sounds important. I thought everybody knows by today, okay? Well, they should all buy your book, but let's assume they haven't yet. Okay, the lowest level is statistics. I hope you don't have many statisticians among your audience

Starting point is 00:26:16 because they get extremely insulted when you put them on the lowest level. The lowest level, I know. It's just when you look at one event, then you infer the likelihood. that another one occurs. Will occur or head occur? It doesn't matter. But it's still the association between events.

Starting point is 00:26:41 This is correlation. This is statistics. This is, by the way, also machine learning, unfortunately. Or at least 99% of machine learning. So this is what we get on the lowest level. For that, we don't even need a diagram. We can do that. Actually, we need it to, for parsimonious representation of conditional probabilities.

Starting point is 00:27:11 We need a diagram too, but it's a purely probabilistic diagram. We call it Bayesian Network in my language. So that's a level one. Level two is action. What if I do that? And the reason it's totally different than the first one, because you're talking about changing the probability space. You're talking about a new environment. Things have changed in the world.

Starting point is 00:27:40 Okay, I don't wait for the sprinkler to be turned on. I apply my muscle and make sure that the sprinkler is on. Looking at the sprinkler, the sprinkler is on, I can infer it must be summer. But if I turn the sprinkler on, I can no longer infer that it must be summer. Yeah. Okay. So that is a simple example of why it changes the world. So the world of intervention, this is something that we can learn from randomized experiments.

Starting point is 00:28:14 Okay. And that was the greatness of Fisher, Ronald Fisher, to introduce a randomized experiment into statistics. statistics before Fisher was all about correlation, and that changed the practice of statistics, not the philosophy, because Fisher refused to talk counterfactuals, so he couldn't, he couldn't prove that his randomized experiment give you what you really want, what the farmer really wanted in his station. in the agricultural environment in England

Starting point is 00:28:57 was to know whether to plant use fertilizer A or fertilizer B and what will be the yield if I do one or the other. The father did not care about randomization. But Fisher

Starting point is 00:29:16 convinced the community that if you randomize, you get rid of all the other factors. And what you have is an answer. to the farmer. He couldn't prove it. Neiman could prove it, Neiman at a time, but he a fisher lacked horn with Neiman, and he refused to use his notation. So, without the mathematics, he was able to convince the community that randomized experiment gives you answer to your question, which fertilizer should they use? Good. Now we are going to a third level,

Starting point is 00:29:53 and this is counterfactual or understanding or explanation, unit level, retrospection, and imagination. It's the highest level of reasoning that I can think of. Perhaps I'm missing fourth level. But this is what we mean by explanation. Why things happen way? was it the aspirin that removed my headache or other factors? As you mentioned, it's event-based.

Starting point is 00:30:33 It has to do with individuals. In a particular situation, one event happened and another one happened was the first one because of the other one. And that is not even in, you cannot answer that in a randomized experiment. Right. So I guess the understanding, I'm getting confused between the second level of action and the third level of counterfactuals. Why aren't counterfactuals in the second level? I mean, the action is if we did this, what would happen, yeah? In the future, right. There's no contradiction. There's no contradiction. There's no contradiction about me going to take an aspirin now. I want to know if my headache will go. go away. But if I say I did take the aspirin, my headache did go away. What if I didn't? Now I have a

Starting point is 00:31:28 contradiction between what was actually observed, event that occurred, and one that I hypothesized. What if I didn't take aspirin? Good. So now I do understand. So the second level is moving forward in time, if I do this, what will happen next? Whereas the third level lets us go, had I not done something, how would things be different? Right. It's undoing. events that took place. Good. Okay. Let's look at one of the classic examples, which is always used in these discussions, which is, does smoking cause cancer, right?

Starting point is 00:32:01 This was, we know the answer, but, you know, back in the 60s, there was a debate, and one of the ideas was that there were, there might be just something else, some genetic effect or something like that that could cause cancer. And the question is, how can you tell? And the answer is, you can't, unless you make some, you can, unless you make an experiment. Sure. You could, but it's unethical and probably practically impossible to force a guy to smoke two packs a day, even though he's not inclined to do that.

Starting point is 00:32:37 Probably bad, yeah. Yeah, so it's hard to do, although conceptually it's doable, but it cannot answer the question without randomized experiments. But at least we have one technique to answer it. Randomized experiment. Yeah? Yeah. You said, okay, good.

Starting point is 00:32:57 At the time, given that we cannot run the randomized experiment, the answer, it developed into a fierce discussion or controversy, an argument between pro-tobacco and anti-tobacco camps. and Fisher was a heavy smoker. He argued for, he argued they cannot rule out the possibility that there is a genetic factor. It makes people crave for nicotine and puts you into a cancer risk.

Starting point is 00:33:36 And indeed, we cannot rule it out, except we can bring to bear some knowledge about plausibility in the world. And the way it was resolved by thinking, how strong should that tobacco gene be in order to account for the observation? It turned out it had to be quite strong. It had to have a strength that the presence

Starting point is 00:34:08 or non-presence of that gene would make you eight times more likely to smoke then not smoke, okay? And that was just implausible on the basis of what? Yeah, the whole legal battle there was resolved by appealing to

Starting point is 00:34:26 plausibility. Wow, okay. But let's put aside ethics and human beings and what we're allowed to do and things like that and just wonder intellectually about this question because it's just a paradigm for other kinds of questions. I mean, the two

Starting point is 00:34:42 possibilities are smoking causes cancer or some genetic factor causes both smoking and cancer. And if I understand the move that you and your collaborators want to make, it's to say the difference is that if we force you to smoke, you will probably get cancer. And therefore, it doesn't matter. Yeah, the difference is shown in a model very nicely. You have a model and the difference will be in predicting the effect of action. If I force it, you to smoke, I can predict the likelihood of cancer. If I force you to evade smoking, to refrain from smoking, then I can predict the likelihood of cancer under that circumstances.

Starting point is 00:35:27 Okay, so it's a, however, there's another element here. If you believe in, if you bring to bear knowledge in a form of a diagram, then I can do more than that. And I can say, perhaps you, you should adjust for gender, or adjust for family history, or adjust for addiction in the family. Okay, to prevent that kind of thing. And so the diagram also tells you what factors you must adjust for in order to get an answer without experiment. So now we are talking about replacing the experiments, which many times cannot be done, by a piece of knowledge. The diagram. And I am content of piece of knowledge, which is the collection of who listens to whom.

Starting point is 00:36:27 And it tells you now what factors it should adjust for if you want to get the answer. Replacing the randomized experiment. Good. And at the level of this new calculus that you want to talk about, rather than just equal signs, we have arrows now, you invented an operation in this world of diagrams called the do operator that sort of implements this idea. So tell us what a do operator is, that's a DO, as in I'm going to do it. It's very simple. A do operator just simulate on the diagram what an action will be in the real world. For instance, if I want to say, I turn the sprinkler on, you can simulate on the diagram. Previce, right now, the sprinkler is enslaved to the climate, to the season, because I connected it to automatic controller. If I do sprinkler on, I subject the sprinkler to a new master.

Starting point is 00:37:32 That's my muscle. and I dislodge it from the influence of all previous influences. I can do it on a diagram. I remove the arrows from his previous masters and I subject it to a new master, which is my muscle

Starting point is 00:37:50 on a diagram, and I set the value of the sprinkler 2 on a bullion 1. Okay, Bing. I have a new diagram. I can solve it. and that the du operator is simple simulation of action. I remember when Simon Deo, I don't know if you know Simon, but he was a previous guest on the podcast.

Starting point is 00:38:15 And he mentioned in conversation that something that he wanted to do was like an implementation of Judea Pearl's do operator. And I had never heard of this concept before. But instantly, I had the impression this was something very, very, very important. So I ran out and learned about it. And it's clear that this is going to be crucial to how we're thinking about artificial intelligence and complicated science questions going forward. I'm very happy with the new operator. Yeah, but it's so simple.

Starting point is 00:38:46 It's so simple. That's good. That's not bad. Only statisticians get irritated by the do operator. You know why? Because they realize that they should have invented it. 500 years before Pearl, and they didn't. Well, it's simple in implementation, but there's something subtle here.

Starting point is 00:39:11 Let me just sort of say it in my own words to see if I'm right. Another example is exactly the same structure as what you've been saying, but my favorite example is windshield wipers on cars being on and people having their umbrellas up, right? In the data, whenever people have their umbrellas up, the windshield wipers, are going on cars, and when they're not, they're not. And so there's a correlation there in the data, and this is why you say the statisticians, you know, this is what they do.

Starting point is 00:39:39 They go, look, there's a correlation. But what you're saying is you can implement in your diagram, do umbrella. So walk outside on a sunny day and put up your umbrella and see if all the windshield wipers start moving on cars, and they don't. And that's the sort of physical implementation of what you can do in a Bayesian diagram. But this will not convince a statistician. I tell you why. Because the dual operator doesn't exist in probability theory.

Starting point is 00:40:13 Okay? So it's not an operator in probability. Where does it exist on the diagram? And where does the diagram come from? It's a piece of knowledge that is brought to where outside the data. Exactly. And that is what statisticians. resist. We don't want a new opinion.

Starting point is 00:40:36 Well, that was the manifesto of the Royal Statistical Society in 1833. We are not going to publish anything which has to do with opinion. Data and only data. Right. Yes. But it's not a matter of opinion. I mean, so I think I'm totally on your side here. But it is not in the data either, right?

Starting point is 00:41:05 It's counterfactual data. It's the question if I walk outside and put on my umbrella, what would happen? And so you need to go, like you say, beyond the data, but you can by doing experiments, right? One way is by doing the experiment. As far as the wind ship vital, yes, you can do it by experiments, right? Right. It's a better example than smoking for that reason. We don't have to hurt anybody by making them smoke.

Starting point is 00:41:31 Yeah, satisfaction will be happy with it. Do experiment. And how, explain a little bit how this helps us tackle classic conundrums of causality, like the firing squad, right? You know, when you have a firing squad where there's 10 people shooting at a death penalty victim and one bullet hits first and you say that causes the person to die. But if your definition of cause was had that not happened, the effect wouldn't have happened, that's wrong, because the next bullet would have come and hit them, right? So how does this help us understand cases like that? That is a difference between necessary cause and sufficient cause.

Starting point is 00:42:15 So we can compute how sufficient was a rifleman A bullet as opposed to rifleman 12 bullet. And that's why we put the squad there. So no one could be blamed as an individual. Blame is a necessary cause. We are saying if it wasn't for your bullet, the guy would be alive. So the responsibility is divided here equally. Not even equally. I think if you compute it, including all kinds of noises.

Starting point is 00:42:55 For instance, you're all kind of happy trigger guys and things like that, the probability will be minimal for each rifleman as a responsible necessary cause for the death.

Starting point is 00:43:15 But at least we have a calculus to compute it. That's right. The degree to which your bullet wasn't necessary cause for the death. And vice, now we can also talk about sufficient cause

Starting point is 00:43:27 and compute that. And the combination of two plays a role in responsibility. It's still not part of standard court procedure to compute the necessary. Not yet, not yet. But I think when we are advancing now with the causal AI, I think the legal profession will listen to us. Because they are dealing now with very critical issue of fairness. To what degree the algorithm was unfair to gender, to women,

Starting point is 00:44:12 or to minority groups in their request for loans and so forth. So the idea of responsibility and sufficient and necessary cause play a very critical role. They will have to listen to those philosophical definitions, I call it computer science definition. Sure. Yes, and listen to them and implement them in some procedure. They already did because according to the court of law, the batfall is a standard criterion.

Starting point is 00:44:53 You don't pay compensation unless, you apply the but-for criterion. This is the victim would be alive but for the actions of the defendant. And is that, sorry, is that compatible with your definition of causality? Oh, absolutely. Okay. But-for is, no, but-for has no meaning in colloquial conversations among lawyers, unless you put it in a firm scientific basis.

Starting point is 00:45:27 And the algorithm is a definition of but for it's a necessary cause, a degree to which the action of the defendant is necessary for the death of the victim. And it has to be greater than 50% according to the court of law before the guy is declared guilty or before he's forced to pay compensation. Okay, so let me just try to summarize because I think that we didn't quite end up with the definition of causality yet. And maybe there isn't one, but I think it... No, no, no, I like it because I like the fact that we didn't conclude that because it has different shades. Sure. You have direct cause, indirect cause, necessary cause, sufficient cause, necessary and sufficient. They come in all kinds.

Starting point is 00:46:21 You want to quantify all those. So I'm glad we didn't come out with a single one. Well, it's not a single one, but there is an insight, which I think is crucial. And you've said it, but I just want to say it again because sometimes we say lots of things, and it's hard to take away the message. At least I have this idea that you have this Bayesian network, this graph of all the probabilities, of all these things that can happen, and then all these arrows from one thing to another if one thing listens to it.

Starting point is 00:46:49 I like that formulation. And then there's this extra statement. that you need to go a little bit beyond the data you have to say, if I put a do on one of the variables and just force it to do something without propagating backward in the graph, right, without saying why it's doing that. I'm just forcing it to do it. I'm going outside and putting up my umbrella or turning on my sprinkler. If doing that causes, if that leads to some effect down the chain, then that action is a cause of those effects, yes? Correct. Yeah. Good. And one of the, and again, you said this again, but I just want to sort of rub it in because it is so profound. For the purposes of learning and robots and AI, it's a lesson that the data or a comprehensive set of data might not be enough. You have to go play. You have to do some experiments to really learn why things are happening.

Starting point is 00:47:46 And that's why you find toddlers and babies are constantly playing around in the crib. And they are not pacified and they are restless until they get a state of understanding why this toy makes noise and this toy doesn't make noise. And that is why this restlessness is the craving for a story. Stebbling the diagram. So babies in the crib, they're drawing a basic diagram. I'm seriously. Maybe in the crib are thriving to construct a causal diagram for the crib world.

Starting point is 00:48:33 I like that. I'm just going to sit in silence and contemplate that because they don't know it, but that's what they're doing. Just like Euclid didn't know he was using a metric. No, no, no, I'm sorry. They are born with a craving. Okay. Okay.

Starting point is 00:48:43 I'm serious about it, which means it explains why babies remains restless, regardless if you reward them for the right action or not, they are reward neutral, and they have this curiosity to find out how things work, regardless of the payoff, as opposed to monkeys and other animals, which are driven by reward,

Starting point is 00:49:12 and not by curiosity. Okay, so this raises a whole bunch of questions about at what point in evolution did we become motivated to just learn the causal network rather than just get rewards? Yeah, I leave it to anthropology, and I leave it, but I pose to them the question in more concrete terms, right? At what point in evolution did this transition occur?

Starting point is 00:49:41 or what kind of computational facilities we acquire to enable us to do that kind of things. I believe it was the invention of the counterfactual. And if you read the first chapter of the book of why, is the Harari hypothesis that the artist was able to construct things that have no physical reality. that could not happen in the physical reality. Like the lion man, the head of a person with a body, sorry, the body of a person with the head of a lion.

Starting point is 00:50:21 Put them together. This ability to construct things which do not exist in reality, but exist in one's imagination. That was the key cognitive transition, or cognitive revolution that enables the homo sapiens to

Starting point is 00:50:44 dominate the planet. So I will mention something that maybe you have not heard much earlier in evolution, but it wouldn't be doing what you say. It's the first step toward doing what you say. Malcolm McIver, who's a neuroscientist at

Starting point is 00:50:59 Northwestern, was a previous guest on the show and he studies fish climbing onto land for the first time, okay? And he makes the claim that if you're underwater, you're swimming along at meters per second, but you can only see meters in front of you. And all of the evolutionary optimization is to react instantly to whatever you see. But when you're on land, now you can see forever. Now you can see to the horizon. And there's a new modality that opens up, namely, seeing something far away and contemplating different hypothetical responses to it. You have time to do that. And he makes predictions on the basis. of this theory about the development of brains and bones and sensory organs in the fish that they climb on. And I get it.

Starting point is 00:51:45 I think that that's, you know, number one, I don't know if it's true or not. I don't know the qualifications. But in some sense, that'll be the birth of imagination. But it's only imagination within the template of what you already know is possible. But you could see how evolution would, in the long term, develop that up into something much more creative. Yeah, I can see that. I haven't heard about this experiment.

Starting point is 00:52:08 Interesting. I'm surprised that fish can see outside the water on land. Well, in fact, as soon as they start peaking up onto land, their eyes move. They evolve so their eyes become frog-like, right, and peek up above their head instead of being on the side so they can see better. Interesting. It is, it is. Okay, good. So babies constructing Bayesian networks.

Starting point is 00:52:33 But again, I want to sort of re-ask a question that came in at the very beginning, but now we're more sophisticated so we can ask it again. When we're drawing these arrows, we do it on the basis of data, even though it's supposed to be something that says more than the data are telling us, ideally, all we have is the data that we have to construct them. How objective is that? Can we write down a methodology for saying, here's a bunch of data, therefore, here are the arrows you should draw? Or is that completely coming in from our judgment or something like that? The easy answer completely comes from our judgment. However, our judgment has also evolved. and general contains a condensation, compilation of ideas, tradition, knowledge that came to us by social evolution as well as by biological evolution.

Starting point is 00:53:40 So our judgment is also based on stream of data that took billions of years to evolve, to empower, to impact. to impact us. So, indeed, the controversy in data science is whether we should build an Einstein from an amoeba, by simulating the stream of data that our ancestor had from the time they were amoebas until they became Einstein, which essentially what data science is today. Let's learn everything from data, because that's all we have. Or the alternative is we already ahead.

Starting point is 00:54:27 Our ancestors that worked for us and have compiled a bunch of knowledge that we call plausible, a plausibility. Who listens to whom? We have it already compiled, and we know that the rooster, the son does not listen to the rooster,

Starting point is 00:54:46 Let's use it. Both sides have arguments on their side. Because after all, everything we know originally came from sense data. But the question is, can we afford to wait a zillions of years to replicate it? Will we ever replicate it the way we evolved? because the knowledge that we have is subject to incidents such as meteor rains and something we cannot duplicate. Anyhow, that's a philosophical question.

Starting point is 00:55:33 And I'm for using compiled knowledge that we already have. Also, the argument is like that. Suppose you are successful in discovering the causal graph from pure data. And by the way, there is a bunch of activity called causal discovery, which is based also on the idea of what should the graph look like in order to be compatible with the data. Let's rule out all the incompatible. I'm leaving it on the side now.

Starting point is 00:56:08 It's quite a picked up momentum in the past few years. But aside from, if you are successful to learn the causal graph of data, you have to learn it, how to use it. And that's why it's important to keep it in mind, to know how to use it. And so you remember that you have to communicate with a human being, the end user, who is enslaved to this structure. You have to be compatible with the way the human being has structured his or her causal graph in order to build trust between the computer and the user. No, I like that because I think that in a lot of philosophy of science, for example, people pretend that we should aspire to be some objective receiver of data and develop hypotheses. but in fact, we carry around with us models of the world from the start, right? The manifest image or whatever you want to call it.

Starting point is 00:57:12 And I think we under-emphasize the importance of that built-in starting point in reasoning. Absolutely. As I say in one of my chapter, the physicists write equations, but they talk cause of effect in the cafeteria. Very true, very true. Speaking of which, I do want to talk about physics a little bit because, well, here's how I say it sometimes. The question that you and your friends are trying to ask is, does smoking cause cancer or is there some other variable that causes both smoking and cancer? You're not trying to answer the question, does cancer cause smoking? You've decided ahead of time the cancer doesn't cause smoking because, you know, like you said, it's heavier or whatever.

Starting point is 00:58:00 we know from the structure of the world that it's plausible that cancer listens to smoking and it's implausible the other way around. I want to derive that, though, on the basis of the laws of physics, and in particular, like you said, that might be too ambitious as a general rule, but I do want to derive the fact that the causes come before the effects in time. I want to derive the fact that the arrows have to point toward the future. Are you optimistic that that is something that can be derived on the basis of our physical understanding of the world, or do you think is it, it's going to have to be taken for granted. You want to derive the time directionality from causes, as opposed to the normally we think about that causes must be constrained by the flow of time, by temporal precede precedence. No, sorry, I think I misspoke.

Starting point is 00:58:50 I want to derive the fact that causes precede effects from the arrow of time. but by the arrow of time, I mean the increase of entropy since the Big Bang to today. That is the definition of time? That's the arrow of time. The arrow of time is an increase of entropy. Yes, that's right.

Starting point is 00:59:12 I cannot help you. Okay, good. I don't know. It's a nice challenge. I can maybe, if you convince me, it's worth doing, I'll be happy to, I immerse myself in that question.

Starting point is 00:59:30 I need to write a paper here. Halfway done, yeah. But I'm not there. Good. I mean, I know that temporal president constrained the direction of cause and effect, but it's not sufficient. It's not sufficient.

Starting point is 00:59:46 Right. As we saw, for instance, in the rooster crew, it comes before the sunrise, but still we say it's not a cause. That's right. That's right. No, that I completely agree with. I'll just add one more claim that you can think about whether or not you think it's relevant or not, which is the following, that if I know the state of the world macroscopically right now, so what I mean by that is, I don't know where every atom is, et cetera, but I know where the people are and where the planets are. I can use, or even better, if I have a probability distribution over the macroscopic state of the world, I claim that from that, from that, plus the laws of physics, I can predict the probability distribution in the future. I can just use the laws of physics to move forward in time. But from that and the laws of physics, I can't retrodict the pass all by themselves. I need an extra assumption, which is the low entropy boundary

Starting point is 01:00:42 condition near the Big Bang. And I think that's what's going to break the symmetry between going forward and going backward, that ability to predict the correct probability distribution of the world just based on current macroscopic probability data. Okay, why do we go to microscopic, where we cannot think to will? You're talking about molecules, and there are already zillions of them. Let's talk about something which is simpler. Let's talk about the billiard ball, a billiard table. And you have this nice triangle of the balls sitting there.

Starting point is 01:01:22 And what do you call this leading ball that bounces, What the name? The cue ball. The cue or the cue? Okay. The cue ball comes. It hits the triangle. Everybody disperse and comes and hits the walls of the table.

Starting point is 01:01:39 Okay. Now, we run it backward. Take a movie. We run it backward, okay? Can you tell, can you tell with a bear eye whether you run the movie forward or backward? Yes. By what? But you see, there's no entropy here.

Starting point is 01:01:59 Well, that initial... It's just equal to MA for every billion ball. True, but you chose to begin with a low entropy configuration in the triangle, yeah? What makes it... Why is the triangle lower entropy than the state of the balls a second later where each one of them bounces? Why is it low entropy? Well, you have some... I don't think it's...

Starting point is 01:02:24 It is. It is just coarse-graining. No, it's just the fact that you have a name for the nicely arranged bolts when they are in triangle, and you don't have a name for their state a minute later. Just a matter of a name. No, I think it's actually, I think there is something objective about the measure on the space of ways to be in the triangle versus scattered around the... Just because it's simpler. Yeah.

Starting point is 01:02:54 But if I remember my thermodynamic correctly, it is not a progression from an order state to a disorder, but the natural escape from a narrow region in phase space to a wider region. Exactly, yes. That's right. Okay. Okay. So the order is a subjective perception. We have a name to the triangle.

Starting point is 01:03:24 We don't have a name for the state, through this state of the balls a minute later. We don't have it. It's just we have to keep on saying, ball number one is it has momentum, 23 and so on. It doesn't have a single name. So we are biased by our language. I don't think it is just our language.

Starting point is 01:03:45 I think that there is something about how we observe the system that lets us coarse grain in some ways and not others. When we say all the cream is separated from all the coffee and a cup of coffee is low entropy versus when they're all mixed together, it's high entropy. That's because we can see the difference pretty immediately. You say we are biased immediately, yes. Okay, anyway, this is my responsibility to work further than this.

Starting point is 01:04:13 I just wondered if you had any strong opinions about it from the start. But I want to get back to this idea that human beings, have this baked in manifest image, because this is really where it becomes important for the AI, right? I mean, the robots don't have any pre-existing image of the world. I just had Gary Marcus on the podcast. He was the one who connected us here. And Gary has been very strongly arguing that there is a roadblock to deep learning being too successful if it just does correlations between different things in the world.

Starting point is 01:04:46 You need some structure. You need some common sense. I'm betting that you agree. and causality is going to be part of that common sense, but how do we do it? How do we actually teach the robot how the arrows go between all the parts of our little network?

Starting point is 01:05:02 Okay, let me first answer. I agree. Not only I agree, but I have to supplement it by saying that it is not an opinion. It's a theorem. Okay. It's a theorem that here is a limit.

Starting point is 01:05:18 There are certain tasks you can, not do if you don't have this set of assumptions. So it is mathematical constraints on the ability of robot to do certain tasks. Okay, now we go to, how do we get

Starting point is 01:05:34 this model of the world into the robot? If you have the time, you can just feed it with a diagram, plus equip it with the techniques

Starting point is 01:05:50 to enrich the diagram. Enrich the diagram by thinking conjecturally, what experiments do I have to conduct in the future in order to answer a certain question, what additional variables I wish I could purchase so that I can observe them, enrich the diagram. So this is what we mean by automatic scientists. A scientist that can design the next experiment because it answers a question which currently cannot be answered on the basis of the existing diagram.

Starting point is 01:06:33 So the diagram must be vulnerable to number one refutation for the data and number two to enrichment. So that is a blue sky idea of automated scientists which. I could elaborate on, but it's all built on the force of curiosity. We strive to obtain a state of deep understanding. And deep understanding is having the ability to answer questions in all three levels of the hierarchy. That gives us a sense of being in control and having an understanding of a domain, of crossing the street of playing games and so on. And I'm not, I don't have any fish in this pan or whatever the metaphor is here,

Starting point is 01:07:30 dogs in this fight. But I'm betting that the reaction to that philosophy from a lot of people who are working in contemporary deep learning is, no, that's not what we do. You know, we just let the computer learn everything it can, do all the thought experiments it can do, collect all the data, it can. and the computer will figure out what the patterns are. For that, we have a theorem. It's saying impossible.

Starting point is 01:07:58 Is that someone's name attached to that theorem, or whose theorem is that? That's the causal hierarchy. It's a ladder of causation. You cannot go from level I to level I plus one unless you have information or assumptions on level I plus one or higher. Okay, good. But I do get the impression, and again, correct me if I'm wrong, that this perspective is sort of a plucky minority in the field. It is not swept the consensus view. It is not swept the machine learning people.

Starting point is 01:08:36 Okay. Or the deep learning people. It's the same way that it took 20 years to sweep the statistical society until they have. Except the duo poeter. Even that, it's still resistant. I still have an island of resistance there. You'll win. I think you'll win that one.

Starting point is 01:09:02 I can safely predict that. I know what I'm going to win. But okay, I mean, even if we're on board, even if we're on the train, it does sound hard. It's really unfair for me to fight against this huge industry, a huge industry called machine learning. Because I know that I have mathematics on my side, and they don't have this certainty. So I think it's unfair. I'm going to win. You're going to win. That's okay. It's okay to know you're going to win ahead of time.

Starting point is 01:09:30 There are worse positions to be in. But even if you're going to win, it still leaves us with hard problems about what to tell the computer. I mean, what is the diagram? What are the causal relationships that matter? Or even what's the stuff out there in the universe? Do you tell it that there are tables and chairs and people and cars rather than letting it learn that? I mean, how advanced is that program of sort of formalizing our common sense intuition about the causal structure of the world? I have neglected to talk about an object property relationship.

Starting point is 01:10:08 I've been very narrowed in what I'm doing. I'm never-minded. So I only dealt with cause-effect relationship. I've neglected many other things which come into bear in natural language, in vision, interpretation, and so on. But I think what we have learned in the causal corner can be a role model for other areas. Okay, yeah. We are now working with propositional calculus. We have to go expand it to predicate calculus.

Starting point is 01:10:44 All kind of thing that needs to be done. And it will be done eventually. Yeah, we have to teach the robots about objects and properties relationship, chairs and tables and their functions. That is, I haven't done anything in this area. But I want to... Except what's me there. I mean, when you say that, just to be clear, because I think that some people might hear you say, we need to teach the robot about object property relations, and they'll say, well, sure.

Starting point is 01:11:12 But the important part of that phrase is we need to teach as opposed to let it figure it out. We can't wait for it to figure it out. Some things we need to teach others. We can let the robot figure it out. At least in my corner, I have film which tells us what you must teach and what you can let the robot figure out by itself. I don't have those films applied to natural language. and vision. Okay. And then I guess the final thing I just wanted to touch on, which you've already brought up, but is very exciting but also confusing to me is the whole set of applications to

Starting point is 01:11:53 the social sciences and even to law or moral philosophy, right? I mean, when we talk about right and wrong, blame and responsibility, punishment and reward, we're always assuming some causal structure, right? You are responsible for this happening. Do you expect that a more sophisticated, nuanced idea of how cause and effect structures work is going to have an effect downstream on how we think about these puzzles? Yes, I have some expectations, some excitements. I can see how we can build social intelligence on top of environment intelligence. So far, I've been talking. So far, I've talking about a robot learning about managing a domain or understanding a domain, a disease domain, so on. But now we can build on top of that the idea that robots can have a model

Starting point is 01:12:55 of another robot or of itself. If it has a blueprint of its own software, then it can reason about what made me do what I did. And then you can program compassion of that and saying, I understand why you did what you did. Because you are like me. If I were in your situation, I would do the same thing. But are you aware of this and this? So all this relationship, awareness, compassion,

Starting point is 01:13:30 I understand you, trust me. All this relationship involves a robot having a model of another robot. And once we have it, we're going to have a nice conversation with our apprentice robot. Are you optimistic that artificial intelligence will reach human levels of intelligence at some point? I'm absolutely sure. How can one be absolutely sure on the basis of conjection?

Starting point is 01:14:04 only on the basis that I don't see any impediments. But is it a sooner rather than later kind of question? Is this something we need to kind of contemplate? I refuse to answer. Okay, fine. I don't have any imagination that other people have, okay. I'm with you. That asks him of him.

Starting point is 01:14:25 I don't have it. Well, so you say that, but what do you really mean is you have too much imagination because you can imagine many different possible things. it's hard to tell, which is going to be true, right? Correct. Yeah, correct. Yeah. And then, okay, I'll just close with sort of a statement that you can reflect on if you want. But as we were having this conversation, I realized the following weird thing that I write books. I write books for sort of broad audiences on physics and other things.

Starting point is 01:14:57 And over and over again, in all of my books, I always start with the following idea, whether or not I like it or not. It helps to start with the idea that there was Aristotle who imputed natures and goals to objects in the world, right? Fire wants to rise up, rocks want to fall down. It was a teleological view, and causes and effects were front and center. He had this taxonomy of cause-effect relationships. And I say, we got rid of all that. Galileo and Newton came along, and we replaced this. This happens because of that language with a language of patterns, right?

Starting point is 01:15:36 The equal sign in your mathematical representation. And that's been very helpful. All of modern physics is this sort of, it doesn't have any direction of time. There's no direction of causality. It's just this is happening and this is happening and this is happening. But of course, like you say, in the cafeteria, all we physicists talk about causes and effects all the time. And so it is crucially important to me to recover, to understand how to recover, to understand how to recover our ordinary everyday understanding of causality and goals and teleology in a way that

Starting point is 01:16:09 is compatible with that underlying view of fundamental physics. And so I guess all I'm saying is I'm glad to see you're doing it. I'm not sure that I'm capable of undertaking this major monumental goal that you mentioned. Okay. But I'm having fun just to capture. the way we think, the you and I think, and I have a tremendous satisfaction from seeing myself replicated, amplified on a computer. And I get a better understanding of myself. Why I had this intuition?

Starting point is 01:16:54 Oh, because so and so. I have a playground for myself. I mean, maybe this is a paradigm. Maybe we should all have gigantic. gigantic, huge aspirational goals and work to make progress on them in little tiny pieces step by step. I agree with you. All right.

Starting point is 01:17:14 That sounds like a wonderful little lesson for us all and a good place to stop. So Jadaa Pearl, thanks so much for being on the Mindscape podcast. Thank you, Sean. It's great having me, having you, having me on your show. All right.

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas - 196 | Judea Pearl on Cause and Effect

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.