The Joe Walker Podcast - The Shape Of Probability - Harry Crane

Starting point is 00:00:00 This episode of the Jolly Swagman podcast is proudly brought to you by Globite. Head to the website, globite.com, use our discount code SWAGMAN, and you'll get 15% off all items. Because with Globite, you go, we go. From Swagman Media, this is the Jolly Swagman podcast. Here are your hosts, Angus and Joe. Hey there. How's it going, ladies and gentlemen? Welcome back to the show. I'm Joe Walker, and I'm coming to you with another episode with an American guest. Oh, how I love America.

Starting point is 00:00:46 This is probably one of my favorite episodes ever, and I think that means something. I've done about 50 of these now, over 50, with my partner in crime, Angus. My guest this week is Harry Crane. Harry's a professor of statistics at Rutgers University in New Jersey, and he did his PhD in statistics at Chicago University. I first came across Harry on Nassim Taleb's Twitter feed, his infamous Twitter feed, and it was one of those occasional positive tweets. He said that one of the three smartest things he did in 2017 was attend a Harry Crane Foundations of Probability seminar. And Harry hosts the Foundations of Probability seminar at Rutgers and he's involved guests like Nassim Taleb and Daniel Kahneman. Harry was an incredibly gracious guest. He was well prepared. The table that we were speaking at at his office at Rutgers

Starting point is 00:01:39 was strewn with books and notes that we referred to throughout the episode. And this is an episode about probability and statistics. We talk about things like the first rule of probability, which you'll hear about more. We talk about the definition of probability. We talk about why the predictions expert Nate Silver can never be wrong, or perhaps more aptly why he is wrong. We talk about the replication crisis in science, and we talk about some of Harry's reactions to famous experiments done by Kahneman and Tversky, the two Israeli psychologists.

Starting point is 00:02:18 Statistics and probability probably aren't the first topics that you'd pick for a podcast. Needless to say, statistics is an incredibly important skill for the future, and it's only going to become more important for the decisions that our species makes in the future. Harry and I set ourselves the challenge of making these topics as enjoyable as possible for non-statisticians. And I think we succeeded in that. Of course, you can judge for yourself. So, without much further ado, please enjoy this conversation with Harry Crane.

Starting point is 00:02:55 Harry Crane, thank you for joining me. All right, thanks. Great to speak with you. As I mentioned when we were emailing, I first came across your name through Nassim Taleb. I think he tweeted out last year that one of the three smartest things he did in 2017 was to attend a Harry Crane Foundations of Probability seminar. Yeah, we had a great time.

Starting point is 00:03:17 Yeah. And you've hosted Nassim, you've hosted Daniel Kahneman. Yeah, we've had a great lineup. I think, by the way, one of the other things that Nassim said was his best thing in 2017 was getting a pretzel at Annie Ann's, I think. So that's about the level that I'm at at this point. Yeah, in no particular order. Yeah, of course.

Starting point is 00:03:40 Tell us what you do. So right now I'm a professor of statistics at Rutgers. So I've been here for the past five, six years or so. And I guess mostly interested in problems having to do with probability. So I was more on the probability side of things than of statistics. But as I'm sure we'll get to, they're not so, well, in my mind, they're very closely related. And more recently, and this is referring to the thing that you just mentioned, over the past couple of years, I've been involved with the philosophy department at Rutgers running

Starting point is 00:04:13 a probability seminar where we invite speakers from, I guess, all over the place to come and talk about how probability comes up in their specific discipline. So I guess as a statistician, I'm mostly familiar with how probability comes up in their specific discipline. So I guess as a statistician, I'm mostly familiar with how probability comes up in statistics, but I've learned that it, of course, comes up in almost, well, in many other fields, whether it's psychology, finance, economics, and actually also a lot in philosophy. So that's been very interesting. Yeah. I think we're going to talk about the distinction between probability and statistics, as you say it, but maybe before we get there, how did you find your way into studying statistics?

Starting point is 00:04:58 Yeah. So, my interest in statistics is, it's a bit of a roundabout story, but really the way it came about was, well, I've always been involved ever since I was a little kid in probability, I guess, in some way. I guess it would be in applied probability in a very real sense, as I'm about to describe. So, and my kind of going in the direction of statistics has in some way been a little bit of an accident. So, before I tell the story, you know, you got to understand that I grew up in the Taconi neighborhood of Northeast Philadelphia, which was a very, which was an Italian neighborhood. And pretty much, I guess at that time, at least as far as I was concerned, every adult that I was aware of was either, either was a bookie or had a bookie, right? So for me, the, you know, this was kind of just a way of life, right?

Starting point is 00:05:53 So what was the ratio of bookies to people in this county? Yeah, well, no comment on that, right? But everybody was involved in one way or the other, at least so it seemed. And so when I was a kid, I remember my dad used to bowl in a bowling league on Monday nights. And every once in a while, I would go with him to play with the other kids. But there was this guy at the bowling alley every Monday. He was probably there every other day too. His name was Reds Gabriel. I remember his name. It's such a great name. But all Reds would do is he would walk around the bowling alley with a newspaper in one hand and a notebook in the other hand. He would walk up to people, look in the newspaper, write something down in his notepad and move on and talk to the next person.

Starting point is 00:06:34 Of course, what Reds was doing was he was taking bets on the lottery and on football games and on horse races and all that stuff. And I was fascinated by this as a kid, because here you have this guy who probably had a sixth grade education, right? I mean, he probably, he doesn't, most likely doesn't have a very, you know, deep understanding of any kind of mathematical theory, right? But here's a guy who's making his living off of things that are in principle unpredictable. And he doesn't even care. You can go up to him and bet on whichever side of whichever game you want, But here's a guy who's making his living off of things that are, in principle, unpredictable. And he doesn't even care. You can go up to him and bet on whichever side of whichever game you want, and he'll take the bet.

Starting point is 00:07:11 And at the end, he's walking away with money. And so I remember as a kid being very kind of taken by that, which was that there must be something going on here that everybody's talking about this as gambling. But from Red's perspective, it certainly doesn't look like gambling, right? And so, then I thought, so then around the time of, I guess this is a story where I guess Red's has really had a big influence on my life, little do I know. But when I was in sixth grade, Mr. Bulligatz's sixth grade class, Our Lady of Consolation, me and a couple of my friends put together a syndicate, if you want to call it that. And I guess we were doing what Reds was doing. We were taking bets from other kids in the class on horse racing, on football games, on whatever it may be.

Starting point is 00:07:59 It didn't last very long, but the end of the story, I guess, the way it ends is the way that a lot of bookies end, is that we were wiretapped in some sense. I guess one of the, my partner, Nick, was talking on the phone. So at that time, there were no cell phones, or at least kids in sixth grade didn't have cell phones. The world's changed since then. But he was talking to another kid in the class and taking a bet from him from his home phone. And back at that time, the best way for parents to eavesdrop on their kids was to pick up another phone in the house and listen into the conversation. And so, while this other kid was placing a bet, probably a 10 cent exacta box at Aqueduct Racetrack or something like that,

Starting point is 00:08:44 his mom overheard this and she reported to the principal and you know that was pretty much the end of that but anyway that's how i got interested that's how i got into this stuff and uh you know that more or less you know i've never really been somebody who was much about betting on sports or betting on anything like that but i was always kind of interested in that aspect of it and so i continued throughout you throughout college playing poker games and things like that. And I played online poker a lot. And then in college, I majored in actuarial science. I majored in math, but one of my minors was actuarial science. And really,

Starting point is 00:09:16 what I was interested in most was actuarial science. Because if you think about it, what actuarial science is, is the science of bookmaking. I mean, this is the science of what, you know, what is an insurance company doing except trying to price these bets effectively. You know, when you buy insurance, you're making a bet about whether or not you're going to get into a car accident or not. And, you know, the actuary has to determine what the right premium is to charge so that when those unfortunate events happen, that the insurance company can pay the claim and still walk away with a little bit of profit, right? And so, in some way, my study in college was even kind of influenced by this early childhood obsession in some way with probability or with gambling, yeah. What sort of money were you making in the poker? Was it pretty good money for a college kid?

Starting point is 00:10:07 It was definitely good money for a college kid. I mean, at that time, I was... So this was just to put a time frame around that anyone out there who knows about how things have evolved. When I was in college, I graduated college in 2006. Chris Moneymaker won the world series of poker in around 2003 or so and that was when there was a big poker boom online poker um you know shot through the roof and um to 03 04 um sites like poker stars full tilt and all that i was playing you know

Starting point is 00:10:39 several years before them but at this time now a lot of people were coming in, a lot of people were interested in it. And so by that time, at that time, I was playing, I guess, what was at that time the highest stakes on the internet. Now it's kind of peanuts compared to what people play for now. But at that time, you know, the most you could play for was a few thousand dollars, five, ten thousand dollars in a given game. Now you can play for millions of dollars. You know, I've never, I've certainly never played that high. Ever been tempted to try your hand again? Now the stakes are so high. Well, I still do try to stay connected to it. I still do try to play. And I think that we'll probably, we'll talk more about that in the discussion because I think that even if it's

Starting point is 00:11:23 just recreationally for me now, it's important for me to kind of stay connected to these real uses of probability. I mean, I like to experience where probability comes into play in everyday life. I mean, we just had lunch, right? We just had lunch and I ended up paying the whole bill, right? I feel terrible still. No, you shouldn't. But I, so maybe I should, so at lunch, so at lunch we, so this is a good pastime that me and a couple of my friends like to do when we're out to dinner or lunch together, which is what we call flipping for the check. And so when the bill comes, you, it's important that you have a nice iPhone, I guess, like Joe does here. And you start a stopwatch, start the timer on the phone and let it run. And as it's running, I guess you have to let it run a little while just to make sure there's no

Starting point is 00:12:20 kind of foul play. I guess that's what it's for. But you calculate everybody's fair share of the check, and whatever percentage of the check is your fair share, you are given that proportion of the numbers between double zero and 99. And so then what happens is, after the timer's run for a long enough time, you stop the timer, and whichever number is showing on the milliseconds, whoever's number, whoever was assigned that number pays the

Starting point is 00:12:49 whole bill. Okay. And so we just played this at lunch. I assume it was the first time you've ever done this. Yes. But it won't be the last. That's for sure. Well, hopefully not. But you got out, well, you had a good run this time. So, you know, but I think it was fun, right? It was fun when you win. I'm still reveling in the beginner's luck. But even that, which in principle is a neutral expected value play, it's really just introducing volatility or variance into your everyday bankroll, which in the long run shouldn't make much difference. But it's a way of, you know, staying kind of close to where, you know, these probability models are affecting or, you know to to a lot of the in a lot of the situations where probability or statistics comes up as in science for example the probabilities aren't tethered to

Starting point is 00:13:54 anything real uh and if you treat them just like numbers on a page then they're they're you know that's pretty much all they are um it's not until you put something real behind it until it has real consequences that you can really, I think, give meaning to these probability statements. We'll come back to that idea because it's something that's very profound, which I know we both want to discuss. But I thought now we could, let's first first talk about let's give a definition of probability and then secondly a definition of statistics because for a lot of people they're sort of either synonymous or or at the very least very overlapping concepts but if we begin with

Starting point is 00:14:38 probability because you you just described how you first became interested in the concept through gambling and betting and if i'm not mistaken, the history of probability and the mathematics therein was intertwined with gambling from the earliest days. Yeah. So, I mean, interestingly enough, and I was just rereading something last time on the history of probability. And two of the earliest applications of probability and what drove people's interest in probability, one was certainly gambling and the other was insurance. So I realized that that was also kind of the path that I took. But as far as what probabilities are or what probability is, and this is something that I'm grappling with.

Starting point is 00:15:23 I mean, that's part of my uh ongoing interest with this seminar that i run but within statistics there are pretty much there's historically been a divide you know a of two two schools of thought on probability one is called the frequentist view of probability and the other is i guess usually called the bayesian view or the the subjectivist uh view of probability so what view or the subjectivist view of probability. So what a frequentist, the frequentist interpretation of probability is kind of just what it sounds like, where you interpret these probabilities as frequencies. And so, you know, if I'm going to toss a coin, I imagine tossing a coin a very large number

Starting point is 00:16:00 of times, very long sequence of tosses, then the probability of heads is just the frequency of times at which heads comes up in this long sequence of tosses. And so this has a very natural, I think, mechanistic, I guess, interpretation in terms of statistical interpretation in the sense that if you imagine running an experiment repeatedly, repeating an experiment over and over and over again, then the probability of certain outcomes is just the frequency of times that you would see that outcome if you were able to hypothetically run this experiment over and over. Although, you know, that, I guess that experimental interpretation isn't necessary, but I think it's very helpful. And I think it's how a

Starting point is 00:16:41 lot of people think of probabilities. At least I know that that's how I thought about probability for a long time, because I was thinking about the probabilities of a roll of the dice, playing a game of craps or whatever. The probability of rolling a 7 is just the long-run frequency of times that a 7 is going to come up. And so it's very much tied to that process that generates the data. The Bayesian view is a bit different in that it doesn't, and I guess one of the benefits to it, or one of the arguments in favor of it, is that it doesn't depend on this repeatability of the experiment. So in the Bayesian view, the probability isn't a frequency, but it's thought of as what's called a subjective degree of belief, or really just what is the price of a fair bet, you know, what, or a fair price of a bet on a given outcome. So if I were to tell you that the probability that I assigned to the coin landing heads is 60%, all I'm telling you is that I'd be willing to take a bet on either

Starting point is 00:17:51 side of that price at the implied odds, the implied odds there being three to two. I'd give you three to two odds on tails because I'm saying the 60% heads. So that's just a subjective statement about my disposition towards the outcome it doesn't say what the actual outcome is going to be or what or you know what the actual probabilities or frequencies are if those even mean anything those don't really have um you know a meaning in this context so those are the two predominant views uh i guess within statistics or at least within the philosophy of statistics I should say they're not necessarily competing views there I would agree that they're not competing views I think some people think that they are competing views and different people have different ways

Starting point is 00:18:33 of thinking about this it's it's best not to get into only because you know this is a this is a historical debate that's gone on for 50 60 years and it's not one that people are very interested in anymore because within it's not not to say that it's years, and it's not one that people are very interested in anymore because it's not to say that it's been resolved, but it's just to say that for the ways that statisticians actually use probabilities now, this philosophical argument over what the probabilities are isn't really their main interest anymore. But I guess what I just want to close out on these Bayesian probabilities or these subjective probabilities and what the benefit is that there are a lot of situations in

Starting point is 00:19:09 which you might want to assign a probability to something that can't be repeated in principle. We were talking before about the election, for example. If I were to put a probability, as certainly people did, and Nate Silver, these election predictors do, that Trump has a 25% probability of winning the election. You can't really think of that as, it's not really right to think of that as, well, if we were to rerun the election a million times, then 25% of the time Trump would win, because the election's only going to happen once. So, maybe the better way to think about it or the way that the Bayesian view would think about it is that instead, what I'd be saying in that 25% probability is that I'd be willing to offer you three to one odds against Trump winning.

Starting point is 00:19:57 Then there's also other types of probability or other interpretations of probability that I think are interesting and relevant, not so much in statistics, but a lot of probability judgments that we make in everyday life are qualitative, based on some qualitative or ambiguous assessment of the evidence. And that's something that I've been thinking about a lot more recently, but those don't so much come up in statistics. And so what would an example of one of those be? Well, I think that good examples are when, I guess the example that comes to mind often is,

Starting point is 00:20:34 well, one is, you know, if we were trying to talk about whether we should meet on Saturday or on Sunday and you were to say, well, Sunday probably works better for me, you're making some kind of probabilistic judgment there, but you're not, of course you're not assigning a numerical value to it. But I think maybe in a more substantive context, in legal settings where a juror makes a determination

Starting point is 00:20:55 that a defendant is guilty or not guilty according to some standard of evidence or some standard of, you know, how probable it was that they committed the crime. At least legally speaking, you know, there's a lot of resistance to putting any numerical value on that assessment because of how kind of hard it is to really pin down these numbers. But certainly the system works. Well, to say it works or doesn't, but, you know, the system operates and it operates under this, you you know kind of intuitive understanding of probability without any reference to the numerical or the mathematical theory of probability yeah so to recap then we've got the frequentist view of probability the bayesian view of probability and then probability sort of in the common parlance which is you know your proclivity to assenting towards a belief or not yeah i would

Starting point is 00:21:52 say i would say that's good for well for the sake of this conversation i think that's those are the three relevant i think in a lot in most cases those are the three relevant uh concepts of probability yeah yeah so now tell us where statistics fits into all this in terms of the concept. Yeah, so I think that one way of understanding what statistics is now is it's pretty much, it's a field of, I guess, analyzing data on the basis of probability or using probability

Starting point is 00:22:22 as kind of a way of modeling the way that data behaves and drawing inferences from data based on those probability models. So, probability comes into statistics at kind of a very foundational level in that it's the basis of which statistical models are based on the theory of probability nowadays. I think that there's other ways of understanding what statistics is that has nothing to do with probability i mean a statistic is a you know more or less a number in a book or or you know the way that when you take the census the national census you can you measure all of these statistics of the population that's a different use of the term statistics although the field of statistics is in essence a way of kind of gleaning information from these these statistics yeah these numbers yeah

Starting point is 00:23:12 now of course there are lies damned lies and statistics and we were talking earlier about some of the misuses of statistics and the way that statistics can be used to sort of give a veneer of legitimacy to things that are unfalsifiable or outright wrong. And you mentioned the example of the elections and Nate Silver with his polling predictions, but could you talk about, you know, what some of the problems are that you see at the moment in terms of how probability is being used by people like Nate Silver. I don't want to single out Nate in particular, but I guess he's sort of representative of a certain class of people. Yeah, well, I guess there's a couple of contexts that we can talk about this, one of which is the election predictions, and another of which is the way that probabilities are used, I guess, in mainstream scientific research to, you know, validate scientific conclusions. And that's something that's led to what's called the

Starting point is 00:24:14 replication crisis. I think both are certainly related and relevant to this discussion. I think that the latter has potential to have much more far-reaching impact and, you know, negative impact potentially as it already has in that, I guess, the conclusions that people are drawing from these scientific articles are, you know, are in a sense more consequential than Nate Silver's election prediction. But I guess on the topic of the election predictions, I guess what I would just say is that I found the whole experience of the most recent election in America to be very interesting from my perspective. Because at the time, I guess at that time, I had just started this seminar on probability. And so we were meeting every week and we were

Starting point is 00:25:02 talking about all these ways in which probability comes up. And here you have a real life event that's going on that the news media is reporting news that isn't, they're reporting things that are supposedly happening in the world, but they're also making a major part of their news operation, just reporting these probabilities that are being calculated in principle based on the things that they're reporting. So it's really, you know, it was interesting to me, especially, I guess, the story that really sticks in my head was, I remember sometime in September, right after the first debate, Trump had, I think, outperformed what everybody was expecting in the first debate. And, you know, a lot of people were, Trump had, I think, outperformed what everybody was expecting in the first debate. And a lot of people were kind of, well, at that time, regardless of what people were saying, Nate Silver, the 538 probability took him from something like 20% to win to like 35%

Starting point is 00:25:59 to win all in the span of a day. And I just remember sitting around the room and hearing people talk about this. And they were legitimately worried. They were like shaking. And they weren't shaken by anything that actually happened in the debate. They were shaken by the fact that the probability changed and that now the probability was a lot higher than they wanted it to be because these were people who were, I guess, Clinton supporters, right? And so this really struck me that I had been thinking for a long time, I guess, in the kind of weeks and months leading up to that, what do these probabilities actually mean? You know, I have no idea what these probabilities actually mean. I mean, you know, if you think about it, and then I wrote something about this after the election happened, which was, you know, in 2008, I think, 538 gave Obama something like a

Starting point is 00:26:48 90% probability to win. And, you know, at that time, silver was really heralded as this great prognosticator, and that he got it right, right? He got everything right. And then in 2012, he gave a similarly high, a lower, but a similarly high probability for Obama in the 80% range. And so again, he was said to have gotten it right. And then after the 2016 election, where he gave Trump 25, 28% chance to win, you still had people coming out and saying that he got it right again. And these people, what they were pointing out was, well, when you say that something's going to happen 25% of the time, or when you say that something has a probability of 25%, that means it's going to happen 25% of the time. And so the fact that this thing happened

Starting point is 00:27:35 doesn't invalidate the probability. And so when you start to get into that realm of interpreting these probabilities, then it means that pretty much nothing that is said other than an extreme 0% or 100% probability is falsifiable, right? Because you can't replicate election. You can't run multiple versions of the same election and see how, Trump wins or continues. Right. And apparently, no matter what comes out, you're going to be able to say that it's consistent with whatever you predicted, right? So, it really renders the probabilities meaningless, at least from my point of view, unless, I mean, I guess this is something that I've been working on recently or thinking about recently, and I showed you an early draft of this, unless these probabilities are tied to something real, right? And so, this is something that I've been working on recently or thinking about recently, and I showed you an early draft of this, unless these probabilities are tied to something real, right? And so this is something

Starting point is 00:28:29 that I'm calling the first rule of probability. And it should be the last rule of probability too, which is if you state a probability on any outcome, then you should be forced to accept any bet on that outcome at the implied odds, right? So if I state a probability of something being 1%, then I should be forced to offer you 99 to 1 odds if you wanted to bet on the other side. And of course, if I lose that bet, I have to pay up, right? That's the only way, at least the only way that I can see, and this is something that, going back to Reds, the guy at the only way, at least the only way that I can see, and this is something that,

Starting point is 00:29:05 you know, going back to Reds, the guy at the bowling alley, this is something that he knew very well, and all the people who were dealing with him knew very well. These probabilities that you state or the odds that you state don't mean anything unless they have real consequences. And unless, you know, you can actually, first of all, I guess this is where I, I would, I would, um, I kind of, I, this idea of what I would call a real probability, you know, what is it, what does a probability need to be to make it a real probability? Well, one is that it needs to be backed by something real. And so in this case, it would be backed by money and it has to be about something real, uh, meaning that it has to be decidable. I have to be able to determine whether or not

Starting point is 00:29:46 the thing that you said had this probability actually happened or not. But otherwise, we're just talking about numbers. Yeah. So, I mean, I could say, I think there's an 80% chance that there's a flying spaghetti monster orbiting Venus. That wouldn't be a real probability.

Starting point is 00:30:04 No, as of right now, no no i wouldn't say that right yeah um you know that that those are exactly complete speculation and i don't well it's just something that i don't i don't understand it it has no it's a meaningless statement yeah well one it is meaningless because i don't know what this spaghetti monster is right but also even if even if you could define for me what it was if you said it was a teacup instead of a monster how are we going to actually uh kind of go ahead and verify whether or not you're correct or not you know you could of course we're not going to send spaceships to to actually look for this thing and so there's no way to actually decide the outcome so then what so if i bet money that

Starting point is 00:30:42 there was a teacup orbiting Venus, and I said I think there's an 80% chance that that is true, does that become a real probability? Well, I guess it would become real in a situation like this where we would say, well, you're saying that there's an 80% chance, and you say, well, you would have to say, I think... It's a bad example because I would never actually make that bet. Of course.

Starting point is 00:31:04 But let's suppose that you said that, then we would have to agree that you will pay me, well, that there is such a thing and that it will be discovered by January 1st, 2019. And so now you're actually putting kind of a finite time limit on and actually saying that it will be discovered. So it could be out there in principle, but if we don't find it, then you win or you lose the bet by this certain date. So, there has to be kind of a clear delineation of what it is we're talking about and what the probability refers to in order for it to have any kind of real meaning. Yeah. So, essentially what you're saying is that prediction specialists and people in that prediction industry need to have skin in the game. that I'm familiar with it and I see it, you know, I've seen, I've read part of the book and I'm of course familiar with his Twitter account and his Twitter feuds and all of that stuff. But now that I'm familiar with the concept, I see it everywhere, right? And oftentimes when you see something not

Starting point is 00:32:17 working out the way that it should, it's because there's an asymmetry in this idea of skin in the game. And I think that this first rule of probability is is just a um kind of a specific example of of how in order to i i say put you know put more meaning into the probabilities that people say force them you know force them to actually kind of back it up so let's give people an example of this so you know when he says well firstly i suppose when he talks about skin in the game what he's saying is that you need to share the downside risk for your your predictions or your actions so it's not merely a matter of incentives it's about that

Starting point is 00:32:54 symmetry of of also owning some of the downside would a good example maybe be you know the the financial crisis how does that kind of fit into this this idea well i assume so i guess i assume that yeah the financial crisis would certainly uh fit into this idea uh in the sense that um well but i guess an example that i that i think is maybe even more better to talk about what's happening in real time, which is the replication crisis in science. And it's a situation where people are not talking about anything along the lines of skin in the game or what I'm talking about with how to give meaning to probabilities. But I think it's a situation in which this really could benefit. And so what is the idea? So the idea of sharing the downside.

Starting point is 00:33:46 So I guess before we go into that, should we talk about some of the, how much should I explain about what the replication crisis is, I suppose I should explain? Yeah, let's go into detail because I think it's really interesting. But to give people sort of a sense of how this connects to their their daily lives like i'm sure people are familiar with the with

Starting point is 00:34:10 the sense of just of overwhelm in terms of different scientific studies like you hear on the one hand that you know coffee is good for you and on the other hand that it's bad for you and then you hear about a study saying that two glasses of wine a day is good for you. And then you get another study saying it's bad for you. And we see this across so many domains, especially a lot in psychology as well. Kahneman wrote quite a famous letter back in, was it 2012 or 2013, criticizing the field of priming, psychology of priming psychology of priming um but it's uh it's it's become as you've said it's it's now regarded as a quote-unquote crisis so how did how did we get to that point

Starting point is 00:34:55 and why is it a crisis yeah um so what i guess the easiest way to explain this is to explain it in a very simple context. And it's a context that I think most of the problems are coming from this, coming from application of statistics called hypothesis testing. This is not the only way that statistics are used and the only way the problem manifests itself, but it is kind of the lion's share of the issue. And so basically the intuition behind hypothesis testing is something like this. Just to put it in kind of layman's terms to start,

Starting point is 00:35:31 which is if I were to toss a coin 10 times right now and it were to land heads all 10 times, and then I were to come to you and I would say, I just tossed this coin 10 times. It was completely fair, legitimate, and it came up heads 10 times in a row. What, you know, what are you going to think? I guess, you know, what is your natural reaction? Well, one natural reaction is just to say, well, you got lucky, you know, it wasn't supposed to happen, you know, whatever. But another reaction you might have was, you know, there must have been something wrong with the way

Starting point is 00:36:01 that you were running this experiment. Are you sure that you were tossing the coin correctly are you sure the coin was fair you know are you sure that the that the whatever conditions you assumed you were you're running this experiment under were the conditions that it was actually run under and so that's pretty much how hypothesis testing works so um you would have some kind of scientific hypothesis which might be something like you just mentioned it, that drinking two cups of coffee a day or drinking coffee is good for you, or, you know, drinking coffee reduces your risk of getting some kind of cancer. Okay. And what you would do is, I guess you would start with the hypothesis that you want to disprove. So you might say something,

Starting point is 00:36:42 I guess, if you believe that cancer is helpful, is good for you, then you would start with hypothesis something along the lines that there is no effect, that there is no relationship between drinking coffee and this particular form of cancer. And so you want to refute that hypothesis. And so in order to do that, you would collect some data. And so you would run some kind of study on people under certain dosages of coffee, perhaps. And based on that data, so you make certain measurements about their response to this. And I guess you could measure their risk factors for a certain type of cancer. And then you would calculate some statistics and measurements based on that data. And then you would compute a probability.

Starting point is 00:37:25 So you're assuming a model under this hypothesis. And so you would compute a probability that of, which is, this is what's called the p-value. You would calculate the probability that the data that you, of having observed the data or having observed the statistics that you observed, assuming that your hypothesis was correct. So assuming there was no relationship between coffee and cancer, what is the probability that I would have observed what I ended up observing? If the probability, if that p-value is small, then you would interpret that as saying

Starting point is 00:37:56 that there's evidence against your hypothesis, right? That would be like tossing the coin 10 heads in a row, right? So if the probability of having observed what you observed is very small under the assumptions that you're making, then you would interpret that to mean that maybe the assumptions that I'm making aren't such good assumptions. And so you would be inclined to reject your hypothesis under those conditions. The point is, I think something is true. I have a hypothesis. I collected data to test that hypothesis. If my hypothesis were true, then the data that I observed is very unlikely to have occurred. Therefore, the logic would be that it seems reasonable to maybe call into question some

Starting point is 00:38:42 of the assumptions that I've been making in my hypothesis. So, that's the basic structure of the argument. Now, what statistical significance is, is it sets a predetermined cutoff value, a predetermined threshold, which is usually called the alpha level, which is going to determine whether or not you reject this hypothesis. So usually conventional cutoff value in a lot of fields is 0.05, 5% probability. But in a lot of fields like physics, experimental physics, and genetics, these probabilities are much lower, which a lower probability means a much more stringent threshold to reject the hypothesis. So 5% is actually pretty high. But we'll talk about that in a bit because

Starting point is 00:39:26 there's some discussion about whether that 5% value is actually, you know, is too high and should be changed as a default cutoff value. But anyway, you set this cutoff ahead of time, and if your p-value is less than the alpha level, is less than the 0.05, then you would declare your result statistically significant. And so, where this idea, this notion of false positive comes in is that, well, what I've done is that I've calculated a probability, I've assessed that probability as being small, and I've drawn a conclusion based on that probability being small, but the probability wasn't zero, right? So, even though what I observed was very unlikely under the assumed hypothesis, it was still possible, right? So, all that means is

Starting point is 00:40:14 that it's possible that I would be rejecting this hypothesis erroneously. That's due to randomness. Due to randomness, right? And so, this is what's called a type one error or a false positive. And so, what the replication crisis is, is one way of putting it is that the, so if you think about the scientific literature, the scientific literature assume everybody is applying the same, well, everybody's applying some kind of statistical technique or method in a lot of parts of the literature. People use statistics a lot. And so, they're oftentimes justifying their conclusions based on some statistical method and some probabilistic assessment. If each one of these has a small probability of being a false positive, but you have a very large sample of conclusions, then there's going to be some conclusions in the literature that are false positives. And so, according to the theory, the theory itself, if you correctly apply the statistical

Starting point is 00:41:16 theory, some percentage of the conclusions are going to be false. The question is, which ones? Okay. And now the way that you would find out which ones is then you try to replicate the study. So if you make a claim that coffee is beneficial for health, and you have some way of quantifying that, then what I could do, or some other scientists could do, is they could take your description of the experiment and rerun your experiment and try to see if they observe the same thing. If they observe the same thing, then that would be called a replication. And that would provide further evidence that your original experiment and your original conclusions were valid. If I fail to replicate, then that

Starting point is 00:42:01 would suggest that the conditions, you know, that you experienced a false positive, that your original conclusions were due to randomness. And so, what the replication crisis is, is that while we would expect in the literature there to be some failure to replicate, the replication crisis is that the percentage, you know, the percentage, the proportion of conclusions in certain fields that are replicable are very low. The replication rates are very low. A recent study in the psychology literature tried to replicate nearly 100 or about 100 very well-known results in psychology, and they got about a 37 percent replication rate which means that you know more than half of the conclusions did not replicate and so a hundred of the most significant studies in psychology i don't know so i don't want to say most significant i i don't know the

Starting point is 00:42:57 the the field very well but these were these were results taken from the well-respected journals and these were results that were cited a number of times and so yeah and and does this apply to other fields as well is it just psychology so uh psychology especially i think social psychology is one of the main is often brought up as one of the main you know culprits of this uh biomedical research is also uh is also has very high false positive rate and that that's maybe more concerning because you would think that these are these are studies that are having direct effects on medicine and i guess it's just like the coffee study that you mentioned these are the types of things that are having a hard time replicating yeah yeah other fields are better

Starting point is 00:43:40 um i think experimental physics uh is much better uh they have more precision and more control over the conditions that they're running things on there but uh psychology biomedical research you know these these things uh seem to be you know have very low rates of replication yeah and i mean when you have a psychologist of the the standing of kahn, like he's widely regarded as probably the most influential living psychologist, you know, seriously kind of cautioning his and chastening his colleagues saying we need to do something or otherwise the entire field of priming in this example is its credibility is at stake.

Starting point is 00:44:22 That indicates just how serious this crisis is. Well, it's definitely serious. And people are talking about it constantly, and there's a lot of efforts now. So I guess I should say, I just became kind of interested in this relatively recently. Because it is fundamentally, it's a problem of statistics. It's a problem of statistics it's a problem of statistics and

Starting point is 00:44:45 i'll explain in a second why i actually what got me thinking about this more because i've been aware of it and i've heard about it for a long time so when i was in even in graduate school was you know almost 10 years well 10 years ago between six and ten years ago or so people were already talking about this then um and so there have been a lot of proposals as to how we can fix it i mean i think that it's one thing to say that it's there it's another thing to fix it and so there's been movements towards well what we need to do is we need to make people we need to make sure that people report their you know people uh report their methods or their their their experimental methods more clearly that they they report their

Starting point is 00:45:26 design they do all this they use these more sophisticated methods but none of it seems to be working and in fact i think that there's reason to believe that it's only getting worse so they're kind of saying that it's been a mistake in the in the the replication of the the experiment that researchers haven't been following the exact way it was run in the original study. And that's why it's fair. Is that sort of what that argument goes to? So that is, so I guess that, no, that's not what they're saying, but actually that right there is something that if you were to run my, if I were to run an experiment and then you were to try to replicate it and you fail, a common response from me would be exactly that.

Starting point is 00:46:03 You didn't do it right. Which would be that you didn't do it right yeah and there are situations in which it seems like the people who ran the original experiment are able to get replications even though the nobody else can and it might be assuming no kind of foul play which there's plenty of that but it might just be that they have the kind of the technique i mean a lot of this is physical stuff right it's people working with things in a lab and it's very sensitive and so there might be things that that they have the kind of the technique. I mean, a lot of this is physical stuff, right? It's people working with things in a lab and it's very sensitive. And so there might be things that they know or that they're able to do that other people aren't. But no, those aren't what people are pointing to. What people are pointing out is simply, I guess, for example, so the thing that

Starting point is 00:46:39 really got me into this and got me thinking about this was that there was this recent proposal to i so i mentioned that the default significance cutoff is 0.05 in a lot of fields and there was this recent proposal so and does that mean there's there's a five percent chance that you'll get a false positive oh okay yeah so what that means um what that significance level means is that that's the that's the type one error rate yeah the type one error is the probability of a false positive yeah on any given trial and if and this cutoff value this 0.05 yeah is controlling that type one error rate so at five percent it's it's saying that you if your hypothesis is true then you have only a five 5% chance of wrongly rejecting it. So some people said we need to change that number.

Starting point is 00:47:30 Right. So a recent proposal, this is by a list of 72 authors, these are very well-known people in a wide number of fields, have suggested that we should lower this cutoff. We should lower the type one error probability by making the default cutoff 0.005. So half of a percent instead of 5%. Got it. And this got a lot of attention. And most of the attention I've seen has been negative attention. Now, why is it getting negative attention? I mean, on the one hand, you think about it and you say, well, if you're telling me that there's too many false why is it getting negative attention i mean on the one hand it's kind of an on the one hand you look at you think about it and you say well if you're telling me that there's too many false

Starting point is 00:48:09 positives in the literature and if this number 0.05 is the false positive probability then if we lower that probability then that means that we're going to get low fewer false positives in the literature so that's going to help with the replication crisis. And that's, in essence, I mean, that's caricature in a bit. They do go into other arguments in favor of this proposal, but that's essentially the kind of argument that they put forward, which is just, well, first of all, just because there's something called the false positive rate and the type one error probability. So, I should distinguish between the probability on any given trial of having a false positive and then the overall rate of false positives in the population of all studies, overall scientific experiments.

Starting point is 00:48:58 So, before we move on, Harry, could you please clarify the difference between the false positive rate and the significance level. Yeah, okay. This is important. Yeah, these are related. The significance level, this was this 0.05, I guess, in a lot of fields, is that this is, you can think of this as controlling the probability of a type one error or of a false positive of getting a false positive on any given study, on any given experiment. The false positive rate is the rate at which these false positives show up in the literature over a large body, large collection of such studies. Now, why would these two things be different? It has to do with the fact that the rate at which they show up in the literature depends on what proportion of studies are in the

Starting point is 00:49:55 camp of having a true null hypothesis versus a false null hypothesis to begin with. And so, what we're finding is that, you know, under certain empirical estimates of what these base rate probabilities are for true and false null hypotheses, that at the 5% significance level and at a standard level of what's called statistical power, so the specifics of this is not something that you should worry about the technical details, but understand their kind of assumptions on what those typically are, what we shoot for, that the false positive rate in the literature would be something like 36%. So, you know, something like a third of the findings that are reported to be significant at the 5% level would be false positives. Okay, got it, got it. Yeah. that are reported to be significant at the 5% level would be false positives. Okay, got it.

Starting point is 00:50:45 Got it. Yeah. So one of the terms that you hear a lot when people are talking about this replication crisis is p-hacking. What's that all about? Yeah. So p-hacking is, so we talked about the p-value. The p-value is this probability that you would have observed your measurements under the assumption that your initial hypothesis was true, right?

Starting point is 00:51:06 That's your p-value. The way that you use that p-value is that if the p-value is small, you would take that as evidence against your hypothesis, and you would reject the hypothesis if it was lower than the significance level, right? So now what p-hacking is, is that in a lot of fields, and this is pretty much, this isn't the way that it's supposed to be used, and this isn't the way that it's supposed to be used, and this isn't the way that it was intended to be used originally when this idea of statistical significance was first introduced. But the way that this significance level gets used is that

Starting point is 00:51:35 if you have a statistically significant p-value, if you have a p-value of 0.049, then you're in a much stronger position to get your result published than if you have a p-value of 0.051, right? So there's this very clear dividing line of what's in and what's out. Now, just because you have a significant p-value does not mean you get published. Just because you have an insignificant p-value does not mean you don't. But the chances really are different, even on either side of the line. And so what p-hacking is, is a natural reaction to this phenomenon, which is that if I have a p-value of 0.051, then what I'm going to do is I'm not going to report a p-value of 0.051. I'm going to change my data a little bit or change my model or change my method somehow. And I'm going to keep doing it until I get a p-value that's less than 0.05. And then that's the p-value that I'm going to report to you.

Starting point is 00:52:35 Because I know that you, being the editor of a journal who decides whether to publish my results or not, I know that that decision depends very heavily on whether or not my p-value is significant at the 5% level or not, or significant at whatever level the prevailing significance is for that field. And so p-hacking is just, I mean, the term is very descriptive in that respect, in that I'm literally, I'm gearing my analysis towards getting a significant result. I'm not doing the analysis and then seeing whether it's significant and using that result as a guideline as to whether or not I should conclude one way or the other. I'm already making the conclusion that I want it to be significant. And then I'm just reverse engineering the analysis to make it happen wow so an

Starting point is 00:53:27 experimenter could rerun an experiment or keep running it in order to get a statistically significant result and you know just just bury the evidence of of the stuff that yeah so i should say i should make it clear that this is i guess the way that i just described it it sounds like it sounds like fraud right and and it is fraud at the level that i just described it as but there's a lot of subtleties in this and that a lot of times you can be p hacking without knowing it and that's that gets to kind of the difficulty the inherent difficulty of running these statistical analyses or how to interpret the outcomes. So, something that I guess wouldn't necessarily fall under the terminology p-hacking specifically, but I think that I'm using the term p-hacking kind of more generically to mean kind of

Starting point is 00:54:18 any number of behaviors that goes against standard statistical practice, whether it's intended or not. Because really, at the end of the day, whether it's intended or not. Because really, at the end of the day, whether it's intentional or not, to me, doesn't really affect the outcome. The outcome is that the results are still unreliable as they're reported. Of course, you want people to be acting in good faith, but even if they are acting in good faith, if they're doing a bad job of doing it, then the results aren't going to be very good. So, an example of how, you know, you could end up doing some kind of p-hacking inadvertently is if we have a 5% significance level as our standard, then if I run any given study, if there's no effect,

Starting point is 00:54:58 right, if I run any given study, then I'm unlikely to find, I'm unlikely to reject that hypothesis. But what if I run 20 studies? What if I run 100 studies? Well, eventually I'm going to get lucky and I'm going to get a significant result, even if none of these hypotheses are, have any, you know, true kind of effect. And so, what happens is that over the course of, you know, in a scientist's career, scientists run experiments all the time, right? And so, what do you do when you run an experiment and it's not significant? Well, you might just throw it away. You ignore it. This is something that's called the file drawer effect, I guess, that people have started to call. Because what you do is I run an experiment, I test a hypothesis, the hypothesis turns out not to be significant. And so, I don't

Starting point is 00:55:46 report that. It goes into my file drawer, right? Over time, the things in this file drawer are very, is getting very long, you know, getting very large, right? A lot of things in there. They've never been reported. What gets reported is the one out of every so many studies that I run that actually turns out to be significant. I report that you and so i mean in fairness that's the stuff that's going to get published that's the stuff that's going to get published right so when you now go to consider my my result you would like to know how hard i had to work to get it even though there's all these other studies which might have nothing to do in principle with the conclusion i'm drawing but the statistical method that's being applied to them you know the the you know the effectiveness of that method does matter and of how many times i've used it it's kind of like if i'm

Starting point is 00:56:36 you know if i do something if you if you toss a coin long enough you're eventually going to see a sequence of 10 heads in a row, even though any given sequence is very unlikely to turn up 10 heads in a row. And so the same thing's happening here. If I run enough experiments, I'm eventually going to find something significant, even if there's no effect whatsoever. And so the people who are evaluating these methods, they can look at the methods and they can say, the methods are completely sound for this particular study. What they don't know is that you may have, this might, this is likely to be one of very many studies that you have run, and those other studies didn't turn out to be significant at all. And by the way, I mean, the question is, is this fraud or is this just kind of, you know, I don't think people really know exactly how to handle this at the moment.

Starting point is 00:57:27 Because if you're going to use statistics to evaluate the conclusions of your experiment, you can't, you know, how are you going to account for the thousands, you know, hundreds or thousands of studies you've run over the past 5, 10, 20 years, if you have a long career, right? How is that all supposed to factor into the evaluation of one single outcome of one single experiment? It's not clear at all. So, while it's clear what the effect is and why this is bad, it's not clear how to remedy the situation. So, this p-h is is what's driving the replication crisis well uh the replication crisis what's driving it is things like p hacking things like what i just what i just said which is effectively what's called multiple testing if i run multiple tests if i run enough of them some of them are going to turn out to be significant uh another

Starting point is 00:58:24 thing is what's called publication bias which is what we what you just mentioned which is if there's this bias towards publishing things that are significant so the only things that get reported are significant um and therefore that's what's encouraging p hacking but it's also just biasing the results towards uh you know it's just leading to bias in what those probabilities should act, what the probabilities that appear in the published literature, how they should be interpreted. And so, all of these things play into the replication crisis. Now, is this undermining the reputation of the field of psychology? Well, you know, to some people,

Starting point is 00:59:05 psychology doesn't have much of a reputation to begin with. But I definitely do think that among a lot of statisticians, yeah, I don't think psychology has a very high, a very good reputation to begin with in terms of how they run their experiments and how they rely so heavily on these statistical measures and how they interpret them and things like that yeah do you have any proposals for how we could get out of the mess well i guess i i have a bit of a uh somewhat of a radical proposal uh which i think is necessary i think in order to actually

Starting point is 00:59:42 fix things you have to have something a bit radical. I don't think anybody will actually take what I'm saying and implement it, but I think it is food for thought. Before I get to that, there was this proposal I mentioned. I think I mentioned a little earlier about 0.005, right? So, the proposal there is trying to say that, well, if we lower the type 1 error probability, then there's going to be fewer false positives. And if you do a theoretical calculation, if you do a mathematical calculation under the circumstances that I just mentioned earlier, so under the circumstances I mentioned earlier, the theoretical false positive rate is about 36%, about a third. If you calculate, make those same numbers under this 0.005 proposal, the theoretical false

Starting point is 01:00:26 positive rate will be about 6% or less than 10%, something like that. So the argument that these people are making is that just by lowering the significance level, we can improve the false positive rate and therefore improve the replication crisis. And so part of my, I guess, retort to this and part of my argument against this proposal isn't that the proposal itself is a bad proposal and that, you know, kind of in a vacuum, 0.005 might be better than 0.05. don't know right but what the argument but my problem is more with the argument that they gave in support of their proposal which i mean is very misleading and actually ignores a major part of the problem which is this idea of p hacking you know just because you lower the when you lower the cutoff value from 0.05 to 0.005, all you've done is change the target.

Starting point is 01:01:28 If you haven't done anything else to change people's behavior, then what they're going to do now is if they were p-hacking before, they're going to still be p-hacking. They're just going to be p-hacking even harder. Right. Yeah. And there's people arguing that a lot of the people who right now aren't doing p-hacking might start to do it because it's going to be harder for them to get results published because of this more stringent cutoff value. So there's actually arguments saying that this could make things worse. And I think that what would happen is, you know, if there is a benefit to this new proposal, it certainly isn't going to be as beneficial as the author's claim in this paper.

Starting point is 01:02:14 And so my main kind of gripe with the proposal wasn't what the proposal is. I think a lot of people don't, you know, a lot of people, including myself, don't think that it'll have much of an effect, but that's kind of more of an opinion than anything else. But my issue with it was that here you have 72 very well-known people, very well-respected people. Some of them have been involved in some capacity or another as leaders of this replication crisis trying to fix things for over a decade and the way that they present this first of all after a decade the you know the solution they come up with

Starting point is 01:02:50 is to essentially add another zero to this new proposal which i think is you know not not a very dress not not the radical change that i think is needed but i think even more importantly than that what you have is you have uh the argument they give in favor of it is a pretty weak statistical argument because it ignores a key part of the data, which is this idea of p-hacking. And so I wrote an article on this trying to explain how if you were to try to account for the fact that p-hacking exists, and if you want your theoretical argument to become an argument that is applied to reality then you have to take reality into account and p-hacking is a part of reality then if you were to take these into account then you know the effect that that they're projecting which is that replication rates would essentially double under their new proposal are are you know are just not going to happen. So that was my argument against that specific proposal. Now, how do you possibly correct the situation? Well, in order to get rid

Starting point is 01:03:53 of p-hacking, you have to somehow, in order to get rid of the replication crisis, you have to address this issue of p-hacking. And there's been proposals on the table to try to do that by something called, there's movements towards what's called pre-registration, which is that before I run my study, I'm going to report it to the journal first, have them kind of vet the methods that I say that I'm going to do. They accept it. And then they say, we will publish your work regardless of whether or not you get a significant p-value or not, as long as you apply these methods. I go, I apply the method. And so I don't really have an incentive to p-hack now because I've been told that I'm going to get published no matter what.

Starting point is 01:04:37 That's one proposal on the table. A lot of people, there is support for that. I guess for me, I just don't think that the answer is having more bureaucratic oversight to this to this whole problem i think that arguably the problem is in place because of a bureaucratic measure in the first place which is this 0.05 publication threshold if you if you just kind of change the criteria which is if you change it away from 0.05 to making sure that somebody vet approves of your methods before you run the experiment then am i going to now be hacking my methods just to say well i'm going to use these methods the only reason i'm going to use

Starting point is 01:05:17 these methods isn't because i think they're the best it's because i know that you are going to approve of them ahead of time you know it gets out the, it's still not what we want, right? So, what I've been thinking about is, well, why not tie this idea of replication to this rule of, this first rule of probability, which is when you publish a conclusion, when you make a claim, a scientific claim, in order for it to be scientific, it needs to be falsifiable. State the conditions under which it would be falsified. Lay them out in detail. Lay out exactly the experimental protocol, how I would go about falsifying your results if I was so inclined.

Starting point is 01:06:02 And tell me the probability that the result would be falsified. Well, I want to phrase it in terms of replication. So if something isn't falsified, then let's say it's replicated, right? Tell me all the circumstances under which it would be considered to be replicated, and tell me what the probability of replication is. So let's say I do all that. I say, this is my conclusion. Here is how you would go about replicating it. And the probability that this replicates is 80%. And what that means is, well, what we said earlier is that now somebody else can come along and say, well, I don't think that you're, I think that you're being a little overly ambitious with this 80% figure. So you're

Starting point is 01:06:44 offering me four to one odds to try to replicate your experiment. I'm going to try to do it, or I'm going to have some third party do it, ideally. And if it replicates, fine, you win. I'm going to put my money up. I'm going to put grant money up. I'm going to put something up to kind of confirm that I actually don't believe your probability. You're going to have to put up something to support your claim that this is going to be replicated. And we run the replication. And whatever happens, if it replicates, you win. If it doesn't, I win. At the very least, what does that do? Well, that suggests that you do not have the incentive to p-hack anymore. Because if you do p-hack, that means you're going

Starting point is 01:07:26 to be offering me better odds than I should be getting to try to replicate your study. Which means that eventually all that's going to do is that's going to make you go broke. Right? And so here there's a real kind of shit. You talked about skin in the game and how it's part, it's not just about incentives, but it's about sharing the downside well if i'm a scientist and i'm making a conclude a claim that something is true i should have to share the downside in that claim being wrong and this is you know this is maybe the most direct way i can think to actually do that um it's quite radical yeah um i guess i don't i don't think anyone's actually going to do it. I don't think that it's in kind of the DNA of scientists.

Starting point is 01:08:12 They're going to think of it in terms of gambling, which I think is the wrong way to think about it. I think you should think about it in terms of investing. That we're investing in good scientists because the good scientists are going to be the ones who win these bets. And when you win the bets, it's not like you go run away and take the money, put it in your pocket. You know, I'm thinking about this as kind of funds for research. So,

Starting point is 01:08:39 if I put my funds up against your funds and you win, that just means that I have less money to work with. You have more money to work with, but you were the better scientist, so you should have more money to work with. Yeah. Wow better scientist so you should have more money to work with yeah wow i like it i like it yeah well we'll see uh we'll see what happens watch this space let's let's talk about kahneman and toversky so these names might be familiar to uh listeners two very famous israeli psychologists who sort of pioneered the field of cognitive biases and heuristics and, you know, in combination with Richard Thaler founded the field of behavioral economics. Of course, sadly Amos passed away in the nineties, I believe. But for the work they did together,

Starting point is 01:09:21 Danny won the Nobel prize in economics and then published a very famous book, which kind of collected a lot of the work they did together and that he did as well called Thinking Fast and Slow. But one of the really famous experiments that he did was called the Linda problem. Do you want to just give us an outline of what that experiment was. Sure, yeah. The Linda problem, so this is a very interesting experiment. I'll actually read what it is directly from, this is from Kahneman's, I guess it's his most recent book, right? Thinking Fast and Slow. So here's the experiment. So this description was read to subjects of a psychology experiment. They were usually or oftentimes, I think, undergraduates at whatever schools Kahneman and Tversky were working at at the time.

Starting point is 01:10:14 It's a description of a person called Linda. And it goes like this. So, Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. And so then, based on that description, the subjects are asked to answer this question, which of the following alternatives is more probable? Linda is a bank teller, or Linda is a bank teller and is active in the feminist

Starting point is 01:10:46 movement. Okay. And so, what is, I guess, what is famous about this result is that 85, between 85 and 90% of the subjects in this study said that the more probable outcome was the second of the two, that Linda is more likely to be a bank teller and active in the feminist movement than she is to be just a bank teller. And so this is what's called the Linda problem or the conjunction fallacy by Kahneman-Tversky. And so why is this a fallacy is according to the rules of probability, well, if you think about what I've just described about Linda, which is that there are two options. She's either a bank teller or she's a feminist bank teller. And the second is a more specific version of the first. And so, it's a narrower description.

Starting point is 01:11:37 And so, necessarily, no matter how many people satisfy, no matter what the probability is that she's a bank teller, the probability that she's a bank teller and a feminist has to be a lot smaller by just the rules of probability a subset of bank tellers that's right yeah and so the fact that most people an overwhelming percentage of people uh in this study said that the second was more likely was really it's kind of you know it's kind of a gut-wrenching uh observation yeah but this is what's called the conjunction fallacy us included i guess yeah so i mean i guess what my what i find interesting about this is that um so the first time that that i uh that i heard it i immediately jumped to the second one that feminist bank teller is certainly a better description of linda as i guess and then i of course i understand the rules of probability,

Starting point is 01:12:27 at least that's my job. And then it was explained to me that, well, that's a fallacy, at least that's wrong because of this. But I still, something in my gut suggests to me that this is still the right answer. And I guess that's what kind of, I guess that's what's interesting about this is that Kahneman even brings this up. He says that, remarkably, the sinners seem to have no shame. When I asked my large undergraduate class, do you realize that you violated an elementary logical rule, someone in the back row shouted, so what? And a graduate student who made the same error explained herself by saying, I thought you

Starting point is 01:13:04 just asked for my opinion. And I guess, you know, so, yeah, no, I think it's a really interesting observation. I don't know what more to say than that. I'm not so sure. I don't even know how to kind of react to it. Well, so he says, he describes it as well as the representativeness heuristic. Right. So a feminist bank teller seems

Starting point is 01:13:27 more sort of more plausible right yeah so that's i think that the a lot a very uh i think a good explanation of maybe what's more likely happening is is that when people hear which of these is more probable the way they're interpreting that is which of these is a more representative circus situation right and which of this is a more yeah representative description of this person this person named linda and the idea is that she the the description of a bank teller is not at all descriptive of this person that you just described but the idea of being a feminist is and so the feminist bank teller is a, is kind of fits better with your, the picture that you have in your head about who this person might be. And so, this is, this is one of a number of psychological or cognitive fallacies that, that they highlighted in their research. And ultimately, I think that, you know this this got a lot of this

Starting point is 01:14:26 got this got a lot of attention but i guess what this got me thinking about this in addition so this observation and also um some of the things that i was observing with the um the election probabilities and how people are interpreting those is just the fact that people are violating the rules of probability is one thing but then when you point the when you point it out to them and they're indignant that they didn't do anything wrong and i i include myself in that in that category to me that suggests that um i guess the way that kahneman's versky seem to be interpreting this is that there's a real kind of cognitive defect in people that they're incapable of understanding or you know kind of

Starting point is 01:15:05 working in or thinking along the lines of probability they're not they're not living up to the paradigm of reason yeah but i guess the way that i the way that i interpreted this was well if people are in such large numbers violating this rule then obviously it must be that the probability that this model of probability for how people think isn't a good model for how people think because they're quite clearly violating that rule. I still can't think of a model that explains this or a better way of explaining this, but I do think that it kind of points to a question of which of the two things should take precedence? Should the theory that you're assuming, should the model

Starting point is 01:15:51 take precedence over the actual people who you're using to test the theory? Or should, if you're trying to model people's cognitive capabilities, should you use the way that people actually think to kind of model your theory after that. And I guess I would be more in the second camp in the sense that, you know, I guess what we were just talking about with hypothesis testing and p-values and all that, which was if you observe data and it seems to be in disagreement with your hypothesis or disagreement with the model that you assumed, the tendency is to reject the model, reject the hypothesis. In this case, it seems like the conclusion is quite the opposite, which is that you have a model,

Starting point is 01:16:36 you observe data on these people based on this Linda problem. The data is not in agreement with your model. And so you're taking that to mean that people have this cognitive defect, instead of taking that to be evidence that you need to find a better theory. So, that's just kind of an initial reaction to how this was originally presented and how this is presented by Kahneman, and I think still how he thinks about it yeah so i don't quite understand your your criticism of it yet because i mean there's a there's a distinction between descriptive and normative models and under a normative model you'd be saying what people ought to do if they were rational and under a descriptive model you'd just be pointing out what they do do in reality which which they demonstrated with the linda problem that people don't make the rationally correct decision. But isn't that all they're doing there? Yeah. So it seemed to me that, so what I understand of Kahneman and Daburski,

Starting point is 01:17:35 what motivated them initially to study these types of things was the fact that economists operate, I guess a lot of economic theory is based on this assumption of rationality. And they, as psychologists, thought that this assumption was more or less bogus because they know that real people are not rational, I guess. And so they ran these experiments initially to test this out, at least in part. Well, this was also part of their other experiments and what led to the development of prospect theory, which is eventually for what the Kahneman received the Nobel Prize, mostly, I think. But here they are running experiments which are showing that people are not behaving according to the rules of rationality, or at least according to the rules of probability. And so, that would seem to be a refutation of this assumption that people are behaving this way. And sure, that's fine, but I still think that it seems to me, based on reading the interpretation, at least, and hearing

Starting point is 01:18:39 Kahneman talk about it, my understanding is that he still seems to see rationality as an end goal in the sense that he talks about people as being bad intuitive statisticians and you know in other words saying that you know value judgment in that yeah yeah so there's something about how that people are bad intuitive statisticians and so we need to kind of train them to behave correctly, right? And so, I think that there is a sense in which there, you know, that he's saying that, yes, this is kind of, you know, as a normative theory, this, you know, this Bayesian model of, you know, rationality is not such a good one, perhaps. But maybe we should still try to make it so that people behave closer to the way that the theory prescribes, because otherwise you fall into all of these kind of, you end up

Starting point is 01:19:35 behaving in ways that could, I guess, cost you money in the long run or things like that, according to certain economic theories. But I guess there was something that I found interesting, which was a quote by Gigerenzer, or Gigerenzer and some co-authors. And we know that, I don't know if you've covered this before, but on this, in future episodes. So Gerd Gigerenzer was a psychologist who was pretty much the arch nemesis of Kahneman-Tversky as they were running these experiments. And he has some very kind of uh strong criticisms of of their work and of their conclusions and i and i i do think that it's it's it's good to get the other side of things for sure but here i'm just going to um explain or just kind of quote

Starting point is 01:20:19 out of a book called the empire of chance which is by gigenzer and some of his co-authors, which was the 20th century psychologist, and among those is Kahneman Tversky, and this is in a paragraph that's talking specifically about that work. The 20th century psychologists had come so to review the mathematical theory of probability and statistics that they instead insisted that the common reason be reformed to fit the mathematics. So, instead of the other way around, I guess, being the other side of the coin. So, I mean, just to kind of point out that there are people on the other side of this debate. I'm not a psychologist, and I certainly don't know the whole 30-year history that's happened in between the original experiments and now, but my understanding in talking to psychologists is that there is

Starting point is 01:21:05 there's definitely quite a there's been quite a bit of people uh study of this of this topic and there are opinions on you know all ends of the spectrum so i don't think that kahneman-tversky's opinion or you know work on this or that their their conclusions that they drew from it is necessarily the paradigmatic view anymore. Are there any other of the famous conclusions that you don't necessarily agree with? Well, I think that another example is the base rate fallacy, the only one that comes to mind. What was that again? So the base rate fallacy, and this can be described in a problem called the uh the blue cab problem and so the the the description of the problem is is something like this which well it's it's it goes

Starting point is 01:21:51 like this which is um a cab was involved in a hit and run accident at night two cab companies the green and the blue operate in the city and you're given the following data 85 percent of cabs in the city are green, 15% are blue. A witness identified the cab as being blue and the court tested the reliability of this witness under the circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time. What is the probability that the cab involved in the accident was blue rather than green? Okay, so what is the probability? So a lot of people responded to that saying the probability is 80%

Starting point is 01:22:34 because that's what the last thing said, which is that we tested the reliability. The witness said that it was blue and the court tested the reliability of the witness and showed that the witness is correct 80% of the time. What Kahneman and Tversky argue, and this is what they're calling the base rate fallacy, is that this 80% figure is ignoring the base rate, which is this percentage of cabs that are green and blue in the city, that 85% of the cabs in the city are green and 15% are blue. And if you take that

Starting point is 01:23:05 into account and you apply Bayes' rule, which is the way that you would update these probabilities in the Bayesian theory of probability, according to the Bayesian paradigm, the correct answer would be 41%. And so they took this to be, well they they concluded that people are not taking account of base rates which is true they're not um or at least they're not taken into account uh in this particular problem and they're not taken into account in the way that bay's rule would tell us to the question of course is whether or not they're wrong to do they're wrong not to do that and this is something that i think is much more, so the Linda problem isn't something that I have a very good kind of response to. I still think that's really an interesting and kind of experimental outcome that, you know, still is

Starting point is 01:23:54 kind of makes your head explode in some ways when you think about it. But this one, I think, is much less compelling just because the only reason you would say that 41% is the correct answer is if you buy into the Bayesian theory of probability. If I don't buy into it, then I don't necessarily think one way or the other about that calculation. And so to suggest that it's a fallacy not to come up with that number, that I'm wrong if I say 80% instead of 41%, is suggesting that I should be Bayesian. And so this is kind of, so I just don't think that the conclusion, I don't think the conclusion really works there. I think this is a pretty, I think this is maybe a clearer situation in which I would say that, sure, 41% could be the answer if you adopt this specific

Starting point is 01:24:47 way of interpreting the question. But even in interpreting the question that way, there's a lot of hidden assumptions that you're making in doing those calculations. One of which is, so you're telling me that 85% of the cows in the city are green and 15% are blue. If I'm going to use those as probabilities in this calculation, then I'm implicitly making the assumption that the probability that every cab in the city is equally likely to have been involved in this particular incident as every other cab in the city. And when I make that assumption, that means that the probability of the cab being green was 85%. But that's an assumption. Got it.

Starting point is 01:25:27 So maybe like green paint makes cabs slower. Or it could just be that the people who drive for this company, this blue cab company, are bad drivers. There's a lot of reasons for it that could be there that we don't know about. But surely the way they constructed the experiment was to assume all things being equal, etc. I think that that's how they would like, yeah, they would have liked to present it in that way. But the question is, though, if I'm a subject in this experiment, if I haven't been exposed to this concept of all things being equal or you know this even if i've never taken a statistics course and even if i have i guess i'm not inclined to necessarily make those assumptions on an intuitive basis of course that could be one of the conclusions that they're making which is

Starting point is 01:26:16 that people don't intuitively think in line with the statistical uh theory but i still don't think that that's an argument that they should in this particular case so in the book kahneman repeatedly says that people are bad intuitive statisticians but what you're saying is that well he's actually holding people to a very particular standard of statistics that might necessarily be the the correct standard to apply so i guess I would put it this way. I think that some way of interpreting it, I think Kahneman is right to say that people are bad intuitive statisticians. But I think that there's still a question of whether or not that's a bad thing or not. So I guess if you were to ask, you know, are people bad at statistics? I mean, I think that

Starting point is 01:27:03 there's kind of a consensus that people are intuitively bad at statistics? I mean, I think that there's kind of a consensus that people are intuitively bad at statistics. A lot of people dislike statistics when they took it in college, myself included. But, you know, so yeah, humans are in some way bad at thinking intuitively about statistics, but there's a lot of things that humans are good at that statistics is really bad at and so i i don't think it's necessarily a good thing to try to engineer people to think more statistically i think that there's kind of benefits to both and so you know what i mean by that is that what what is statistics good at statistics is really good at kind of working out the average case. So what statistics does is it takes a lot of data, you know, it takes a sample of data, and it kind of filters out the noise, and it gets rid of the noise, and it doesn't,

Starting point is 01:27:53 so it doesn't see those fine-grained details, but it gives you kind of the average case. And so as long as you're in the average case, you want to be working with statistics, right? Human beings are not good statisticians, I think, at least in part. You know, this is kind of just an opinion. Of course, this is not a scientific, I haven't tested this, but in part, it's because human beings see details at a level that the statistical methods can't, and oftentimes get distracted by those details. But sometimes those details are actually relevant, right? I mean, there's, how many times do you, you know, you kind of, you come across a situation that you've been in hundreds, you know, hundreds of times before, but something just doesn't feel right. You know,

Starting point is 01:28:32 you feel like there's just something off. You can't say what it is. And then it turns out that, you know, something actually, you know, happens that was kind of crazy, you know, that you, that you weren't expecting to happen. And was just because you you sensed something that certainly you were picking up but wouldn't necessarily show up in any kind of physical measurement of the situation and so i think that in terms of trying to train people to think more statistically i would say that you do need to it is important to train uh to be trained to it is something you kind of need to, it is important to train, to be trained to, it is something you kind of need to force yourself to do, at least in my case. I mean, I'm inclined to follow my own gut instincts over statistics. But a lot of times, you know, it is better to go with the statistics, but you have to really kind of train yourself to trust the statistics, but not too much, right? You definitely don't want to train, you don't want to be overly trustworthy

Starting point is 01:29:25 of the statistics because I think that's what leads us into situations like the replication crisis or like these election predictions, which we're putting too much into these numbers without thinking about what their limitations are and without kind of realizing that they are limited in what they can tell us. And I think the only way to really be able to tell the difference is to stay human, right? I mean, as humans, we're able to kind of determine when there's something that's maybe slightly off about a given situation and what isn't. And so we have to know when to apply the theory and when not to apply the theory. So there was a quote. I just wanted to, somebody said this to me once, I think it's kind of relevant here, which was that if all of your you know to handle uncertainty or the only way that you know how to operate under some kind of uncertain situation is to run a statistical analysis and do whatever the statistics tell you to do, then you're kind of in deep trouble because you're more or less a robot at that point, right? And I think that you've probably, or if you haven't, you probably should talk to somebody who is interested in this, who is studying AI and how that affects our lives in kind of all different ways, right? But one of the scary parts about this is that we're becoming too dependent on these machines or too

Starting point is 01:31:01 dependent on these statistics. And if we do that, then we're very vulnerable to kind of situations that the actual machinery can't anticipate. The black swans. Exactly. Yeah. Yeah. This has been so interesting.

Starting point is 01:31:17 Do you want to, I'm conscious of the time before we wrap, do you want to talk about some of the things you're working on at the moment? So ideas you're working on around the thinking of probability as a shape and a couple of the other projects, which we spoke about before we started recording. Yeah, sure. I mean, if we have a few minutes, I can just tell you about it. I mean, so the idea, I won't go into the details too much about the probabilities being shaped. So this is something that I've been kind of very interested in over the past year or so,

Starting point is 01:31:49 which is trying to think about, well, if there's these situations in which the traditional probability theory doesn't apply to, or isn't a very good way of explaining. It doesn't explain very well. Then, you know, shouldn't we, you know, are there other ways of possibly explaining this? Are there better theories? And so, what got me thinking about this was in thinking about the election predictions and things like that. But which was, well, you know, so much is riding on this number, right? This 75% probability, which gives this illusion of kind of some, this illusion of precision that's just really not there. And most of the time when I make assessments of probability, I'm not thinking in such precise terms. I'm making a rather general, ambiguous claim that something is probably true or whatever. And so, I've been

Starting point is 01:32:47 thinking recently about how to kind of formalize the idea of probability as something other than a number, as something other than a numerical measure. And so, this has gotten me into an interesting area of math, which is called homotopy type theory. And without going into any details on that, uh, essentially what these homotopy, what homotopy types are, are, are abstract shapes in some sense or structures, if you want to think of it that way. And so what I want to think about is probabilities, not as being numbers, but as being structures in some way of thinking about it. And why, why do I want to think about probabilities as structures? Because I guess one way that I would think about doing, making decisions, making probabilistic type judgments

Starting point is 01:33:31 or doing plausible reasoning is not by, you know, when I do it instinctively, I'm not doing it by calculating a number so much as kind of sizing up the situation that I'm in, taking into account the evidence or whatever it is that I observed that I think is relevant, and seeing how that evidence fits into some mental structure, mental picture that I have of kind of how everything fits together. And so, I very much see kind of probabilistic judgments as judgments about structure and of trying to fit things together in a way that makes the most amount most sense and so if we think about probabilities in that way then that would make a probabilistic judgment just one in which we are kind of instead of calculating numbers we're kind of calculating the way in which these different shapes fit together

Starting point is 01:34:22 and whatever however they fit together the best is kind of the probabilistic judgment that we make. So that's a bit of an abstract description, but that is kind of the way that I think about it or the way that I'm trying to think about it. And does that have the potential to resolve some of these issues that we've been talking about? How does it connect to some problems? Yeah, so that's something I've thought about and I'm not exactly sure. issues that we've been talking about? How does it connect to some problems?

Starting point is 01:34:45 Yeah, so that's something I've thought about, and I'm not exactly sure. I mean, I think that a lot of the situations that I can imagine this being a more relevant kind of model for probability are in situations where we don't typically think of assigning numerical values to the probability. So, I think I mentioned the example earlier where a juror who makes an assessment in a courtroom, you know, what a juror is doing is they're taking all of the evidence into account and they're assessing it on a holistic level and they're kind of making a judgment, right? Or a mathematician, when a mathematician makes a conjecture that a certain statement is true, they're basing that on kind of their own intuition of how mathematical objects relate to one another and why, based on the evidence that they have, should this new theorem be true, even though they haven't proven it yet um as far as whether it relates to some of these kind of mince burski things uh you know i

Starting point is 01:35:46 wouldn't i i guess i shouldn't speculate yet i think it's it's possible but i don't have the answer to that just yeah yeah so you've got you've got a new book that's just come out yeah i think so it just came out um about a week or so ago and uh so this is this you know ties in in some way well in addition to all the other things that we talked about, I mean, this was something that I've been working on for the past couple of years. So the name of the book is The Probabilistic Foundations of Statistical Network Analysis. So it's a bit of a technical, it's a bit on the more technical side, although I've tried to mitigate the technical part as much as possible, because there's a lot

Starting point is 01:36:23 of people these days in a lot of different fields who are working on you know network analysis is there's social networks people and then there's people in you know graph theory and math so it kind of goes all across the spectrum but the the main idea behind the book really is just so if you think about classical statistics and the theory of statistics that's been developed for the past 50 to 100 years has been mostly geared towards understanding basically sets of measurements, unstructured. I guess I might call these more or less unstructured data. Even though there is structure in the data, it's very minimal. And so we're basically treating these measurements as individual numerical values or collections of measurements taken for individual entities. And then we fit a model to that data.

Starting point is 01:37:08 And so we might impose some structure on that data by putting some kind of dependence into the model and all this. But for the most part, the structure itself is not built into the data. The data is just data points, data sets. But in a lot of more modern data sets and in these network data sets, we have much more complex data where the structure is actually built into the data. So when you have a social network, if I'm analyzing a social network there, I'm actually analyzing the structure. If I were to analyze Twitter, I actually have to take into account the structure and all of the interdependencies and all of the interactions that people have, right? And so the way that this book is structured is the way that I set it up is that I want to talk more about data that is of this form, data that comes from complex systems, or even data that is of the form of a complex system itself.

Starting point is 01:38:01 And so for which classical statistics is not suited to handle these types of data. And so I, for which classical statistics is not suited to handle these types of data. And so I talk a bit about the limitations of the current approaches to network analysis. And I try to suggest a path forward for how we can kind of get out of a lot, you know, kind of, in order to improve upon the current limitations, we need to, I think, build up a new theory or a separate theory of statistics for these complex entities. For me, the way I see this, I think of this as, over the next 20 to 30 years, I see this as the future of data science. In order to analyze data, it is going to be necessary to be able to handle structure in a kind of systematic and sophisticated way. And so I see it as a necessity several years down the line. Awesome. We're going to link to a few of these different papers in your book as well on the

Starting point is 01:38:58 show notes on our website. But is there another that that you wanted to to give a mention yeah one one quick thing i'll just mention is i something that's kind of a non-scientific a meta-scientific thing that i've gotten involved in which is uh which is a an initiative a non-profit initiative which with a friend of mine ryan martin who's a uh statistician at north carolina state and so this is a this is a called researchers.one um and And you can go there if you want more information. Right now we have some information up, and there's going to be a lot more that's coming out in the coming weeks and months. But essentially what we've been working on behind the scenes and what I'm hoping to become operational in the next, you know, very soon, I guess, is a platform, an alternative platform to academic publishing and scholarly publishing, which is geared towards putting a lot of the control back into the hands of the authors, of the people who are actually the stakeholders in

Starting point is 01:39:58 this game, instead of putting the control into the hands of editors and anonymous referees and, you know, otherwise other types of bureaucrats who, in a lot of ways, are responsible for things like this replication crisis. So I guess this is kind of a reaction. This is a response to a lot of conversations that I see people having on Twitter and people who come up to me at conferences constantly complaining about, you know about how we need to improve the way that articles get peer reviewed and improve the publication process. This is something people have been talking about for a very long time and do it in a way that emphasizes the quality of the work instead of falling prey to all the politics and the bureaucracy and the corruption that's actually happening in these journals. And so even with all of that complaining, there hasn't really been anything that's happened or nothing concrete,

Starting point is 01:40:54 no concrete alternative to the current model is really out there. So this is our effort to try to provide such an alternative. And so I would just say to, if anyone interested, go to the website. So it's researchers.one, it's dot O-N-E. And anyone interested can also contact us for more information. Absolutely. Well, I think we were speaking before we started that we wanted to set ourselves the challenge

Starting point is 01:41:19 of making statistics and probability into a very interesting conversation. And I think we've definitely achieved that. This great been so enjoyable yeah so enjoyable i guess i mean one of the main messages i suppose for the audience is just to bring a healthy skepticism to statistics yeah i think that's it i think that's a good uh message for anything is that uh yeah you know taking taking nothing for granted and question everything, I guess. And statistics is certainly something that deserves to be questioned.

Starting point is 01:41:51 Yeah. Harry Crane. Thank you very much. Thanks a lot. There you go. Harry Crane, ladies and gentlemen, did you enjoy that?

Starting point is 01:42:00 I enjoyed that. I enjoyed that a lot. I am a bit of a nerd, but I think Harry did an excellent job. He's definitely not your average academic. He is a gentleman and a scholar. And for everything relating to the projects of his that we discussed at the end just there, you can head to our website, which is www.thejollyswagman.com and we'll have everything up on the episode page.

Starting point is 01:42:31 So thanks again. If you enjoyed, then please tune in again next week where I speak with Stanford physicist Leonard Susskind. Until then, ciao.

The Joe Walker Podcast - The Shape Of Probability - Harry Crane

Harry Crane is a scholar who specialises in statistics and probability. He is currently a professor of statistics...See omnystudio.com/listener for privacy information....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

The Joe Walker Podcast - The Shape Of Probability - Harry Crane

Harry Crane is a scholar who specialises in statistics and probability. He is currently a professor of statistics...See omnystudio.com/listener for privacy information....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.