The Joe Walker Podcast - The Shape Of Probability - Harry Crane
Episode Date: May 14, 2018Harry Crane is a scholar who specialises in statistics and probability. He is currently a professor of statistics...See omnystudio.com/listener for privacy information....
Transcript
Discussion (0)
This episode of the Jolly Swagman podcast is proudly brought to you by Globite.
Head to the website, globite.com, use our discount code SWAGMAN,
and you'll get 15% off all items.
Because with Globite, you go, we go.
From Swagman Media, this is the Jolly Swagman podcast.
Here are your hosts, Angus and Joe.
Hey there. How's it going, ladies and gentlemen? Welcome back to the show. I'm Joe Walker,
and I'm coming to you with another episode with an American guest. Oh, how I love America.
This is probably one of my favorite episodes ever, and I think that means something.
I've done about 50 of these now, over 50, with my partner in crime, Angus. My guest this week is Harry Crane. Harry's a professor of statistics at Rutgers University in New
Jersey, and he did his PhD in statistics at Chicago University. I first came across Harry on Nassim Taleb's Twitter feed,
his infamous Twitter feed, and it was one of those occasional positive tweets. He said that one of
the three smartest things he did in 2017 was attend a Harry Crane Foundations of Probability
seminar. And Harry hosts the Foundations of Probability seminar
at Rutgers and he's involved guests like Nassim Taleb and Daniel Kahneman. Harry was an incredibly
gracious guest. He was well prepared. The table that we were speaking at at his office at Rutgers
was strewn with books and notes that we referred to throughout the episode. And this is an episode about probability and statistics.
We talk about things like the first rule of probability, which you'll hear about more.
We talk about the definition of probability.
We talk about why the predictions expert Nate Silver can never be wrong, or perhaps more
aptly why he is wrong.
We talk about the replication crisis in science,
and we talk about some of Harry's reactions to famous experiments done by Kahneman and Tversky,
the two Israeli psychologists.
Statistics and probability probably aren't the first topics that you'd pick for a podcast.
Needless to say, statistics is
an incredibly important skill for the future, and it's only going to become more important
for the decisions that our species makes in the future. Harry and I set ourselves the challenge
of making these topics as enjoyable as possible for non-statisticians. And I think we succeeded in that.
Of course, you can judge for yourself.
So, without much further ado,
please enjoy this conversation with Harry Crane.
Harry Crane, thank you for joining me.
All right, thanks.
Great to speak with you.
As I mentioned when we were emailing,
I first came across your name through Nassim Taleb.
I think he tweeted out last year that one of the three smartest things he did in 2017
was to attend a Harry Crane Foundations of Probability seminar.
Yeah, we had a great time.
Yeah.
And you've hosted Nassim, you've hosted Daniel Kahneman.
Yeah, we've had a great lineup.
I think, by the way, one of the other things that Nassim said was his best thing in 2017
was getting a pretzel at Annie Ann's, I think.
So that's about the level that I'm at at this point.
Yeah, in no particular order.
Yeah, of course.
Tell us what you do.
So right now I'm a professor of statistics at Rutgers.
So I've been here for the past five, six years or so.
And I guess mostly interested in problems having to do with probability.
So I was more on the probability side of things than of statistics.
But as I'm sure we'll get to, they're not so, well, in my mind, they're very closely related.
And more recently, and this is referring to the thing that you just mentioned, over the
past couple of years, I've been involved with the philosophy department at Rutgers running
a probability seminar where we invite speakers from, I guess, all over the place to come
and talk about how probability comes up in their specific discipline.
So I guess as a statistician, I'm mostly familiar with how probability comes up in their specific discipline. So I guess as a statistician, I'm mostly familiar with how
probability comes up in statistics, but I've learned that it, of course, comes up in almost,
well, in many other fields, whether it's psychology, finance, economics, and actually
also a lot in philosophy. So that's been very interesting. Yeah. I think we're going to talk about the
distinction between probability and statistics, as you say it, but maybe before we get there,
how did you find your way into studying statistics?
Yeah. So, my interest in statistics is, it's a bit of a roundabout story, but really the way it came about was, well, I've always been involved ever since I was a little kid in probability, I guess, in some way.
I guess it would be in applied probability in a very real sense, as I'm about to describe.
So, and my kind of going in the direction of statistics has in some way
been a little bit of an accident. So, before I tell the story, you know, you got to understand
that I grew up in the Taconi neighborhood of Northeast Philadelphia, which was a very,
which was an Italian neighborhood. And pretty much, I guess at that time, at least as far as
I was concerned, every adult that I was aware of was either, either was a bookie or had a bookie, right?
So for me, the, you know, this was kind of just a way of life, right?
So what was the ratio of bookies to people in this county?
Yeah, well, no comment on that, right?
But everybody was involved in one way or the other, at least so it seemed. And so when I was a kid, I remember my dad used to bowl in a bowling league on Monday nights. And every once in a while, I would go
with him to play with the other kids. But there was this guy at the bowling alley every Monday.
He was probably there every other day too. His name was Reds Gabriel. I remember his name. It's
such a great name. But all Reds would do is he would walk around the bowling alley with a
newspaper in one hand and a notebook in the other hand. He would walk up to people,
look in the newspaper, write something down in his notepad and move on and talk to the next person.
Of course, what Reds was doing was he was taking bets on the lottery and on football games and on
horse races and all that stuff. And I was fascinated by this as a kid, because here you
have this guy who probably had a sixth grade education, right? I mean, he probably, he doesn't,
most likely doesn't have a very, you know, deep understanding of any kind of mathematical theory,
right? But here's a guy who's making his living off of things that are in principle unpredictable.
And he doesn't even care. You can go up to him and bet on whichever side of whichever game you want, But here's a guy who's making his living off of things that are, in principle, unpredictable.
And he doesn't even care.
You can go up to him and bet on whichever side of whichever game you want, and he'll take the bet.
And at the end, he's walking away with money.
And so I remember as a kid being very kind of taken by that, which was that there must
be something going on here that everybody's talking about this as gambling.
But from Red's perspective, it certainly doesn't look like gambling, right?
And so, then I thought, so then around the time of, I guess this is a story where I guess Red's has really had a big influence on my life, little do I know.
But when I was in sixth grade, Mr. Bulligatz's sixth grade class, Our Lady of Consolation, me and a couple of my friends put together a syndicate,
if you want to call it that. And I guess we were doing what Reds was doing. We were
taking bets from other kids in the class on horse racing, on football games, on whatever it may be.
It didn't last very long, but the end of the story, I guess, the way it ends is the way that a lot of bookies end, is that we were wiretapped in some sense.
I guess one of the, my partner, Nick, was talking on the phone.
So at that time, there were no cell phones, or at least kids in sixth grade didn't have cell phones.
The world's changed since then.
But he was talking to another kid in the class and taking a bet from him from his home
phone. And back at that time, the best way for parents to eavesdrop on their kids was to pick up
another phone in the house and listen into the conversation. And so, while this other kid was
placing a bet, probably a 10 cent exacta box at Aqueduct Racetrack or something like that,
his mom overheard
this and she reported to the principal and you know that was pretty much the end of that but
anyway that's how i got interested that's how i got into this stuff and uh you know that more or
less you know i've never really been somebody who was much about betting on sports or betting on
anything like that but i was always kind of interested in that aspect of it and so i continued
throughout you throughout college playing
poker games and things like that. And I played online poker a lot. And then in college, I majored
in actuarial science. I majored in math, but one of my minors was actuarial science. And really,
what I was interested in most was actuarial science. Because if you think about it,
what actuarial science is, is the science of bookmaking. I mean, this is the science of what, you know, what is an insurance company doing except trying to price these bets effectively.
You know, when you buy insurance, you're making a bet about whether or not you're going to get
into a car accident or not. And, you know, the actuary has to determine what the right premium
is to charge so that when those unfortunate events happen, that the insurance company can pay the claim and still walk away with a little bit of profit,
right? And so, in some way, my study in college was even kind of influenced by this early childhood
obsession in some way with probability or with gambling, yeah.
What sort of money were you making in the poker? Was it pretty good money for a college kid?
It was definitely good money for a college kid.
I mean, at that time, I was...
So this was just to put a time frame around
that anyone out there who knows about how things have evolved.
When I was in college, I graduated college in 2006.
Chris Moneymaker won the world series of poker in around
2003 or so and that was when there was a big poker boom online poker um you know shot through the
roof and um to 03 04 um sites like poker stars full tilt and all that i was playing you know
several years before them but at this time now a lot of people were coming in, a lot of people were interested in it. And so by that time, at that time, I was playing, I guess, what was at
that time the highest stakes on the internet. Now it's kind of peanuts compared to what people play
for now. But at that time, you know, the most you could play for was a few thousand dollars,
five, ten thousand dollars in a given game. Now you can play for millions of dollars. You know,
I've never, I've certainly never played that high.
Ever been tempted to try your hand again? Now the stakes are so high.
Well, I still do try to stay connected to it. I still do try to play. And I think that we'll
probably, we'll talk more about that in the discussion because I think that even if it's
just recreationally for me now, it's important for me to kind of stay connected to these real uses of probability. I mean,
I like to experience where probability comes into play in everyday life. I mean,
we just had lunch, right? We just had lunch and I ended up paying the whole bill, right?
I feel terrible still.
No, you shouldn't. But I, so maybe I should, so at lunch, so at lunch we, so this is a good pastime that me and a couple of my friends like to do when we're out to dinner or lunch together, which is what we call flipping for the check.
And so when the bill comes, you, it's important that you have a nice iPhone, I guess,
like Joe does here. And you start a stopwatch, start the timer on the phone and let it run.
And as it's running, I guess you have to let it run a little while just to make sure there's no
kind of foul play. I guess that's what it's for. But you calculate everybody's fair share of the check,
and whatever percentage of the check is your fair share,
you are given that proportion of the numbers
between double zero and 99.
And so then what happens is,
after the timer's run for a long enough time,
you stop the timer,
and whichever number is showing on the milliseconds, whoever's number, whoever was assigned that number pays the
whole bill. Okay. And so we just played this at lunch. I assume it was the first time you've ever
done this. Yes. But it won't be the last. That's for sure. Well, hopefully not. But you got out,
well, you had a good run this time. So, you know, but I think it was fun, right? It was fun when you win.
I'm still reveling in the beginner's luck.
But even that, which in principle is a neutral expected value play, it's really just
introducing volatility or variance into your everyday bankroll, which in the long run
shouldn't make much difference. But it's a way of, you know, staying kind of close to where, you know, these probability models are affecting or, you know to to a lot of the in a lot of the situations where
probability or statistics comes up as in science for example the probabilities aren't tethered to
anything real uh and if you treat them just like numbers on a page then they're they're you know
that's pretty much all they are um it's not until you put something real behind it until it has real consequences
that you can really, I think, give meaning to these probability statements.
We'll come back to that idea because it's something that's very profound,
which I know we both want to discuss.
But I thought now we could, let's first first talk about let's give a definition of probability
and then secondly a definition of statistics because for a lot of people they're sort of
either synonymous or or at the very least very overlapping concepts but if we begin with
probability because you you just described how you first became interested in the concept through
gambling and betting and if i'm not mistaken, the history of probability and the mathematics therein
was intertwined with gambling from the earliest days.
Yeah. So, I mean, interestingly enough, and I was just rereading something last
time on the history of probability. And two of the earliest applications of probability and what drove people's interest in probability,
one was certainly gambling and the other was insurance.
So I realized that that was also kind of the path that I took.
But as far as what probabilities are or what probability is, and this is something that I'm grappling with.
I mean, that's part of my uh ongoing interest with
this seminar that i run but within statistics there are pretty much there's historically been
a divide you know a of two two schools of thought on probability one is called the frequentist
view of probability and the other is i guess usually called the bayesian view or the the
subjectivist uh view of probability so what view or the subjectivist view of probability.
So what a frequentist, the frequentist interpretation of probability is kind of just what it sounds
like, where you interpret these probabilities as frequencies.
And so, you know, if I'm going to toss a coin, I imagine tossing a coin a very large number
of times, very long sequence of tosses, then the probability of heads is just the
frequency of times at which heads comes up in this long sequence of tosses. And so this has a very
natural, I think, mechanistic, I guess, interpretation in terms of statistical interpretation
in the sense that if you imagine running an experiment repeatedly, repeating an experiment
over and over and over again, then the probability of certain
outcomes is just the frequency of times that you would see that outcome if you were able to
hypothetically run this experiment over and over. Although, you know, that, I guess that
experimental interpretation isn't necessary, but I think it's very helpful. And I think it's how a
lot of people think of probabilities. At least I know that that's how I thought about probability for a long time, because I was thinking about the probabilities of a roll of the dice, playing a game of craps or whatever.
The probability of rolling a 7 is just the long-run frequency of times that a 7 is going to come up.
And so it's very much tied to that process that generates the data.
The Bayesian view is a bit different in that it doesn't, and I guess one of the benefits to it, or one of the arguments in favor of it, is that it doesn't depend on this repeatability of the
experiment. So in the Bayesian view, the probability isn't a frequency, but it's thought of as what's called
a subjective degree of belief, or really just what is the price of a fair bet, you know, what,
or a fair price of a bet on a given outcome. So if I were to tell you that the probability that
I assigned to the coin landing heads is 60%, all I'm telling you is that I'd be willing to take a bet on either
side of that price at the implied odds, the implied odds there being three to two.
I'd give you three to two odds on tails because I'm saying the 60% heads.
So that's just a subjective statement about my disposition towards the
outcome it doesn't say what the actual outcome is going to be or what or you know what the actual
probabilities or frequencies are if those even mean anything those don't really have um you know
a meaning in this context so those are the two predominant views uh i guess within statistics
or at least within the philosophy of statistics I should say they're not necessarily competing views there I would agree that they're not competing views I
think some people think that they are competing views and different people have different ways
of thinking about this it's it's best not to get into only because you know this is a this is a
historical debate that's gone on for 50 60 years and it's not one that people are very interested
in anymore because within it's not not to say that it's years, and it's not one that people are very interested in anymore
because it's not to say that it's been resolved, but it's just to say that for the ways that
statisticians actually use probabilities now, this philosophical argument over what the
probabilities are isn't really their main interest anymore. But I guess what I just want to close out
on these Bayesian probabilities
or these subjective probabilities and what the benefit is that there are a lot of situations in
which you might want to assign a probability to something that can't be repeated in principle.
We were talking before about the election, for example. If I were to put a probability,
as certainly people did, and Nate Silver, these election predictors do, that Trump has a 25% probability of winning
the election. You can't really think of that as, it's not really right to think of that as, well,
if we were to rerun the election a million times, then 25% of the time Trump would win,
because the election's only going to happen once. So, maybe the better way to think about it or the
way that the Bayesian view would think about it is that instead, what I'd be saying in that 25%
probability is that I'd be willing to offer you three to one odds against Trump winning.
Then there's also other types of probability or other interpretations of probability that I think
are interesting and relevant, not so much in statistics, but a lot of probability judgments that we make in everyday life
are qualitative, based on some qualitative or ambiguous assessment of the evidence.
And that's something that I've been thinking about a lot more recently,
but those don't so much come up in statistics.
And so what would an example of one of those be?
Well, I think that good examples are when,
I guess the example that comes to mind often is,
well, one is, you know,
if we were trying to talk about
whether we should meet on Saturday or on Sunday
and you were to say,
well, Sunday probably works better for me,
you're making some kind of probabilistic judgment there,
but you're not, of course you're not assigning a numerical value to it. But I think
maybe in a more substantive context, in legal settings where a juror makes a determination
that a defendant is guilty or not guilty according to some standard of evidence or some standard of,
you know, how probable it was that they committed the crime. At least legally speaking, you know, there's a lot of resistance to putting any
numerical value on that assessment because of how kind of hard it is to really pin down these
numbers. But certainly the system works. Well, to say it works or doesn't, but, you know, the system
operates and it operates under this, you you know kind of intuitive understanding of probability without any reference
to the numerical or the mathematical theory of probability yeah so to recap then we've got the
frequentist view of probability the bayesian view of probability and then
probability sort of in the common parlance which is you know your proclivity to assenting towards a belief or not yeah i would
say i would say that's good for well for the sake of this conversation i think that's those are the
three relevant i think in a lot in most cases those are the three relevant uh concepts of
probability yeah yeah so now tell us where statistics fits into all this
in terms of the concept.
Yeah, so I think that one way of understanding
what statistics is now is it's pretty much,
it's a field of, I guess, analyzing data
on the basis of probability or using probability
as kind of a way of modeling the way that data behaves and drawing inferences from data based on those probability models.
So, probability comes into statistics at kind of a very foundational level in that it's the basis of which statistical models are based on the theory of probability nowadays.
I think that there's other ways of
understanding what statistics is that has nothing to do with probability i mean a statistic is a you
know more or less a number in a book or or you know the way that when you take the census the
national census you can you measure all of these statistics of the population that's a different
use of the term statistics although the field of statistics is in essence
a way of kind of gleaning information from these these statistics yeah these numbers yeah
now of course there are lies damned lies and statistics and we were talking earlier about
some of the misuses of statistics and the way that statistics can be used to sort of give a
veneer of legitimacy to things that are unfalsifiable or outright wrong. And you mentioned
the example of the elections and Nate Silver with his polling predictions, but could you talk about,
you know, what some of the problems are that you see at the moment in terms of how probability is being used by people like Nate Silver.
I don't want to single out Nate in particular, but I guess he's sort of representative of a certain class of people.
Yeah, well, I guess there's a couple of contexts that we can talk about this, one of which is the election predictions, and another of which is the way that probabilities are used, I guess, in mainstream scientific research to, you know,
validate scientific conclusions. And that's something that's led to what's called the
replication crisis. I think both are certainly related and relevant to this discussion. I think
that the latter has potential to have much more far-reaching impact
and, you know, negative impact potentially as it already has in that, I guess, the conclusions
that people are drawing from these scientific articles are, you know, are in a sense more
consequential than Nate Silver's election prediction. But I guess on the topic of the election predictions, I guess
what I would just say is that I found the whole experience of the most recent election in America
to be very interesting from my perspective. Because at the time, I guess at that time,
I had just started this seminar on probability. And so we were meeting every week and we were
talking about all these ways in which probability comes up. And here you have a real life event that's going on that the news media
is reporting news that isn't, they're reporting things that are supposedly happening in the world,
but they're also making a major part of their news operation, just reporting these probabilities that are being calculated in principle based on the things that they're reporting. So it's really, you know, it was
interesting to me, especially, I guess, the story that really sticks in my head was, I remember
sometime in September, right after the first debate, Trump had, I think, outperformed what
everybody was expecting in the first debate. And, you know, a lot of people were, Trump had, I think, outperformed what everybody was expecting in
the first debate. And a lot of people were kind of, well, at that time, regardless of what people
were saying, Nate Silver, the 538 probability took him from something like 20% to win to like 35%
to win all in the span of a day. And I just remember sitting around the room and hearing
people talk about this. And they were legitimately worried. They were like shaking. And they weren't shaken
by anything that actually happened in the debate. They were shaken by the fact that the probability
changed and that now the probability was a lot higher than they wanted it to be because these
were people who were, I guess, Clinton supporters, right? And so this really struck me that I had
been thinking for a long time, I guess, in the kind of weeks and months leading up to that, what do these probabilities
actually mean? You know, I have no idea what these probabilities actually mean. I mean,
you know, if you think about it, and then I wrote something about this after the election happened, which was, you know, in 2008, I think, 538 gave Obama something like a
90% probability to win. And, you know, at that time, silver was really heralded as this great
prognosticator, and that he got it right, right? He got everything right. And then in 2012,
he gave a similarly high, a lower, but a similarly high probability for Obama in the 80%
range. And so again, he was said to have gotten it right. And then after the 2016 election,
where he gave Trump 25, 28% chance to win, you still had people coming out and saying that he
got it right again. And these people, what they were pointing out was, well, when you say that something's
going to happen 25% of the time, or when you say that something has a probability of 25%,
that means it's going to happen 25% of the time. And so the fact that this thing happened
doesn't invalidate the probability. And so when you start to get into that realm of interpreting
these probabilities, then it means that pretty much nothing that is said other than an extreme 0% or 100% probability is falsifiable, right?
Because you can't replicate election.
You can't run multiple versions of the same election and see how, Trump wins or continues. Right. And apparently, no matter what comes out, you're going to be able to say that it's
consistent with whatever you predicted, right? So, it really renders the probabilities meaningless,
at least from my point of view, unless, I mean, I guess this is something that I've been
working on recently or thinking about recently, and I showed you an early draft of this,
unless these probabilities are tied to something real, right? And so, this is something that I've been working on recently or thinking about recently, and I showed you an early draft of this, unless these probabilities are tied to something real, right? And so this is something
that I'm calling the first rule of probability. And it should be the last rule of probability too,
which is if you state a probability on any outcome, then you should be forced to accept
any bet on that outcome at the implied odds, right?
So if I state a probability of something being 1%,
then I should be forced to offer you 99 to 1 odds if you wanted to bet on the other side.
And of course, if I lose that bet, I have to pay up, right?
That's the only way, at least the only way that I can see,
and this is something that, going back to Reds, the guy at the only way, at least the only way that I can see, and this is something that,
you know, going back to Reds, the guy at the bowling alley, this is something that he knew
very well, and all the people who were dealing with him knew very well. These probabilities
that you state or the odds that you state don't mean anything unless they have real consequences.
And unless, you know, you can actually, first of all, I guess this is where I, I would, I would,
um, I kind of, I, this idea of what I would call a real probability, you know, what is it,
what does a probability need to be to make it a real probability? Well, one is that it needs to
be backed by something real. And so in this case, it would be backed by money and it has to be about
something real, uh, meaning that it has to be decidable. I have to be able to determine whether or not
the thing that you said had this probability
actually happened or not.
But otherwise, we're just talking about numbers.
Yeah.
So, I mean, I could say,
I think there's an 80% chance
that there's a flying spaghetti monster orbiting Venus.
That wouldn't be a real probability.
No, as of right now, no no i wouldn't say that right yeah
um you know that that those are exactly complete speculation and i don't well it's just something
that i don't i don't understand it it has no it's a meaningless statement yeah well one it is
meaningless because i don't know what this spaghetti monster is right but also even if even
if you could define for me what it was if you said it was a teacup instead of a
monster how are we going to actually uh kind of go ahead and verify whether or not you're correct or
not you know you could of course we're not going to send spaceships to to actually look for this
thing and so there's no way to actually decide the outcome so then what so if i bet money that
there was a teacup orbiting Venus,
and I said I think there's an 80% chance that that is true,
does that become a real probability?
Well, I guess it would become real in a situation like this
where we would say, well, you're saying that there's an 80% chance,
and you say, well, you would have to say, I think...
It's a bad example because I would never actually make that bet.
Of course.
But let's suppose that you said that, then we would have to agree that you will pay me, well, that there is such a thing and that it will be discovered by January 1st, 2019.
And so now you're actually putting kind of a finite time limit on and actually saying that it will be discovered.
So it could be out there in principle, but if we don't find it, then you win or you lose the bet by this certain date. So, there has to be kind of a
clear delineation of what it is we're talking about and what the probability refers to in
order for it to have any kind of real meaning. Yeah.
So, essentially what you're saying is that prediction specialists and people in that prediction industry need to have skin in the game. that I'm familiar with it and I see it, you know, I've seen, I've read part of the book and I'm of
course familiar with his Twitter account and his Twitter feuds and all of that stuff. But now that
I'm familiar with the concept, I see it everywhere, right? And oftentimes when you see something not
working out the way that it should, it's because there's an asymmetry in this idea of skin in the
game. And I think that this first rule of
probability is is just a um kind of a specific example of of how in order to i i say put you
know put more meaning into the probabilities that people say force them you know force them to
actually kind of back it up so let's give people an example of this so you know when he says well
firstly i suppose when he talks
about skin in the game what he's saying is that you need to share the downside risk for your
your predictions or your actions so it's not merely a matter of incentives it's about that
symmetry of of also owning some of the downside would a good example maybe be you know the the
financial crisis how does that kind of fit into this this idea
well i assume so i guess i assume that yeah the financial crisis would certainly
uh fit into this idea uh in the sense that um well but i guess an example that i that i think
is maybe even more better to talk about what's happening in real time, which is the replication crisis in
science. And it's a situation where people are not talking about anything along the lines of
skin in the game or what I'm talking about with how to give meaning to probabilities. But I think
it's a situation in which this really could benefit. And so what is the idea? So the idea of sharing the downside.
So I guess before we go into that,
should we talk about some of the,
how much should I explain about
what the replication crisis is, I suppose I should explain?
Yeah, let's go into detail
because I think it's really interesting.
But to give people sort of a sense
of how this connects to their their daily lives like i'm sure people are familiar with the with
the sense of just of overwhelm in terms of different scientific studies like you hear on
the one hand that you know coffee is good for you and on the other hand that it's bad for you and
then you hear about a study saying that two glasses of wine a day is good for you.
And then you get another study saying it's bad for you.
And we see this across so many domains, especially a lot in psychology as well.
Kahneman wrote quite a famous letter back in, was it 2012 or 2013, criticizing the field
of priming, psychology of priming psychology of priming um but it's uh it's it's become as you've
said it's it's now regarded as a quote-unquote crisis so how did how did we get to that point
and why is it a crisis yeah um so what i guess the easiest way to explain this is to explain it
in a very simple context.
And it's a context that I think most of the problems are coming from this,
coming from application of statistics called hypothesis testing.
This is not the only way that statistics are used and the only way the problem manifests itself,
but it is kind of the lion's share of the issue.
And so basically the intuition behind hypothesis testing is something like this.
Just to put it in kind of layman's terms to start,
which is if I were to toss a coin 10 times right now
and it were to land heads all 10 times,
and then I were to come to you and I would say,
I just tossed this coin 10 times.
It was completely fair, legitimate, and it came up heads 10 times in a row. What, you know, what are you going to think? I guess,
you know, what is your natural reaction? Well, one natural reaction is just to say,
well, you got lucky, you know, it wasn't supposed to happen, you know, whatever.
But another reaction you might have was, you know, there must have been something wrong with the way
that you were running this experiment. Are you sure that you were tossing the coin correctly are you sure the coin was fair you know are you sure that the
that the whatever conditions you assumed you were you're running this experiment under
were the conditions that it was actually run under and so that's pretty much how hypothesis
testing works so um you would have some kind of scientific hypothesis which might be something
like you just mentioned it,
that drinking two cups of coffee a day or drinking coffee is good for you, or, you know,
drinking coffee reduces your risk of getting some kind of cancer. Okay. And what you would do is,
I guess you would start with the hypothesis that you want to disprove. So you might say something,
I guess, if you believe that cancer is helpful, is good for you, then you would start with hypothesis something along the lines that
there is no effect, that there is no relationship between drinking coffee and this particular form
of cancer. And so you want to refute that hypothesis. And so in order to do that,
you would collect some data. And so you would run some kind of study on people under certain dosages of coffee, perhaps.
And based on that data, so you make certain measurements about their response to this.
And I guess you could measure their risk factors for a certain type of cancer.
And then you would calculate some statistics and measurements based on that data.
And then you would compute a probability.
So you're assuming a model under this hypothesis.
And so you would compute a probability that of, which is, this is what's called the p-value.
You would calculate the probability that the data that you, of having observed the data
or having observed the statistics that you observed, assuming that your hypothesis was
correct.
So assuming there was no relationship
between coffee and cancer, what is the probability that I would have observed what I ended up
observing? If the probability, if that p-value is small, then you would interpret that as saying
that there's evidence against your hypothesis, right? That would be like tossing the coin 10
heads in a row, right? So if the probability of having observed what you
observed is very small under the assumptions that you're making, then you would interpret that to
mean that maybe the assumptions that I'm making aren't such good assumptions. And so you would
be inclined to reject your hypothesis under those conditions. The point is, I think something is
true. I have a hypothesis. I collected data to test that hypothesis.
If my hypothesis were true, then the data that I observed is very unlikely to have occurred.
Therefore, the logic would be that it seems reasonable to maybe call into question some
of the assumptions that I've been making in my hypothesis. So, that's the basic structure of the argument. Now, what statistical significance is, is it sets a
predetermined cutoff value, a predetermined threshold, which is usually called the alpha
level, which is going to determine whether or not you reject this hypothesis. So usually conventional cutoff value in a lot of fields is 0.05, 5% probability.
But in a lot of fields like physics, experimental physics, and genetics, these probabilities
are much lower, which a lower probability means a much more stringent threshold to reject
the hypothesis.
So 5% is actually pretty high.
But we'll talk about that in a bit because
there's some discussion about whether that 5% value is actually, you know, is too high and
should be changed as a default cutoff value. But anyway, you set this cutoff ahead of time,
and if your p-value is less than the alpha level, is less than the 0.05, then you would declare your
result statistically significant. And so,
where this idea, this notion of false positive comes in is that, well, what I've done is that
I've calculated a probability, I've assessed that probability as being small, and I've drawn a
conclusion based on that probability being small, but the probability wasn't zero, right? So, even though what I observed was very
unlikely under the assumed hypothesis, it was still possible, right? So, all that means is
that it's possible that I would be rejecting this hypothesis erroneously.
That's due to randomness.
Due to randomness, right? And so, this is what's called a type one error or a false positive. And so, what the replication crisis is, is one way of putting it is that the, so if you think about the scientific literature, the scientific literature assume everybody is applying the same, well, everybody's applying some kind of statistical technique or method in a lot of parts of the literature. People use statistics a
lot. And so, they're oftentimes justifying their conclusions based on some statistical method and
some probabilistic assessment. If each one of these has a small probability of being a false
positive, but you have a very large sample of conclusions,
then there's going to be some conclusions in the literature that are false positives.
And so, according to the theory, the theory itself, if you correctly apply the statistical
theory, some percentage of the conclusions are going to be false. The question is, which ones?
Okay. And now the way that you would find
out which ones is then you try to replicate the study. So if you make a claim that coffee is
beneficial for health, and you have some way of quantifying that, then what I could do,
or some other scientists could do, is they could take your description of the experiment
and rerun your experiment and try to see if they observe the same thing. If they observe the same
thing, then that would be called a replication. And that would provide further evidence that your
original experiment and your original conclusions were valid. If I fail to replicate, then that
would suggest that the conditions, you know, that you experienced
a false positive, that your original conclusions were due to randomness. And so, what the replication
crisis is, is that while we would expect in the literature there to be some failure to replicate,
the replication crisis is that the percentage, you know, the percentage, the proportion of conclusions in certain fields that are replicable are very low.
The replication rates are very low.
A recent study in the psychology literature tried to replicate nearly 100 or about 100 very well-known results in psychology, and they got about a 37 percent replication rate which means that you
know more than half of the conclusions did not replicate and so a hundred of the most significant
studies in psychology i don't know so i don't want to say most significant i i don't know the
the the field very well but these were these were results taken from the well-respected journals and
these were results that were cited a number of times and
so yeah and and does this apply to other fields as well is it just psychology so uh psychology
especially i think social psychology is one of the main is often brought up as one of the main
you know culprits of this uh biomedical research is also uh is also has very high false positive rate and that that's maybe
more concerning because you would think that these are these are studies that are having
direct effects on medicine and i guess it's just like the coffee study that you mentioned these
are the types of things that are having a hard time replicating yeah yeah other fields are better
um i think experimental physics uh is much better uh they have more precision and more
control over the conditions that they're running things on there but uh psychology biomedical
research you know these these things uh seem to be you know have very low rates of replication yeah
and i mean when you have a psychologist of the the standing of kahn, like he's widely regarded as probably the most influential living psychologist,
you know, seriously kind of cautioning his
and chastening his colleagues saying we need to do something
or otherwise the entire field of priming in this example
is its credibility is at stake.
That indicates just how serious this crisis is.
Well, it's definitely serious.
And people are talking about it constantly,
and there's a lot of efforts now.
So I guess I should say,
I just became kind of interested in this relatively recently.
Because it is fundamentally, it's a problem of statistics.
It's a problem of statistics it's a problem of statistics and
i'll explain in a second why i actually what got me thinking about this more because i've been aware
of it and i've heard about it for a long time so when i was in even in graduate school was you know
almost 10 years well 10 years ago between six and ten years ago or so people were already talking
about this then um and so there have been a lot of proposals as
to how we can fix it i mean i think that it's one thing to say that it's there it's another thing to
fix it and so there's been movements towards well what we need to do is we need to make people we
need to make sure that people report their you know people uh report their methods or their
their their experimental methods more clearly that they they report their
design they do all this they use these more sophisticated methods but none of it seems to
be working and in fact i think that there's reason to believe that it's only getting worse
so they're kind of saying that it's been a mistake in the in the the replication of the
the experiment that researchers haven't been following the exact way it was run
in the original study. And that's why it's fair. Is that sort of what that argument goes to?
So that is, so I guess that, no, that's not what they're saying, but actually that right there
is something that if you were to run my, if I were to run an experiment and then you were to
try to replicate it and you fail, a common response from me would be exactly that.
You didn't do it right.
Which would be that you didn't do it right yeah and there are situations in which it seems like the people
who ran the original experiment are able to get replications even though the nobody else can and
it might be assuming no kind of foul play which there's plenty of that but it might just be that
they have the kind of the technique i mean a lot of this is physical stuff right it's people working
with things in a lab and it's very sensitive and so there might be things that that they have the kind of the technique. I mean, a lot of this is physical stuff, right? It's people working with things in a lab and it's very sensitive. And so there might be things that
they know or that they're able to do that other people aren't. But no, those aren't what people
are pointing to. What people are pointing out is simply, I guess, for example, so the thing that
really got me into this and got me thinking about this was that there was this recent proposal to i so i mentioned that the default significance cutoff is 0.05 in a lot of fields and there was this recent proposal
so and does that mean there's there's a five percent chance that you'll get a false positive
oh okay yeah so what that means um what that significance level means is that that's the
that's the type one error rate yeah the type one error is the
probability of a false positive yeah on any given trial and if and this cutoff value this 0.05 yeah
is controlling that type one error rate so at five percent it's it's saying that you if your
hypothesis is true then you have only a five 5% chance of wrongly rejecting it.
So some people said we need to change that number.
Right. So a recent proposal, this is by a list of 72 authors, these are very well-known people
in a wide number of fields, have suggested that we should lower this cutoff. We should lower the type one error probability
by making the default cutoff 0.005. So half of a percent instead of 5%.
Got it.
And this got a lot of attention. And most of the attention I've seen has been negative attention.
Now, why is it getting negative attention? I mean, on the one hand,
you think about it and you say, well, if you're telling me that there's too many false why is it getting negative attention i mean on the one hand it's kind of an on the one hand you
look at you think about it and you say well if you're telling me that there's too many false
positives in the literature and if this number 0.05 is the false positive probability then if
we lower that probability then that means that we're going to get low fewer false positives
in the literature so that's going to help with the replication crisis.
And that's, in essence, I mean, that's caricature in a bit. They do go into other arguments in favor of this proposal, but that's essentially the kind of argument that they put forward,
which is just, well, first of all, just because there's something called the false positive rate
and the type one error probability. So, I should distinguish between the probability on
any given trial of having a false positive and then the overall rate of false positives
in the population of all studies, overall scientific experiments.
So, before we move on, Harry, could you please clarify the difference between
the false positive rate and the significance level.
Yeah, okay. This is important. Yeah, these are related. The significance level,
this was this 0.05, I guess, in a lot of fields, is that this is, you can think of this as
controlling the probability of a type one error or of a false positive of getting a false positive on any given study, on any given experiment.
The false positive rate is the rate at which these false positives show up in the literature over a large body, large collection of such studies.
Now, why would these two things be different? It has to do with the fact that
the rate at which they show up in the literature depends on what proportion of studies are in the
camp of having a true null hypothesis versus a false null hypothesis to begin with. And so,
what we're finding is that, you know, under certain empirical
estimates of what these base rate probabilities are for true and false null hypotheses, that
at the 5% significance level and at a standard level of what's called statistical power, so
the specifics of this is not something that you should worry about the technical details, but understand their kind of assumptions on what those typically are, what we shoot for, that the false positive rate in the literature would be something like 36%.
So, you know, something like a third of the findings that are reported to be significant at the 5% level would be false positives.
Okay, got it, got it. Yeah. that are reported to be significant at the 5% level would be false positives.
Okay, got it.
Got it.
Yeah.
So one of the terms that you hear a lot when people are talking about this replication crisis is p-hacking.
What's that all about?
Yeah.
So p-hacking is, so we talked about the p-value.
The p-value is this probability that you would have observed your measurements under the
assumption that your initial hypothesis was true, right?
That's your p-value.
The way that you use that p-value is that if the p-value is small,
you would take that as evidence against your hypothesis,
and you would reject the hypothesis if it was lower than the significance level, right?
So now what p-hacking is, is that in a lot of fields,
and this is pretty much, this isn't the way that it's supposed to be used,
and this isn't the way that it's supposed to be used, and this isn't the way that it was intended to be used originally when this idea of statistical
significance was first introduced. But the way that this significance level gets used is that
if you have a statistically significant p-value, if you have a p-value of 0.049,
then you're in a much stronger position to get your result published than if you have a
p-value of 0.051, right? So there's this very clear dividing line of what's in and what's out.
Now, just because you have a significant p-value does not mean you get published. Just because you
have an insignificant p-value does not mean you don't. But the chances really are different, even on either side of the line.
And so what p-hacking is, is a natural reaction to this phenomenon, which is that if I have a p-value of 0.051, then what I'm going to do is I'm not going to report a p-value of 0.051. I'm going to change my data a little bit or change my model or change my method somehow.
And I'm going to keep doing it until I get a p-value that's less than 0.05.
And then that's the p-value that I'm going to report to you.
Because I know that you, being the editor of a journal who decides whether to publish my results or not, I know that that decision depends very heavily
on whether or not my p-value is significant at the 5% level or not, or significant at whatever level
the prevailing significance is for that field. And so p-hacking is just, I mean, the term is
very descriptive in that respect, in that I'm literally, I'm gearing my analysis towards
getting a significant result. I'm not doing the analysis and then seeing whether it's significant
and using that result as a guideline as to whether or not I should conclude one way or the other.
I'm already making the conclusion that I want it to be significant. And then I'm just reverse
engineering the analysis to make it happen wow so an
experimenter could rerun an experiment or keep running it in order to get a statistically
significant result and you know just just bury the evidence of of the stuff that yeah so i should say
i should make it clear that this is i guess the way that i just described it
it sounds like it sounds like fraud right and and it is fraud at the level that i just described it
as but there's a lot of subtleties in this and that a lot of times you can be p hacking without
knowing it and that's that gets to kind of the difficulty the inherent difficulty of running
these statistical analyses or how to interpret the outcomes. So, something that I guess wouldn't necessarily fall under the terminology p-hacking
specifically, but I think that I'm using the term p-hacking kind of more generically to mean kind of
any number of behaviors that goes against standard statistical practice, whether it's intended or
not. Because really, at the end of the day, whether it's intended or not. Because
really, at the end of the day, whether it's intentional or not, to me, doesn't really affect
the outcome. The outcome is that the results are still unreliable as they're reported.
Of course, you want people to be acting in good faith, but even if they are acting in good faith,
if they're doing a bad job of doing it, then the results aren't going to be very good. So, an example of how,
you know, you could end up doing some kind of p-hacking inadvertently is if we have a 5%
significance level as our standard, then if I run any given study, if there's no effect,
right, if I run any given study, then I'm unlikely to find, I'm unlikely to reject that hypothesis. But what if I run
20 studies? What if I run 100 studies? Well, eventually I'm going to get lucky and I'm going
to get a significant result, even if none of these hypotheses are, have any, you know, true
kind of effect. And so, what happens is that over the course of, you know, in a scientist's career,
scientists run experiments all the time, right? And so, what do you do when you run an experiment and it's not significant?
Well, you might just throw it away. You ignore it. This is something that's called the file drawer
effect, I guess, that people have started to call. Because what you do is I run an experiment,
I test a hypothesis, the hypothesis turns out not to be significant. And so, I don't
report that. It goes into my file drawer, right? Over time, the things in this file drawer are
very, is getting very long, you know, getting very large, right? A lot of things in there.
They've never been reported. What gets reported is the one out of every so many studies that I
run that actually turns out to be significant. I report that you and so i mean in fairness that's the stuff that's going to get published
that's the stuff that's going to get published right so when you now go to consider my my result
you would like to know how hard i had to work to get it even though there's all these other
studies which might have nothing to do in principle with the conclusion i'm drawing but the statistical method that's being applied to them you know the the you know the effectiveness
of that method does matter and of how many times i've used it it's kind of like if i'm
you know if i do something if you if you toss a coin long enough you're eventually going to see a
sequence of 10 heads in a row,
even though any given sequence is very unlikely to turn up 10 heads in a row.
And so the same thing's happening here. If I run enough experiments, I'm eventually going to find something significant, even if there's no effect whatsoever. And so the people who are evaluating
these methods, they can look at the methods and they can say, the methods are completely sound for this particular study. What they don't know is
that you may have, this might, this is likely to be one of very many studies that you have run,
and those other studies didn't turn out to be significant at all. And by the way, I mean,
the question is, is this fraud or is this just kind of, you know, I don't think people really know exactly how to handle this at the moment.
Because if you're going to use statistics to evaluate the conclusions of your experiment, you can't, you know, how are you going to account for the thousands, you know, hundreds or thousands of studies you've run over the past 5, 10, 20 years, if you have a long career, right?
How is that all supposed to factor into the evaluation of one single outcome of one single experiment?
It's not clear at all.
So, while it's clear what the effect is and why this is bad, it's not clear how to remedy the situation.
So, this p-h is is what's driving the replication
crisis well uh the replication crisis what's driving it is things like p hacking things like
what i just what i just said which is effectively what's called multiple testing if i run multiple
tests if i run enough of them some of them are going to turn out to be significant uh another
thing is what's called publication bias which is what we what you just mentioned
which is if there's this bias towards publishing things that are significant so the only things
that get reported are significant um and therefore that's what's encouraging p hacking but it's also
just biasing the results towards uh you know it's just leading to bias in what those probabilities
should act, what the probabilities that appear in the published literature, how they should be
interpreted. And so, all of these things play into the replication crisis.
Now, is this undermining the reputation of the field of psychology?
Well, you know, to some people,
psychology doesn't have much of a reputation to begin with.
But I definitely do think that among a lot of statisticians,
yeah, I don't think psychology has a very high,
a very good reputation to begin with
in terms of how they run their experiments
and how they rely so heavily on these statistical measures and how they interpret them and things like that
yeah do you have any proposals for how we could get out of the mess well i guess i i have a bit
of a uh somewhat of a radical proposal uh which i think is necessary i think in order to actually
fix things you have to have something a bit radical. I don't think anybody will actually take what I'm saying and implement it, but I think
it is food for thought. Before I get to that, there was this proposal I mentioned. I think I
mentioned a little earlier about 0.005, right? So, the proposal there is trying to say that,
well, if we lower the type 1 error probability, then there's going
to be fewer false positives. And if you do a theoretical calculation, if you do a mathematical
calculation under the circumstances that I just mentioned earlier, so under the circumstances I
mentioned earlier, the theoretical false positive rate is about 36%, about a third. If you calculate,
make those same numbers under this 0.005 proposal, the theoretical false
positive rate will be about 6% or less than 10%, something like that. So the argument that these
people are making is that just by lowering the significance level, we can improve the false positive rate and therefore improve the replication crisis.
And so part of my, I guess, retort to this and part of my argument against this proposal
isn't that the proposal itself is a bad proposal and that, you know, kind of in a vacuum,
0.005 might be better than 0.05. don't know right but what the argument but my problem is
more with the argument that they gave in support of their proposal which i mean is very misleading
and actually ignores a major part of the problem which is this idea of p hacking
you know just because you lower the when you lower the cutoff value from 0.05 to 0.005, all you've done is change the target.
If you haven't done anything else to change people's behavior, then what they're going to do now is if they were p-hacking before, they're going to still be p-hacking.
They're just going to be p-hacking even harder.
Right.
Yeah. And there's people arguing that a lot of the people who right now aren't doing p-hacking might start to do it because it's going to be harder for them to get results published because of this more stringent cutoff value.
So there's actually arguments saying that this could make things worse.
And I think that what would happen is, you know, if there
is a benefit to this new proposal, it certainly isn't going to be as beneficial as the author's
claim in this paper.
And so my main kind of gripe with the proposal wasn't what the proposal is.
I think a lot of people don't, you know, a lot of people, including myself, don't think
that it'll have much of an effect, but that's kind of more of an opinion than anything else.
But my issue with it was that here you have 72 very well-known people, very well-respected
people.
Some of them have been involved in some capacity or another as leaders of this replication
crisis trying to fix things for over a decade and the
way that they present this first of all after a decade the you know the solution they come up with
is to essentially add another zero to this new proposal which i think is you know not not a very
dress not not the radical change that i think is needed but i think even more importantly than that
what you have is you have uh the argument they give in favor of it is a pretty weak statistical argument because it ignores a key part of the data, which is this idea of p-hacking.
And so I wrote an article on this trying to explain how if you were to try to account for the fact that p-hacking exists, and if you want your theoretical argument to become an argument that is applied to reality then you have to take reality into account and p-hacking is a part of
reality then if you were to take these into account then you know the effect that that
they're projecting which is that replication rates would essentially double under their new proposal
are are you know are just not going to happen. So that was my argument against
that specific proposal. Now, how do you possibly correct the situation? Well, in order to get rid
of p-hacking, you have to somehow, in order to get rid of the replication crisis, you have to
address this issue of p-hacking. And there's been proposals on the table to try to do that by
something called, there's movements towards what's called pre-registration, which is that before I run my study, I'm going to report it to the journal first, have them kind of vet the methods that I say that I'm going to do.
They accept it.
And then they say, we will publish your work regardless of whether or not you get a significant p-value or not, as long as you apply these methods.
I go, I apply the method.
And so I don't really have an incentive to p-hack now because I've been told that I'm
going to get published no matter what.
That's one proposal on the table.
A lot of people, there is support for that.
I guess for me, I just don't think that the answer is having
more bureaucratic oversight to this to this whole problem i think that arguably the problem is in
place because of a bureaucratic measure in the first place which is this 0.05 publication threshold
if you if you just kind of change the criteria which is if you change it away from 0.05 to making sure that
somebody vet approves of your methods before you run the experiment then am i going to now be
hacking my methods just to say well i'm going to use these methods the only reason i'm going to use
these methods isn't because i think they're the best it's because i know that you are going to
approve of them ahead of time you know it gets out the, it's still not what we want, right?
So, what I've been thinking about is, well, why not tie this idea of replication to this
rule of, this first rule of probability, which is when you publish a conclusion, when you
make a claim, a scientific claim, in order for it to be scientific, it needs to be falsifiable.
State the conditions under which it would be falsified.
Lay them out in detail.
Lay out exactly the experimental protocol, how I would go about falsifying your results if I was so inclined.
And tell me the probability that the result would be falsified.
Well, I want to phrase it in terms of replication. So if something isn't falsified, then let's say
it's replicated, right? Tell me all the circumstances under which it would be considered
to be replicated, and tell me what the probability of replication is. So let's say I do all that. I
say, this is my conclusion. Here is how you would go about
replicating it. And the probability that this replicates is 80%. And what that means is,
well, what we said earlier is that now somebody else can come along and say, well, I don't think
that you're, I think that you're being a little overly ambitious with this 80% figure. So you're
offering me four to one odds to
try to replicate your experiment. I'm going to try to do it, or I'm going to have some third
party do it, ideally. And if it replicates, fine, you win. I'm going to put my money up. I'm going
to put grant money up. I'm going to put something up to kind of confirm that I actually don't
believe your probability. You're going to have to put up
something to support your claim that this is going to be replicated. And we run the replication. And
whatever happens, if it replicates, you win. If it doesn't, I win. At the very least, what does
that do? Well, that suggests that you do not have the incentive to p-hack anymore. Because if you do p-hack, that means you're going
to be offering me better odds than I should be getting to try to replicate your study.
Which means that eventually all that's going to do is that's going to make you go broke.
Right? And so here there's a real kind of shit. You talked about skin in the game and how it's
part, it's not just about incentives, but it's about sharing the downside well if i'm a scientist and i'm making a conclude a claim that
something is true i should have to share the downside in that claim being wrong and this is
you know this is maybe the most direct way i can think to actually do that um it's quite radical
yeah um i guess i don't i don't think anyone's actually going to do it.
I don't think that it's in kind of the DNA of scientists.
They're going to think of it in terms of gambling,
which I think is the wrong way to think about it.
I think you should think about it in terms of investing.
That we're investing in good scientists
because the good scientists are going to be the ones
who win these bets.
And when you win the bets, it's not like you go run away and take the money,
put it in your pocket. You know, I'm thinking about this as kind of funds for research. So,
if I put my funds up against your funds and you win, that just means that I have less money to work with. You have more money to work with, but you were the better scientist, so you should have
more money to work with. Yeah. Wow better scientist so you should have more money to work with yeah wow i like it i like it yeah well we'll see uh we'll see what happens watch this space
let's let's talk about kahneman and toversky so these names might be familiar to uh listeners
two very famous israeli psychologists who sort of pioneered the field of cognitive
biases and heuristics and, you know,
in combination with Richard Thaler founded the field of behavioral economics.
Of course, sadly Amos passed away in the nineties, I believe.
But for the work they did together,
Danny won the Nobel prize in economics and then published a
very famous book, which kind of collected a lot of the work they did together and that he did as
well called Thinking Fast and Slow. But one of the really famous experiments that he did was called
the Linda problem. Do you want to just give us an outline of what that experiment was. Sure, yeah. The Linda problem, so this is a very interesting experiment.
I'll actually read what it is directly from, this is from Kahneman's, I guess it's his most recent book, right?
Thinking Fast and Slow.
So here's the experiment.
So this description was read to subjects of a psychology experiment. They were usually or oftentimes, I think, undergraduates at whatever schools Kahneman and Tversky were working at at the time.
It's a description of a person called Linda.
And it goes like this.
So, Linda is 31 years old, single, outspoken, and very bright.
She majored in philosophy.
As a student, she was deeply concerned outspoken, and very bright. She majored in philosophy. As a student,
she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. And so then, based on that description, the subjects
are asked to answer this question, which of the following alternatives is more probable?
Linda is a bank teller, or Linda is a bank teller and is active in the feminist
movement. Okay. And so, what is, I guess, what is famous about this result is that 85,
between 85 and 90% of the subjects in this study said that the more probable outcome was the second
of the two, that Linda is more likely to be a bank teller and active in the feminist movement than she is to be
just a bank teller. And so this is what's called the Linda problem or the conjunction fallacy
by Kahneman-Tversky. And so why is this a fallacy is according to the rules of probability, well,
if you think about what I've just described about
Linda, which is that there are two options. She's either a bank teller or she's a feminist bank
teller. And the second is a more specific version of the first. And so, it's a narrower description.
And so, necessarily, no matter how many people satisfy, no matter what the probability is that
she's a bank teller, the probability that she's a bank teller and a feminist has to be a lot smaller by just the rules of probability a subset of
bank tellers that's right yeah and so the fact that most people an overwhelming percentage of
people uh in this study said that the second was more likely was really it's kind of you know it's
kind of a gut-wrenching uh observation yeah but this is what's called the conjunction fallacy us included i guess yeah so i mean i guess what my what i find interesting
about this is that um so the first time that that i uh that i heard it i immediately jumped to the
second one that feminist bank teller is certainly a better description of linda as i guess and then
i of course i understand the rules of probability,
at least that's my job. And then it was explained to me that, well, that's a fallacy,
at least that's wrong because of this. But I still, something in my gut suggests to me that
this is still the right answer. And I guess that's what kind of, I guess that's what's
interesting about this is that Kahneman even brings this up.
He says that, remarkably, the sinners seem to have no shame.
When I asked my large undergraduate class, do you realize that you violated an elementary
logical rule, someone in the back row shouted, so what?
And a graduate student who made the same error explained herself by saying, I thought you
just asked for my opinion.
And I guess, you know, so, yeah, no, I think it's a really interesting observation.
I don't know what more to say than that.
I'm not so sure.
I don't even know how to kind of react to it.
Well, so he says, he describes it as well as the representativeness heuristic.
Right.
So a feminist bank teller seems
more sort of more plausible right yeah so that's i think that the a lot a very uh i think a good
explanation of maybe what's more likely happening is is that when people hear which of these is more
probable the way they're interpreting that is which of these is a more representative circus situation right and which of this is a more yeah representative description
of this person this person named linda and the idea is that she the the description of a bank
teller is not at all descriptive of this person that you just described but the idea of being a
feminist is and so the feminist bank teller is a, is kind of fits better with your, the picture that you have in your head about who this person might be.
And so, this is, this is one of a number of psychological or cognitive fallacies that, that they highlighted in their research.
And ultimately, I think that, you know this this got a lot of this
got this got a lot of attention but i guess what this got me thinking about this in addition so
this observation and also um some of the things that i was observing with the um the election
probabilities and how people are interpreting those is just the fact that people are violating
the rules of probability is one thing but then when you point
the when you point it out to them and they're indignant that they didn't do anything wrong
and i i include myself in that in that category to me that suggests that um i guess the way that
kahneman's versky seem to be interpreting this is that there's a real kind of cognitive defect in
people that they're incapable of understanding or you know kind of
working in or thinking along the lines of probability they're not they're not living
up to the paradigm of reason yeah but i guess the way that i the way that i interpreted this was
well if people are in such large numbers violating this rule then obviously it must be that the probability that this model of
probability for how people think isn't a good model for how people think because they're
quite clearly violating that rule.
I still can't think of a model that explains this or a better way of explaining this, but
I do think that it kind of points to a question of which
of the two things should take precedence? Should the theory that you're assuming, should the model
take precedence over the actual people who you're using to test the theory? Or should,
if you're trying to model people's cognitive capabilities, should you use the way that
people actually think to kind of
model your theory after that. And I guess I would be more in the second camp in the sense that,
you know, I guess what we were just talking about with hypothesis testing and p-values and all that,
which was if you observe data and it seems to be in disagreement with your hypothesis or disagreement with the
model that you assumed, the tendency is to reject the model, reject the hypothesis. In this case,
it seems like the conclusion is quite the opposite, which is that you have a model,
you observe data on these people based on this Linda problem. The data is not in agreement with
your model. And so you're taking that to mean that people have this cognitive defect, instead of taking that to be evidence that you need to find a better theory.
So, that's just kind of an initial reaction to how this was originally presented and how this is presented by Kahneman, and I think still how he thinks about it yeah so i don't quite understand your your
criticism of it yet because i mean there's a there's a distinction between descriptive and
normative models and under a normative model you'd be saying what people ought to do if they were
rational and under a descriptive model you'd just be pointing out what they do do in reality which
which they demonstrated with the linda problem that people don't make the rationally correct decision. But isn't that all they're doing there?
Yeah. So it seemed to me that, so what I understand of Kahneman and Daburski,
what motivated them initially to study these types of things was the fact that economists
operate, I guess a lot of economic theory is based on this assumption
of rationality. And they, as psychologists, thought that this assumption was more or less bogus
because they know that real people are not rational, I guess. And so they ran these experiments
initially to test this out, at least in part. Well, this was also part of their other experiments and what led to the development of prospect theory, which is eventually for what the Kahneman received the Nobel Prize, mostly, I think.
But here they are running experiments which are showing that people are not behaving according to the rules of rationality, or at least according to the rules of probability. And so, that would seem to
be a refutation of this assumption that people are behaving this way. And sure, that's fine,
but I still think that it seems to me, based on reading the interpretation, at least, and hearing
Kahneman talk about it, my understanding is that he still seems to see rationality as an end goal in the
sense that he talks about people as being bad intuitive statisticians and you know in other
words saying that you know value judgment in that yeah yeah so there's something about how
that people are bad intuitive statisticians and so we need to kind of train them to behave correctly, right? And so, I think that there is a sense in which
there, you know, that he's saying that, yes, this is kind of, you know, as a normative theory,
this, you know, this Bayesian model of, you know, rationality is not such a good one, perhaps.
But maybe we should still try to make it so that people behave closer to the way
that the theory prescribes, because otherwise you fall into all of these kind of, you end up
behaving in ways that could, I guess, cost you money in the long run or things like that,
according to certain economic theories. But I guess there was something that I found interesting, which was a
quote by Gigerenzer, or Gigerenzer and some co-authors. And we know that, I don't know if
you've covered this before, but on this, in future episodes. So Gerd Gigerenzer was a psychologist
who was pretty much the arch nemesis of Kahneman-Tversky as they were running these experiments.
And he has some very kind of uh strong criticisms
of of their work and of their conclusions and i and i i do think that it's it's it's good to get
the other side of things for sure but here i'm just going to um explain or just kind of quote
out of a book called the empire of chance which is by gigenzer and some of his co-authors, which was the 20th century psychologist, and among those is Kahneman Tversky, and this is in a paragraph
that's talking specifically about that work. The 20th century psychologists had come so to
review the mathematical theory of probability and statistics that they instead insisted that
the common reason be reformed to fit the mathematics. So, instead of the other way around,
I guess, being the other side of the coin. So, I mean, just to kind of point out that there are
people on the other side of this debate. I'm not a psychologist, and I certainly don't know the
whole 30-year history that's happened in between the original experiments and now, but my understanding
in talking to psychologists is that there is
there's definitely quite a there's been quite a bit of people uh study of this of this topic and
there are opinions on you know all ends of the spectrum so i don't think that kahneman-tversky's
opinion or you know work on this or that their their conclusions that they drew from it is necessarily the paradigmatic view anymore.
Are there any other of the famous conclusions that you don't necessarily agree with?
Well, I think that another example is the base rate fallacy, the only one that comes to mind.
What was that again?
So the base rate fallacy, and this can be described in a problem called the uh the blue cab problem and so
the the the description of the problem is is something like this which well it's it's it goes
like this which is um a cab was involved in a hit and run accident at night two cab companies the
green and the blue operate in the city and you're given the following data 85 percent of cabs in the city are green, 15% are blue. A witness identified the cab as being blue
and the court tested the reliability of this witness under the circumstances that existed
on the night of the accident and concluded that the witness correctly identified each one of the
two colors 80% of the time and failed 20% of the time. What is the probability that the cab involved in the accident was blue rather than green?
Okay, so what is the probability?
So a lot of people responded to that saying
the probability is 80%
because that's what the last thing said,
which is that we tested the reliability.
The witness said that it was blue
and the court tested the reliability of the witness
and showed that the witness is correct
80% of the time. What Kahneman and Tversky argue, and this is what they're calling the base rate
fallacy, is that this 80% figure is ignoring the base rate, which is this percentage of cabs that
are green and blue in the city, that 85% of the cabs in the city are green and 15% are blue. And if you take that
into account and you apply Bayes' rule, which is the way that you would update these probabilities
in the Bayesian theory of probability, according to the Bayesian paradigm, the correct answer would
be 41%. And so they took this to be, well they they concluded that people are not taking account
of base rates which is true they're not um or at least they're not taken into account uh in this
particular problem and they're not taken into account in the way that bay's rule would tell us
to the question of course is whether or not they're wrong to do they're wrong not to do that
and this is something that i think is much more, so the Linda problem isn't something that I have a very good kind of response to. I still think
that's really an interesting and kind of experimental outcome that, you know, still is
kind of makes your head explode in some ways when you think about it. But this one, I think,
is much less compelling just because the only reason you would say that 41% is the correct answer is if you buy into the Bayesian theory of probability. If I don't buy into it, then I don't
necessarily think one way or the other about that calculation. And so to suggest that it's a fallacy
not to come up with that number, that I'm wrong if I say 80% instead of 41%, is suggesting that I should be Bayesian.
And so this is kind of, so I just don't think that the conclusion,
I don't think the conclusion really works there.
I think this is a pretty, I think this is maybe a clearer situation
in which I would say that, sure, 41% could be the answer if you adopt this specific
way of interpreting the question. But even in interpreting the question that way, there's a
lot of hidden assumptions that you're making in doing those calculations. One of which is,
so you're telling me that 85% of the cows in the city are green and 15% are blue. If I'm going to use those as probabilities in this calculation,
then I'm implicitly making the assumption that the probability that every cab in the city
is equally likely to have been involved in this particular incident as every other cab in the city.
And when I make that assumption, that means that the probability of the cab being green was 85%.
But that's an assumption.
Got it.
So maybe like green paint makes cabs slower.
Or it could just be that the people who drive for this company, this blue cab company, are bad drivers.
There's a lot of reasons for it that could be there that we don't know about.
But surely the way they constructed the experiment was to assume all things being equal, etc.
I think that that's how they would like, yeah, they would have liked to present it in that way.
But the question is, though, if I'm a subject in this experiment, if I haven't been exposed to this concept of all things being equal or you know this even if i've never taken a
statistics course and even if i have i guess i'm not inclined to necessarily make those assumptions
on an intuitive basis of course that could be one of the conclusions that they're making which is
that people don't intuitively think in line with the statistical uh theory but i still don't think
that that's an argument that they should in this particular case
so in the book kahneman repeatedly says that people are bad intuitive statisticians but what
you're saying is that well he's actually holding people to a very particular standard of statistics
that might necessarily be the the correct standard to apply so i guess I would put it this way. I think that
some way of interpreting it, I think Kahneman is right to say that people are bad intuitive
statisticians. But I think that there's still a question of whether or not that's a bad thing or
not. So I guess if you were to ask, you know, are people bad at statistics? I mean, I think that
there's kind of a consensus that people are intuitively bad at statistics? I mean, I think that there's kind of a consensus that people are
intuitively bad at statistics. A lot of people dislike statistics when they took it in college,
myself included. But, you know, so yeah, humans are in some way bad at
thinking intuitively about statistics, but there's a lot of things that humans are good at that statistics is really bad at and so i i don't think it's necessarily a good thing to try to
engineer people to think more statistically i think that there's kind of benefits to both
and so you know what i mean by that is that what what is statistics good at statistics is
really good at kind of working out the average case. So what statistics does is it takes a lot of data, you know, it takes a sample of data,
and it kind of filters out the noise, and it gets rid of the noise, and it doesn't,
so it doesn't see those fine-grained details, but it gives you kind of the average case.
And so as long as you're in the average case, you want to be working with statistics, right?
Human beings are not good statisticians, I think,
at least in part. You know, this is kind of just an opinion. Of course, this is not a scientific,
I haven't tested this, but in part, it's because human beings see details at a level that the
statistical methods can't, and oftentimes get distracted by those details. But sometimes those
details are actually relevant, right? I mean, there's, how many times do you, you know, you kind of, you come across a situation that you've been in
hundreds, you know, hundreds of times before, but something just doesn't feel right. You know,
you feel like there's just something off. You can't say what it is. And then it turns out that,
you know, something actually, you know, happens that was kind of crazy, you know, that you,
that you weren't expecting to happen. And was just because you you sensed something that certainly you were picking up but wouldn't
necessarily show up in any kind of physical measurement of the situation and so i think
that in terms of trying to train people to think more statistically i would say that
you do need to it is important to train uh to be trained to it is something you kind of need to, it is important to train, to be trained to, it is something you kind of need to force yourself to do, at least in my case.
I mean, I'm inclined to follow my own gut instincts over statistics.
But a lot of times, you know, it is better to go with the statistics, but you have to really kind of train yourself to trust the statistics, but not too much, right? You definitely don't want to train, you don't want to be overly trustworthy
of the statistics because I think that's what leads us into situations like the replication
crisis or like these election predictions, which we're putting too much into these numbers
without thinking about what their limitations are and without kind of realizing that they are
limited in what they can tell us. And I think the only way to really be able to tell the difference is to stay human, right?
I mean, as humans, we're able to kind of determine when there's something that's maybe slightly off about a given situation and what isn't.
And so we have to know when to apply the theory and when not to apply the theory.
So there was a quote. I just wanted to, somebody said this to me once, I think it's kind of relevant here, which was that if all of your you know to handle uncertainty or the only way that you know how to operate under some kind of uncertain situation is to run a statistical analysis and do whatever the statistics tell you to do, then you're kind of in deep trouble because you're more or less a robot at that point, right? And I think that you've probably, or if you haven't, you probably should talk to somebody who is interested in this,
who is studying AI and how that affects our lives in kind of all different ways, right? But one of the scary parts about this is that we're becoming too dependent on these machines or too
dependent on these statistics. And if we do that, then we're very vulnerable
to kind of situations that the actual machinery
can't anticipate.
The black swans.
Exactly.
Yeah.
Yeah.
This has been so interesting.
Do you want to, I'm conscious of the time before we wrap,
do you want to talk about some of the things
you're working on at the moment?
So ideas you're working on around the thinking of probability as a shape and a couple of the other projects, which we spoke
about before we started recording. Yeah, sure. I mean, if we have a few minutes, I can just tell
you about it. I mean, so the idea, I won't go into the details too much about the probabilities
being shaped.
So this is something that I've been kind of very interested in over the past year or so,
which is trying to think about, well, if there's these situations in which the traditional probability theory doesn't apply to,
or isn't a very good way of explaining. It doesn't explain very well. Then, you know,
shouldn't we, you know, are there other ways of possibly explaining this? Are there better
theories? And so, what got me thinking about this was in thinking about the election predictions
and things like that. But which was, well, you know, so much is riding on this number, right? This 75% probability,
which gives this illusion of kind of some, this illusion of precision that's just really not
there. And most of the time when I make assessments of probability, I'm not thinking in such precise
terms. I'm making a rather general, ambiguous claim that something is probably true or whatever. And so, I've been
thinking recently about how to kind of formalize the idea of probability as something other than
a number, as something other than a numerical measure. And so, this has gotten me into an
interesting area of math, which is called homotopy type theory. And without going into any details on that, uh, essentially what these homotopy,
what homotopy types are, are, are abstract shapes in some sense or structures, if you
want to think of it that way.
And so what I want to think about is probabilities, not as being numbers, but as being structures
in some way of thinking about it.
And why, why do I want to think about probabilities as structures? Because I guess one way that I would think about doing, making decisions, making probabilistic type judgments
or doing plausible reasoning is not by, you know, when I do it instinctively, I'm not doing it by
calculating a number so much as kind of sizing up the situation that I'm in, taking into account the evidence or whatever it
is that I observed that I think is relevant, and seeing how that evidence fits into some
mental structure, mental picture that I have of kind of how everything fits together.
And so, I very much see kind of probabilistic judgments as judgments about structure and of trying to fit things together in
a way that makes the most amount most sense and so if we think about probabilities in that way
then that would make a probabilistic judgment just one in which we are kind of instead of
calculating numbers we're kind of calculating the way in which these different shapes fit together
and whatever however they fit together the best is kind of the probabilistic judgment that we make.
So that's a bit of an abstract description,
but that is kind of the way that I think about it
or the way that I'm trying to think about it.
And does that have the potential to resolve
some of these issues that we've been talking about?
How does it connect to some problems?
Yeah, so that's something I've thought about and I'm not exactly sure. issues that we've been talking about? How does it connect to some problems?
Yeah, so that's something I've thought about, and I'm not exactly sure. I mean, I think that
a lot of the situations that I can imagine this being a more relevant kind of model for
probability are in situations where we don't typically think of assigning numerical values
to the probability. So, I think I mentioned the example earlier where
a juror who makes an assessment in a courtroom, you know, what a juror is doing is they're taking
all of the evidence into account and they're assessing it on a holistic level and they're
kind of making a judgment, right? Or a mathematician, when a mathematician makes a
conjecture that a certain statement is true, they're basing that on kind of their own intuition of how mathematical objects relate to one another and why, based on the evidence that they have, should this new theorem be true, even though they haven't proven it yet um as far as whether it relates to some of these kind of mince burski things uh you know i
wouldn't i i guess i shouldn't speculate yet i think it's it's possible but i don't have the
answer to that just yeah yeah so you've got you've got a new book that's just come out yeah i think
so it just came out um about a week or so ago and uh so this is this you know ties in in some way
well in addition to
all the other things that we talked about, I mean, this was something that I've been working on
for the past couple of years. So the name of the book is The Probabilistic Foundations of
Statistical Network Analysis. So it's a bit of a technical, it's a bit on the more technical side,
although I've tried to mitigate the technical part as much as possible, because there's a lot
of people these days in a lot of different fields who are working on you know network analysis is there's social networks
people and then there's people in you know graph theory and math so it kind of
goes all across the spectrum but the the main idea behind the book really is just
so if you think about classical statistics and the theory of statistics that's been developed for the past 50 to 100 years has been mostly geared towards understanding basically sets of measurements, unstructured.
I guess I might call these more or less unstructured data.
Even though there is structure in the data, it's very minimal.
And so we're basically treating these measurements as individual numerical values or collections of measurements taken for individual entities.
And then we fit a model to that data.
And so we might impose some structure on that data by putting some kind of dependence into the model and all this.
But for the most part, the structure itself is not built into the data.
The data is just data points, data sets.
But in a lot of more modern data sets and in these network data sets, we have much more complex data where the structure is actually built into the data.
So when you have a social network, if I'm analyzing a social network there, I'm actually analyzing the structure.
If I were to analyze Twitter, I actually have to take into account the structure and all of the interdependencies and all of the interactions that people have, right? And so the way that this book is structured
is the way that I set it up is that I want to talk more about data that is of this form,
data that comes from complex systems, or even data that is of the form of a complex system itself.
And so for which classical statistics is not suited to handle these types of data. And so I, for which classical statistics is not suited to handle
these types of data. And so I talk a bit about the limitations of the current approaches to
network analysis. And I try to suggest a path forward for how we can kind of get out of a lot,
you know, kind of, in order to improve upon the current limitations, we need to, I think, build up a new theory or a separate theory of statistics for these complex entities.
For me, the way I see this, I think of this as, over the next 20 to 30 years, I see this as the future of data science.
In order to analyze data, it is going to be necessary to be able to handle structure in a
kind of systematic and sophisticated way. And so I see it as a necessity several years down the line.
Awesome. We're going to link to a few of these different papers in your book as well on the
show notes on our website. But is there another that that you wanted to to give a mention yeah one one quick thing i'll just mention is i something that's kind of a non-scientific a meta-scientific
thing that i've gotten involved in which is uh which is a an initiative a non-profit initiative
which with a friend of mine ryan martin who's a uh statistician at north carolina state and so this
is a this is a called researchers.one um and And you can go there if you want more information.
Right now we have some information up, and there's going to be a lot more that's coming out in the coming weeks and months.
But essentially what we've been working on behind the scenes and what I'm hoping to become operational in the next, you know, very soon, I guess, is a platform, an alternative platform to
academic publishing and scholarly publishing, which is geared towards putting a lot of the
control back into the hands of the authors, of the people who are actually the stakeholders in
this game, instead of putting the control into the hands of editors and anonymous referees and, you know, otherwise
other types of bureaucrats who, in a lot of ways, are responsible for things like this
replication crisis. So I guess this is kind of a reaction. This is a response to
a lot of conversations that I see people having on Twitter and people who come up to me
at conferences constantly complaining about, you know about how we need to improve the way that articles get peer reviewed and improve the publication process.
This is something people have been talking about for a very long time and do it in a way that emphasizes the quality of the work instead of falling prey to all the politics and the
bureaucracy and the corruption that's actually happening in these journals. And so even with
all of that complaining, there hasn't really been anything that's happened or nothing concrete,
no concrete alternative to the current model is really out there. So this is our effort to try
to provide such an alternative. And so I would just say to, if anyone interested, go to the website.
So it's researchers.one, it's dot O-N-E.
And anyone interested can also contact us
for more information.
Absolutely.
Well, I think we were speaking before we started
that we wanted to set ourselves the challenge
of making statistics and probability
into a very interesting conversation.
And I think we've definitely achieved that. This great been so enjoyable yeah so enjoyable i guess i mean one
of the main messages i suppose for the audience is just to bring a healthy skepticism to statistics
yeah i think that's it i think that's a good uh message for anything is that uh yeah you know
taking taking nothing for granted and question everything,
I guess.
And statistics is certainly something that deserves to be questioned.
Yeah.
Harry Crane.
Thank you very much.
Thanks a lot.
There you go.
Harry Crane,
ladies and gentlemen,
did you enjoy that?
I enjoyed that.
I enjoyed that a lot.
I am a bit of a nerd,
but I think Harry did an excellent job.
He's definitely not your average academic. He is a gentleman and a scholar. And for everything
relating to the projects of his that we discussed at the end just there, you can head to our website,
which is www.thejollyswagman.com
and we'll have everything up on the episode page.
So thanks again.
If you enjoyed, then please tune in again next week
where I speak with Stanford physicist Leonard Susskind.
Until then, ciao.