The Knowledge Project with Shane Parrish - #6 Philip Tetlock: How to See the Future
Episode Date: December 8, 2015In this episode of the Knowledge Project, I chat with professor and New York Times best-selling author Philip Tetlock about how we can get better at the art and science of predicting the future. ...Go Premium: Members get early access, ad-free episodes, hand-edited transcripts, searchable transcripts, member-only episodes, and more. Sign up at: https://fs.blog/membership/ Every Sunday our newsletter shares timeless insights and ideas that you can use at work and home. Add it to your inbox: https://fs.blog/newsletter/ Follow Shane on Twitter at: https://twitter.com/ShaneAParrish Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
Welcome to the Knowledge Project.
I'm your host Shane Parrish, editor and chief curator of the Farnham Street blog,
a website with over 70,000 readers dedicated to mastering the best of what other people have already figured out.
The Knowledge Project allows me to interview amazing people from around the world to deconstruct why they're good at what they do.
It's more conversation than prescription.
On this episode, I'm happy to have Philip Tetlock, professor at the University of Pennsylvania.
He's the co-leader of the Good Judgment Project, which is a multi-year forecasting study.
And he's also the author of the recently released Super Forecasting, The Art and Science of Prediction.
How we can get better at prediction is the subject of this interview.
We're going to dive into what makes some people better and what we can learn to improve our ability to guess the future.
I hope you enjoy the conversation as much as I did.
I want to talk about your new book, Super Forecasting, the Art and Science of Prediction that you wrote with, Dan Gardner, who like me, I think is still based in Ottawa.
In the book, you say that we're all forecasters. Can you elaborate on that a little?
Well, it's hard to make any decision in life, whether it's a consumer decision about whether to buy.
a car or a house or whether to marry a particular spouse, potential spouse, or a candidate
to vote for in an election, it's very hard to make any decision without forming at least
implicit expectations about what the consequences of that decision will be.
So whenever you're making a decision, there are implied probabilities built into that.
So the question becomes, are you better off with implicit probabilities that you don't
recognize as probabilities or explicit ones. And I think one of the major takeaways from the
forecasting tournaments we've been running is that when people make explicit judgments and
they're fully self-conscious about what they're doing, they can learn to do it better. And you're
talking about the Good Judgment Project? Can you maybe introduce us to that a little? Sure. Well,
the Good Judgment Project is a research program that my wife, Barbara Mellers and I, started several years
ago, it was supported by a branch, research and development branch of the US intelligence
community known as IARPA, Intelligence Advanced Research Projects Activity, which models itself
after DARPA in the Defense Department. And their mandate is to support research that has
the potential to revolutionize intelligence analysis. So working from that mandate, they decided
in 2010 to support a series of forecasting tournaments in which major universities would
compete, researchers
at major universities would compete
to generate accurate
probability estimates of possible
futures of national security
relevance. And
we were one of the five teams
selected for the competition in
2010. The tournaments ran
from 2011 to 2015.
They ended in June
of this year.
And the Good Judgment Project,
I am proud to say, was the
winner of those forecasting tournaments.
and I can explain more about what winning a forecasting tournament means later if you want.
Congratulations, yeah, definitely.
Is there a difference between forecasting and predicting?
I don't see one.
I think if you go to a phtosaurus, I think we're going to find their virtual synonyms.
Some people may try to draw distinctions of one sort or another,
but I see them essentially as distinctions without a difference.
And so were you using a representative subset of the Good Judgment Project,
or were you using super forecasters from the project, or how are you competing in that?
Well, different universities and different teams of researchers
took different approaches to generating accurate probability estimates.
We recruited thousands of forecasters,
and we explored a number of different techniques
for eliciting the best possible probability estimates from those forecasters.
We are continually running experiments.
And one of the experiments we conducted was to identify top performers in each year, the top 2% of performers each year, cream them off into teams, elite teams with super teams of super forecasters, and give them as much support as we could, intellectual support as we could, for their task, and see what would happen.
And they really went to town.
They did a phenomenally good job.
blew the ceiling off all of the performance expectations that are up ahead for what was possible.
And frankly, they certainly exceeded my expectations as well.
So some of us are good and some of us are bad and some of us seem like way off the chart
at making predictions. Why are some people so good?
That is indeed the $64,000 question. Why are some people so good?
So the skeptics argue that if you toss enough coins, enough time, some of them are bound to come up heads.
So the super forecasters are just super lucky.
So let's treat that as kind of the default skeptical hypothesis.
There's nothing special about super forecasters.
If we ran a tournament in which the task was, say, to predict whether a fair coin would
land heads or tails, some people would do better than others just by chance in a given year.
We could anoint those people as super coin toss predictors, and we could say, well, how are they going to do next year?
and what we would find is perfect regression toward the mean.
The best prediction is that the super coin toss predictories in year one
will be essentially around the average in year two.
And the worst predictories will progress upward toward the mean, of course.
So that's what a pure chance environment would look like.
Well, what we find in the Europa tournament
is that there certainly is an element of chance
in predicting geopolitical and geo-economic outcomes,
but the skill luck ratio seems to be about 70-30.
So you're not observing a great deal of regression toward the mean among super forecasters,
but there inevitably is some regression toward the mean among the top performers.
And so what makes those people so good?
Well, now that we've eliminated or at least rendered implausible, the super lucky hypothesis,
the question becomes, what are the attributes these super forecasters have?
You might think of them as being stable, like the logical attributes,
are they score higher on measures of fluid intelligence or crystallized intelligence or active
open-mindedness, did they have certain attitudinal profiles, certain behavioral profiles? And the answer
is all of the above. They differ from ordinary mortals in a host of ways. They're not radically
different from ordinary mortals, but they are systematically different. They tend to score higher
on measures of fluid intelligence. They tend to score higher measures of active open-mindedness. But if I had to
identify one factor that I think best distinguishes super forecasters from, from other
forecasters who are equally intelligent and equally open-minded, it is that super
forecasters believe that probability estimation of real-world events is a skill that can be
cultivated and is worth cultivating. And they're willing to make that commitment, that effort.
So when people ask me, how could the super-forecasters have outperformed, say, intelligence
analysts who do this full-time and have access to classified information,
I think the short answer is it's not because they're smarter and it's not because they're even more open-minded, although they are pretty open-minded.
It's because they are willing to make this commitment, this act of faith, that there is a skill that is worth cultivating.
So in the book, we quote Aaron Brown, who's the chief risk officer at AQR and also a great poker player,
that his view is you could distinguish great players from talented amateurs on the basis that great players are good at distinguishing 60-40 bets from 40.
60 bets.
And then he paused and says, no, maybe more like 55, 45, 45, 55.
The greatest players tend to be extremely granular in their assessments of uncertainty.
One of the big questions I think that IARPA wanted us to answer and that I think we have
answered in the affirmative is, does granularity in assessments of uncertainty pay off not just
in poker, but when you're making messy real-world judgments, like whether Greece is going to
leave the Eurozone or what kind of mischief Putin might be up to in the Ukraine next or
what's going to happen with Sino-Japanese relations in the East China Sea or there's going
to be another outbreak of bird flu in a given region. These are extremely idiosyncratic
one-shot historical events. It's not like poker where you're sampling from a well-defined
sampling universe, repeated play, quick feedback. So there are a lot of people, very smart people have
been skeptical for many decades, that it's even possible to make probability estimates of these
kinds of intelligence analytic problems. And I think what the IRPIT tournament has proven,
beyond reasonable doubt in my opinion, is that there is room for improvement. It's possible to make
these probability estimates. It's possible to get better at it. It's possible to identify the
kinds of people who learn to do it better. It's possible to develop training modules to help people
do it better. And the gains and accuracy are appreciable. So what happened when you took
average people and you started giving them, I think I remember this, that you started giving
them a course and probability?
We get about, for average forecasters who are randomly assigned to an experimental condition
in which they get connem and style deb-biasing exercises, the improvement is in the vicinity
of 10%.
And that's a big effect when you consider that, you're talking about improvement across
the entire year of forecasting, and this training exercise takes.
takes about 50 minutes.
And what did that consist of this 50 minute training exercise?
Some basic ideas about heuristics and biases and how to check biases.
For example, one of the classic coneman arguments is that people don't give enough weight
to statistical or base rate information in assessing the probabilities of events.
They're too quick to take the inside view.
So if you're attending a wedding and you see the happy couple
and you're impressed by how much in love they are
and the enthusiasm of the moment and someone asks you,
how likely are they to get divorced,
you're not likely to consult national divorce statistics
for that SES subgroup.
You're likely to say,
hmm, they look really happy and compatible.
I'm going to touch a very high probability
to they're not getting divorced.
And the net result of making predictions in that way
is that you're going to be,
what is less accurate than you would have been if you had at least started your estimation
process by saying what are the base rates of divorce and now I'm going to adjust that based
on whatever idiosyncratic factors are present in this particular relationship.
So starting with the outside view and working your way inside?
Exactly. Start with the outside and work inside. That's a, it's one of our mantras.
So, but isn't Conman famous for saying that he's studied biases his whole life and he feels
like he's no better at avoiding them.
So how does this 50-minute training exercise come in and help people?
Well, Danny Kahneman was a colleague of ours at Berkeley.
My wife and I bar, we know him well.
And we know that he is more pessimistic about the prospects for debiasing than we are.
He did give us advice on how to design the debiasing modules.
I think he probably is more of a pessimist still than we are,
but I think he is persuaded that these improvements are real.
They certainly seem to be.
So one of the keys to keeping track of forecasting and your ability to predict is kind of keeping score.
And do you think it takes a certain type of person to want to keep score?
I mean, most of us are happy to kind of weasel out of or use uncertain wording or jargon
when we're going about making decisions so that even if we're wrong, we can kind of say,
well, that's not what I meant.
Absolutely.
It does take a particular type of person, and there are many factors that come into play.
I think it certainly helps to be open-minded, but there are other things that come into play.
They're a little more, say, sociological.
I've been doing forecasting tournaments for over 30 years now, and I started when I was about 30 in 1984.
I'm 61 years old now.
So if I were an intelligence analyst, a 61-year-old intelligence analyst, I would be a very senior analyst.
Um, and let's just say for sake of argument that, that I, uh, I am a senior analyst in, in, in the U.S. intelligence community. I'm on the national intelligence council say, just for sake of argument. And I, I'm the go to guy on China. So when Xi Jinping comes into town, people say to me, you know, what's what's going on. I have inputs into the presidential daily briefing and help with national intelligence estimates. And I'm at the top of the status pecking order within the IC on China. And someone comes along like IARPA is,
this upstart research and development branch for the Office of Director of National Intelligence.
And they say, hey, you know what we're going to do?
We want to run forecasting tournaments now.
And everyone's going to compete on a level of playing field.
And 25-year-old China analysts are going to compete against 61-year-old analysts like Tetlock.
And we're going to see who does better.
Are the 61-year-old analysts going to welcome this development?
No.
To ask us to answer.
Even open-minded 61-year-olds are not going to be very enthusiastic about this.
They're going to argue that these turn limits just don't really capture what makes my judgment special.
And that is indeed a lot of the resistance we run into for forecasting tournaments.
I mean, in the book, you may remember we talk about the parable of two forecasters at the beginning,
Tom Friedman and Bill Flack.
Almost everybody who reads newspapers knows who Tom Friedman is, famous New York Times columnist,
Middle East expert, often in the White House or Davos,
God knows where.
And Bill Flack, nobody has a faintest idea who he is because he's an anonymous, retired irrigation
specialist in Nebraska who happens to be a super forecaster.
And we know a tremendous amount about Bill Flack's forecasting track record.
We know almost nothing about Tom Friedman's forecasting track record.
Right.
And that's in substantial part because Tom Friedman's forecasts, and he does make forecasts,
are embedded in vague verbiage.
He says that this could happen or this might happen.
And when you say something could or might happen, that could mean anything from 0.1 to 0.9
in probability terms.
And, you know, if it happens, I can say, well, I told you it could.
And if it doesn't happen, I can say, look, I merely said it could.
Right.
You can't get paid down.
You've covered very nicely.
Yeah.
Do you think that that's one of the problems with organizations?
I mean, it seems like we're not getting better as organizations at making decisions, in part,
because our ability to keep score is, you know,
hampered by these psychological kind of effects where, you know,
if I keep score, I might be wrong, so my incentive is not to.
And if I use precise wording, it might be wrong,
so my incentive is not to.
Yes.
Yeah, I think there's a whole mix.
There's a real mixture, powerful mixture,
of psychological and political forces that interact to create a lot of resistance
to forecasting tournaments.
So even though I think we have shown
that forecasting tournaments can appreciably improve probability estimates, there are a lot of
reasons why organizations don't adopt them. One is the people at the top of the status hierarchy
are not very enthusiastic. Bob, who's in the CEO suite, isn't all that enthusiastic about
being discovered that Bob in the mailroom is just as good as he is at anticipating trends
relevant to the company's future. So you have the status hierarchy problem. People at the
top don't want to be a second guess. They don't want their judgment process to be demystified
A large part of status in contemporary organizations is that there's something special about your judgment.
So even open-minded high-status people are going to be reluctant to do this because it's going to look like a career-damaging move.
So there's certainly that.
And there's a lot of other factors in play.
I mean, there's, again, this Kahneman argument that people don't pay attention to the outside view.
In the book we talk about a mistake that a New York Times, famous New York Times journalist, David Leonhard,
You may know him.
He runs the upshot in the column and the New York Times.
He's a quant-savvy journalist.
And he made a mistake in 2012 that we talk about
that illustrates just how tenacious the misconceptions can be.
He was commenting on the Supreme Court decision to uphold Obamacare in 2012.
It was a narrow decision.
It was 5'4.
and he noted that the prediction markets had had futures contracts on this decision,
on the Supreme Court decision, and they were pricing it at about a 75% probability of the law being overturned.
Okay, so they were way off.
And he said, well, how far off is way off?
He said, well, they got it wrong.
He just said flat out got it wrong.
That doesn't account for the complexity, right?
That itself is wrong.
It certainly isn't good news.
that the prediction market that it was on the wrong side, it may be by that margin,
but prediction markets have generated hundreds of forecasts over many years, and they've proven
to be pretty darn well calibrated, which is another way of saying, when they say 75% probability
of something happening, things happen about 75% of the time, and they don't happen about 25%
of the time. So even if you have a perfectly calibrated prediction market system doing, when it
says 75% 25% of the time, smart observers, observers are smart as David Leonhardt
are going to be tempted to conclude that you're wrong and to dismiss you. So this creates
a huge political incentive to stick with vague verbiage. If they simply said it could be overturned,
you know, they would be well positioned to explain it either way. But because the prediction
market was generating these precise probability estimates and because people don't take the
outside view and say, well, we can't just look at that particular forecast, we have to put
it in the context of all these other forecasts that the system is generating, take the
outside view toward the system, people have a very hard time doing that.
David Leonhardt knows that this is true, and he's even written later on the upshot about
situations in which I read about this fallacy.
So if someone as smart as that, who doesn't have a grudge against prediction markets can
make a mistake like that, you can see why politically savvy intelligence analysts might be
reluctant in a blame game culture like D.C.
do it. Right. I think one of the most interesting parts of the book for me was when you started
talking about the Fermi-style thinking. Can you introduce us to that? Well, Enrico Fermi was
Italian-American physicists who developed the first nuclear reactor at the University of Chicago. He was
involved in the development of atomic bomb in World War II. And he was known for his rather
flamboyant thinking style. He was continually coming up with innovative.
ways of estimating the seemingly
unestimatable.
One of the famous examples of
a Fermi problem was, you know, it sounds
really weird. It was to estimate the number
of piano tuners in Chicago.
Other examples might be estimating how much
the Empire State Building ways
or estimating the likelihood
of extraterrestrial civilizations
elsewhere in the Milky Way.
Sounds a lot like the brain teasers that Google
used to ask to hire, right?
Exactly. Now, I don't know whether Google,
whether the legal department
It still allows Google to continue using those for screening potential personnel.
But they are interesting tests of how people approach problems.
And what was so interesting about the way that Fermi approached it?
He really believed in flushing out your ignorance and decomposing the problem
into as many tractable components as possible.
So you would start by how many stars are there in the Milky Way, roughly about $100 billion.
you'd say, well, how many of these stars have planets orbiting around them?
You might look at the most recent data from Kepler, which has done some reconnaissance in our local area,
about 60 light years around, and you say, well, you know, it looks like a fair number,
a pretty high percentage of stars do seem to have planets going around them.
Let's say it could be as much of half or maybe slightly less.
I don't really know the answer to that question, but you make initial guesses.
You flush out your ignorance.
And then other people can come back and they can see that Tetlock said about half
and they say, oh, Tetlock doesn't understand what Keppler is doing.
It should have been 70%.
No, it should have been 30%.
But it's not that Petlock is getting it right.
It's that we're flushing out Tetlock's zone of ignorance and we're making it clear
and it's all open and transparent.
And then we would, you know, in the net process of the inquiry would continue how many planets are in the habitable zone.
And you direct some further guesstimate from Kepler.
It's a fairly small fraction of planets seem to qualify for that.
And, but that still might leave you with, say, as many as 500 million to a billion planets that are potentially inhabitable zones.
And then you'd have to make some estimate about how likely is life to jumpstart if you have a planet in a habitable zone
and how likely is intelligent life to emerge once you have.
And there are different evolutionary theorists
who have different models
that at least have some of different implications
as answers to those questions.
And what you would wind up with would be ranges of probabilities.
Now, for this particular problem,
the range of possible probability is going to be very large.
You know, we know it's not impossible.
There's another advanced extraterrestrial civilization in the Milky Way.
We also know it's not a sure thing.
It's probably, you know,
In my best estimate, if I were to combine all the different steps that we just started to work through,
it would be probably more than one or two percent, but I don't think it would be as high as 90 percent.
It would probably be maybe it could take between two and 50 percent.
Now, that's a guesstimate.
Now, there's nothing special about that number, but what Tetlock has done now,
if he's flushed out, Tettlock me, I'm talking about my south third person here,
what the Fermi person, the Fermiizer, the person using the Fermi method is done, he or she has
flushed out all the different points of ignorance along the reasoning continuum.
And you, the observer, can say, oh, look, Tetelac made a really stupid estimate here, and
you have to adjust that, but it's a basis for proceeding.
But initially look like a hopelessly intractable problem, at least becomes at least
a little more tractable.
And that's what Super Forecasters are pretty good at doing.
at breaking down seemingly intractable problems into semi-attractable components and then just pushing,
they're not afraid of looking stupid and making estimates that observers can see and look at it and say,
oh, my God, why did you say something that's stupid about the capital project?
That's an incredible point where you're taking this big intractable kind of problem
that's very hard to pin down, and you determine you have some organized process for determining
the sub-components involved to get you there, and then you go through and estimates,
So part of that would be highlighting your thinking, right?
Yes, sir.
And then part of that would be like, I really don't know anything about this question.
So can I break that down further into subcomponents, or am I extrapolating too much?
No, that's exactly the spirit of the enterprise.
So why is that style of thinking?
Why does it lend itself, do you think, to better forecasting?
Is it just the nature of the changing the framing of the problem itself, or do you think it's more the curiosity of
the people who are willing to break it down and go through, it sounds like a lot of work.
It sounds very demanding and mentally taxing to do that versus just throughout an estimate
with your immediate response.
You're exactly right.
It is demanding, and I think it works best if it's done in a team environment in which
members of the team have mutual respect for each other, but they're also willing to push
each other hard.
So if you were an organization and you wanted to set up a team environment, like a
forecasting team within a large company, say IBM, how would you go about doing that with your
knowledge? That's a great question. And I'm a little bit wary about saying that
organizations should try to construct super teams the way the Good Judgment Project did. Because
team construction has a lot of implications for other parts of the organization. That can be tricky.
I mean, imagine that if you just did what we did in the IARC tournament to win it,
and you just identified the very best people, brought them together and nurtured them
and helped them and pushed them hard, that would be a very elitist and somewhat divisive thing
to do in many organizations.
And it could cause a lot of political friction.
Now, we didn't care a lot about that because we were in a forecasting tournament.
We didn't really have an organization in the traditional sense of the term.
We wanted performance engine.
right we wanted to harness human ingenuity individually and collectively as rigorously as possible to generate as accurate as possible probability estimates for things that you tell this community cared about that was it was a pure accuracy game and we weren't we weren't that interested in the long-term viability of the organization we were interested in just pure accuracy so I would I would be a little cautious about saying you know it's really easy all you do is you recruit these super forecasters and you put them into these teams and you give them some training
on how to do precision questioning, and you give them some training on how to do constructive
confrontation, and you've got these anti-group think norms enforced, and you give them some
training and guidance and probabilistic reasoning, and you encourage a certain self-critical structure
and culture inside the teams, and boom, an amazingly accurate forecasts emerge.
It works pretty well in the forecasting tournament environment, but whether it would work well
in an actual organization, I think the senior executives who want to think
carefully about each step along the way there.
What would you say to people inside an organization, how can they use your research to make
better decisions inside their company?
Well, I think it's something you want to consider seriously that when people make forecasts
inside most organizations today, accuracy is only one of the goals that they're pursuing.
They're also interested in making forecasts that are going to be difficult to falsify.
I said they can't be embarrassed.
So a lot of the forecasting inside organizations doesn't involve numbers.
It involves a lot of vague verbiage.
They're also interested in making forecasts that don't annoy other people in the organization.
They don't want to tip the political apple card over.
So they're compromising accuracy in a whole host of ways that help promote their careers inside the organization,
help to maintain political stability in the organization.
but that aren't all that centrally focused on accuracy.
Forecasting tournaments are really weird
because they focus 100% on accuracy.
That's all that matters.
So I guess the thing you'd want to consider as an executive would be
do I want to reserve part of my organization's
analytical processing capacity for a pure accuracy game?
So I want to incentivize some small group of the people in my organization
to play pure accuracy games in forecasting tournaments.
and those probability estimates would then filter up to senior executives to guide decision-making.
I think it's really an interesting experiment to consider doing.
I think the intelligence community has been moving somewhat in that direction.
I think it's a good idea, and I think it would probably be a good idea for many other entities as well,
at least to consider.
It's in the spirit of the whole IARP enterprise is to run experiments.
And what I would propose would be that senior executives consider running experiments
in which they see what do they, what do they?
discover when they incentivize people to play true accuracy games.
And do you think what transfers from your research into the decision-making process in a
corporation, not necessarily about forecasting, but about how we go about organizing,
unpacking, synthesizing, multiple views, how does that transfer do you think into a learnable
skill that people can have inside of an organization? There are many ways that could happen.
We put a lot of emphasis in the Good Judgment Project on synthesizing diverse views into aggregate forecasts.
And I think one of our major performance engines was the statistical or aggregation algorithms that our statisticians developed for doing that.
When IARPA started this whole exercise, they thought it would be really hard to do better than 20 or 30 or 40% better than the unweighted average of the control group forecasters.
And our super forecasters exceeded that performance benchmark.
quite substantially each year of the tournament.
They did so well that IARPA essentially suspended the tournament after two years
and we were able to absorb the other teams into our teams in substantial ways
and compete against the intelligence community and against the prediction market baselines
instead of the other universities.
Now, how did all that come to pass?
I think the aggregation algorithm developed, and if I had to credit two big things,
as responsible for the victory at the Good Judgment Project.
One of them would be the super forecasters,
and the other would be, they would call them super algorithms,
the great algorithms that our statisticians develop.
Now, when I describe these algorithms,
at some level you're not going to be too surprised at first,
but there is one aspect of them that does surprise most people.
So the first thing to do,
I don't know if your listeners are familiar with the James Serwicky
wisdom of the crowd book,
But it's been well known in the forecasting world that the average of a group of
forecasters, the average forecast from those forecasters, is going to be more accurate than
most of the individuals from whom the average was derived.
And this is the famous Galton story about the ox.
You had hundreds of people trying to guess the weight of the ox.
And the average of all those guesses was only about one or two pounds off from the
original from from the true weight of the ox and that would that means it was more accurate than
all of the individuals from the average was derived so averaging is a powerful way of synthesizing
information from diverse perspectives it's it's really it's a remarkably crude approach to doing it
but it works pretty darn well and that's why iarpa used it as its benchmark now we were able to
be averaging by doing some simple things like giving more weight the better forecasters as we get
more and more data on who the good forecasters were, who the more intelligent forecasters
were, who the more frequent belief updators were, various attributes of forecasters, we're
able to give more weight to certain forecasters and we created weighted averages.
The weighted averages beat the average.
That's not too surprising, is it?
I mean, it makes sense.
It's not astonishing, though.
Now, here's the interesting thing that the algorithms did.
They did something called extremizing.
And to illustrate extremizing, I want to just to have a little digression of a story that we do talk about in the book about the decision President Obama made to go after Osama bin Laden.
In the movie 030, they have a scene in which senior analysts are being polled on how likely they think it is that Osama bin Laden is in that compound.
And putting aside what Hollywood says about it.
Let's just do a little thought experiment.
And imagine that you're the president of the United States
and you have these senior advisors around the table
and you ask them, how likely is it that Osama is there?
And each of the analysts around the table says,
do you miss the president?
And I think the answer is 0.7.
0.7, 0.7.
Everybody around the table says 0.7.
What should the president conclude
is the likelihood that Osama bin Laden is in that compound?
And the short answer to that is,
well, if the advisors are all clones of each other,
and they're drawing on exactly the same information
and processing it in exactly the same way,
the answer is 0.7, because there's no information added, right?
But imagine that the analysts say 0.7 all around the table,
but the analysts don't know each other,
and they haven't been sharing information,
and each analyst bases his or her 0.7 judgment
on information that only he or she has.
So you have extreme diversity of perspectives.
One person has satellite information, another has encryption breaking stuff,
and another one has human intelligence and so forth.
But they're siloized, and they're coming together for the first time,
and each one has independently arrived at this 0.7 estimate from very different sources of information.
You've got true diversity here.
And is the answer still 0.7?
Should the president say, shrug and say, well, I think the answer is 0.7,
or should the president say, gee, each of you has very different reasons for believing 0.7?
this leaves me to suppose that the answer is probably more extreme than 0.7,
because if each of you knew the reasons the others had,
you would probably become more extreme.
And that's exactly what the best algorithm did.
It extremized as a function of diversity.
So 0.7 was turned into 0.85 or 0.9.
That's fascinating.
I mean, how did it go about doing that in terms of aggregating the data
from the people or from the forecasters?
That's right, from the forecasters.
And what would happen if you had two forecasters who have great track records, and then they're divergent on, they're really divergent on an opinion or a forecast?
Is that happen often?
No, it doesn't happen very often, actually.
But if it did happen, it would be a real cautionary moment.
If you had two super forecasters, one of whom was at 0.9, there was a 0.1, my inclination would be not to stray too far from 0.5, knowing nothing else.
at the moment. Are there certain types of questions to avoid if your desire is to have an accurate
prediction? Yes. Well, there are many questions in the IARPA tournament. There are many questions
in life in which there's a massive amount of irreducible uncertainty. If you want to be a good
forecaster, you don't spend very much time working on roulette wheel type problems. I mean,
if you go to, if you visit casinos, you'll find lots of people who think they can detect patterns
and roulette wheel spins.
And they develop little algorithms even to help them.
But what they're doing is they're essentially modeling randomness.
So spending a lot of time modeling randomness is a good way not to become a super forecaster.
What other types of questions would you say don't lend themselves to, is it like a time
duration?
Is it?
Oh, what other kinds of questions are roulette wheel like?
Well, not roulette wheel, but if you're, what kind of, what kind of,
What type of questions lend themselves to better predictions, right?
Is it short time, very few, I mean, I don't want to say very few variables,
but short time duration versus long time duration, because you have to constantly update
over a long period of time, right?
I mean, that was one of the things that super forecasters did was they updated there.
Yes, that's true.
Well, yeah, all of the things equal, it's usually easier to predict questions
with shorter time ranges than longer time range.
But that's not always true.
I mean, some short-range questions are extremely unpredictable.
It's very hard to say whether the stock market is going to go up or down tomorrow.
So that's a short-range question.
In some ways, it's easier to predict where the stock market is going to be up or down 10 years from now
relative to now than it is tomorrow, right?
That's a good point.
So there are categories of problems in which you get a reversal of that.
But, yes, I think by and large it's true that the analogy to vision would be, right?
It's easier to see the Snell and I chart if you're close to it,
and you're far from it.
Probabilistic foresight is better in shorter time ranges.
That's one of the things I talk about in the book,
one of the reasons why my later work is different in emphasis
from my earlier work,
in which experts had a hard time beating the dark-thowing chimpanzee,
because they were in the earlier work making much longer-term predictions
than they were in the IARPA work,
where the predictions were rarely much more than a year.
You mentioned open-mindedness at the beginning.
How do we go about fostering open-mindedness?
Are there ways that we can improve that in ourselves or other people?
Well, that's another thing we do try emphasize in the training.
Exerting people simply to be open-minded is most people don't think they're close-minded.
Most people think they're quite reasonable.
And simply exhorting people to be open-minded, people struggle and say, well,
yeah, I already am. I think you want to start in a more specific ways. You want to start
with very specific problems in which you assess whether people change their minds in an appropriate
way. So there are some normative models like base theorem that tell you how much you should change
your mind in response to evidence that has certain diagnostic value. And you can create simulated
problems. It may be medical diagnosis problems. It might be economic problems. They might be
military problems, but you can create simulated problems with simulated data, and you can see
whether people learn to practice to update their beliefs the way they should.
Now, there's always a question whether those lessons are going to stick, and we found that
they do stick a little bit because they can produce 10% improvement throughout the year.
But it's one of the great challenges.
I don't think we've solved the problem of how to make people more open-minded.
I think we can make people better belief updators on problems where they don't have very strong ideological priors or preconceptions.
But when people have really strong emotions and ideological convictions about presidential candidates or economic policy or whatnot, belief updating becomes quite problematic.
Yeah, I mean, I can see why that would be a problem, right?
It contradicts probably something that you hold very dear and true.
Giving that up would take a lot of mental issues.
Yeah, we can make people a bit more open-minded, but making people perfect Bayesian belief
updators is something that no one has achieved yet, and I think it will be very difficult to
achieve. I think we should keep working on it. I don't think we should give up.
Do you think that super forecasters were better at learning from the other super
forecasters than the, say, average forecaster? Like if somebody had a better approach, would they
copy it? Would they just drop their own internal approach? I think they listened to each other
quite carefully in the super forecaster teams.
Even when they disagree with each other,
they disagree diplomatically, but they can disagree quite forcefully
about what lessons they should draw from particular forecasting failures
or even forecasting successes.
I mean, it's fairly common for regular forecasters even to say,
well, what did we do wrong with the forecasting failure?
And supers do that too, but they also second-guess their successes.
They say, well, were we lucky, we really mailed this question, but were we lucky?
Could it have gone otherwise?
Were we almost wrong?
That's an unusual question for people to ask themselves.
People don't normally look a gift horse in the mouth.
When they're right, they want to take credit for it.
And super forecaster skepticism even extends to their forecasting successes.
I can't imagine a lot of the average or below average in terms of forecasting ability
people went through their successes and evaluated them from that angle.
What would you say is the role of intuition in forecasting,
or would you say that it's minimized, or would you say that it's...
This is one of the big debates in the field of judgment and decision-making.
Malcolm Bladwell wrote a book called Blink,
and some psychologists wrote a rejoinder book.
much less widely read, called Think.
There are different schools of thought about the value of intuition.
And even Gladwell, of course, has devised it in his book.
He did point to some great successes of intuition,
but also noted the situations in which intuition could lead you seriously astray.
I think the dominant emphasis in our work, it leans toward think over blink.
I'm not ruling out the possibility that there are super-forecasters who do
rely on intuition but the problems that we're dealing with in real world are
different from the sorts of problems where brilliant intuition has been
demonstrated pretty rigorously so it's not like chess where you're playing the
same game with well-defined rules right the pattern recognition really smart people
can do extremely rapid forms of combinatorics and pattern recognition and
and it's quite astonishing what they can do.
Real world isn't quite like chess, is it?
And I think that it requires more subtlety
and more willingness to second guess yourself
because history, I think it was Mark Twain who said
history doesn't repeat itself, but does rhyme.
And I think Super Forecasters sort of get that.
There are patterns in history, but they're quite subtle
and they're quite conditional.
And you can easily over-learn from history.
Hmm. That's a really good point. What book would you say has had the most impact on your life?
On my life. On my life? That would have to be a book I read very early on in my life.
Oh, possibly, yeah.
Yeah. I think, well, I don't know how far back we should go on this one. I mean, if I were to go back to graduate school, say, when I was making decisions about what I would do with my research career, there was a book by Robert Jervis who still, I think he's an, he's maybe an emeritus,
professor now at Columbia, but he's a very senior political scientist. He wrote a wonderful
book in 1976 that I was in graduate school in 19, I just started in graduate school in
1976. And it's called perception and misperception in international politics. And it is
a wonderful synthesis of psychology and political science. And I think it is a synthesis of the
sort that I've aspired to, I've tried to be Jervisian in my work in many ways. Now, Dervis is not
quantitative researcher. He's qualitative, whereas I'm more quantitative. So we differ in a
number of ways. But I have a deep respect for how he was trying to synthesize the psychological
and the political. And I suppose if there's any themes running through my work at synthesizing
the psychological and the political. So the last question is, who would you like to see interviewed
on the show and their thoughts articulated or explored with me? Well, I've always been a fan of Michael
Lewis's work. I think he would be a fun person to talk to, and I think he may be working on
a biography of Daniel Kahneman, the name of Tversky. I think that would be an interesting
conversation. Well, excellent. Thank you so much, Phil, for taking the time. I really
appreciate it. It's been a great conversation. Oh, it's a pleasure.
Hey, guys, this is Shane again. Just a few more things before we wrap up. You can find show notes
at Farnhamstreetblog.com slash podcast.
That's F-A-R-N-A-M-S-T-R-E-E-T-B-L-O-G.com slash podcast.
You can also find information there on how to get a transcript.
And if you'd like to receive a weekly email from me
filled with all sorts of brain food,
go to Farnhamstreetblog.com slash newsletter.
This is all the good stuff I've found on the web that week
that I've read and shared with close friends, books I'm reading,
and so much more.
Thank you for listening.
You know,
I'm going to be.