Freakonomics Radio - 233. How to Be Less Terrible at Predicting the Future
Episode Date: January 14, 2016Experts and pundits are notoriously bad at forecasting, in part because they aren't punished for bad predictions. Also, they tend to be deeply unscientific. The psychologist Philip Tetlock is finally ...turning prediction into a science -- and now even you could become a superforecaster.
Transcript
Discussion (0)
There is a website called Fantasy Football Nerd.
It aggregates predictions from roughly 40 NFL pundits to produce what it calls the industry's
most accurate consensus rankings.
Now, how accurate is the consensus? Let me give you an
example. Earlier this season, the Carolina Panthers were playing the Seattle Seahawks. Only two of the
pundits picked Carolina to win. 36 picked Seattle. And you can see why. Seattle has been one of the
best teams in the league for the past several seasons. They won the Super Bowl two years ago,
nearly repeated last year. They'd be playing Carolina in Seattle, where the past several seasons. They won the Super Bowl two years ago, nearly repeated last year.
They'd be playing Carolina in Seattle,
where the home crowd is famously,
almost punishingly supportive.
So even though Seattle had won only two games this season against three losses,
and even though Carolina was an undefeated 4-0 at this point,
the experts liked Seattle.
They liked their pedigree.
But Carolina won the game, 27-23.
It's the hook and ladder. Lockett has it.
Lockett's being tackled, flips it, ball's loose.
Recovered by Seattle at the 40.
Carolina has won the football game.
What an unbelievable, validating, respect-taking road win for the Carolina Panthers.
Soon afterward, Carolina quarterback Cam Newton faced the media.
Cam, before the Seattle game, a lot of the national media was down on this team.
After you guys won that game, now a lot of the national media says,
this is one of the best teams we've seen this year.
Do you ever find it comical the way that a lot of these people think,
you know, that, hey, this team is all right, this team is not good?
Oh, it depends.
But I find all media comical at times.
Because I think in you guys' profession, you can easily take back what you say.
And you don't get, there's no danger, you know, when somebody says it.
You know, if it was a pay cut or if it was an incentive,
if picking teams each and every week,
you know, you make it a raise. I guarantee you people will be watching what they say then.
So first of all, let's give Cam Newton a medal because he just articulated in about 10 seconds
a big problem that experts in many fields, along with TV producers and opinion page editors and government
officials, either fail to understand or acknowledge, which is this. When you don't have skin in the
game and you aren't held accountable for your predictions, you can say pretty much whatever
you want. I completely agree with Cam on that.
That's Jonathan Bales.
A lot of the beat writers in the NFL or cross sports,
they just can say whatever they want and there's no incentive for them to be correct.
And I do think that for the most part,
they are very bad at making predictions.
Bales can't afford to be bad
because he plays fantasy sports for a living.
People who have something to lose from their opinions or the predictions that they make
are incentivized to make sure that they're right.
Bales is 30 years old. He lives in Philadelphia. He's written a series of books called Fantasy
Football for Smart People.
In college, he was a philosophy major, but he also loved to analyze sports.
Yeah, I was really interested in in-game strategy.
So why are coaches doing all these things that even anecdotally, they just seem very wrong.
Many of the best fantasy sports players, he says, have a similar mindset.
We question things and we want to improve and we ask why a lot.
Like, why am I making lineups this way?
Is this truly the best way?
Just always questioning everything that we do, taking a very, very data-driven approach
to fantasy and adapting and evolving.
Adapting and evolving, using data to make better decisions,
challenging the conventional wisdom, that all doesn't sound so hard, does it?
When you think that all experts everywhere would do the same,
or at the very least, when you think that we would pay better attention to all the bad predictions out there, the political and economic and even sports predictions,
and then do something about it?
Why isn't that happening?
That is indeed the $64,000 question.
Why very smart people have been content to have so little accountability for accuracy and forecasting?
Today on Freakonomics Radio, let's fix that. And while we're at it, why don't we all learn to become not just good forecasters, but super forecasters.
Am I a super forecaster?
The short answer is not really.
Okay, we won't learn it from him. But maybe from him.
I didn't have any background and had to learn it all from the start.
Or from her.
I had totally missed the 2007-2008 financial crash.
I really had very little expertise in terms of international events.
On the other hand, I'm fairly skeptical of forecasting.
From WNYC studios, this is Freakonomics Radio, the podcast that explores the hidden side of
everything. Here's your host, Stephen Dubner.
If you're a longtime listener of this program, you've met Philip Tetlock before.
I'm a professor at the University of Pennsylvania,
cross-appointed in Wharton in the School of Arts and Sciences.
We spoke with Tetlock years ago for an episode called The Folly of Prediction.
I think the most important takeaway would be that the experts think they know more than they do.
They were systematically overconfident.
Which is to say that a lot of the experts that we encounter in the media and elsewhere aren't very good at making forecasts.
Not much better, in fact, than a monkey with a dartboard.
Oh, the monkey with the dartboard comparison that comes back to haunt me all the time.
Back then, I asked Tetlock to name the distinguishing characteristic of a bad and overconfident forecaster.
Dogmatism.
It can be summed up that easily. I think so. I think an unwillingness
to change one's mind in a reasonably timely way in response to new evidence, a tendency when
asked to explain one's predictions to generate only reasons that favor your preferred prediction
and not to generate reasons opposed to it. Tettlock knows this because he conducted a remarkable long-term empirical study focused
on geopolitical predictions with nearly 300 participants.
They were very sophisticated political observers.
Virtually all of them had some postgraduate education.
Roughly two-thirds of them had PhDs.
They were largely political scientists, but there were some economists and a variety of other professionals as well. This study became the basis of a book
that Tetlock titled Expert Political Judgment. It was a sly title because the experts' predictions
often weren't very expert, which to Philip Tetlock is a big problem because forecasting is everywhere.
People often don't recognize how pervasive forecasting is in their lives,
that they're doing forecasting every time they make a decision about whether to take a job or whom to marry
or whether to take a mortgage or move to another city.
We make those decisions based on implicit or explicit expectations about how the future will unfold.
We spend a lot of money on these forecasts.
We base important decisions on these forecasts.
And we very rarely think about measuring the accuracy of the forecasts.
Some of us may have been satisfied to merely identify and describe this problem, as Tetlock did.
Some of us might have gone a bit further and raised our voices against the problem.
But Tetlock went even further than that.
He put together a team to participate
in one of the biggest forecasting tournaments ever conducted.
It was run by a government agency called IARPA.
IARPA is Intelligence Advanced Research Projects Activity,
and it is modeled somewhat on DARPA.
It aspires to fund cutting-edge research that will produce surprising results that have the potential to revolutionize intelligence analysis.
And Tetlock was at the center of this cutting-edge research.
He tells the story in a new book called Superforecasting, co-authored by the journalist Dan Gardner. The book is both a how-to, if at a rather high level,
and a cautionary tale about all the flaws
that lead so many people to make so many bad forecasts.
Dogmatism, as we mentioned earlier,
a lack of understanding of probability,
and a reliance on what Tetlock calls vague verbiage.
In the book, you mention a couple cases from history where
the intelligence community did not do so well, the Bay of Pigs situation with JFK, and then later,
the belief that Saddam Hussein had weapons of mass destruction. In both instances, you write
that it wasn't about bad intelligence, it was about how the intelligence was communicated to
government officials and to the public. So what happened in those cases?
Well, in the context of the Bay of Pigs, the Kennedy administration had just come into power,
and they were considering whether to support an effort, Cuban exiles and CIA operatives and others,
to launch an invasion to depose Castro in April 61.
And the Kennedy administration asked the Joint Chiefs of Staff to do an independent
review of the plan and offer an assessment of how likely this plan was to succeed. And I believe the
vague verbiage phrase that the Joint Chiefs analysts used was they thought there was a
fair chance of success. And it was later discovered that by fair chance of success,
they meant about one in three.
But the Kennedy administration did not interpret fair chances being one in three.
They thought it was considerably higher.
So it's an interesting question of whether they would have been willing to support that invasion if they thought the probability were as low as one in three. As a psychologist, though, you know a lot about how we are predisposed toward interpreting data in a way that confirms our bias or our priors or the decision we want to make, right?
So if I am inclined toward action and I see the words fair chance of success, even if attached to that is the probability of 33%, I might still interpret it as a move to go forward, yes?
Absolutely.
That's one of the ways in which vague verbiage forecasts can be so
mischievous. It's very easy to hear in them what we want to hear, whereas I think there's less room
for distortion if you say one in three or two in three chance. There's a big difference between a
one in three chance of success and a two in three chance of success. A difference of one, if I'm
doing my math properly. Right. Now, the Bay of Pigs didn't really change much in the intelligence community,
you write, surprisingly, perhaps, but the WMD issue with Saddam Hussein in Iraq
was an embarrassment to the point that the government wanted to do something about it.
Is that about right, that IARPA was founded in part out of a response to that? I'm not sure I
understand all of the internal decisions inside the intelligence community, but I think that the false positive judgment on weapons of mass destruction in Iraq did cause a lot of soul-searching inside the U.S. intelligence community
and made people more receptive to the creation of something like IARPA, yes.
IARPA was formed in 2006.
One of its major goals is, and I quote, anticipating surprise.
I think that's why they decided to fund these forecasting tournaments.
These forecasting tournaments would deal with real issues.
They all had to be relevant to national security, according to the intelligence community.
For instance?
So whether Greece would leave the
Eurozone was considered to be an event of national security relevance. Some other questions. Whether
the Muslim Brotherhood was going to win the elections in Egypt. Would the president of Austria
remain in office? These are a couple of the forecasters on Tetlock's team. Will Russia's
credit rating decline in the next eight weeks?
There was a notorious China Sea question about whether there'd be a violent confrontation
around the South China Sea. We were one of five university-based research programs that
were competing, and the goal was to generate the most accurate possible probability estimates. What was IARPA trying to accomplish?
Were they trying to really crowdsource intelligence?
Were they trying to figure out how government intelligence could improve itself, or what?
Well, I think crowdsourcing and improvement of probabilistic accuracy,
they saw as deeply complementary goals.
They set up the performance objectives in 2011
very much based on, in the wisdom of the crowd tradition,
the idea being that the average forecast
derived from a group of forecasters
is typically more accurate than the majority,
often the vast majority of forecasters
from whom the average was derived.
So they wanted to see whether or not we could do 20% better than
the average, 30%, 40%, 50% as the tournament went on. Okay, so what did you name your team?
The Good Judgment Project. It was an optimistic name, if nothing else. The team was put together
by Tetlock, his research and life partner, Barbara Mellors, who also teaches
at Wharton, and Don Moore from the Haas Business School at Berkeley. But here's the thing. You
didn't have to be an academic or an expert of any kind to join the Good Judgment Project or any of
the other teams in the IARPA tournament. Anyone could sign up online, and tens of thousands of
people did, eager to make forecasts about global
events. Each of the research programs had its own distinctive philosophy and approach to generating
accurate probability judgments. I think we were probably the most eclectic and opportunistic
of the research programs, and I think that helped. Eclectic and opportunistic. How? What do you mean by that? Well, I think we were ready to roam across disciplines fairly freely.
We just didn't care that much about whether we offended particular academic constituencies
by exploring particular hypotheses.
So we got a lot of pushback on a lot of the things we considered.
There was a big debate, for example, about whether it'd be a good idea to have forecasters
work in teams.
And we didn't really know what the right answer was.
There were some good arguments for using teams.
There were some good arguments against using teams.
But what we did is we ran an experiment.
And it turned out that using teams in this sort of context helped quite a bit.
There was also a debate about whether it would be feasible to give people training to help reduce some common psychological biases in human cognition.
And again, we didn't know for sure what the answer would be, but we ran experiments.
And we found out that it was possible to get a surprising degree of improvement by training people, giving people tutorials that warned them against particular biases and offered them some reasoning strategies for improving their accuracy.
So we did a lot of things that some psychologists
or other people in the social sciences might have disagreed with,
and we went with the experimental results. Give me now some summary stats on the Good Judgment Project's performance overall.
First of all, how long did the tournament end up lasting, Phil?
The tournament lasted for four years.
Okay. How many questions did IARPA pose?
Roughly 500 questions were posed between 2011 and 2015, inclusive. And your team, the Good
Judgment Project, gathered approximately how many individual judgments about the future?
Let's see, thousands of forecasters, hundreds of questions, forecasters often making more than one
judgment per question because the opportunities to update their beliefs, I believe it was in excess of 1 million. Okay. And how'd you do? Well, we managed to beat IARPA's performance
objectives in the first year. IARPA's fourth year objective was doing 50% better than the
unweighted average of the crowd. And our best forecasters and best algorithms are outperforming
that even after year one. And they continue to outperform in years two, three, and four. And the Good Judgment Project was the only project that consistently outperformed IARPA's
year one and two objectives. So IARPA decided to merge teams, essentially. So the Good Judgment
Project was able to absorb some really great talent from the other forecasting teams.
And each year, at the end of the year, we creamed off the top 2% of forecasters,
and we called them super forecasters. So the top 2% of roughly 3,000 forecasters would be about
60 people or so. And then next year, the next year, and on it would go.
So the way you're describing the success of the Good Judgment Project now in your kind of measured academic tone of voice sounds pretty measured and academic.
But let's be real.
You kicked butt, yes?
Yep.
That's fair enough.
And what did IARPA do or how did they respond to the success of your team?
In addition to, I assume, congratulations, did they want to, I don't know, hire a bunch of your superforecasters or you?
I have heard people in the intelligence community express an interest in potentially hiring some superforecasters.
I don't know whether they have or not.
Our superforecasters tend to be gainfully employed, but some of them might have been interested in that.
Coming up on Freakonomics Radio, why the people you think might be super forecasters often are not. There are plenty of reasons why very smart people don't ever become super forecasters,
and plenty of reasons why people who know a ton about politics never become super forecasters.
And Tetlock super forecasters share their secrets.
I often don't read the newspaper at all, and when I do, it's generally the Good Judgment Project,
and now its commercial spinoff, Good Judgment, Inc.,
Philip Tetlock has come to two main conclusions.
The first one, foresight is real.
That's how he puts it in his book, Superforecasting.
The other conclusion has to do
with what sets any one forecaster above the crowd. It's not really who they are, Tetlock writes,
it's what they do. Foresight isn't a mysterious gift bestowed at birth. It is the product of
particular ways of thinking, of gathering information, of updating beliefs, these habits of thought can be
learned and cultivated by any intelligent, thoughtful, determined person. Okay, so you
ran this amazing competition, a long series of experiments, in which you identified these people
who were better than the rest at predicting, in this case,
mostly geopolitical events. And what we really want to know is, again, as nice as that is,
congratulations, Dr. Tetlock, et cetera, et cetera, we want to know what are the characteristics of
the superforecasters, because we all want to become a little bit more of one. So,
would you mind walking us through some of these characteristics, Phil? Let's start with,
what about their philosophical outlook?
A super forecaster tends to be what, philosophically, would you say?
They're less likely than ordinary people, regular mortals, to believe in fate or destiny.
They're more likely to believe in chance.
You roll enough dice enough times, improbable coincidences will occur.
Our lives are nothing but quite improbable series of coincidences.
Many people find that a somewhat demoralizing philosophy of life.
They prefer to think that their lives have deeper meaning.
They don't like to think that the person to whom they're married, they could just as easily have wound up happy with 237,000 other people.
What about their level of, let's say,
confidence or even arrogance? Is a super forecaster arrogant?
I think they're often proud of what they've accomplished, but I think they're really very humble about their judgments. They know that they're just often very close to forecasting
disaster. They need to be very careful. I think it's very difficult to remain a super forecaster
for very long in an arrogant state of mind.
So would you say that humility is a characteristic that contributes to superforecasting then,
or do you think it just kind of travels along with it?
I think humility is an integral part of being a superforecaster.
But that doesn't mean superforecasters are chickens who hang around the maybe zone
and never say anything more than minor shades of maybe. You don't win a forecasting tournament by saying maybe all the time.
You win a forecasting tournament by taking well-considered bets.
Okay, so let's talk about now their, let's say, abilities and thinking styles. A super
forecaster will tend to think in what styles? They tend to be more actively open-minded.
They tend to treat their beliefs not as sacred possessions to be guarded, but rather as testable
hypotheses to be discarded when the evidence mounts against them. That's another way in which
they differ from many people. They try not to have too many ideological sacred cows. They're
willing to move fairly quickly in response to changing circumstances.
What about numeracy, background in math and or science and or engineering? Is that helpful,
important?
There are a few mathematicians and statisticians among the superforecasters, but I wouldn't say
that most superforecasters know a lot of deep math. I think they are pretty good with numbers.
They're pretty comfortable with numbers. And they're pretty comfortable with the idea that they can quantify states of uncertainty
along a scale from 0 to 1.0 or 0 to 100%. So they're comfortable with that. Superforecasters
tend to be more granular in their appraisals of uncertainty.
And what about the method of forecasting? Can you talk a little bit about methods that
seem to contribute to superforecasters' success?
One of the more distinctive differences between how superforecasters approach a problem and how regular forecasters approach it is that superforecasters are much more likely to use what Danny Kahneman calls the outside view rather than the inside view. So if I ask you a question about whether a particular sub-Saharan dictator is likely to
survive in power for another year, a regular forecaster might get to the job by looking up
facts about that particular dictator in that particular country, whereas a super forecaster
might be more likely to sit back and say, hmm, well, how likely are sub-Saharan dictators who
have been in power X years likely to survive another year?
And the answer for that particular question tends to be very high.
It's in the area of 85%, 95%, depending on the exact numbers at stake.
And that means their initial judgment will be based on the base rate of similar occurrences in the world, they'll start off with that and then they will gradually adjust
in response to idiosyncratic inside view circumstances.
So knowing nothing about the African dictator
or the country even,
let's say I've never heard of this dictator,
I've never heard of this country,
and I just look at the base rate and I say,
oh, it looks like about 87%.
That would be my initial hunch estimate.
Then the question is, what do I do? Well, then I
start to learn something about the country and the dictator. And if I learned that the dictator
in question is 91 years old and has advanced prostate cancer, I should adjust my probability.
And if I learned that there are riots in the capital city and there's hints of military coups
in the offing, I should again adjust my probability. But starting with the base rate probability is a good way to at least ensure that you're going to be in the plausibility ballpark initially.
What about the work ethic of a super forecaster? How would you characterize that?
You don't win forecasting tournaments by being lazy or apathetic.
You have to be willing to do some legwork and learn something about that particular sub-Saharan country.
It's a good opportunity to learn something about a strange place and a strange political system.
It helps to be curious.
It helps to have a little bit of spare time to be able to do that.
So that, I guess, implies a certain level of socioeconomic status and flexibility.
And what about IQ?
I think it's fair to say that it helps a lot to be a somewhat above average intelligence if you want to become a super forecaster.
It also helps a lot to know more about politics than most people do.
I would say they're almost necessary conditions for doing well, but they're not sufficient because there are plenty of people who are very smart and close-minded.
There are plenty of people who are very smart and think that it's impossible to attach probabilities to unique events. There are plenty of reasons why very smart people don't ever become super forecasters,
and plenty of reasons why people who know a ton about politics never become super forecasters.
It is very hard to become a super forecaster, Tetlock makes clear, unless you have a very good
grip on probability. We talk in the book with a great poker player, Aaron Brown, who's the chief risk officer of AQR.
AQR is an investment and asset management firm in Greenwich, Connecticut.
He defined the difference between a great poker player, a world-class poker player, and a talented amateur.
As the world-class player knows the difference between a 60-40 proposition or a 40-60 proposition.
And then he paused and said, no, more like 55-45, 45-55. And of course, you can get even more
granular than that in principle. Now, when you make that claim in the context of poker,
most people nod and say, sure, that sounds right. Because poker, you're sampling from a well-defined
universe. You have repeated play.
You have clear feedback.
It's a textbook case where the probability theory we learned in basic statistics seems to apply.
But if you ask people, well, what's the likelihood of a violent Sino-Japanese clash in the East China Sea in the next 12 months?
Or another outbreak of bird flu somewhere?
Or Putin was up to more mischief in the Ukraine, or Greece might begin to flirt with the idea of exiting
the Eurozone. If you ask those types of questions, most people say, how could you possibly assign
probabilities to what seem to be unique historical events? There just doesn't seem to be any way to do that.
The best we can really do is use vague verbiage,
make vague verbiage forecasts.
We can say things like, well, this might happen,
this could happen, this may happen.
And to say something could happen isn't to say a lot.
I mean, we could be struck by an asteroid
in the next 24 hours and vaporized,
0.0000.1%, or the sun could rise tomorrow, 0.99999%.
So could doesn't tell us a lot, and it's impossible to learn to make better probability judgments
if you conceal those probability judgments under the cloak of vague verbiage.
Thank you all. It is time to startbiage. Thank you all.
It is time to start the debate.
Are you all ready?
It is 9 p.m. on the East Coast,
and the moment of truth has arrived.
Live from St. Anselm College
in Manchester, New Hampshire,
the Democratic Debate.
Let me ask you this.
If you were asked to introduce one question into an upcoming
presidential debate, let's say, that you feel would give some insight via the candidates'
answers, the insight into their views overall on forecasting our limits and the need for it,
what kind of question would you try to ask? What a wonderful question that is. You've
taken me aback at such a good question. I'm going to have to think hard about that. I don't have an
answer right off the top of my head, but I would love to have the opportunity to draft such a
question. It would be something along the lines of, would it be a good thing for the advisors to the president to make an effort to express uncertainty in numerical
terms and to keep record of how accurate or inaccurate they are over time. Would you like
to have presidential daily briefings in which instead of the document saying this could or
might or may happen, it says our best analysts, when we crowdsource our best analysts,
the probability seems to range
somewhere between 0.35 and 0.6.
You know, that still sounds,
it's still a pretty big zone of uncertainty,
but it sure is a lot better than could,
which could mean anything from 0.01 to 0.99.
Now, can you imagine anyone saying
they wouldn't want that though?
Do you think there are those
who'd want to show that they're so,
let's whatever, macho, that no, no, no, no, we don't want to traffic in that?
I think there's vast variation among politicians in how numerate they are and in how open they are
to thinking of their beliefs as gradations along an uncertainty continuum rather than expressions
of tribal loyalties. We have the story in the book about President Obama making the decision about going
after Osama bin Laden and the probability estimates he got about Osama's location and how he dealt
with those probabilities. The probabilities range from about, I don't know, from maybe from 0.4 to
about 0.95 with a center of gravity around 0.75. And the president's reaction was to shrug and say,
well, I don't know what to do with this.
It feels like a 50-50 thing, a coin toss.
Now, that's an understandable reaction from a president
who is about to make an important decision
and feels he's getting somewhat conflicting advice
and feels like he doesn't have closure on a problem.
It's a common way to use the language.
But it's not how the president would have used the language
if he'd been sitting in a TV room in the White House
with buddies watching March Madness and Duke University is playing and someone says, you know, what's the likelihood of Duke winning this game?
And his friends offer probabilities ranging from 0.5 to about 0.95 with a center of gravity of 0.75 once again.
He wouldn't say sounds like 50-50, say, sounds like 3-1. Now, how much better decisions would politicians make
if they achieved that improvement in granularity,
accuracy, calibration?
We don't know.
I think that if the intelligence community
had been more diffident about its ability
to assign probability estimates,
the term slam dunk probably wouldn't have materialized
in the discourse about weapons of mass destruction in Iraq. I think the actual documents themselves would have been written in a more circumspect
fashion. I think there were good reasons for thinking Saddam Hussein was doing something
suspicious. I'm not saying that the probability would have been less than 50%. The probability
might have been as high as 85% or 80%, but it would have been 100%. But I wonder how much of this is our fault, our meaning the public, because, you know,
when someone makes a decision that turns out poorly, not wrong necessarily, but poorly,
even if the odds were very much in his or her favor, we punish them for the way that
turned out.
I mean, forget about politics, go to something as silly as football.
If a head football coach goes for it on fourth down when all the probability
is encouraging him to do so and his team doesn't make it, we know what happens. All the sports
fans come out and say, this guy was an idiot. What the hell was he doing? He didn't properly
calculate the risk. Whereas in fact, he calculated the risk exactly right. And maybe there was an
80% probability of success and he happened to hit the 20%. So, we don't respond well to probabilistic
choices, and maybe that's why our leaders don't abide by them.
That's right. I mean, part of the obstacle is in us. We've met the enemy, and the enemy is us.
We don't understand how probability works very well. We have a very
hard time taking the outside view toward the forecast we make, the forecast other people make.
And if we did get in the habit of keeping score more, we might gradually become a little more
literate. So who are these people, these probability-understanding, humble, open-minded, inside-view people that have the power of super forecasting?
Until I got into grad school, I was used to being the smartest person in the room.
And grad school very quickly disabused me of that notion.
That's one of them. His name is Bill Flack.
He's a 56-year-old retiree in rural Nebraska, and he is a super forecaster with the Good Judgment Project, one of the top 2%.
Flack studied physics in college, got a master's in math, and even though he wanted to get his Ph.D.
I just came to realize that I didn't have either the mental power or the commitment to the subject to pursue a Ph.D.
As smart as he is, Flack admits he is not very worldly.
I often don't read the newspaper at all, and when I do, it's generally the Omaha World Herald,
which isn't remarkable for its foreign policy coverage.
Flack wound up working for the U.S. Department of Agriculture.
He was semi-retired when he first read about the Good Judgment Project.
Basically, I thought it sounded kind of interesting, like might be fun to try.
It's an area that's always been interesting to me, how people make decisions.
And that is Mary Simpson, another of Tetlock's superforecasters.
I grew up in San Antonio, Texas, and spent my first 18 years there.
Had a typical suburban family, older brother, younger sister, stay-at-home mom.
Dad was an engineer, you know, the typical breadwinner.
And I went to college in Dallas at Southern Methodist University,
and that was the time when a lot of women were discovering that they could do things besides get married and have children. So I sort of
broadened my horizons, found economics, and was really interested in it and decided I wanted to
do something besides get married and have kids. I finished a PhD from Claremont Graduate School
and went to work for the big local public utility, Southern California Edison,
as an assistant economist.
That's where Simpson was still working when she got involved with the Good Judgment Project.
It was just a few years after the financial crash, which Simpson had failed to foresee.
I had totally missed the 2007-2008 financial crash.
I had seen bits and pieces. I knew that there was certainly a housing
bubble, but I did not connect any of the dots to the underlying financing issues that had really
created the major disruption in the financial industry and the subsequent Great Recession.
Simpson didn't think her forecasts for the Good Judgment Project would be much better.
You know, it's one of those things where I'm a very analytical person,
always decent in math, and learned over the years how to kind of assess situations and make predictions.
On the other hand, I'm fairly skeptical of forecasting.
My company spent thousands of dollars every year
for the best in the class of economic forecasts.
That's what they were.
We had to forecast.
We had to understand where sales would go
and be able to make predictions
in order to be sure that there
was enough power and to assess revenue levels and costs of electricity and so forth. So we relied on
forecasts, but they were often wrong. So again, I was hopeful to do a decent job, but also very
skeptical of the ability of anyone to forecast in certain arenas, especially.
Simpson, like Bill Flack, got involved in the forecasting tournament mostly for fun.
I was only working part-time and felt like I needed to keep my brain engaged.
It was a volunteer position. They weren't being paid by the Good Judgment Project, though they did get an Amazon
gift certificate. What was it worth? A couple of hundred dollars. It was not a lot. If you took
the value of the Amazon gift certificate and divided it by the hours we put into it, we were
getting something like 20 cents an hour. So here were a couple of non-experts in the realm of
geopolitics being asked to make a series of geopolitical predictions.
They didn't have any background and had to learn it all from the start.
I really had very little expertise in terms of international events.
Pretty much every single question I had to dig for background information.
You need to understand the facts on the ground.
You need to understand the players, what their motives are.
Spent a lot of time with Google News, some time with Wikipedia,
which I mostly used as a source of sources, basically.
You know, I have an analytical bent.
I'm interested in doing the research.
And, you know, pretty much had to educate myself up on the subject.
A lot of it is the work.
You know, you have to do the work, you have to update,
you have to really stay engaged.
And if you simply answer the questions once
and let them go and don't look at them again,
you're not going to be a very good forecaster.
One of the unusual things about how questions are asked
in forecasting tournaments is that they're asked extremely explicitly.
That's Tetlock again.
It's not just, will Greece leave the Eurozone?
But there are very specific meanings to what leaving the Eurozone means,
and there's a very specific time frame within which this would need to happen.
It's not simply answering yes or no on a question.
The answer had to be, what is your expectation of this event happening?
In other words, is it 50% or is it 90%?
So, you know, there was a certain amount of effort to figure out, well, what's a good probability?
Each of us learned from previous questions how, you know,
whether they were being overconfident and underconfident on specific types of questions.
We were getting pretty much constant feedback. Every time a question resolved, we knew whether we were being overconfident and underconfident on specific types of questions. We were getting pretty much constant feedback.
Every time a question resolved, we knew whether we were right or wrong,
whether we'd been overconfident and underconfident.
And we tried to look back and see what we had on questions where we'd gone wrong,
how we'd gone wrong, on questions where we'd done well, what we had done right.
Were we lucky?
Had we followed a very good approach that we should apply to other questions?
And so these typical Americans with no foreign policy experience whatsoever wound up making remarkably accurate forecasts about things like the Grexit or whether there would be conflict in the South China Sea.
One of the things I liked about Good Judgment was it gave me a pretext to learn about these various foreign policy issues. I think there's certain satisfaction in knowing that you're actually helping research
that will hopefully lead to better assessments and better forecasts on the part of government.
Certainly, I've got a good deal less patient with pundits who issue forecasts where,
well, this could happen, but don't attempt to assign a probability to it.
Don't suggest how it could go the other way.
You probably won't like this answer, but I've grown much less fond of radio news
because in trying to make forecasts, I've been really looking for details,
and it annoys me greatly when the radio starts a story about something that could be interesting
and then they go into anecdotes instead.
Public radio is as bad as the rest, I'm afraid.
Not today, friendo.
We are all about the details.
For instance, here are what Philip Tetlock calls the Ten Commandments for aspiring superforecasters.
Number one, triage. commandments for aspiring superforecasters.
Number one, triage.
Focus on questions where your hard work is likely to pay off.
Pretty sensible.
Number two, break seemingly intractable problems
into tractable subproblems.
OK, no problem.
Number three, strike the right balance between inside views
and outside views.
Number four, strike the right balance between inside views and outside views.
Strike the right balance between under- and overreacting to the evidence.
Look for the clashing causal forces at work in each problem.
Strive to distinguish as many degrees of doubt as the problem permits, but no more.
Okay, that one just sounds hard.
Number seven.
Strike the right balance between under and overconfidence, between prudence and decisiveness.
Number eight.
Look for the errors behind your mistakes, but beware of rearview mirror hindsight biases.
Did you get that one here?
Let me read it again.
Look for the errors behind your mistakes,
but beware of rear-view mirror hindsight biases.
Number nine, bring out the best in others
and let others bring out the best in you.
Not a very Washington, D.C. concept, but what the heck.
And number 10, master the error balancing bicycle.
This one needs a little bit more explanation.
Just as you can't learn to ride a bicycle by reading a physics textbook, Tetlock writes,
you can't become a super forecaster by reading training manuals.
Learning requires doing with good feedback that leaves no ambiguity about whether you are succeeding or failing.
Now, if following these commandments sounds like a lot of work, well, that's the point.
What your book proves among a lot of things that are interesting, I think the most fascinating, the most uplifting really, is that this is a skill or maybe set of skills that can be acquired or improved upon,
right? The people who are better than others at forecasting are not necessarily born that way,
not born that way at all, correct? I think that's a deep truth, a deep lesson of the research that we conducted. Sometimes I'm asked,
how is it that a group of people, regular citizens who didn't have access to classified
information working part-time, were able to generate probability estimates that were
more accurate than those generated by intelligence analysts working full-time jobs and with access
to classified information? How is that possible?
And I don't think it's because the people we recruited are more intelligent than intelligence
analysts. I'm pretty sure that's not true. I don't think it's even because they're more
open-minded, and it's certainly not because they know more about politics. It's because
our forecasters, unlike many people in Washington, D.C., believe that probability estimation of messy real-world events is a skill that can be cultivated and is worth cultivating, and hence they dedicate real effort to it.
But if you shrug your shoulders and say, look, there's no way we can make predictions about unique historical events, you're never going to try. Philip Tetlock has been running forecasting tournaments for roughly 30 years now,
and the success of the Good Judgment Project has dictated his next move.
It led me to decide that the last part of my career,
I want to dedicate the last part of my career to improving the quality of public debate,
and that I see forecasting tournaments as a tool that can be used for that purpose.
I believe that if partisans in debates felt that they were participating in forecasting tournaments as a tool that can be used for that purpose. I believe that if partisans
in debates felt that they were participating in forecasting tournaments in which their accuracy
could be compared against that of their competitors, we would quite quickly observe the depolarization
of many polarized political debates. People would become more circumspect, more thoughtful,
and I think that would, on balance, be a better thing for our society and for the world.
So I think there are some tangible ways in which forecasting,
the forecasting tournament technology can be used to improve the quality of public debate
if only we were happens all the time.
Some company or institution, maybe even a country, does something you don't like.
So you and maybe a few million friends decide to start a boycott.
This leads to a natural question. Do boycotts work?
So there are a variety of empirical papers that point out that the economic impact of
boycotts is limited. The evidence against and for boycotts. That's next time on Freakonomics Radio. Freakonomics Radio is produced by WNYC Studios and Dubner Productions. This
episode was produced by Irva Gunja. Our staff also includes Jake Howitt, Merit Jacob, Christopher
Wirth, Greg Rosalski, Kasia Mihailovic, Alison Hockenberry, and Caroline English. Thanks to the
Carolina Panthers Radio Network for providing audio for this episode. You can now hear Freakonomics Radio on public radio stations across the U.S. If you are one of our many international podcast listeners, you should probably just move here or at least listen to our recent episode on open borders. It was called Is Migration a Basic Human Right? But you can find that and all our previous episodes wherever you live at Freakonomics.com.
You can also subscribe to the podcast on iTunes or wherever you get your podcasts.
I'm Stephen Dubner.
Thanks for listening. Thank you.