3 Takeaways - The Science and Skill of Superforecasting (#230)
Episode Date: December 31, 2024Every decision you make involves predicting the future. Superforecasting can help you make better predictions. What do superforecasters actually do, and how can you become a better forecaster? Don’t... miss this talk with superforecaster Warren Hatch, who helped lead a team that won a forecasting tournament conducted by the U.S. intelligence community. We predict you’ll benefit from listening.
Transcript
Discussion (0)
To quote from the beginning of Philip Tetlock's book, Super Forecasters, we are all forecasters.
When we think about changing jobs, getting married, buying a home, making an investment,
launching a product or retiring, we decide based on how we expect the future will unfold,
unquote. decide based on how we expect the future will unfold."
But how good are we actually at predicting the future?
And how can we get better?
Hi, everyone.
I'm Lynn Toman, and this is Three Takeaways.
On Three Takeaways, I talk with some of the world's best
thinkers, business leaders, writers, politicians,
newsmakers, and scientists.
Each episode ends with three key takeaways to help us understand the world and maybe
even ourselves a little better.
Today, I'm excited to be with Super Forecaster and CEO of the Good Judgment
Project, Warren Hatch. In 2005, the University of
Pennsylvania's Philip Tetlock published a study showing that experts performed
about as well at making predictions as what he called dart-tossing chimpanzees.
And those who were surest of their predictions did much worse than their humbler colleagues.
The study caught the eye of the United States intelligence community, which set up a geopolitical forecasting tournament.
The undisputed winner of the tournament was the Good Judgment Project,
which was led by University of Pennsylvania professors Philip
Tetlock and Barbara Mellors. Over four years, their forecasters answered 500 questions and made a
million forecasts that were more accurate than even intelligence analysts who had access to classified data.
My guest today is Warren Hatch.
He's the CEO of the Good Judgment Project, which
is the group of forecasters who won the US intelligence
communities forecasting competition.
And Warren is not only the CEO of the Good Judgment Project,
he's also one of their top super forecasters.
I'm excited to find out from Warren how super forecasters forecast and how we can all get
better at forecasting.
Welcome Warren and thanks so much for joining Three Takeaways today.
Thank you Lynn, it's a pleasure to be here.
Thanks for having me on. It you, Lynn. It's a pleasure to be here. Thanks for having me on.
It is my pleasure. Warren, let's start with why is forecasting important? Where do we use it?
In a sense, every decision that we make is a forecast because we're going to be taking action and doing things to improve the odds that we're going to get our desired outcome, whatever that
might be. So we're doing that all day, all week, all year long, or thousands of decisions that way. So we want to make
the best possible forecast and super forecasting is a process to get to the best possible forecast
and therefore to get to the best possible decision.
And how do most people forecast and what's wrong with their approach? Well, most people, and it's the dominant way,
is they'll use language to express their views about the future.
Somebody will ask them, well, do you think this will happen? And they'll say,
well, maybe, maybe it will, or they'll use fancier words like, well,
there's a possibility or a distinct possibility. Now,
here's the problem with that, many problems.
One problem is we're all going to understand that in different ways. There's a famous example when
Kennedy came into office and inherited a plan to invade Cuba and topple the regime. He asked his
advisors, will this succeed? And they said, there's a fair chance it will succeed. Now it turns out Kennedy
had in mind north of 50%. The analysts had in mind something more like 25%. By using
language, there was noise in that decision process. I imagine if Kennedy had known they
had in mind 25%, history might've been a little different. So that is true for all kinds of
words like that, where we're going to
interpret it differently. Another bad thing about it is that it straddles for the 50-50 line. So if
it happens, you say, aha, I said there'd be a distinct possibility. If it doesn't, well, I only
said it was a distinct possibility. One other bad thing about it is it's impossible to put different
views together. You can't crowdsource language like that because
we're all using different words and in different ways. How much better to use a number? We all know
what 72% is. We all know what that means. For big events like market crashes or looming wars
or for policy decisions like tax cuts or sanctions or tariffs, we turn to the
experts, those people that are supposed to be the most knowledgeable. How do experts do as compared
to super forecasters? That's a great question because sometimes there's an apparent tension between experts and good forecasters
where it's one is better than the other or we really are of the view that you want both.
You want hybrid models when it comes to experts and things like that. And one core reason
for that is experts might be good forecasters, but we don't know that until we see their
track record. Just being an expert does not make you a good forecaster., but we don't know that until we see their track record. Just
being an expert does not make you a good forecaster. We want to see if when you
say an 80% probability of something occurring, it occurs eight times out of
ten and doesn't two times out of ten. That's accuracy in a probabilistic
sense. Most experts do not go through that process. They're very good at
telling us what we need to understand, how we got where we are, some of the things we might wanna watch going forward.
But what we've seen time and again,
when you ask them for a probabilistic forecast,
experts generally tend to assign higher probabilities
to events in their area of expertise than actually occur.
And so you're better off by going to a crowd
who are very skilled at assigning
probabilities that occur with that frequency to give a forecast for that particular thing.
Now best of all is if you get an expert who applies themselves and becomes a good forecaster,
that's what we really want to see. But experts, not necessarily good forecasters just by being
an expert. So the first part of a forecast is really the question.
Can you give some examples
of what good forecastable questions should be or are?
Oh, I love that.
That's a great question.
And that's half the work is getting the question right.
You wanna make sure that everybody understands it in the same way.
You want to be sure that it's actionable, it's useful.
Why go to all this effort if it's just, you know, a parlor game?
And you also want to be able to say, well, when that date occurs and you look back,
we will agree it happened or it didn't.
And a lot of forecast questions that are out there don't
meet those criteria. There was a great example, because in the original part of the project,
the government wrote the questions themselves. And in that first year or so, they had questions
of the sort like, will construction begin on a canal through Nicaragua? And this was
back when there was a guy in
Hong Kong who was going to build a big giant canal for these supermax boats. And will construction
begin? Was the question that they posed. Now, if you reflect a little bit, you can see that
there's a problem with that. What counts as construction? Is it built with boats going
through or is it big roads going through the jungle,
or is it a golden shovel in the ground, which was my view, and sure enough, there was a golden
shovel in the ground, but that was it. So did construction begin? We need to agree on what the
question's asking. So the head of our question team, the fail rate we have in our questions now
is like 0% this year, thanks to his hard work.
And what he'll do is in addition to writing a careful question, he'll include, it's almost
like a little contract, he's trained as an attorney, where this is how the question will
resolve under these circuit, da da da da, and for things that matter, it's worth going
to that effort.
Not everything matters that much, but for things that do,
then you want to make sure it's crisp and tight, it's actionable, it's verifiable,
and we all can agree if it happened or it did not.
So if somebody were wanting to use super forecasting in their personal life,
can you give some examples of how to frame questions? Robert Klingler For the higher order questions, things that
really impact our lives, it's often not just a single decision.
There might be a series of smaller decisions that go into it.
And so breaking that down into what are the pieces that are really going to be impactful
for how I think about this issue.
So where to go to
school, that's more than just a single decision. There's a series of them. Do they provide
the support that I would want as a student? Do they have the right kind of curriculum?
What sorts of career trajectory might I get on? Those are all smaller sub-decisions. And
those are all things that you can then analyze as a forecasting question too. Now, one thing that
people tend to do both when they're thinking about the question as well as the forecast
is focus on the particulars themselves. Like who's going to win the next presidential election?
Most people immediately start thinking about the candidates. And what we want to do is
instead of going into the particulars right away, we want to
zoom out.
How do things like this usually go?
What Daniel Kahneman calls the outside view to get a sense of what history can tell us,
what are the comparison classes out in the world where we can begin our forecasting,
where we can anchor ourselves.
Most people will get anchored.
That's what happens.
So you want to anchor in the best possible place and then incrementally update from there.
So if you're thinking about schools and what the career trajectories are, you'd want to
go and take a look at all the data for comparable schools and see what the history of graduation
rates and the like have been, and then go narrower and narrower into what Kahneman calls
the inside view to make the
decision itself.
What's the first thing that super forecasters do that's different?
That's actually step one is to find another word for that is base rate.
How do things like this usually go?
And by going to a base rate or outside view, that was basically synonyms, you'll get an immediate boost just by doing that
compared to the rest of the crowd.
What are the other steps to get better forecasts?
Another good one for anyone to have on their checklist
is to make a comment.
When you've come up with an estimate about the future,
jot down your rationale.
One reason for that is
it crystallizes your thinking. It also allows you to, in the future, look back and say,
was I right for the right reasons or did I miss something? And it also allows you to
share information with others. And that's how you get a high quality forecast from a
crowd.
Most wisdom of the crowd approaches, you ask everybody, then you take an average and you're
done.
That's where we start.
And what we like to do is have everyone make a comment and then exchange those ideas and
see if somebody had a point that maybe I missed and then make an update, which is the next
very crucial thing is expect to change your view, make an update,
and you just do those few things,
you're gonna end up with a better number.
Start with a base rate, make a comment,
exchange views, update.
Warren, what have you learned recently from forecasting
and events in the real world, if you will?
One area where we learned a lot was during the pandemic.
We were talking about experts earlier.
And one thing about experts,
as well as artificial intelligence is they have models.
That's the way they work.
And if you have a model,
then you're using backward looking data to build your model. And that works great when
we're in moments of relative equilibrium. But when things get upended and there is a lot of flux,
those models don't work so well. And almost by definition, the experts are going to be
slower to recognize that because their models will be filtering out the subtle changes that are eroding their
models. And an excellent example of that was when COVID began to go global, and many of
the experts with their models were downplaying the significant, not all of them, but many
of them, I think even most of them were saying that this thing will be contained. And instead of course, it became something much more significant.
Our forecasters who were not bound by a particular model and were skilled at
identifying these subtle shifts did much better at identifying how quickly the
pandemic would spread.
They also did very well at identifying when a vaccine would become
available. They saw it becoming available far sooner than the experts did. And that, of course,
getting those two pieces has enormous implications for public policy, for investing, and for personal
life decisions. So for us, that was a really critical example where we could also contribute to the
public discourse on an important issue about how to think about applying this process
in real-world situations, especially when there's a lot of flux going on. That's where this kind of
a process can really shine is when things get upended, when there's more uncertainty than we
recognized. So when there are big unexpected events in the world, black swans, if you will,
is when your approach of the super forecasting really shines?
That's a process that can be useful in everyday decision making, but I think where it really
stands out relative to other ways
of thinking about the world is when there is a lot of flux.
And maybe not so much dark black swans as very dark gray swans, because some of these
things may be small probability, but high impact.
And somehow getting those small probability items on your radar can help you be better prepared. And in the case of
the pandemic, there were some people who were identifying that as a possibility, not something
that they were attaching high probabilities to, but it was definitely on their radar.
And I think identifying those kinds of dark gray swans is part of a good successful forecasting process. But once it
hits, you're absolutely right. Now it's here, now what do we do? Well, one thing we should
do is discount how much reliance we have on the static models. We don't want to get rid
of them completely, but the importance and the reliance we've previously had on them
should be discounted. And we should instead factor
in more weight to probabilistic judgment from people who are recognizing the small subtle
factors that will eventually lead to new models.
Danielle And can you give some examples of what you
call dark gray swans?
Dr. Craig Hildreth One is, yeah, another pandemic could be a dark gray swan. Wildfires has become such
a thing where the frequency is much higher than it was, but where are they going to hit?
That's the challenge. I think a lot of the AI developments can also fall into that category
with some of those risks might look like, and also possible global conflicts. One that we're
looking at more is in Korea. We started looking at this a couple of months ago,
because there've been some changes. There've been some changes to North Korea's doctrine.
North Korea sent troops to fight on Russia's behalf around Kursk. These are, in the global
scheme of things, small, but significant. So we're paying
much more attention to how events on the Korean Peninsula may unfold as a possible dark grace
one.
And tariffs, there's an incoming administration with a wish list, and that's high on the wish
list. So some sort of tariffs seems in the bag. But the more significant question, certainly for people
who are asking us for our forecast, is where, on what categories of goods, which countries,
are some of them going to get exempted? Or will it be an across the board thing? Exactly what
does that look like? And what we're generally concluding for the moment is that just a blanket tariff,
maybe a 10% probability of that occurring at the moment, we'll know more as we see,
you know, who gets staffed in the different positions. But the higher risks are in specific
categories and specific countries. And we're just unfolding those questions now. Before I ask for the three takeaways you'd like to leave the audience with today, is
there anything else you would like to mention that you have not already talked about?
Well, I think one thing that I get asked a lot is the role of artificial intelligence
in forecasting. And there are certainly an awful lot of firms popping up.
But here's the thing too, just like with experts, there's an artificial tension between AI and
humans, which from our point of view is misplaced. We do know, because the studies have been
out there, that at least for now, the humans continue to do better than artificial intelligence.
Phil Tetlock did a big research project that came out a couple of months ago where they compared lots
of different models and none of them did better than the super forecasters and the best of
them still lag by 20%. But what we've seen is that you want to have both. You can use
the LLMs, you can use the AI models to help humans get to a better number faster.
So I'm very optimistic about the future with AI
from that point of view.
Warren, what are the three takeaways
you would like to leave the audience with today?
First, slow down.
We need to make lots of decisions every day,
thousands, and most of those are gonna be from the gut. But for the ones that really matter, slow down and use what Daniel Kahneman calls
system two thinking. And you basically start having a dialogue with yourself. And that way,
you can challenge your thinking yourself. You can double check your assumptions, make sure you're
not overly confident about what you think you know.
The next thing is to change your mind. Expect to change your mind as new information becomes
available.
Number three is to keep score. At the end of it all, to get better, you need feedback.
You need to see if your forecasts occur with the frequency that you think they do.
You need to see if your reasoning aligns with how reality unfolds. So, keep score both for yourself
to get better and better calibrated as a forecaster and to make therefore better decisions
and extend that accountability to thought leaders too. Lauren Ruffin Warren, thank you so much. This has been fascinating.
Warren Smith Thank you, Lynn. It's been a pleasure.
Lynn Howell If you're enjoying the podcast, and I really
hope you are, please review us on Apple Podcasts or Spotify or wherever you get your podcasts.
It really helps get the word out. If you're interested, you can also sign up for the Three Takeaways newsletter
at ThreeTakeAways.com, where you can also listen to previous episodes. You can also follow us on
LinkedIn, X, Instagram, and Facebook. I'm Lynne Toman, and this is Three Takeaways. Thanks for listening.