3 Takeaways - The Science and Skill of Superforecasting (#230)

Episode Date: December 31, 2024

Every decision you make involves predicting the future. Superforecasting can help you make better predictions. What do superforecasters actually do, and how can you become a better forecaster? Don’t... miss this talk with superforecaster Warren Hatch, who helped lead a team that won a forecasting tournament conducted by the U.S. intelligence community. We predict you’ll benefit from listening.

Transcript
Discussion (0)
Starting point is 00:00:00 To quote from the beginning of Philip Tetlock's book, Super Forecasters, we are all forecasters. When we think about changing jobs, getting married, buying a home, making an investment, launching a product or retiring, we decide based on how we expect the future will unfold, unquote. decide based on how we expect the future will unfold." But how good are we actually at predicting the future? And how can we get better? Hi, everyone. I'm Lynn Toman, and this is Three Takeaways.
Starting point is 00:00:38 On Three Takeaways, I talk with some of the world's best thinkers, business leaders, writers, politicians, newsmakers, and scientists. Each episode ends with three key takeaways to help us understand the world and maybe even ourselves a little better. Today, I'm excited to be with Super Forecaster and CEO of the Good Judgment Project, Warren Hatch. In 2005, the University of Pennsylvania's Philip Tetlock published a study showing that experts performed
Starting point is 00:01:13 about as well at making predictions as what he called dart-tossing chimpanzees. And those who were surest of their predictions did much worse than their humbler colleagues. The study caught the eye of the United States intelligence community, which set up a geopolitical forecasting tournament. The undisputed winner of the tournament was the Good Judgment Project, which was led by University of Pennsylvania professors Philip Tetlock and Barbara Mellors. Over four years, their forecasters answered 500 questions and made a million forecasts that were more accurate than even intelligence analysts who had access to classified data. My guest today is Warren Hatch.
Starting point is 00:02:08 He's the CEO of the Good Judgment Project, which is the group of forecasters who won the US intelligence communities forecasting competition. And Warren is not only the CEO of the Good Judgment Project, he's also one of their top super forecasters. I'm excited to find out from Warren how super forecasters forecast and how we can all get better at forecasting. Welcome Warren and thanks so much for joining Three Takeaways today.
Starting point is 00:02:42 Thank you Lynn, it's a pleasure to be here. Thanks for having me on. It you, Lynn. It's a pleasure to be here. Thanks for having me on. It is my pleasure. Warren, let's start with why is forecasting important? Where do we use it? In a sense, every decision that we make is a forecast because we're going to be taking action and doing things to improve the odds that we're going to get our desired outcome, whatever that might be. So we're doing that all day, all week, all year long, or thousands of decisions that way. So we want to make the best possible forecast and super forecasting is a process to get to the best possible forecast and therefore to get to the best possible decision. And how do most people forecast and what's wrong with their approach? Well, most people, and it's the dominant way,
Starting point is 00:03:28 is they'll use language to express their views about the future. Somebody will ask them, well, do you think this will happen? And they'll say, well, maybe, maybe it will, or they'll use fancier words like, well, there's a possibility or a distinct possibility. Now, here's the problem with that, many problems. One problem is we're all going to understand that in different ways. There's a famous example when Kennedy came into office and inherited a plan to invade Cuba and topple the regime. He asked his advisors, will this succeed? And they said, there's a fair chance it will succeed. Now it turns out Kennedy
Starting point is 00:04:05 had in mind north of 50%. The analysts had in mind something more like 25%. By using language, there was noise in that decision process. I imagine if Kennedy had known they had in mind 25%, history might've been a little different. So that is true for all kinds of words like that, where we're going to interpret it differently. Another bad thing about it is that it straddles for the 50-50 line. So if it happens, you say, aha, I said there'd be a distinct possibility. If it doesn't, well, I only said it was a distinct possibility. One other bad thing about it is it's impossible to put different views together. You can't crowdsource language like that because
Starting point is 00:04:47 we're all using different words and in different ways. How much better to use a number? We all know what 72% is. We all know what that means. For big events like market crashes or looming wars or for policy decisions like tax cuts or sanctions or tariffs, we turn to the experts, those people that are supposed to be the most knowledgeable. How do experts do as compared to super forecasters? That's a great question because sometimes there's an apparent tension between experts and good forecasters where it's one is better than the other or we really are of the view that you want both. You want hybrid models when it comes to experts and things like that. And one core reason for that is experts might be good forecasters, but we don't know that until we see their
Starting point is 00:05:43 track record. Just being an expert does not make you a good forecaster., but we don't know that until we see their track record. Just being an expert does not make you a good forecaster. We want to see if when you say an 80% probability of something occurring, it occurs eight times out of ten and doesn't two times out of ten. That's accuracy in a probabilistic sense. Most experts do not go through that process. They're very good at telling us what we need to understand, how we got where we are, some of the things we might wanna watch going forward. But what we've seen time and again, when you ask them for a probabilistic forecast,
Starting point is 00:06:13 experts generally tend to assign higher probabilities to events in their area of expertise than actually occur. And so you're better off by going to a crowd who are very skilled at assigning probabilities that occur with that frequency to give a forecast for that particular thing. Now best of all is if you get an expert who applies themselves and becomes a good forecaster, that's what we really want to see. But experts, not necessarily good forecasters just by being an expert. So the first part of a forecast is really the question.
Starting point is 00:06:50 Can you give some examples of what good forecastable questions should be or are? Oh, I love that. That's a great question. And that's half the work is getting the question right. You wanna make sure that everybody understands it in the same way. You want to be sure that it's actionable, it's useful. Why go to all this effort if it's just, you know, a parlor game?
Starting point is 00:07:15 And you also want to be able to say, well, when that date occurs and you look back, we will agree it happened or it didn't. And a lot of forecast questions that are out there don't meet those criteria. There was a great example, because in the original part of the project, the government wrote the questions themselves. And in that first year or so, they had questions of the sort like, will construction begin on a canal through Nicaragua? And this was back when there was a guy in Hong Kong who was going to build a big giant canal for these supermax boats. And will construction
Starting point is 00:07:50 begin? Was the question that they posed. Now, if you reflect a little bit, you can see that there's a problem with that. What counts as construction? Is it built with boats going through or is it big roads going through the jungle, or is it a golden shovel in the ground, which was my view, and sure enough, there was a golden shovel in the ground, but that was it. So did construction begin? We need to agree on what the question's asking. So the head of our question team, the fail rate we have in our questions now is like 0% this year, thanks to his hard work. And what he'll do is in addition to writing a careful question, he'll include, it's almost
Starting point is 00:08:31 like a little contract, he's trained as an attorney, where this is how the question will resolve under these circuit, da da da da, and for things that matter, it's worth going to that effort. Not everything matters that much, but for things that do, then you want to make sure it's crisp and tight, it's actionable, it's verifiable, and we all can agree if it happened or it did not. So if somebody were wanting to use super forecasting in their personal life, can you give some examples of how to frame questions? Robert Klingler For the higher order questions, things that
Starting point is 00:09:07 really impact our lives, it's often not just a single decision. There might be a series of smaller decisions that go into it. And so breaking that down into what are the pieces that are really going to be impactful for how I think about this issue. So where to go to school, that's more than just a single decision. There's a series of them. Do they provide the support that I would want as a student? Do they have the right kind of curriculum? What sorts of career trajectory might I get on? Those are all smaller sub-decisions. And
Starting point is 00:09:39 those are all things that you can then analyze as a forecasting question too. Now, one thing that people tend to do both when they're thinking about the question as well as the forecast is focus on the particulars themselves. Like who's going to win the next presidential election? Most people immediately start thinking about the candidates. And what we want to do is instead of going into the particulars right away, we want to zoom out. How do things like this usually go? What Daniel Kahneman calls the outside view to get a sense of what history can tell us,
Starting point is 00:10:14 what are the comparison classes out in the world where we can begin our forecasting, where we can anchor ourselves. Most people will get anchored. That's what happens. So you want to anchor in the best possible place and then incrementally update from there. So if you're thinking about schools and what the career trajectories are, you'd want to go and take a look at all the data for comparable schools and see what the history of graduation rates and the like have been, and then go narrower and narrower into what Kahneman calls
Starting point is 00:10:43 the inside view to make the decision itself. What's the first thing that super forecasters do that's different? That's actually step one is to find another word for that is base rate. How do things like this usually go? And by going to a base rate or outside view, that was basically synonyms, you'll get an immediate boost just by doing that compared to the rest of the crowd. What are the other steps to get better forecasts?
Starting point is 00:11:14 Another good one for anyone to have on their checklist is to make a comment. When you've come up with an estimate about the future, jot down your rationale. One reason for that is it crystallizes your thinking. It also allows you to, in the future, look back and say, was I right for the right reasons or did I miss something? And it also allows you to share information with others. And that's how you get a high quality forecast from a
Starting point is 00:11:42 crowd. Most wisdom of the crowd approaches, you ask everybody, then you take an average and you're done. That's where we start. And what we like to do is have everyone make a comment and then exchange those ideas and see if somebody had a point that maybe I missed and then make an update, which is the next very crucial thing is expect to change your view, make an update, and you just do those few things,
Starting point is 00:12:08 you're gonna end up with a better number. Start with a base rate, make a comment, exchange views, update. Warren, what have you learned recently from forecasting and events in the real world, if you will? One area where we learned a lot was during the pandemic. We were talking about experts earlier. And one thing about experts,
Starting point is 00:12:33 as well as artificial intelligence is they have models. That's the way they work. And if you have a model, then you're using backward looking data to build your model. And that works great when we're in moments of relative equilibrium. But when things get upended and there is a lot of flux, those models don't work so well. And almost by definition, the experts are going to be slower to recognize that because their models will be filtering out the subtle changes that are eroding their models. And an excellent example of that was when COVID began to go global, and many of
Starting point is 00:13:13 the experts with their models were downplaying the significant, not all of them, but many of them, I think even most of them were saying that this thing will be contained. And instead of course, it became something much more significant. Our forecasters who were not bound by a particular model and were skilled at identifying these subtle shifts did much better at identifying how quickly the pandemic would spread. They also did very well at identifying when a vaccine would become available. They saw it becoming available far sooner than the experts did. And that, of course, getting those two pieces has enormous implications for public policy, for investing, and for personal
Starting point is 00:13:59 life decisions. So for us, that was a really critical example where we could also contribute to the public discourse on an important issue about how to think about applying this process in real-world situations, especially when there's a lot of flux going on. That's where this kind of a process can really shine is when things get upended, when there's more uncertainty than we recognized. So when there are big unexpected events in the world, black swans, if you will, is when your approach of the super forecasting really shines? That's a process that can be useful in everyday decision making, but I think where it really stands out relative to other ways
Starting point is 00:14:45 of thinking about the world is when there is a lot of flux. And maybe not so much dark black swans as very dark gray swans, because some of these things may be small probability, but high impact. And somehow getting those small probability items on your radar can help you be better prepared. And in the case of the pandemic, there were some people who were identifying that as a possibility, not something that they were attaching high probabilities to, but it was definitely on their radar. And I think identifying those kinds of dark gray swans is part of a good successful forecasting process. But once it hits, you're absolutely right. Now it's here, now what do we do? Well, one thing we should
Starting point is 00:15:31 do is discount how much reliance we have on the static models. We don't want to get rid of them completely, but the importance and the reliance we've previously had on them should be discounted. And we should instead factor in more weight to probabilistic judgment from people who are recognizing the small subtle factors that will eventually lead to new models. Danielle And can you give some examples of what you call dark gray swans? Dr. Craig Hildreth One is, yeah, another pandemic could be a dark gray swan. Wildfires has become such
Starting point is 00:16:09 a thing where the frequency is much higher than it was, but where are they going to hit? That's the challenge. I think a lot of the AI developments can also fall into that category with some of those risks might look like, and also possible global conflicts. One that we're looking at more is in Korea. We started looking at this a couple of months ago, because there've been some changes. There've been some changes to North Korea's doctrine. North Korea sent troops to fight on Russia's behalf around Kursk. These are, in the global scheme of things, small, but significant. So we're paying much more attention to how events on the Korean Peninsula may unfold as a possible dark grace
Starting point is 00:16:53 one. And tariffs, there's an incoming administration with a wish list, and that's high on the wish list. So some sort of tariffs seems in the bag. But the more significant question, certainly for people who are asking us for our forecast, is where, on what categories of goods, which countries, are some of them going to get exempted? Or will it be an across the board thing? Exactly what does that look like? And what we're generally concluding for the moment is that just a blanket tariff, maybe a 10% probability of that occurring at the moment, we'll know more as we see, you know, who gets staffed in the different positions. But the higher risks are in specific
Starting point is 00:17:39 categories and specific countries. And we're just unfolding those questions now. Before I ask for the three takeaways you'd like to leave the audience with today, is there anything else you would like to mention that you have not already talked about? Well, I think one thing that I get asked a lot is the role of artificial intelligence in forecasting. And there are certainly an awful lot of firms popping up. But here's the thing too, just like with experts, there's an artificial tension between AI and humans, which from our point of view is misplaced. We do know, because the studies have been out there, that at least for now, the humans continue to do better than artificial intelligence. Phil Tetlock did a big research project that came out a couple of months ago where they compared lots
Starting point is 00:18:29 of different models and none of them did better than the super forecasters and the best of them still lag by 20%. But what we've seen is that you want to have both. You can use the LLMs, you can use the AI models to help humans get to a better number faster. So I'm very optimistic about the future with AI from that point of view. Warren, what are the three takeaways you would like to leave the audience with today? First, slow down.
Starting point is 00:19:00 We need to make lots of decisions every day, thousands, and most of those are gonna be from the gut. But for the ones that really matter, slow down and use what Daniel Kahneman calls system two thinking. And you basically start having a dialogue with yourself. And that way, you can challenge your thinking yourself. You can double check your assumptions, make sure you're not overly confident about what you think you know. The next thing is to change your mind. Expect to change your mind as new information becomes available. Number three is to keep score. At the end of it all, to get better, you need feedback.
Starting point is 00:19:41 You need to see if your forecasts occur with the frequency that you think they do. You need to see if your reasoning aligns with how reality unfolds. So, keep score both for yourself to get better and better calibrated as a forecaster and to make therefore better decisions and extend that accountability to thought leaders too. Lauren Ruffin Warren, thank you so much. This has been fascinating. Warren Smith Thank you, Lynn. It's been a pleasure. Lynn Howell If you're enjoying the podcast, and I really hope you are, please review us on Apple Podcasts or Spotify or wherever you get your podcasts. It really helps get the word out. If you're interested, you can also sign up for the Three Takeaways newsletter
Starting point is 00:20:25 at ThreeTakeAways.com, where you can also listen to previous episodes. You can also follow us on LinkedIn, X, Instagram, and Facebook. I'm Lynne Toman, and this is Three Takeaways. Thanks for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.