The Knowledge Project with Shane Parrish - #6 Philip Tetlock: How to See the Future

Episode Date: December 8, 2015

In this episode of the Knowledge Project, I chat with professor and New York Times best-selling author Philip Tetlock about how we can get better at the art and science of predicting the future.   ...Go Premium: Members get early access, ad-free episodes, hand-edited transcripts, searchable transcripts, member-only episodes, and more. Sign up at: https://fs.blog/membership/   Every Sunday our newsletter shares timeless insights and ideas that you can use at work and home. Add it to your inbox: https://fs.blog/newsletter/   Follow Shane on Twitter at: https://twitter.com/ShaneAParrish Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Knowledge Project. I'm your host Shane Parrish, editor and chief curator of the Farnham Street blog, a website with over 70,000 readers dedicated to mastering the best of what other people have already figured out. The Knowledge Project allows me to interview amazing people from around the world to deconstruct why they're good at what they do. It's more conversation than prescription. On this episode, I'm happy to have Philip Tetlock, professor at the University of Pennsylvania. He's the co-leader of the Good Judgment Project, which is a multi-year forecasting study. And he's also the author of the recently released Super Forecasting, The Art and Science of Prediction.
Starting point is 00:00:46 How we can get better at prediction is the subject of this interview. We're going to dive into what makes some people better and what we can learn to improve our ability to guess the future. I hope you enjoy the conversation as much as I did. I want to talk about your new book, Super Forecasting, the Art and Science of Prediction that you wrote with, Dan Gardner, who like me, I think is still based in Ottawa. In the book, you say that we're all forecasters. Can you elaborate on that a little? Well, it's hard to make any decision in life, whether it's a consumer decision about whether to buy. a car or a house or whether to marry a particular spouse, potential spouse, or a candidate to vote for in an election, it's very hard to make any decision without forming at least
Starting point is 00:01:42 implicit expectations about what the consequences of that decision will be. So whenever you're making a decision, there are implied probabilities built into that. So the question becomes, are you better off with implicit probabilities that you don't recognize as probabilities or explicit ones. And I think one of the major takeaways from the forecasting tournaments we've been running is that when people make explicit judgments and they're fully self-conscious about what they're doing, they can learn to do it better. And you're talking about the Good Judgment Project? Can you maybe introduce us to that a little? Sure. Well, the Good Judgment Project is a research program that my wife, Barbara Mellers and I, started several years
Starting point is 00:02:28 ago, it was supported by a branch, research and development branch of the US intelligence community known as IARPA, Intelligence Advanced Research Projects Activity, which models itself after DARPA in the Defense Department. And their mandate is to support research that has the potential to revolutionize intelligence analysis. So working from that mandate, they decided in 2010 to support a series of forecasting tournaments in which major universities would compete, researchers at major universities would compete to generate accurate
Starting point is 00:03:02 probability estimates of possible futures of national security relevance. And we were one of the five teams selected for the competition in 2010. The tournaments ran from 2011 to 2015. They ended in June
Starting point is 00:03:17 of this year. And the Good Judgment Project, I am proud to say, was the winner of those forecasting tournaments. and I can explain more about what winning a forecasting tournament means later if you want. Congratulations, yeah, definitely. Is there a difference between forecasting and predicting? I don't see one.
Starting point is 00:03:37 I think if you go to a phtosaurus, I think we're going to find their virtual synonyms. Some people may try to draw distinctions of one sort or another, but I see them essentially as distinctions without a difference. And so were you using a representative subset of the Good Judgment Project, or were you using super forecasters from the project, or how are you competing in that? Well, different universities and different teams of researchers took different approaches to generating accurate probability estimates. We recruited thousands of forecasters,
Starting point is 00:04:10 and we explored a number of different techniques for eliciting the best possible probability estimates from those forecasters. We are continually running experiments. And one of the experiments we conducted was to identify top performers in each year, the top 2% of performers each year, cream them off into teams, elite teams with super teams of super forecasters, and give them as much support as we could, intellectual support as we could, for their task, and see what would happen. And they really went to town. They did a phenomenally good job. blew the ceiling off all of the performance expectations that are up ahead for what was possible. And frankly, they certainly exceeded my expectations as well.
Starting point is 00:05:00 So some of us are good and some of us are bad and some of us seem like way off the chart at making predictions. Why are some people so good? That is indeed the $64,000 question. Why are some people so good? So the skeptics argue that if you toss enough coins, enough time, some of them are bound to come up heads. So the super forecasters are just super lucky. So let's treat that as kind of the default skeptical hypothesis. There's nothing special about super forecasters. If we ran a tournament in which the task was, say, to predict whether a fair coin would
Starting point is 00:05:34 land heads or tails, some people would do better than others just by chance in a given year. We could anoint those people as super coin toss predictors, and we could say, well, how are they going to do next year? and what we would find is perfect regression toward the mean. The best prediction is that the super coin toss predictories in year one will be essentially around the average in year two. And the worst predictories will progress upward toward the mean, of course. So that's what a pure chance environment would look like. Well, what we find in the Europa tournament
Starting point is 00:06:05 is that there certainly is an element of chance in predicting geopolitical and geo-economic outcomes, but the skill luck ratio seems to be about 70-30. So you're not observing a great deal of regression toward the mean among super forecasters, but there inevitably is some regression toward the mean among the top performers. And so what makes those people so good? Well, now that we've eliminated or at least rendered implausible, the super lucky hypothesis, the question becomes, what are the attributes these super forecasters have?
Starting point is 00:06:36 You might think of them as being stable, like the logical attributes, are they score higher on measures of fluid intelligence or crystallized intelligence or active open-mindedness, did they have certain attitudinal profiles, certain behavioral profiles? And the answer is all of the above. They differ from ordinary mortals in a host of ways. They're not radically different from ordinary mortals, but they are systematically different. They tend to score higher on measures of fluid intelligence. They tend to score higher measures of active open-mindedness. But if I had to identify one factor that I think best distinguishes super forecasters from, from other forecasters who are equally intelligent and equally open-minded, it is that super
Starting point is 00:07:18 forecasters believe that probability estimation of real-world events is a skill that can be cultivated and is worth cultivating. And they're willing to make that commitment, that effort. So when people ask me, how could the super-forecasters have outperformed, say, intelligence analysts who do this full-time and have access to classified information, I think the short answer is it's not because they're smarter and it's not because they're even more open-minded, although they are pretty open-minded. It's because they are willing to make this commitment, this act of faith, that there is a skill that is worth cultivating. So in the book, we quote Aaron Brown, who's the chief risk officer at AQR and also a great poker player, that his view is you could distinguish great players from talented amateurs on the basis that great players are good at distinguishing 60-40 bets from 40.
Starting point is 00:08:09 60 bets. And then he paused and says, no, maybe more like 55, 45, 45, 55. The greatest players tend to be extremely granular in their assessments of uncertainty. One of the big questions I think that IARPA wanted us to answer and that I think we have answered in the affirmative is, does granularity in assessments of uncertainty pay off not just in poker, but when you're making messy real-world judgments, like whether Greece is going to leave the Eurozone or what kind of mischief Putin might be up to in the Ukraine next or what's going to happen with Sino-Japanese relations in the East China Sea or there's going
Starting point is 00:08:50 to be another outbreak of bird flu in a given region. These are extremely idiosyncratic one-shot historical events. It's not like poker where you're sampling from a well-defined sampling universe, repeated play, quick feedback. So there are a lot of people, very smart people have been skeptical for many decades, that it's even possible to make probability estimates of these kinds of intelligence analytic problems. And I think what the IRPIT tournament has proven, beyond reasonable doubt in my opinion, is that there is room for improvement. It's possible to make these probability estimates. It's possible to get better at it. It's possible to identify the kinds of people who learn to do it better. It's possible to develop training modules to help people
Starting point is 00:09:32 do it better. And the gains and accuracy are appreciable. So what happened when you took average people and you started giving them, I think I remember this, that you started giving them a course and probability? We get about, for average forecasters who are randomly assigned to an experimental condition in which they get connem and style deb-biasing exercises, the improvement is in the vicinity of 10%. And that's a big effect when you consider that, you're talking about improvement across the entire year of forecasting, and this training exercise takes.
Starting point is 00:10:07 takes about 50 minutes. And what did that consist of this 50 minute training exercise? Some basic ideas about heuristics and biases and how to check biases. For example, one of the classic coneman arguments is that people don't give enough weight to statistical or base rate information in assessing the probabilities of events. They're too quick to take the inside view. So if you're attending a wedding and you see the happy couple and you're impressed by how much in love they are
Starting point is 00:10:43 and the enthusiasm of the moment and someone asks you, how likely are they to get divorced, you're not likely to consult national divorce statistics for that SES subgroup. You're likely to say, hmm, they look really happy and compatible. I'm going to touch a very high probability to they're not getting divorced.
Starting point is 00:11:00 And the net result of making predictions in that way is that you're going to be, what is less accurate than you would have been if you had at least started your estimation process by saying what are the base rates of divorce and now I'm going to adjust that based on whatever idiosyncratic factors are present in this particular relationship. So starting with the outside view and working your way inside? Exactly. Start with the outside and work inside. That's a, it's one of our mantras. So, but isn't Conman famous for saying that he's studied biases his whole life and he feels
Starting point is 00:11:32 like he's no better at avoiding them. So how does this 50-minute training exercise come in and help people? Well, Danny Kahneman was a colleague of ours at Berkeley. My wife and I bar, we know him well. And we know that he is more pessimistic about the prospects for debiasing than we are. He did give us advice on how to design the debiasing modules. I think he probably is more of a pessimist still than we are, but I think he is persuaded that these improvements are real.
Starting point is 00:12:02 They certainly seem to be. So one of the keys to keeping track of forecasting and your ability to predict is kind of keeping score. And do you think it takes a certain type of person to want to keep score? I mean, most of us are happy to kind of weasel out of or use uncertain wording or jargon when we're going about making decisions so that even if we're wrong, we can kind of say, well, that's not what I meant. Absolutely. It does take a particular type of person, and there are many factors that come into play.
Starting point is 00:12:36 I think it certainly helps to be open-minded, but there are other things that come into play. They're a little more, say, sociological. I've been doing forecasting tournaments for over 30 years now, and I started when I was about 30 in 1984. I'm 61 years old now. So if I were an intelligence analyst, a 61-year-old intelligence analyst, I would be a very senior analyst. Um, and let's just say for sake of argument that, that I, uh, I am a senior analyst in, in, in the U.S. intelligence community. I'm on the national intelligence council say, just for sake of argument. And I, I'm the go to guy on China. So when Xi Jinping comes into town, people say to me, you know, what's what's going on. I have inputs into the presidential daily briefing and help with national intelligence estimates. And I'm at the top of the status pecking order within the IC on China. And someone comes along like IARPA is, this upstart research and development branch for the Office of Director of National Intelligence. And they say, hey, you know what we're going to do?
Starting point is 00:13:36 We want to run forecasting tournaments now. And everyone's going to compete on a level of playing field. And 25-year-old China analysts are going to compete against 61-year-old analysts like Tetlock. And we're going to see who does better. Are the 61-year-old analysts going to welcome this development? No. To ask us to answer. Even open-minded 61-year-olds are not going to be very enthusiastic about this.
Starting point is 00:14:03 They're going to argue that these turn limits just don't really capture what makes my judgment special. And that is indeed a lot of the resistance we run into for forecasting tournaments. I mean, in the book, you may remember we talk about the parable of two forecasters at the beginning, Tom Friedman and Bill Flack. Almost everybody who reads newspapers knows who Tom Friedman is, famous New York Times columnist, Middle East expert, often in the White House or Davos, God knows where. And Bill Flack, nobody has a faintest idea who he is because he's an anonymous, retired irrigation
Starting point is 00:14:35 specialist in Nebraska who happens to be a super forecaster. And we know a tremendous amount about Bill Flack's forecasting track record. We know almost nothing about Tom Friedman's forecasting track record. Right. And that's in substantial part because Tom Friedman's forecasts, and he does make forecasts, are embedded in vague verbiage. He says that this could happen or this might happen. And when you say something could or might happen, that could mean anything from 0.1 to 0.9
Starting point is 00:15:02 in probability terms. And, you know, if it happens, I can say, well, I told you it could. And if it doesn't happen, I can say, look, I merely said it could. Right. You can't get paid down. You've covered very nicely. Yeah. Do you think that that's one of the problems with organizations?
Starting point is 00:15:18 I mean, it seems like we're not getting better as organizations at making decisions, in part, because our ability to keep score is, you know, hampered by these psychological kind of effects where, you know, if I keep score, I might be wrong, so my incentive is not to. And if I use precise wording, it might be wrong, so my incentive is not to. Yes. Yeah, I think there's a whole mix.
Starting point is 00:15:41 There's a real mixture, powerful mixture, of psychological and political forces that interact to create a lot of resistance to forecasting tournaments. So even though I think we have shown that forecasting tournaments can appreciably improve probability estimates, there are a lot of reasons why organizations don't adopt them. One is the people at the top of the status hierarchy are not very enthusiastic. Bob, who's in the CEO suite, isn't all that enthusiastic about being discovered that Bob in the mailroom is just as good as he is at anticipating trends
Starting point is 00:16:12 relevant to the company's future. So you have the status hierarchy problem. People at the top don't want to be a second guess. They don't want their judgment process to be demystified A large part of status in contemporary organizations is that there's something special about your judgment. So even open-minded high-status people are going to be reluctant to do this because it's going to look like a career-damaging move. So there's certainly that. And there's a lot of other factors in play. I mean, there's, again, this Kahneman argument that people don't pay attention to the outside view. In the book we talk about a mistake that a New York Times, famous New York Times journalist, David Leonhard,
Starting point is 00:16:53 You may know him. He runs the upshot in the column and the New York Times. He's a quant-savvy journalist. And he made a mistake in 2012 that we talk about that illustrates just how tenacious the misconceptions can be. He was commenting on the Supreme Court decision to uphold Obamacare in 2012. It was a narrow decision. It was 5'4.
Starting point is 00:17:19 and he noted that the prediction markets had had futures contracts on this decision, on the Supreme Court decision, and they were pricing it at about a 75% probability of the law being overturned. Okay, so they were way off. And he said, well, how far off is way off? He said, well, they got it wrong. He just said flat out got it wrong. That doesn't account for the complexity, right? That itself is wrong.
Starting point is 00:17:48 It certainly isn't good news. that the prediction market that it was on the wrong side, it may be by that margin, but prediction markets have generated hundreds of forecasts over many years, and they've proven to be pretty darn well calibrated, which is another way of saying, when they say 75% probability of something happening, things happen about 75% of the time, and they don't happen about 25% of the time. So even if you have a perfectly calibrated prediction market system doing, when it says 75% 25% of the time, smart observers, observers are smart as David Leonhardt are going to be tempted to conclude that you're wrong and to dismiss you. So this creates
Starting point is 00:18:25 a huge political incentive to stick with vague verbiage. If they simply said it could be overturned, you know, they would be well positioned to explain it either way. But because the prediction market was generating these precise probability estimates and because people don't take the outside view and say, well, we can't just look at that particular forecast, we have to put it in the context of all these other forecasts that the system is generating, take the outside view toward the system, people have a very hard time doing that. David Leonhardt knows that this is true, and he's even written later on the upshot about situations in which I read about this fallacy.
Starting point is 00:19:03 So if someone as smart as that, who doesn't have a grudge against prediction markets can make a mistake like that, you can see why politically savvy intelligence analysts might be reluctant in a blame game culture like D.C. do it. Right. I think one of the most interesting parts of the book for me was when you started talking about the Fermi-style thinking. Can you introduce us to that? Well, Enrico Fermi was Italian-American physicists who developed the first nuclear reactor at the University of Chicago. He was involved in the development of atomic bomb in World War II. And he was known for his rather flamboyant thinking style. He was continually coming up with innovative.
Starting point is 00:19:46 ways of estimating the seemingly unestimatable. One of the famous examples of a Fermi problem was, you know, it sounds really weird. It was to estimate the number of piano tuners in Chicago. Other examples might be estimating how much the Empire State Building ways
Starting point is 00:20:01 or estimating the likelihood of extraterrestrial civilizations elsewhere in the Milky Way. Sounds a lot like the brain teasers that Google used to ask to hire, right? Exactly. Now, I don't know whether Google, whether the legal department It still allows Google to continue using those for screening potential personnel.
Starting point is 00:20:20 But they are interesting tests of how people approach problems. And what was so interesting about the way that Fermi approached it? He really believed in flushing out your ignorance and decomposing the problem into as many tractable components as possible. So you would start by how many stars are there in the Milky Way, roughly about $100 billion. you'd say, well, how many of these stars have planets orbiting around them? You might look at the most recent data from Kepler, which has done some reconnaissance in our local area, about 60 light years around, and you say, well, you know, it looks like a fair number,
Starting point is 00:21:00 a pretty high percentage of stars do seem to have planets going around them. Let's say it could be as much of half or maybe slightly less. I don't really know the answer to that question, but you make initial guesses. You flush out your ignorance. And then other people can come back and they can see that Tetlock said about half and they say, oh, Tetlock doesn't understand what Keppler is doing. It should have been 70%. No, it should have been 30%.
Starting point is 00:21:27 But it's not that Petlock is getting it right. It's that we're flushing out Tetlock's zone of ignorance and we're making it clear and it's all open and transparent. And then we would, you know, in the net process of the inquiry would continue how many planets are in the habitable zone. And you direct some further guesstimate from Kepler. It's a fairly small fraction of planets seem to qualify for that. And, but that still might leave you with, say, as many as 500 million to a billion planets that are potentially inhabitable zones. And then you'd have to make some estimate about how likely is life to jumpstart if you have a planet in a habitable zone
Starting point is 00:22:06 and how likely is intelligent life to emerge once you have. And there are different evolutionary theorists who have different models that at least have some of different implications as answers to those questions. And what you would wind up with would be ranges of probabilities. Now, for this particular problem, the range of possible probability is going to be very large.
Starting point is 00:22:26 You know, we know it's not impossible. There's another advanced extraterrestrial civilization in the Milky Way. We also know it's not a sure thing. It's probably, you know, In my best estimate, if I were to combine all the different steps that we just started to work through, it would be probably more than one or two percent, but I don't think it would be as high as 90 percent. It would probably be maybe it could take between two and 50 percent. Now, that's a guesstimate.
Starting point is 00:22:54 Now, there's nothing special about that number, but what Tetlock has done now, if he's flushed out, Tettlock me, I'm talking about my south third person here, what the Fermi person, the Fermiizer, the person using the Fermi method is done, he or she has flushed out all the different points of ignorance along the reasoning continuum. And you, the observer, can say, oh, look, Tetelac made a really stupid estimate here, and you have to adjust that, but it's a basis for proceeding. But initially look like a hopelessly intractable problem, at least becomes at least a little more tractable.
Starting point is 00:23:29 And that's what Super Forecasters are pretty good at doing. at breaking down seemingly intractable problems into semi-attractable components and then just pushing, they're not afraid of looking stupid and making estimates that observers can see and look at it and say, oh, my God, why did you say something that's stupid about the capital project? That's an incredible point where you're taking this big intractable kind of problem that's very hard to pin down, and you determine you have some organized process for determining the sub-components involved to get you there, and then you go through and estimates, So part of that would be highlighting your thinking, right?
Starting point is 00:24:05 Yes, sir. And then part of that would be like, I really don't know anything about this question. So can I break that down further into subcomponents, or am I extrapolating too much? No, that's exactly the spirit of the enterprise. So why is that style of thinking? Why does it lend itself, do you think, to better forecasting? Is it just the nature of the changing the framing of the problem itself, or do you think it's more the curiosity of the people who are willing to break it down and go through, it sounds like a lot of work.
Starting point is 00:24:36 It sounds very demanding and mentally taxing to do that versus just throughout an estimate with your immediate response. You're exactly right. It is demanding, and I think it works best if it's done in a team environment in which members of the team have mutual respect for each other, but they're also willing to push each other hard. So if you were an organization and you wanted to set up a team environment, like a forecasting team within a large company, say IBM, how would you go about doing that with your
Starting point is 00:25:08 knowledge? That's a great question. And I'm a little bit wary about saying that organizations should try to construct super teams the way the Good Judgment Project did. Because team construction has a lot of implications for other parts of the organization. That can be tricky. I mean, imagine that if you just did what we did in the IARC tournament to win it, and you just identified the very best people, brought them together and nurtured them and helped them and pushed them hard, that would be a very elitist and somewhat divisive thing to do in many organizations. And it could cause a lot of political friction.
Starting point is 00:25:49 Now, we didn't care a lot about that because we were in a forecasting tournament. We didn't really have an organization in the traditional sense of the term. We wanted performance engine. right we wanted to harness human ingenuity individually and collectively as rigorously as possible to generate as accurate as possible probability estimates for things that you tell this community cared about that was it was a pure accuracy game and we weren't we weren't that interested in the long-term viability of the organization we were interested in just pure accuracy so I would I would be a little cautious about saying you know it's really easy all you do is you recruit these super forecasters and you put them into these teams and you give them some training on how to do precision questioning, and you give them some training on how to do constructive confrontation, and you've got these anti-group think norms enforced, and you give them some training and guidance and probabilistic reasoning, and you encourage a certain self-critical structure and culture inside the teams, and boom, an amazingly accurate forecasts emerge.
Starting point is 00:26:48 It works pretty well in the forecasting tournament environment, but whether it would work well in an actual organization, I think the senior executives who want to think carefully about each step along the way there. What would you say to people inside an organization, how can they use your research to make better decisions inside their company? Well, I think it's something you want to consider seriously that when people make forecasts inside most organizations today, accuracy is only one of the goals that they're pursuing. They're also interested in making forecasts that are going to be difficult to falsify.
Starting point is 00:27:26 I said they can't be embarrassed. So a lot of the forecasting inside organizations doesn't involve numbers. It involves a lot of vague verbiage. They're also interested in making forecasts that don't annoy other people in the organization. They don't want to tip the political apple card over. So they're compromising accuracy in a whole host of ways that help promote their careers inside the organization, help to maintain political stability in the organization. but that aren't all that centrally focused on accuracy.
Starting point is 00:27:57 Forecasting tournaments are really weird because they focus 100% on accuracy. That's all that matters. So I guess the thing you'd want to consider as an executive would be do I want to reserve part of my organization's analytical processing capacity for a pure accuracy game? So I want to incentivize some small group of the people in my organization to play pure accuracy games in forecasting tournaments.
Starting point is 00:28:23 and those probability estimates would then filter up to senior executives to guide decision-making. I think it's really an interesting experiment to consider doing. I think the intelligence community has been moving somewhat in that direction. I think it's a good idea, and I think it would probably be a good idea for many other entities as well, at least to consider. It's in the spirit of the whole IARP enterprise is to run experiments. And what I would propose would be that senior executives consider running experiments in which they see what do they, what do they?
Starting point is 00:28:53 discover when they incentivize people to play true accuracy games. And do you think what transfers from your research into the decision-making process in a corporation, not necessarily about forecasting, but about how we go about organizing, unpacking, synthesizing, multiple views, how does that transfer do you think into a learnable skill that people can have inside of an organization? There are many ways that could happen. We put a lot of emphasis in the Good Judgment Project on synthesizing diverse views into aggregate forecasts. And I think one of our major performance engines was the statistical or aggregation algorithms that our statisticians developed for doing that. When IARPA started this whole exercise, they thought it would be really hard to do better than 20 or 30 or 40% better than the unweighted average of the control group forecasters.
Starting point is 00:29:48 And our super forecasters exceeded that performance benchmark. quite substantially each year of the tournament. They did so well that IARPA essentially suspended the tournament after two years and we were able to absorb the other teams into our teams in substantial ways and compete against the intelligence community and against the prediction market baselines instead of the other universities. Now, how did all that come to pass? I think the aggregation algorithm developed, and if I had to credit two big things,
Starting point is 00:30:21 as responsible for the victory at the Good Judgment Project. One of them would be the super forecasters, and the other would be, they would call them super algorithms, the great algorithms that our statisticians develop. Now, when I describe these algorithms, at some level you're not going to be too surprised at first, but there is one aspect of them that does surprise most people. So the first thing to do,
Starting point is 00:30:44 I don't know if your listeners are familiar with the James Serwicky wisdom of the crowd book, But it's been well known in the forecasting world that the average of a group of forecasters, the average forecast from those forecasters, is going to be more accurate than most of the individuals from whom the average was derived. And this is the famous Galton story about the ox. You had hundreds of people trying to guess the weight of the ox. And the average of all those guesses was only about one or two pounds off from the
Starting point is 00:31:17 original from from the true weight of the ox and that would that means it was more accurate than all of the individuals from the average was derived so averaging is a powerful way of synthesizing information from diverse perspectives it's it's really it's a remarkably crude approach to doing it but it works pretty darn well and that's why iarpa used it as its benchmark now we were able to be averaging by doing some simple things like giving more weight the better forecasters as we get more and more data on who the good forecasters were, who the more intelligent forecasters were, who the more frequent belief updators were, various attributes of forecasters, we're able to give more weight to certain forecasters and we created weighted averages.
Starting point is 00:31:58 The weighted averages beat the average. That's not too surprising, is it? I mean, it makes sense. It's not astonishing, though. Now, here's the interesting thing that the algorithms did. They did something called extremizing. And to illustrate extremizing, I want to just to have a little digression of a story that we do talk about in the book about the decision President Obama made to go after Osama bin Laden. In the movie 030, they have a scene in which senior analysts are being polled on how likely they think it is that Osama bin Laden is in that compound.
Starting point is 00:32:39 And putting aside what Hollywood says about it. Let's just do a little thought experiment. And imagine that you're the president of the United States and you have these senior advisors around the table and you ask them, how likely is it that Osama is there? And each of the analysts around the table says, do you miss the president? And I think the answer is 0.7.
Starting point is 00:32:57 0.7, 0.7. Everybody around the table says 0.7. What should the president conclude is the likelihood that Osama bin Laden is in that compound? And the short answer to that is, well, if the advisors are all clones of each other, and they're drawing on exactly the same information and processing it in exactly the same way,
Starting point is 00:33:16 the answer is 0.7, because there's no information added, right? But imagine that the analysts say 0.7 all around the table, but the analysts don't know each other, and they haven't been sharing information, and each analyst bases his or her 0.7 judgment on information that only he or she has. So you have extreme diversity of perspectives. One person has satellite information, another has encryption breaking stuff,
Starting point is 00:33:42 and another one has human intelligence and so forth. But they're siloized, and they're coming together for the first time, and each one has independently arrived at this 0.7 estimate from very different sources of information. You've got true diversity here. And is the answer still 0.7? Should the president say, shrug and say, well, I think the answer is 0.7, or should the president say, gee, each of you has very different reasons for believing 0.7? this leaves me to suppose that the answer is probably more extreme than 0.7,
Starting point is 00:34:11 because if each of you knew the reasons the others had, you would probably become more extreme. And that's exactly what the best algorithm did. It extremized as a function of diversity. So 0.7 was turned into 0.85 or 0.9. That's fascinating. I mean, how did it go about doing that in terms of aggregating the data from the people or from the forecasters?
Starting point is 00:34:34 That's right, from the forecasters. And what would happen if you had two forecasters who have great track records, and then they're divergent on, they're really divergent on an opinion or a forecast? Is that happen often? No, it doesn't happen very often, actually. But if it did happen, it would be a real cautionary moment. If you had two super forecasters, one of whom was at 0.9, there was a 0.1, my inclination would be not to stray too far from 0.5, knowing nothing else. at the moment. Are there certain types of questions to avoid if your desire is to have an accurate prediction? Yes. Well, there are many questions in the IARPA tournament. There are many questions
Starting point is 00:35:20 in life in which there's a massive amount of irreducible uncertainty. If you want to be a good forecaster, you don't spend very much time working on roulette wheel type problems. I mean, if you go to, if you visit casinos, you'll find lots of people who think they can detect patterns and roulette wheel spins. And they develop little algorithms even to help them. But what they're doing is they're essentially modeling randomness. So spending a lot of time modeling randomness is a good way not to become a super forecaster. What other types of questions would you say don't lend themselves to, is it like a time
Starting point is 00:35:55 duration? Is it? Oh, what other kinds of questions are roulette wheel like? Well, not roulette wheel, but if you're, what kind of, what kind of, What type of questions lend themselves to better predictions, right? Is it short time, very few, I mean, I don't want to say very few variables, but short time duration versus long time duration, because you have to constantly update over a long period of time, right?
Starting point is 00:36:20 I mean, that was one of the things that super forecasters did was they updated there. Yes, that's true. Well, yeah, all of the things equal, it's usually easier to predict questions with shorter time ranges than longer time range. But that's not always true. I mean, some short-range questions are extremely unpredictable. It's very hard to say whether the stock market is going to go up or down tomorrow. So that's a short-range question.
Starting point is 00:36:45 In some ways, it's easier to predict where the stock market is going to be up or down 10 years from now relative to now than it is tomorrow, right? That's a good point. So there are categories of problems in which you get a reversal of that. But, yes, I think by and large it's true that the analogy to vision would be, right? It's easier to see the Snell and I chart if you're close to it, and you're far from it. Probabilistic foresight is better in shorter time ranges.
Starting point is 00:37:11 That's one of the things I talk about in the book, one of the reasons why my later work is different in emphasis from my earlier work, in which experts had a hard time beating the dark-thowing chimpanzee, because they were in the earlier work making much longer-term predictions than they were in the IARPA work, where the predictions were rarely much more than a year. You mentioned open-mindedness at the beginning.
Starting point is 00:37:34 How do we go about fostering open-mindedness? Are there ways that we can improve that in ourselves or other people? Well, that's another thing we do try emphasize in the training. Exerting people simply to be open-minded is most people don't think they're close-minded. Most people think they're quite reasonable. And simply exhorting people to be open-minded, people struggle and say, well, yeah, I already am. I think you want to start in a more specific ways. You want to start with very specific problems in which you assess whether people change their minds in an appropriate
Starting point is 00:38:12 way. So there are some normative models like base theorem that tell you how much you should change your mind in response to evidence that has certain diagnostic value. And you can create simulated problems. It may be medical diagnosis problems. It might be economic problems. They might be military problems, but you can create simulated problems with simulated data, and you can see whether people learn to practice to update their beliefs the way they should. Now, there's always a question whether those lessons are going to stick, and we found that they do stick a little bit because they can produce 10% improvement throughout the year. But it's one of the great challenges.
Starting point is 00:38:51 I don't think we've solved the problem of how to make people more open-minded. I think we can make people better belief updators on problems where they don't have very strong ideological priors or preconceptions. But when people have really strong emotions and ideological convictions about presidential candidates or economic policy or whatnot, belief updating becomes quite problematic. Yeah, I mean, I can see why that would be a problem, right? It contradicts probably something that you hold very dear and true. Giving that up would take a lot of mental issues. Yeah, we can make people a bit more open-minded, but making people perfect Bayesian belief updators is something that no one has achieved yet, and I think it will be very difficult to
Starting point is 00:39:37 achieve. I think we should keep working on it. I don't think we should give up. Do you think that super forecasters were better at learning from the other super forecasters than the, say, average forecaster? Like if somebody had a better approach, would they copy it? Would they just drop their own internal approach? I think they listened to each other quite carefully in the super forecaster teams. Even when they disagree with each other, they disagree diplomatically, but they can disagree quite forcefully about what lessons they should draw from particular forecasting failures
Starting point is 00:40:08 or even forecasting successes. I mean, it's fairly common for regular forecasters even to say, well, what did we do wrong with the forecasting failure? And supers do that too, but they also second-guess their successes. They say, well, were we lucky, we really mailed this question, but were we lucky? Could it have gone otherwise? Were we almost wrong? That's an unusual question for people to ask themselves.
Starting point is 00:40:40 People don't normally look a gift horse in the mouth. When they're right, they want to take credit for it. And super forecaster skepticism even extends to their forecasting successes. I can't imagine a lot of the average or below average in terms of forecasting ability people went through their successes and evaluated them from that angle. What would you say is the role of intuition in forecasting, or would you say that it's minimized, or would you say that it's... This is one of the big debates in the field of judgment and decision-making.
Starting point is 00:41:15 Malcolm Bladwell wrote a book called Blink, and some psychologists wrote a rejoinder book. much less widely read, called Think. There are different schools of thought about the value of intuition. And even Gladwell, of course, has devised it in his book. He did point to some great successes of intuition, but also noted the situations in which intuition could lead you seriously astray. I think the dominant emphasis in our work, it leans toward think over blink.
Starting point is 00:41:46 I'm not ruling out the possibility that there are super-forecasters who do rely on intuition but the problems that we're dealing with in real world are different from the sorts of problems where brilliant intuition has been demonstrated pretty rigorously so it's not like chess where you're playing the same game with well-defined rules right the pattern recognition really smart people can do extremely rapid forms of combinatorics and pattern recognition and and it's quite astonishing what they can do. Real world isn't quite like chess, is it?
Starting point is 00:42:27 And I think that it requires more subtlety and more willingness to second guess yourself because history, I think it was Mark Twain who said history doesn't repeat itself, but does rhyme. And I think Super Forecasters sort of get that. There are patterns in history, but they're quite subtle and they're quite conditional. And you can easily over-learn from history.
Starting point is 00:42:50 Hmm. That's a really good point. What book would you say has had the most impact on your life? On my life. On my life? That would have to be a book I read very early on in my life. Oh, possibly, yeah. Yeah. I think, well, I don't know how far back we should go on this one. I mean, if I were to go back to graduate school, say, when I was making decisions about what I would do with my research career, there was a book by Robert Jervis who still, I think he's an, he's maybe an emeritus, professor now at Columbia, but he's a very senior political scientist. He wrote a wonderful book in 1976 that I was in graduate school in 19, I just started in graduate school in 1976. And it's called perception and misperception in international politics. And it is a wonderful synthesis of psychology and political science. And I think it is a synthesis of the
Starting point is 00:43:42 sort that I've aspired to, I've tried to be Jervisian in my work in many ways. Now, Dervis is not quantitative researcher. He's qualitative, whereas I'm more quantitative. So we differ in a number of ways. But I have a deep respect for how he was trying to synthesize the psychological and the political. And I suppose if there's any themes running through my work at synthesizing the psychological and the political. So the last question is, who would you like to see interviewed on the show and their thoughts articulated or explored with me? Well, I've always been a fan of Michael Lewis's work. I think he would be a fun person to talk to, and I think he may be working on a biography of Daniel Kahneman, the name of Tversky. I think that would be an interesting
Starting point is 00:44:29 conversation. Well, excellent. Thank you so much, Phil, for taking the time. I really appreciate it. It's been a great conversation. Oh, it's a pleasure. Hey, guys, this is Shane again. Just a few more things before we wrap up. You can find show notes at Farnhamstreetblog.com slash podcast. That's F-A-R-N-A-M-S-T-R-E-E-T-B-L-O-G.com slash podcast. You can also find information there on how to get a transcript. And if you'd like to receive a weekly email from me filled with all sorts of brain food,
Starting point is 00:45:07 go to Farnhamstreetblog.com slash newsletter. This is all the good stuff I've found on the web that week that I've read and shared with close friends, books I'm reading, and so much more. Thank you for listening. You know, I'm going to be.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.