What if the experts were right to put Trump’s and Brexit’s chances at 30%?
It’s the quiet week between Christmas and the new year, so I’m taking the opportunity to briefly set investment, pensions and nonprofit programs aside and sneak in a post that touches on some of my favorite subjects: probabilities, forecasting and uncertainty. Specifically: what are the right—and wrong—lessons about forecasting from Donald Trump’s victory and the UK’s Brexit vote?
What if markets were right to put Trump’s and Brexit’s chances at 30%?
June’s vote in the UK in favor of Brexit and Donald Trump’s election victory here in the U.S. in November were two of the biggest stories of 2016. These were both events that, in advance, were judged by markets as probably not going to happen, each being given a probability of perhaps 30% (see endnote.) So, naturally, the general reaction post–event in both cases was that experts and forecasters and markets were wrong not to have seen the results coming.
But suppose for a moment that, based on all of the information actually available prior to either vote, 30% really was the correct probability to assign to each event. Admittedly, that is a big “if”. But let’s suppose it’s the case. Then even though each outcome was slightly unlikely, neither should really have been seen as a huge surprise: 30% is a long way from zero. Events such as these will always involve some degree of uncertainty no matter how good the forecasting process. So even the best–possible forecasts will often give less–than–50% probabilities to events that turn out to happen.
Brier scores
Obviously, if forecaster A assigns a 30% probability to an event and forecaster B assigns the same event a 70% probability and that event turns out to happen, then forecaster B made the better call. But if you make ten 70/30 calls, you’re not really expecting to be “right” on all of them (or else you should be rating them as 100% shots.) The whole point of assigning probabilities is that there’s uncertainty involved.
As it happens, there is a way to quantitatively assess this type of forecast. It’s called the Brier score. The math is not especially complicated: if you give a particular event a probability of x%, then you score 2*(1-x)^{2} if that event does happen and 2*x^{2} if it doesn’t. So if you give a 30% probability to an outcome, you’ll score either .18 if it doesn’t happen or .98 if it does. Low scores are good in this system.
Thus, if you assign a 30% probability to ten independent events and three of these turn out to happen, your average score will be .42. That score can be compared to other forecasters who’d given their own probabilities to the same ten events (although, sadly, you cannot directly compare it to scores based on different events—some things are easier to forecast than others.) The outcomes of ten events may not be enough to distinguish the better forecasters from the lucky ones, but if you keep going then it does become clear over time if one forecaster is consistently making better calls than another. Reaching a conclusion based on just two outcomes (even outcomes as high profile as Brexit and a U.S. presidential election) is just silly.
As Philip Tetlock pointed out in his book Superforecasting (reviewed a year ago in this blog ), forecasts are unlikely to get better unless results are measured: “We can all watch, see the results, and get a little wiser. All we have to do is get serious about keeping score.”^{1} That would mean getting over the knee–jerk mentality that feted Nate Silver as a genius for calling all 50 states in the 2012 Presidential election (even though he rated some as little more than 51/49 probabilities) and dismissed him as just another “so–called expert” four years later. Systematic measurement requires a bit more effort than that.
The bottom line: it may be a mistake to conclude from 2016’s events that experts know nothing or that forecasting is meaningless. Rather, it could just be that we need to remind ourselves that a 30% probability does not mean that something won’t happen.
Endnote: 30% and our understanding of “probably not”
Assessments of the probabilities fluctuated considerably in the run–up to each of the two votes and also varied depending on the source (e.g. bookmakers’ odds, prediction market odds, or the analysis of commentators such as fivethirtyeight.com). In the Brexit case, prediction market probabilities favored a “remain” vote despite close polls; in the U.S. Presidential election, in contrast, the polls were not particularly close at all. 30% is therefore simply an imprecise, albeit broadly correct, measure of the probability assigned by a range of sources to each of the eventual outcomes prior to voting.
As it happens, 30% has history as a generic representation of “probably not.” Tetlock tells^{2} of how Sherman Kent, who was highly influential in advancing the CIA’s approach to intelligence analysis, was taken aback to discover that when his team had agreed to describe in report a particular event as “a serious possibility,” it later turned out that team members’ understanding of what they meant by that varied from a 20% to an 80% probability. His response was to recommend the adoption of seven bands as follows:
Certainty | General area of possibility | |
100% | Certain | |
93% (give or take about 6%) | Almost certain | |
75% (give or take about 12%) | Probable | |
50% (give or take about 10%) | Changes about even | |
30% (give or take about 10%) | Probably not | |
7% (give or take about 5%) | Almost certainly not | |
0% | Impossible |
Readers familiar with behavioral finance may find parallels between these bands and findings (several decades later) regarding the way in which the human brain seems to intuitively categorize probabilities. But let’s not get started on yet another topic, even if it is a quiet week.
Happy new year to you all.
^{1}Tetlock, P. and D. Gardner (2015). Superforecasting: The Art and Science of Prediction. Crown Publishing Group: New York. p269.
^{2}Ibid. p 55.