Meet the Elite Team of Superforecasters Who Have Turned Future-Gazing Into a Science
You just might learn a thing or two about how to look into your own crystal ball.
Before January 1, 2022, will the United States Olympic Committee announce that it is boycotting the 2022 Olympics?
I am in a virtual workshop that will test my abilities to forecast the future, and I have 10 seconds to answer. I’m scanning my brain. (Simone Biles, not relevant. Moscow, 1980, yes. Uyghurs?) But time’s up. I guess 20 percent. Then it’s on to the next questions: What is the probability the U.S. will regulate cryptocurrencies on the stock market by January 2023? Will China attempt to take Taiwan over the next five years? How big is the surface area of the Mediterranean Sea in square kilometers?
“I bet you didn’t wake up thinking you had to answer that question today,” says Warren Hatch, who is co-leading this workshop.
There are about 12 of us taking this training, including a guy from the Department of Defense. Over the next two days, we scratch our heads, trade bits of insight, try to shed our cognitive biases (more on that later), and see if we have the chops for predicting things professionally. I am definitely out, but I suspect a couple in this group qualify. Those who do will be a step closer to gaining an elite, though geeky, kind of status: It’s called a “superforecaster.” And if you are one, you can join the global network of über predictors, the best of the best, who work with the company that arranged this workshop in the first place. It is called Good Judgment. Hatch is its CEO.
While I’m at my laptop sweating it out for Good Judgment in September, experts are making headlines in the real world answering similar questions. “Inflation is elevated and will likely remain so in coming months,” predicts Federal Reserve chairman Jerome Powell; “There is a chance that we will see big declines in coming years,” wagers a Yale economist on home prices. Anthony Fauci, meanwhile, says a Northeast surge of Delta is “possible.” It’s an interesting contrast. As a culture, we’ve come to accept “Likely,” “Possible,” or “There’s a chance in the coming years” as the best our top authorities can tell us about what lies ahead. But what does likely mean, in a concrete way? Is it a 51 percent odds of happening, or 85 percent? Are 2022 and 2023 considered “coming years,” or are 2024 and 2025?
We may not demand this level of specificity from our experts, but we sure need it in business. And as Good Judgment proves, you actually can quantify vague hunches like these with scalpel-like accuracy — simply with the human brain, no AI or big data.
Few people do it with more Olympian skill than Good Judgment’s superforecasters. But as with most sports, we can all get better. We just need to train.
Unlike many companies that begin life in a dark bar scribbled on a cocktail napkin, Good Judgment was born in the belly of the U.S. government. In a way, it goes back to 9/11. After analysts appeared to miss signals of the catastrophic terrorist attack, a group called IARPA (or Intelligence Advanced Research Projects Activity) was created in 2006, modeled after the defense agency DARPA. Its goal was to conduct daring, innovative research that improves American intelligence. By 2010, the intelligence community had started using an internal classified prediction market where top-secret-cleared employees could make trades on whether an event would happen. But IARPA wondered if there was an even better way to use the wisdom of the crowd to foresee what was coming.
That’s why, in 2011, it launched a huge forecasting tournament for the public. At the beginning, there were five teams, and over the next four years, thousands of ordinary Joes and Janes would answer about 500 questions, like: Will North Korea launch a new multistage missile before May 10, 2014? Will Robert Mugabe cease to be president of Zimbabwe by September 30, 2011? The teams had to reach certain benchmarks of accuracy; if they failed, they were eliminated. After the first two years, only one team remained. It was led by Philip Tetlock and Barbara Mellers at University of Pennsylvania’s Wharton School, and called Good Judgment.
Tetlock was already deep into the science of prediction. Back in the 1980s, he’d become curious as to why so many foreign policy experts had failed to predict the Soviet Union’s fate, and it inspired him to analyze broad swaths of predictions. As it turns out, the average expert was roughly as accurate as a dart-throwing chimpanzee. (That’s not quite how he put it, but close enough that he doesn’t mind the joke.) So he developed a more systematic approach — not just to predictions but to identifying the kinds of people who are good at making predictions. To compete in IARPA’s tournament, he and Mellers recruited 3,200 volunteers, then winnowed them down to the top 2 percent, which they called superforecasters. Among that group was Hatch, a Wall Street guy who’d left Morgan Stanley to set up his own small investment firm, and who was trading on a forecasting platform on the side.
By the fourth year of the tournament, the Good Judgment team was 50 percent more accurate than IARPA’s control team recruited from the public; in some cases, it even outperformed intelligence analysts using IARPA’s internal prediction market with access to classified information. The researchers learned a lot, and they put together a guide that the intelligence community began using to train many of their analysts, according to IARPA program manager Steven Rieber. “It’s not what we expected to find,” he says of the tournament. “The fact that there are these people who have unusual skill across domains in making accurate forecasts came as a surprise to me as well as to many others. And that we ordinary people can become more accurate in our own predictions.”
But the government wasn’t the only one to see opportunity here. As the tournament was a year from concluding, in 2014, Tetlock, Mellers, and another colleague transformed Good Judgment into a forecasting company — with a plan to use its elite superforecasters to answer clients’ questions about the future. They asked Hatch to help run it with them. And, based on his own predictions, he decided it was a good idea.
Are you overconfident? Most people would say no. But most people are wrong. That’s what Good Judgment has found—and why, when evaluating whether someone has the skills to be a superforecaster, it tests for overconfidence.
To see what that looks like, another member of the Entrepreneur team submits to Hatch’s questioning: Jason Feifer, editor in chief.
“What year was Gandhi born?” Hatch asks. Specifically, he wants a range — the earliest and the latest year Feifer thinks Gandhi could have been born. Not only that, Feifer should pick years that he is 90 percent confident he’s correct about.
Feifer laughs, because he simply has no idea. “I’m going to say 1940 and 1955.”
“It turns out,” says Hatch, “Gandhi was born in 1869.”
“Oh, I don’t know anything about Gandhi!” Feifer exclaims, embarrassed by his ignorance.
“That doesn’t matter,” Hatch tells him. The real point of the exercise, he explains, is this: Despite not having a clue of what the answer is, Feifer picked a narrow range — just 15 years. He could have instead said, “Gandhi was born between 1600 and 1980,” which would have been technically correct. But Feifer was overconfident; he wasn’t willing to consider (or reveal) the things he didn’t know, and as a result, he needlessly narrowed his options and therefore his chance of being accurate. That, Hatch says, is why overconfidence leads to bad predictions.
Outside academia, in a culture where people want definitive answers, terms like “90 percent confidence” and “67 percent probable” may seem useless or arcane. But the world isn’t binary, argues Hatch; it is filled with uncertainty. “So rather than dealing with that uncertainty by guesses, or going from your gut, instead hold yourself accountable by using numbers,” he says. Why? The process forces you to sharpen your thinking, cast for good information, and pay attention to nuance—all of which leads to making better decisions. This requires a mind shift. If you only feel 67 percent confident in your answer, you’re acknowledging some failure up front—and creating a window for yourself to learn more.
That’s why, when Good Judgment’s superforecasters are trying to answer a client’s question, they push outside their own bubble and take time to understand other people’s experiences and opinions (and also share their own). Scattered around the world, many of them are retired or doing this work on the side, and they often bring in unusual bits of data from wherever they are. Among the ranks is Paul Theron, an investment manager in South Africa, who once tracked down a spokesperson for the Muslim Brotherhood to get inside scoop on a question about Egypt. Another superforecaster, JuliAnn Blam, is an American who has lived in China producing theme park attractions with her company; when answering questions about that country, she goes through her back channels. “Not everything is in the press,” she says. “Sometimes you just have to listen to locals—and even then, you have to read between the lines because in China they can’t really tell you.”
Often just flipping a question (from “Is it a good time to do a capital raise?” to “Is it a bad time to do a capital raise?”) can help you see the fuller picture. Another key practice is frequently tweaking your forecast as new information comes in. “The strongest predictor of rising into the ranks of superforecasters is perpetual beta, the degree to which one is committed to belief updating and self-improvement,” Tetlock writes in his book, Superforecasting. “It is roughly three times as powerful a predictor as its closest rival, intelligence.”
Back at the workshop I’m taking, Marc Koehler, a former U.S. diplomat who is Good Judgment’s senior VP, asks us to imagine being at Prince Harry and Meghan Markle’s royal wedding. He’s setting up another core tactic of good predictions: Start with the base rate.
With Koehler’s guidance, we imagine someone at the wedding asking us what the probability is that the happy bride and groom will stay married. We think 100 percent, right? The look in the couple’s eyes is unmistakable, and there’s Charlotte with the flowers; we can already see their kids. Koehler stops us there. Our minds love a good story, he says, but that’s another thing that can derail a forecast. Instead, we should go straight to the divorce rate, which in the U.S. has been reported as high as 50 percent. “It does matter who Prince Harry is and who Meghan Markle is. It does matter that they’ve left Buckingham Palace. All I’m saying is consider that second,” says Koehler. “We know that people who start with the outside view or the base rate, and then move to consider the particulars of the case, are going to be about 10 percent more accurate.”
After the workshop, I challenge this point with Tetlock, since he’s the one who has done the science. Sure, starting with the base rate makes logical sense, but doesn’t it discourage risk? Nobody would get married if they thought that way — and for that matter, few would start a business considering the statistics on how many startups fail. I suggest that if you’re an entrepreneur, you may need to ignore these things — and to be overconfident! — in order to start the ambitious projects most people predict will fail. “Great point,” Tetlock says. “Success requires inspiring people, and it is hard to inspire people with a lot of ‘howevers’ in your pep talks. Overconfidence is linked to charisma. It is also linked to disaster. So think like a well-calibrated superforecaster in private — and project confidence in public.”
“It’s been a busy morning,” says Hatch at his desk in Good Judgment’s New York office via Zoom this fall, as he waves around the day’s undone New York Times crossword puzzle. He isn’t doing it just for fun. Pattern recognition is an important skill for superforecasters, so Hatch does a daily crossword, sometimes two, to stay up to speed. “Detecting the patterns and seeing what the picture might be before everybody else,” he says, “is ultimately what forecasting is about.”
But building this company has tested all of Hatch’s superforecasting skills and more.
How do you monetize the ability to find, train, and coordinate brilliant minds at seeing the future? Teaching their prediction tactics seemed logical, so Good Judgment started workshops for both individuals and companies. It also created Good Judgment Open, a free site for anyone who wants to mingle with superforecasters and try their hand at predictions, which has served as a recruiting ground. The much bigger question has been how to leverage the actual predictions from its network of superforecasters, now about 170 active members, in ways clients would actually pay for. “And this is where we’ve had our fair share of bloopers,” says Hatch.
As it turns out, many potential clients in the financial, legal, and government worlds already believe they have the top experts making the best predictions. What’s to be gained by hiring a bunch of amateurs picking away on the internet? And the truth is, superforecasters are not infallible. The group, for example, had around an 80 percent probability that Clinton would win in 2016. But overall, the superforecasters continue to beat the competition in tournaments held by the government. And Good Judgment was correct and early in its predictions about COVID-19—which has proven instructive.
The first hint of something COVID-like appeared in September 2019 at a workshop for a Canadian financial firm. Participants were practicing a “pre-mortem” — another critical forecasting practice intended to anticipate surprises. Say you think an event is going to go one way. Before making your prediction, step back and tell the story about why it went the other way. The Canadians were doing that, trying to imagine an unusual or freak event that would change their forecast on China’s economy, and someone came up with a SARS-like epidemic. “When COVID started showing up in the headlines,” says Hatch, “they were better equipped to deal with it.” And so was Good Judgment.
In January 2020, thanks to early chatter on Good Judgment’s platform about COVID-19, Blam (the superforecaster who has done a lot of work in China) turned down another lucrative three-year theme park job in the country. “We all knew it was going to be bad,” she says. “And I didn’t want to get stuck over there.” Hatch and his team also acted quickly, realizing that people were suddenly desperate for exactly the kind of insight Good Judgment could provide. The company created a public dashboard and put its elite team to work forecasting on everything from caseload levels to vaccine timing. Soon financial firms like Goldman Sachs and T. Rowe Price started referencing its forecasts in their work. “It put us on Broadway,” says Hatch, “even if we were in a small theater.”
Using that momentum, this spring Good Judgment launched FutureFirst, a subscription service for $20,000 a year that lets members vote on questions they want forecasts on every week, with customized options for a premium. By fall the product was already generating a third of the company’s total revenue, according to Hatch. Meanwhile, he has a lot of other ideas, including commercializing Delphineo — the collaboration platform it built for the workshops, which, naturally, was named by the crowd using the tool itself. For every major project, Hatch asks his team: “What would success look like? And what would failure look like?” Then he attaches probabilities to each, a process that primes him for signs of risk and opportunity ahead.
“That’s what a lot of this is about in my own head,” he says. “Let’s avoid surprises, good or bad.”
As Good Judgment grows, it must predict not only what will happen with its own business but also the future of the forecasting business at large. Because that will change, too.
“Machines already dominate prediction in all Big Data settings but struggle as the data get sparser and require more qualitative analysis,” says Tetlock, the man whose research initially launched Good Judgment, and who still enjoys engaging on the more challenging client cases while continuing his work at Wharton. “Human-machine hybrids will be the future for the types of problems we deal with in the next 20 years. For now, expect the stale status hierarchies to continue stonewalling efforts to introduce scorekeeping, especially in government, but in many businesses as well.”
One business, however, is bucking that trend. And it could signal good things for both Tetlock and Hatch.
David Barrosse is the founder and CEO of Capstone, a global policy analysis firm for corporate and investor clients. Back in 2015, when he picked up a copy of Tetlock’s Superforecasting, he was, to his surprise, impressed. “It has always stuck out to me that in the global securities research industry, which is a multibillion-dollar industry and covers every investment bank all over the globe, not one of them focuses on the accuracy of their predictions,” says Barrosse. “They don’t track it. They don’t talk about it. And 99 percent of them will not put a number on it.”
At first he passed the book around to his firm’s employees and sent five or six analysts to Good Judgment’s workshops to get the ideas in the bloodstream. But then he wondered what it would look like to radically change Capstone’s predictions systems, both inside the company and for its clients. To explore that, last year he hired Good Judgment to come in and design a training for all the analysts. “There was a lot of resistance and trepidation at first,” says Cordell Eddings, Capstone’s supervisory analyst, who is heading up the project. “But the training helped give people the tools to do it right. And across the firm, people ended up buying in wholeheartedly.”
It’s been a little more delicate to convince clients that they should change their prediction systems, “because they just think it’s utter bullshit,” says Barrosse. “Like, ‘How can you possibly know that it’s 67 percent?’ But it gives us an opportunity to talk about, ‘Maybe we started out with a 40 percent prediction and updated it so many times that it got to 67.’ And we debated internally, ‘Is this going to make us look like we’re bending with the wind?’ But even that is an opportunity to have a conversation with the client where we can say, ‘We’re telling you how things are changing based on information that’s coming in real time. We’re doing the homework, giving you a realistic dynamic prediction.’ Even if we’re not always right, it’s better to tell them, ‘We will be with you and stick our necks out and give you a probability in a distinct timeframe.’ ”
Barrosse now sees this as the competitive advantage of his company. And he is much more than 67 percent sure of it.