I don’t know that we’ve ever lived through an event that will produce so many different models and forecasts for its eventual outcome. The only thing close is presidential elections, but elections don’t quite compare. For one thing, the range of possible outcomes is much smaller. With coronavirus, we could see anything from tens of thousands of deaths to a few million (although the measures we’re taking right now make the latter less likely by the moment). For another, much of the data we process during election campaigns involves simple polling, not forecasts. The RCP average, for instance, aggregates polling data to produce a snapshot of the race on a particular day, but that number doesn’t tell us how the electorate might feel in early November. Some outfits like FiveThirtyEight do produce forecasting models of the result this fall, but that’s not equivalent to what we’re living through with COVID-19. An election is a discrete event that happens on a designated day, leaving us blind until Election Day as to whether FiveThirtyEight’s model is accurate. The coronavirus epidemic is a continuously unfolding event with daily benchmarks for the virus’s spread. We don’t need to wait for some arbitrary date to see if the projections are correct. We can check now.
One professor at Wharton did check, using data collected from a survey of epidemiological experts conducted two weeks ago and published at (where else?) FiveThirtyEight. The question was asked on March 16-17: How many U.S. cases of COVID-19 will the CDC report on March 29?
Their estimates were … not great. We all want to believe that the experts are grossly overestimating the deadliness of coronavirus, which is why there’s so much interest in Oxford University’s model of the disease, but in this case they missed badly the other way. By a factor of six.
Overconfidence is a pernicious bias, even in experts. It's astounding how few experts' confidence intervals included the correct estimate of #COVID19 infections in the US by 3/29 when forecasting for just two weeks in the future. (of course, non-expert estimates are even worse) pic.twitter.com/pa6oMDp2wV
— Katherine Milkman (@katy_milkman) March 30, 2020
Maybe the error there has less to do with underestimating how many cases there are and more to do with underestimating how quickly U.S. testing capacity would get up to speed. On March 17, when this survey was taken, the U.S. was just beginning to ramp up from 10-15,000 tests per day to much bigger numbers. The experts may have assumed that testing would still be progressing slowly by March 29, which would necessarily mean a small number of known cases by then. In reality, by March 29, the U.S. was conducting 100,000 tests a day.
When the experts were asked a week later, on March 23-24, to once again forecast the number of cases as of March 29, they came much closer to the mark.
The consensus estimate was 117,000. In reality there were 139,000 known cases. They were still lowballing the spread — or our ability to detect the extent of the spread — but by not nearly as much. All of which provides two useful lessons. One: At the very beginning of this outbreak, there are destined to be some bad misses on forecasts just because there’s so little data available yet. Even the pros can miss a projection just two weeks out by 80 percent due to the uncertainty about the extent of community spread within the U.S. and labs’ ability to scale up in testing as quickly as they did. Two: With each day that passes and more data that accrues, the forecasts will get better and will begin to coalesce. Death estimates right now range from low-ish five figures to seven figures because the world still has no concrete idea how many asymptomatic cases there are and how quickly the virus is spreading. Once antibody testing gives us that answer, the range will shrink drastically. And with each new therapeutic development, the higher estimate in the new, narrower range will dip lower.
Which is to say, although the past month has felt like a decade, I think the key takeaway from the forecasting miss above is just how early it is in this process. Even the pros are still fumbling in the dark.
Speaking of the many great unknowns about COVID-19, here’s the latest estimate of fatality rates from a new study published in The Lancet, based on Chinese data:
Overall, the fatality rate is expected to be 0.66 percent, around six times worse than the average flu. On the other hand, as you can see, the likelihood of death varies meaningfully by age group. For those under 40, it’s no more deadly than flu is to the general population. For those over 40, the risk rises. On the other hand, 0.66 percent is higher than the fatality rates in various European countries estimated by the Imperial College, which I wrote about yesterday. And if you’re of the opinion that China’s data is garbage, that they’re cooking the books to hide the fact that there are many more infections and deaths than Beijing wants the world to know, then the data in the table above goes right out the window. For a clue about the fatality rate, we’ll just have to wait for American or European scientists to provide reliable data.
In lieu of an exit question, read FiveThirtyEight’s timely explainer on why it’s so damned hard to produce a reliable model for COVID-19. The short version: We lack good data for even *one* of the many variables that would be needed to make forecasts. We don’t know the fatality rate, we don’t know the infection rate, we don’t even know how many people are dead of the virus so far. It’s tantamount to trying to solve the equation “x + y = z” where you don’t know what x, y, or z are.