Abstract
Dozens of coronavirus disease 2019 (COVID-19) forecasting models have been created, however, little information exists on their performance. Here we examined the performance of nine commonly-used COVID-19 forecasting models, as well as equal- and performance-weighted ensembles, based on their predictive accuracy and precision, and their probabilistic ‘statistical accuracy (aka calibration)’ and ‘information’ scores (measures commonly employed in the evaluation of expert judgment) (Cooke, 1991). Data on observed COVID-19 mortality in eight states, selected to reflect differences in racial demographics and COVID-19 case rates, over eight weeks in the summer of 2020 and eight weeks in the winter of 2021, provided the basis for evaluating model forecasts and exploring the stability/robustness of the results. Two models exhibited superior performance with both predictive and probabilistic measures during both pandemic phases. Models that performed poorly reflected ‘overconfidence’ with tight forecast distributions. Models also systematically under-predicted mortality when cases were rising and over-predicted when cases were falling. Performance-weighted ensembles consistently outperformed the equal-weighted ensemble, with the Cooke’s Classical Model-weighted ensemble outperforming the predictive-performance-weighted ensemble. Model performance depended on the time-frame of interest and racial composition, with better predictive forecasts in the near-term and for states with relatively high proportions of non-Hispanic Blacks. Performance also depended on case rate, with better predictive forecasts for states with relatively low case rates but better probabilistic forecasts for states with relatively high case rates. Both predictive and probabilistic performance are important, and both deserve consideration by model developers and those interested in using these models to inform policy.
Significance Statement Coronavirus disease 2019 (COVID-19) forecasting models can provide critical information for decision-making; however, there has been little published information on their performance. We examined the COVID-19 mortality forecasting performance of nine commonly-used and oft-cited models, as well as density-averaged equal- and performance-weighted ensembles of these models, during two phases of the pandemic. Only two of the nine models demonstrated superior predictive and probabilistic performance during both phases of the pandemic. Most of the models exhibited overconfidence, with overly-narrow forecast intervals. Performance-weighted ensembles demonstrated some advantages over the equal-weighted ensemble, primarily in probabilistic performance. As might be anticipated, predictions of near term mortality (next week) were better than predictions of mortality four weeks later.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Kyle J. Colonna's involvement was funded by the Harvard Population Health Sciences PhD scholarship. Roger M. Cooke's involvement was pro bono. John S. Evans' involvement was funded by the Department of Environmental Health and the Harvard Cyprus Initiative at the T.F. Chan School of Public Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
No IRB/oversight body approval or exemption was necessary as the data is publicly available.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Acquisition, analysis, or interpretation of data: Colonna, Cooke
Drafting of the manuscript: Colonna, Evans
Critical revision of the manuscript for important intellectual content: All authors
Supervision: Evans
Competing Interest Statement: The authors declare they have no competing interests that might be perceived to influence the results and/or discussion reported in this manuscript. There has also been no prior discussion with an editor.
Author Roger M. Cooke added; Additional results; Figure 1 revised; Additional tables for Supplemental Information; rewritten manuscript and supplemental information text.
Data Availability
Observed state COVID-19 mortality and case data was gathered from the Centers for Disease Control and Prevention (CDC) (30). State population and racial composition data was collected from one-year estimates from the Census Bureau's 2018 and 2019 American Community Survey (ACS) (31). Tables S3 & S4 in the SI appendix provide the racial composition statistics and case rate data. Model forecasting data was gathered from the COVID-19 Forecast Hub's publicly available structured data storage repository on GitHub (8). Tables S5 & S6 in the SI appendix provide the model and ensemble predictions, their uncertainty distributions, and the subsequent observations of COVID-19 mortality.