Abstract
We analyze an ensemble of n-sub-epidemic modeling for forecasting the trajectory of epidemics and pandemics. These ensemble modeling approaches, and models that integrate sub-epidemics to capture complex temporal dynamics, have demonstrated powerful forecasting capability. This modeling framework can characterize complex epidemic patterns, including plateaus, epidemic resurgences, and epidemic waves characterized by multiple peaks of different sizes. We systematically assess their calibration and short-term forecasting performance in short-term forecasts for the COVID-19 pandemic in the USA from late April 2020 to late February 2022. We compare their performance with two commonly used statistical ARIMA models. The best fit sub-epidemic model and three ensemble models constructed using the top-ranking sub-epidemic models consistently outperformed the ARIMA models in terms of the weighted interval score (WIS) and the coverage of the 95% prediction interval across the 10-, 20-, and 30-day short-term forecasts. In the 30-day forecasts, the average WIS ranged from 377.6 to 421.3 for the sub-epidemic models, whereas it ranged from 439.29 to 767.05 for the ARIMA models. Across 98 short-term forecasts, the ensemble model incorporating the top four ranking sub-epidemic models (Ensemble(4)) outperformed the (log) ARIMA model 66.3% of the time, and the ARIMA model 69.4% of the time in 30-day ahead forecasts in terms of the WIS. Ensemble(4) consistently yielded the best performance in terms of the metrics that account for the uncertainty of the predictions. This framework could be readily applied to investigate the spread of epidemics and pandemics beyond COVID-19, as well as other dynamic growth processes found in nature and society that would benefit from short-term predictions.
Summary The COVID-19 pandemic has highlighted the urgent need to develop reliable tools to forecast the trajectory of epidemics and pandemics in near real-time. We describe and apply an ensemble n-sub-epidemic modeling framework for forecasting the trajectory of epidemics and pandemics. We systematically assess its calibration and short-term forecasting performance in weekly 10-30 days ahead forecasts for the COVID-19 pandemic in the USA from late April 2020 to late February 2022 and compare its performance with two different statistical ARIMA models. This framework demonstrated reliable forecasting performance and substantially outcompeted the ARIMA models. The forecasting performance was consistently best for the ensemble sub-epidemic models incorporating a higher number of top-ranking sub-epidemic models. The ensemble model incorporating the top four ranking sub-epidemic models consistently yielded the best performance, particularly in terms of the coverage rate of the 95% prediction interval and the weighted interval score. This framework can be applied to forecast other growth processes found in nature and society including the spread of information through social media.
Introduction
The coronavirus disease 2019 (COVID-19) pandemic has amplified the critical need for reliable tools to forecast the trajectory of epidemics and pandemics in near real-time. During the early stages of the COVID-19 pandemic, multiple modeling teams embarked on the challenging task of producing short-term forecasts of the course of the COVID-19 pandemic in terms of the trajectory for the number of new cases, hospitalizations, or deaths (e.g., [1-10]). Soon after the epidemic started, our research team published short-term forecasts of the pandemic during the early outbreaks of the novel coronavirus in China [4] and subsequently focused on producing weekly forecasts for the USA [11]. In a related effort, the US COVID-19 Forecasting Hub brought together multiple research teams to synthesize weekly short-term forecasts of the COVID-19 pandemic in the USA [12]. It is time to systematically and rigorously evaluate the forecasting performance of these different pandemic forecasting efforts and document the lessons learned to continue advancing our understanding of epidemic forecasting.
Ensemble modeling approaches and models that integrate sub-epidemics to capture complex temporal dynamics have demonstrated powerful forecasting capability (e.g., [13] [14-17]). In prior work, we developed a sub-epidemic modeling framework to characterize and improve forecasting accuracy during complex epidemic waves [13]. This mathematical framework characterizes epidemic curves by aggregating multiple asynchronous sub-epidemics and outperforms simpler growth models at providing short-term forecasts of various infectious disease outbreaks [13, 18]. It is possible to model sub-epidemics associated with transmission chains that are asynchronously triggered and progress somewhat independently from the other sub-epidemics. This framework supports a family of sub-epidemic models that yield similar fits to the calibration data, but their corresponding forecasts could produce diverging trajectories.
Ensemble modeling aims to boost forecasting performance by systematically integrating the predictive accuracy tied to individual models [16, 19-21]. Past work indicates that multimodel ensemble approaches are powerful forecasting tools that frequently outperform individual models in epidemic forecasts [14, 15, 22-27]. We extend prior sub-epidemic modeling work and propose an ensemble sub-epidemic modeling framework for forecasting the trajectory of epidemics and pandemics. In this model, the sub-epidemics can start at different time points and may follow different growth rates, scaling of growth, and final sizes. Hence, this ensemble modeling framework can characterize more diverse epidemic patterns which were impossible to capture by earlier sub-epidemic models, including plateaus, epidemic resurgences, and epidemic waves characterized by multiple peaks of different sizes.
Here, we systematically assess the calibration and short-term forecasting performance in weekly 10-30 day forecasts in the context of the COVID-19 pandemic in the USA from late April 2020 to late February 2022, including the Omicron-dominated wave. We then compare the performance of the ensemble modeling framework with a set of Autoregressive Integrated Moving Average (ARIMA) models, following the EPIFORGE 2020 guidelines to report epidemic forecasts [28]. Our extended ensemble modeling framework substantially outperforms individual top-ranking sub-epidemic models and the ARIMA models based on standard performance metrics that account for the uncertainty of the predictions.
Data
We used daily COVID-19 deaths reported in the USA from the publicly available data tracking system of the Johns Hopkins Center for Systems Science and Engineering (CSSE) from 27 February 2020 to 30 March 2022 [29]. The data is updated on the CSSE webpage once every day at 23:59 (UTC) and is read from the daily case report. The data is also publicly available in the GitHub repository [30].
n-sub-epidemic model
We model epidemic trajectories comprised of one or more overlapping and asynchronous sub-epidemics. That is, the sub-epidemics are used as building blocks to characterize more complex epidemic trajectories. The mathematical equation for the sub-epidemic building block is the 3-parameter generalized-logistic growth model (GLM), which has performed well in short-term forecasts of single outbreak trajectories for different infectious diseases, including COVID-19 [31-33]. This model is given by the following differential equation: where describes the curve of daily deaths over time t. The cumulative curve at time t is given by C(t), While r is a positive parameter denoting the growth rate per unit of time, K0 is the final outbreak size, and p ∈ [1,0] is the “scaling of growth” parameter which allows the model to capture early sub-exponential and exponential growth patterns. If p = 0, this equation describes a constant number of new deaths over time, while p = 1 indicates that the early growth phase is exponential. Intermediate values of p (0 < p < 1) describe early sub-exponential (e.g., polynomial) growth dynamics.
An n-sub-epidemic trajectory comprises n overlapping sub-epidemics and is given by the following system of coupled differential equations: Where Ci(t) tracks the cumulative number of deaths for sub-epidemic i, and the parameters that characterize the shape of the ith sub-epidemic are given by , for i = 1, …, n. Thus, the 1-sub-epidemic model is equivalent to the generalized growth model described above. When n > 1, we model the onset timing of the (i + 1) th sub-epidemic, where (i + 1) ≤ n, by employing an indicator variable given by Ai(t)so that the (i + 1) th sub-epidemic is triggered when the cumulative curve of the ith sub-epidemic exceeds cthr.
The (i + 1) th sub-epidemic is only triggered when . Hence, we have: where Ai(t)=1 for the first sub-epdemic. Hence, the total number of parameters that are needed to model an n-sub-epidemic trajectory is given by 3n +1. The initial number of deaths is given by C1(0) = I0, where I0 is the initial number of deaths in the observed data. The cumulative curve of the n-sub-epidemic trajectory is given by: The n-sub-epidemic wave model can characterize diverse epidemic patterns, including epidemic plateaus where the epidemic stabilizes at a high level for an extended period, epidemic resurgences where the number of cases increases again after a low incidence period, and epidemic waves characterized by multiple peaks.
Parameter estimation
To reduce the noise in the original data due to artificial reasons such as the weekend effects, we use the 7-day moving average of daily death series to fit the n-sub-epidemic model. Let denote the smoothed daily COVID-19 death series of the epidemic trajectory based on the moving average. Here, tj are the time points for the time series data, nd is the number of observations, and each , is the average of the death counts at the neighboring seven days (tj−3, tj−1, tj−1, tj+1, tj+2, tj+3). We will use this smoothed data to estimate a total of 3n +1 model parameters, namely . Let f(t,Θ)denote the expected curve of new COVID-19 deaths of the epidemic’s trajectory. We can estimate model parameters by fitting the model solution to the observed data via nonlinear least squares [34] or via maximum likelihood estimation assuming a specific error structure [35]. For nonlinear least squares, this is achieved by searching for the set of parameters that minimizes the sum of squared differences between the observed data and the model mean corresponds to f(t,Θ). That is, is estimated by .
This parameter estimation method weights each of the data points equally and does not require a specific distributional assumption for yt, except for the first moment E[yt] = f(ti; Θ). That is, the mean of the observed data at time t is equivalent to the expected count (e.g., number of deaths) denoted by f(t,Θ) at time t [36]. This method yields asymptotically unbiased point estimates regardless of any misspecification of the variance-covariance error structure. Hence, the estimated model mean yields the best fit to observed data in terms of squared L2 norm. In Matlab, we can use the fmincon function to set the optimization problem.
To quantify parameter uncertainty, we follow a parametric bootstrapping approach which allows the computation of standard errors and related statistics in the absence of closed-form formulas [37]. We generate B bootstrap samples from the best-fit model , with an assumed error structure, to quantify the uncertainty of the parameter estimates and construct confidence intervals. Typically, the error structure in the data is modelled using a probability model such as the Poisson or negative binomial distribution. Because the time-series data we are fitting to involve large counts, the Poisson or negative binomial distribution can be well approximated by a normal distribution for large numbers. So, using the best-fit model , we generate B-times replicated simulated datasets of size nd, where the observation at time tj is sampled from a normal distribution with mean and variance . Next, we refit the model to each of the B simulated datasets to re-estimate parameters for each. The new parameter estimates for each realization are denoted by where b = 1,2,…,B. Using the sets of re-estimated parameters , it is possible to characterize the empirical distribution of each estimate, calculate the variance, and construct confidence intervals for each parameter. The resulting uncertainty around the model fit can similarly be obtained from .
Model-based forecasts with quantified uncertainty
Forecasting the model , h days ahead provides an estimate for . The uncertainty of the forecasted value can be obtained using the previously described parametric bootstrap method. Let denote the forecasted value of the current state of the system propagated by a horizon of h time units, where denotes the estimation of parameter set Θ from the bth bootstrap sample. We can use these values to calculate the bootstrap variance as the measure of the uncertainty of the forecasts and use the 2.5% and 97.5% percentiles to construct the 95% prediction intervals (PI).
Model selection
We considered a set of n-sub-epidemic models where 1≤n ≤2 and ranked them from best to worst according to the AICc which is given by [38, 39]: where is the number of model parameters, and nd is the number of data points. The AICc for the parameter estimation from the nonlinear least-squares fit, which implicitly assumes normal distribution for error.
We selected the four top ranking sub-epidemic models for further analyses. We used them to construct three ensemble sub-epidemic models, which we refer to as: Ensemble(2), Ensemble(3), and Ensemble(4). The next section describes the process of constructing these ensemble models from the top-ranking sub-epidemic models.
Constructing Ensemble Models from top-ranking models
Ensemble models that combine the strength of multiple models may exhibit significantly enhanced predictive performance (e.g., [14-17]). Here we generate ensemble models from the weighted combination of the highest-ranking sub-epidemic models as deemed by the for the i-th model where and i = 1, …, I. An ensemble derived from the top-ranking I models is denoted by Ensemble(I) and illustrated in Figure 1. Thus, Ensemble(2) and Ensemble(3) refer to the ensemble models generated from the weighted combination of the top-ranking 2 and 3 models, respectively. We compute the weight wi for the i-th model, i = 1, …, I, where ∑wi = 1 as follows: and hence wI ≤ … ≤ w1.
The estimated mean curve of daily COVID-19 deaths for the Ensemble(I) model is: where given the training data, denotes the set of estimated parameters, and denotes the estimated mean curve of daily COVID-19 deaths, for the i-th model. Accordingly, we compute the weighted average and sample the bootstrap realizations of the forecasts for each model to construct the 95% CI or PI using the 2.5% and 97.5% quantiles [16]. Our MATLAB (The Mathworks, Inc) code for model fitting and forecasting is publicly available in the GitHub repository [30].
As a sensitivity analysis, we also investigated how the ensemble sub-epidemic models performed when the ensemble weights were proportional to the relative likelihood (l) rather than the reciprocal of the AICc. Let AICmin denote the minimum AIC from the set of models. The relative likelihood of model i is given by [40]. We compute the weight wi for the i-th model where ∑wi = 1 as follows: and hence wI ≤ … ≤ w1.
Auto-regressive integrated moving average models (ARIMA)
We also generated short-term predictions of the pandemic trajectory using ARIMA models to compare their performance with that of the sub-epidemic models. ARIMA models have been frequently employed to forecast trends in finance [41-43] and weather [44-46]. The ARIMA (p, d, q) process is given by or equivalently as , where p is the order of the AR model, d is the degree of differencing, q is the order of the MA model, {ϵt} is a white noise process with mean 0 and variance σ2, and B denotes the backshift operator. The p-order polynomial ϕ(z) = 1 − ϕ1z − … − ϕpZp and the q-order polynomial d θ(z) = 1 − θ1z − … − θ1zq are assumed to have no roots inside the unit circle to ensure causality and invertibility. The constant C = µ(1 − ϕ1 − … − ϕp), and µ is the mean of (1 − B) dyt. When d=0, µ is the mean of yt.
The auto.arima function in the R package “forecast” is used to select orders and build the model [47]. First, the degree of differencing 0 ≤ d ≤ 2 is selected based on successive KPSS unit-root tests [48], which test the data for a unit root; if the test result is significant, the differenced data is tested for a unit root; and this procedure is repeated until the first insignificant result is obtained. Then given d, the orders p and q are selected based on the AICc for the d-times differenced data. For d=0 or d=1, a constant will be included if it improves the AICc value; for d>1, the constant µ is fixed at 0 to avoid the model having a quadratic or higher order trend, which is dangerous when forecasting. The final model is fitted using the maximum likelihood estimation.
To guarantee the forecasted values and prediction intervals are above zero, we take the following two strategies. In the first one, we conduct the ARIMA order selection and model fitting using the log-transformed data. Then we take the exponential of the forecasted values and the PI bounds to predict the incident death counts and get the PIs. We refer to this approach as the (log) ARIMA throughout the manuscript. In the second case, the negative values are set as zero. Then, it is possible that the actual coverage probability of such PIs can be smaller than the nominal value (95%). We refer to this approach as ARIMA throughout the manuscript.
Forecasting strategy and performance metrics
We conducted short-term forecasts using the top-ranking n-sub-epidemic model (1 ≤ n ≤ 2) and three ensemble models constructed with the top-ranking sub-epidemic models namely Ensemble(2), Ensemble(3), and Ensemble(4). For comparison, we also generated short-term forecasts using the previously described ARIMA models. Overall, we conducted 588 forecasts across models.
Using a 90-day calibration period for each model, we conducted a total of 98 weekly sequential 10-day, 20-day and 30-day forecasts from 20 April 2020 to 28 February 2022, spanning five pandemic waves. This range of forecasting horizons is comparable to that investigated in prior COVID-19 forecasting studies [49]. This period covers the latter part of the early spring wave, a summer wave in 2020, a fall-winter 2020/2021 wave, the summer-fall wave in 2021, and the winter 2022 wave.
To assess the forecasting performance, we used four performance metrics: the mean absolute error (MAE), the mean squared error (MSE), the coverage of the 95% prediction intervals, and the mean interval score (MIS) [50]. The mean absolute error (MAE) is given by: Here is the time series of the original death counts (unsmoothed) of the h-time units ahead forecasts, where th are the time points of the time series data [51]. Similarly, the mean squared error (MSE) is given by: We also employed two metrics that account for prediction uncertainty: the coverage rate of the 95% PI e.g., the proportion of the observations that fall within the 95% PI as well as the weighted interval score (WIS) [50, 52] which is a proper score. The WIS and the coverage rate of the 95% PIs take into account the uncertainty of the predictions, whereas the MAE and MSE only assess the closeness of the mean trajectory of the epidemic to the observations [53].
Recent epidemic forecasting studies have embraced the Interval Score (IS) for quantifying model forecasting performance [18, 24, 49, 54]. The WIS provides quantiles of predictive forecast distribution by combining a set of ISs for probabilistic forecasts. An IS is a simple proper score that requires only a central (1−α)×100% PI [50] and is described as In this equation 1 refers to the indicator function, meaning that 1(y <, l) =1 if y < l and 0 otherwise. The terms l and u represent the and quantiles of the forecast F. The IS consists of three distinct quantities:
The sharpness of F, given by the width u − l of the central (1− α) x 100% PI.
A penalty term for the observations that fall below the lower end point l of the (1− α) x 100% PI. This penalty term is directly proportional to the distance between y and the lower end l of the PI. The strength of the penalty depends on the level α.
An analogous penalty term for the observations falling above the upper limit u of the PI.
To provide more detailed and accurate information on the entire predictive distribution, we report several central PIs at different levels (1− α 1)< (1− α2) < …< (1− αK)along with the predictive median, m, which can be seen as a central prediction interval at level 1− α 0 → 0.
This is referred to as the WIS, and it can be evaluated as follows: where, for K = 1,2,….K and . Hence, WIS can be interpreted as a measure of how close the entire distribution is to the observation in units on the scale of the observed data [10, 55].
Results
Quality of the sub-epidemic model fits
The best fit sub-epidemic model and three ensemble models constructed using the top-ranking sub-epidemic models (Ensemble(2), Ensemble(3), Ensemble(4)) yielded similar quality fits to 98 sequential weekly calibration periods from 20-April-2020 to 28-February-2022 (Figure 2, Table 1). For instance, the average WIS was ∼247 with little variation across models (Table 1). The coverage rate of the 95% PIs averaged 97% and ranged from 91% to 100% during the study period. Moreover, all performance metrics displayed similar temporal trends (Figure 2).
Representative fits of the top-ranking sub-epidemic models to the daily curve of COVID-19 deaths in the USA from 27-Feb-2020 to 20-April-2020 are shown in Figure 3. Although these sub-epidemic models fit the data well, each of them results from the aggregation of two sub-epidemics characterized by different growth rates, scaling of growth, and outbreak sizes as shown in Figure 4.
Short-term forecasting performance
The best fit sub-epidemic model and three ensemble models constructed using the top-ranking sub-epidemic models (Ensemble(2), Ensemble(3), Ensemble(4)) consistently outperformed the ARIMA models in terms of the weighted interval score (WIS) and the coverage of the 95% prediction interval across the 10, 20 and 30 day short-term forecasts (Table 2). For instance, for 30-day forecasts, the average WIS ranged from 377.6 to 421.3 for the sub-epidemic models, whereas, it ranged from 439.29 to 767.05 for the ARIMA models. Across 98 short-term forecasts, the Ensemble(4) outperformed the (log) ARIMA model 66.3% of the time and the ARIMA model 69.4% of the time in 30-day ahead forecasts in terms of the WIS (Figure 5 & Figure 6). Similarly, the coverage of the 95% PI ranged from 82.2% to 88.2% for the sub-epidemic models, whereas it ranged from 58% to 60.3% for the ARIMA models in 30-day forecasts. In terms of the coverage of the 95% PI, the Ensemble(4) outperformed the (log) ARIMA model 89.8% of the time and the ARIMA model 91.8% of the time (Figure 5 & Figure Forecasting performance generally improved as the number of top-ranking sub-epidemic models included in the ensemble increased (Table 1). The Ensemble(4) model consistently yielded the best performance in terms of the metrics that account for the uncertainty of the predictions.
In terms of the metrics based on point estimate information, the ARIMA models showed lower overall MSE or MAE compared to the sub-epidemic models in 10 and 20-day forecasts, but the Ensemble(4) achieved the best forecasting performance in 30-day forecasts (Table 2). Overall, the forecasting performance deteriorated at longer forecasting horizons across all models considered in our study.
Representative 30-day forecasts of the top-ranking sub-epidemic models to the daily curve of COVID-19 deaths in the USA from 20-April-2020 to 20-May-2022 are shown in Figure 7. The corresponding sub-epidemic profiles of the forecasts are shown in Figure 8. These models support forecasts with diverging trajectories even though they yield similar fits to the calibration period. For instance, the top-ranked sub-epidemic model predicts a decline in the mortality curve, whereas the second-ranked model predicts a stable pattern during the next 30 days (Figure The corresponding forecasts generated from three ensemble models (Ensemble(2), Ensemble(3), Ensemble(4)) built from the top-ranking sub-epidemic models are shown in Figure 9. The individual 30-day ahead predictions across 98 forecasting periods generated by the Ensemble(4) and the ARIMA models are available in the GitHub repository [30].
In sensitivity analyses, defining ensemble weights as proportional to the relative likelihood did not achieve better performance relative to the ensemble models generated using weights proportional to the reciprocal of the AICc. Moreover, the rank of the ensemble models was not affected by the type of weights (Table 3).
Discussion
Our ensemble sub-epidemic modeling approach outperformed individual top-ranking sub-epidemic models and a set of ARIMA models in weekly short-term forecasts covering the national trajectory of the COVID-19 pandemic in the USA from the early growth phase up until the Omicron-dominated wave. This framework has demonstrated reliable forecasting performance across different pandemic phases from the early growth phase characterized by exponential or sub-exponential growth dynamics to plateaus and new disease surges driven by the relaxation of social distancing policies or the emergence of new variants. Importantly, we found that forecasting performance consistently improved for the ensemble sub-epidemic models that incorporated a higher number of top-ranking sub-epidemic models. The ensemble model incorporating the top four ranking sub-epidemic models consistently yielded the best performance, particularly in terms of the coverage rate of the 95% prediction interval and the weighted interval score.
Our findings support the power of ensemble modeling approaches (e.g.,[14-17]). Our ensemble modeling framework derived from a family of sub-epidemic models demonstrated improved performance as the number of top-ranking sub-epidemic models included in the ensemble increased. Prior studies have documented the potential of ensemble models to enhance forecasting performance during multi-epidemic periods [14]. For instance, in the context of influenza, one study utilized “weighted density ensembles” for predicting timing and severity metrics and found that the performance of the ensemble model was comparable to that of the top individual model, albeit the ensemble’s forecasts were more stable across influenza seasons [17]. In the context of dengue in Puerto Rico, another study found that forecasts derived from Bayesian averaging ensembles outperformed a set of individual models [25]. Results from the US COVID-19 Forecasting Hub CDC were consistent with our findings in that a multimodel ensemble frequently outperformed the set of individual models.
We also evaluated short-term forecasting performance by a set of ARIMA models, as prior studies have underscored the value of ARIMA models in epidemic forecasting [56], by providing a relatively simple and transparent approach to forecasting. For instance, in the context of influenza-like-illness in the USA, a set of ARIMA models provided reasonably accurate short-term forecasts during the 2016/17 influenza season [57]. In another forecasting study during multiple seasons of influenza in the USA, an ARIMA model yielded similar short-term forecasting performance compared to other models based on the mechanistic SIR modeling framework [58]. ARIMA models have also been used for spatial prediction of the COVID-19 epidemic [59, 60]. Another study [61] showed that the ARIMA model is more effective than the Prophet time series model for forecasting COVID-19 prevalence. Finally, it is worth noting that the US COVID-19 Forecast Hub did not include an ARIMA model in its set of evaluated models [49]. Therefore, it is interesting to assess how ARIMA models perform in the context of the COVID-19 pandemic in the US.
Prior work has underscored the need to assess alternative ways of constructing ensembles from a set of individual models [14, 16]. We explored two ways of constructing the ensembles by relying on the AICc or the relative likelihood associated with the individual models. We found that the short-term forecasting performance achieved by the ensemble models was not significantly affected by the type of ensemble weights used to construct them although performance using ensemble weights based on the reciprocal of the AICc was slightly better. Further research could explore how different weighting strategies influence the forecasting performance of ensemble modeling approaches.
Short-term forecasting is an essential attribute of the models. As prior studies have underscored, longer-term forecasts are of value, but their dependability varies inversely with the time horizon. Our 20 and 30-day forecasts are most valuable for monitoring, managing, and informing the relaxing of social distancing requirements. The early detection of potential disease resurgence can signal the need for strict distancing controls, and the reports of cases can identify the geographic location of incubating sub-epidemics.
Our study is not exempt of limitations. Our analysis relied on daily time series data of COVID-19 deaths in the USA, which is inherently noisy due to heterogeneous data reporting at fine spatial scales (i.e., county-level) [62]. Noisy data complicate the ability of any mathematical model to identify meaningful signals about the impact of transmission dynamics and control interventions. To deal with the high noise levels in the data, we fitted the models to smoothed time series rather than the actual daily series, as described in the parameter estimation section. Other forecasting studies, including the US COVID-19 Forecasting Hub, have relied on weekly death counts to address this issue [49]. Beyond the COVID-19 pandemic, there is a need to establish benchmarks to systematically assess forecasting performance across a diverse catalog of mathematical models and epidemic datasets involving multiple infectious diseases, social contexts, and spatial scales.
While our analysis demonstrated the accuracy of our ensemble sub-epidemic modeling framework in forecasting the COVID-19 pandemic, the same framework could be readily used to forecast other epidemics irrespective of the type of disease and spatial scale involved. Beyond infectious diseases, this framework could also be used to forecast other biological and social growth processes, such as the epidemics of lung injury associated with e-cigarette use or vaping and the viral spread of information through social media platforms.
In summary, our ensemble sub-epidemic models provided reliable short-term forecasts of the trajectory of the COVID-19 pandemic in the USA involving multiple waves and outcompeted a set of ARIMA models. The forecasting performance of the ensemble models improved with the number of top-ranking sub-epidemic models included in the ensemble. This framework could be readily applied to investigate the spread of epidemics and pandemics beyond COVID-19 and in a range of problems in nature and society that would benefit from short-term predictions.
Data Availability
All data referred to in this manuscript are available in our GitHub repository at https://github.com/atariq2891/An-ensemble-n-sub-epidemic-modeling-framework-for-short-term-forecasting-epidemic-trajectories
Author contributions
Conceptualization, G.C.; methodology, G.C., J.H, R.L, validation, G.C., R.L; formal analysis, G.C., R.L; investigation, G.C., R.L; resources, G.C., ; data curation, G.C., S.D..; writing—original draft preparation, G.C., R.L; writing, review, and editing, A.T., G.C., S.D., K.R., R.L., J.M.H., ; visualization, G.C, R.L; supervision, G.C., R.L; project administration, G.C.; funding acquisition, G.C. All authors have read and agreed to the published version of the manuscript.
Funding
G.C. is partially supported from NSF grants 1610429 and 1633381 and R01 GM 130900. A.T. and S.D. are supported by a 2CI fellowship from Georgia State University.
Conflicts of Interest
The authors declare no conflict of interest.