ABSTRACT
Background The COVID-19 pandemic emphasised the importance of access to reliable real-time forecasts for key epidemiological indicators. Given the strong heterogeneity between regions, providing forecasts at the local level is essential for health professionals.
Methods We developed a SARS-CoV-2 transmission model in France, COVIDici, that performs parameter estimation using up-to-date vaccination coverage and hospital data to provide forecasts up to a four-week horizon based on the current epidemic trend. We present the model, its associated online tool and perform a retrospective evaluation of the forecasts provided from January to December 2021 by comparing to three standard statistical forecasting methods (auto-regression, exponential smoothing, and ARIMA) at the national and regional levels.
Results COVIDici allowed simultaneous real-time visualisation of several indicators of the COVID-19 epidemic at the sub-national level. For anticipating risk of critical care unit overload, it performed worse compared to the baseline methods for forecasts under the three-week horizon, but had better point forecasts at the longest horizons (e.g. four weeks) for 8 of the 13 regions considered depending on the metric.
Conclusions Effective communication between modelers and clinicians is essential for utilising forecasts for health care planning. Online visualisation tools and consideration of how metrics can be affected by distortion from non-pharmaceutical government interventions facilitate this dialogue.
What is already known on this topic
The US and European Covid-19 Forecast Hubs publish real-time COVID-19 forecasts on the national level for new deaths, new cases, and hospital admissions, but not more direct measurements of hospital strain like critical care bed occupancy.
In France, statistical modelling approaches have been proposed to anticipate hospital stain at the sub-national level but are limited by a two-week forecast horizon.
What this study adds
We present a sub-national French modelling framework and online application for anticipating hospital strain at the four-week horizon that can account for abrupt changes in key epidemiological parameters.
It was the only publicly available real-time non-Markovian mechanistic model for the French epidemic when implemented in January 2021 and, to our knowledge, it still was at the time it stopped in early 2022.
How this study might affect research, practice or policy
Further adaptations of this surveillance system can serve as an anticipation tool for hospital strain across sub-national localities to aid in the prevention of short-noticed ward closures and patient transfers.
INTRODUCTION
The COVID-19 pandemic emphasised that policymakers need access to accurate forecasts of key epidemiological indicators to mitigate strain on hospital services and reduce preventable deaths [1,2]. Furthermore, policies implemented at the local level outperformed uniform national policies [3]. This led to international projects tasked with creating centralised repositories for COVID-19 forecasts pertaining to the United States [4], Germany/Poland [5], and Europe [6]. Unfortunately, in contrast to their counterparts, the European COVID-19 Forecast Hub only considers the national level and countries like France had to rely on other sources to optimise their local to national healthcare system management.
Of particular interest is intensive care unit (ICU) occupancy, one of the most direct indicators of hospital strain, which is predicted using either statistical or mechanistic models. Statistical models use correlations in previously observed data to explain model structure that can be separated from noise and extrapolated to the future. In time series analysis, they can benefit from the addition of adequate predictor variables. For example, [7] proposed an ensemble model approach that combined several statistical models (including machine learning) to utilise predictors identified from available epidemiological, mobility, and meteorological data to make 14-day forecasts for ICU admissions and ICU occupancy by French region. Their ensemble was effective in the short-term but had more difficulty predicting beyond the lag (typically at most two weeks) between their predictors (e.g. positive antigen testing) and the subsequent hospitalisation events.
Mechanistic models are explicitly based on a simplified version of the underlying epidemiological process [8]. The most popular is the compartmental model, which typically involves separating a population into distinct sub-populations (e.g. susceptible, infected, and ‘removed’ individuals in the SIR model) and inferring the transition dynamics between these compartments. This can be done based on assumptions regarding the biology of the pathogen or via optimisation approaches using observed data, e.g. hospital admissions, to make inferences regarding partially or unobserved factors such as daily infections [9]. The biological assumptions simplify the causal relationships between a pathogen’s infectivity, pathogenicity, and lethality so that long-term forecasts can be produced. These models provide us with a mechanistic understanding of the epidemic process and can help to anticipate planned changes such as lockdowns or increases in vaccination coverage. A limitation of compartmental models is that they typically require large, idealised populations for best results [10] and can have lower predictive performance [11] depending on the time scale. However, as shown in [12], a non-Markovian discrete-time compartmental model may have the potential to capture the dynamics up to 5 weeks on average (although this also reflects the epidemiological relevance of the underlying assumptions made).
To facilitate COVID-19 monitoring in France, we developed COVIDici: a mechanistic transmission model that accurately captures the hospital and mortality dynamics from the French epidemic [13]. Model results were publicly communicated via a web dashboard (https://cloudapps.france-bioinformatique.fr/covidici/), which provided real-time visualisations of the French epidemic at the national, regional, and departmental levels. COVIDici was updated daily using databases published by the national public health agency, Santé Publique France [14] until 2022 when it was halted after the emergence of the Omicron variant [15] led to decreased interest in epidemic forecasting by French authorities, partly driven by the belief that Omicron BA.1 would represent our way out of the pandemic and the last wave [16].
Here, we briefly summarise the structure of the underlying compartmental model, the statistical procedure for the parameter inference and describe communication via the web dashboard. We present an evaluation of COVIDici’s forecasts for ICU occupancy up to the four-week horizon at the regional and national levels using standard metrics for continuous variables as well as a binarized version representing ICU overload to focus on model performance in anticipating wave peaks. Standard statistical forecasting methods (auto-regression, exponential smoothing, and ARIMA) are included as baseline models. Finally, we discuss perspectives for COVID-19 epidemic modelling in the context of decreased surveillance.
METHODS
This section presents a summary of COVIDici and a retrospective evaluation. A more detailed discussion with mathematical formulas can be found in the Supplementary Materials. The scripts and data used to perform the analysis and generate this manuscript are available on GitLab (https://gitlab.in2p3.fr/ete/covidici_public) and archived in Zenodo [17].
The model
COVIDici is based on the COVIDSIM framework, a discrete-time deterministic age-structured transmission model tailored to capture the dynamics of the hospital indicators of the epidemic in France [13], which was modified to include vaccination. The structure of the model is shown in Figure 1, which describes the age-stratified flows between the susceptible population, mild infections, severe infections eventually requiring hospitalisation and a removed compartment consisting of recovered and deceased individuals.
Modelling vaccination
The French national vaccination campaign started in late December 2020. We explicitly modelled that the vaccines partially prevent infection and critical COVID-19 by reducing the force of infection experienced by all vaccinated susceptible hosts and the probability of being critically ill upon infection, denoted νI and νC respectively. We set vaccine coverage in each age class in the model using the official VAC-SI database [18].
Future vaccination rates were predicted using a linear regression for each age group trained on the previous 3 weeks of vaccination data. We assumed that vaccination begins with the older age classes and that all age classes have an arbitrary vaccine coverage threshold of 90%. If the coverage for an age class was ever over this threshold, the doses planned for this age class were redistributed to the next oldest age class.
To avoid the inflation of the number of parameters, we assumed that full vaccination only required a single dose. Another simplifying assumption is that the vaccine is instantaneously efficient, with an assumed permanent reduction in severe infection forms of 90% (νC = 0.1) and an 80% contagiousness reduction (νT = 0.2), which is an optimistic estimate [19]. Additional details are found in [13].
Parameter inference
Unknown parameters were inferred via MCMC simulation. While some parameters, e.g. the infection-to-hospitalisation delay, were inferred at the nationwide level only, most were independently fit for each sub-national unit. We expected some of these parameters to change over time due to virus evolution, e.g. the increased transmissibility of the Alpha [20] and then Delta [21] variants, but also to public health interventions (e.g. lockdowns, curfews, limitations on businesses, etc.), social factors (e.g. school holidays), improvement in COVID-19 patient care, and variation in patient profiles. To account for this, we allowed for some parameters to be time-dependent by partitioning the time since the beginning of the epidemic and allowing each period to be associated with its own parameter set.
Parameter estimations were based on the daily COVID-19-related critical care admissions and hospital deaths from the COVID-19 hospital activity database (SI-VIC) [14]. In France, critically ill patients can be hospitalised either in intensive care units, continuous care units, or acute care units, the three forming the critical care capacities [22]. For simplicity here, ICU refers to the wider category of critical care beds, as provided by SI-VIC. Furthermore, we assume the age distribution between localities to be fixed and based on the official demographics data [23].
Communication
An automated cluster computing workflow refit the COVIDici model using daily updates of hospital, vaccination and testing data downloaded from the SI-VIC database, allowing a Shiny web application (see Figure 2 for screenshot, link in Introduction or [17] for source code) to communicate real-time results to the public. The original 2021 production version permitted users to visualise the combined past and future model fit by national, regional or departmental administrative unit for multiple epidemiological parameters, including ICU admissions, ICU occupancy, mortality (cumulative and daily), temporal reproductive number (R(t)), infections (cumulative, daily and current), vaccination coverage and incidence for positive tests.
In 2022, a post-mortem version of the interface was deployed to allow for retrospective inspection of past forecasts with respect to a historical reference date. This version allows visualisation of all forecasts occurring prior to the reference date and includes basic evaluation metrics based on ICU overload (i.e. binarized ICU occupancy) and a colour-coded heatmap of hospital strain for varying forecast lengths and arbitrary saturation thresholds.
Benchmarking
Our assessment is based on original forecasts made by COVIDici between January 30, 2021 and the first detected Omicron case in France (i.e. December 2, 2021), taken on a weekly basis to match with evaluation frameworks of the European and US Covid-19 Forecast Hubs. We focus here on ICU occupancy and only consider the regional and national levels while emphasising that many of the results are equally as valid on the departmental level, especially when they contain major urban areas.
The following baseline models were evaluated using a rolling forecasting origin [24] starting on August 2, 2020:
ETS+ARIMA is an ensemble of an ARIMA and an exponential smoothing (ETS) model fit using automated defaults in the fable package in R. It uses a log transformation of the rolling 7-day average of the ICU occupancy using only the data available for COVIDici to make its original forecast for that reference date.
AR-Lasso is an auto-regressive (AR) machine learning type model implemented using the caretForecast package. Lagged values from the previous 21 days are selected using the Least Absolute Shrinkage and Selection Operator (LASSO) [25] tuned using time-series cross-validation as implemented in [26] to prevent data leakage and reduce overfitting.
Naive is a special case of an AR-1 implemented using the fable package. The point forecast is simply the last observed value and is optimal if the time series is a random walk.
Metrics
As recommended by the US and European COVID-19 Forecast Hubs, we evaluate the point forecasts with the absolute error (AE), individual prediction levels with the empirical coverage rate (ECR) and the forecast distributions with the weighted interval score (WIS). The WIS is a proper scoring rule that generalises the absolute error and gives penalties for interval spread as well as for over- and underprediction [27]. All three metrics (AE, ECR, WIS) were calculated using the scoringutils package [28]. We used the default summary function implemented in scoringutils (i.e. the mean) when aggregating over geographic units. However, we use the median when aggregating over time because the mean tended to give distorted results due to outlier errors that are common during the peaks of epidemiological waves (see Figure 3A).
Summarising AE and WIS with the median have the drawback that it is more likely to reflect forecaster performance between waves rather than its ability to anticipate peaks, which is arguably the more important objective. Furthermore, AE and WIS tend to harshly punish outlier errors during wave peaks which can at least be partially explained by a survivor bias that occurs every time a public health policy is implemented (see Figure 3B for non-exhaustive list), as well as spontaneous behavioural change. As illustrated by Figure 3C, this bias (i.e. the shaded area between the curves) corresponds to the difference between what would happen in absence of intervention (dashed curve) and what eventually is observed (the solid curve). While the magnitude of this bias at the peaks is counterfactual and subject to debate, several studies have indicated that even mild non-pharmaceutical interventions can have similar effects on curbing the spread of the virus compared to more severe ones [29,30].
To fairly evaluate performance during wave peaks, we consider ICU overload (binarized ICU occupancy), which we expect to be more robust against over-predictions. This requires introducing arbitrary capacity thresholds which we define as the percentage of the ICU occupancy observed in the geographical unit in the first wave in 2020. For point forecasts, we consider the proportion of incorrect forecasts of an outcome given that outcome was observed. Following the convention that lower scores are better, we define: For forecast distributions of ICU overload, we use the Brier score, which is the mean squared error of the binary overload outcome (i.e. 0 or 1) and the mass of the prediction interval above the arbitrary threshold. We present binary metrics separately for periods of observed overload and underload because anticipation of overload is widely considered more important in a hospitalisation surveillance system.
RESULTS
Qualitatively, all four forecasters performed reasonably well at the four-week horizon (see Figure 3A) although COVIDici and ETS+ARIMA tended to over-predict more at the top of the waves. All forecasters experienced a delay in anticipating upcoming waves, which was more pronounced for COVIDici and the naive model. Figure 3B shows qualitative evidence that COVIDici performs better on the trailing side of each wave than the leading side.
Standard metrics for ICU occupancy are contained in Figure 4. Figure 4A shows the mean WIS (scaled by the naive model) across all geographic units over time. Large misses near wave peaks are evident for COVIDici and ETS+ARIMA but not AR-Lasso, which appears to be more consistent. This narrative is reversed in Figure 4B which depicts the median AE and WIS by region. COVIDici clearly shows improvement relative to the other baselines at longer horizons and had the lowest AE at the four-week horizon for 8 of the 13 regions. These improvements were not present when evaluating the forecast distribution using WIS. This is likely explained by Figure 4C, where we see strong evidence that the prediction intervals for COVIDici were far too narrow. AR-Lasso was somewhat better calibrated and ETS+ARIMA had near optimal calibration of predictions intervals nationally as well as in several regions.
Binarized metrics for ICU overload are shown in Figure 5. All binary metrics show that COVIDici consistently improves relative to the baselines at longer horizons. Figures 5A and 5B show all evaluations of point and distribution metrics for ICU overload at varying capacity thresholds. AR-Lasso is clearly not suitable for anticipating ICU overload at the four-week horizon where it only performed comparably to the naive baseline while ETS+ARIMA had slightly fewer incorrect forecasts of overload compared to COVIDici but slightly more when predicting underload. ETS+ARIMA and COVIDici were also the top two choices at the four-week horizon in Figure 5B where the former had a consistent edge likely due to the overly narrow predictions intervals of COVIDici. Figure 5C breaks these metrics down by region which supports similar conclusions.
DISCUSSION
Modelling and forecasting
Mathematical modelling has made key contributions to the understanding and controlling of the COVID-19 epidemic [2]. In France, it led to the anticipation of hospital dynamics in the first epidemic wave [31] and the emergence of the Delta variant [21]. However, most of these studies were performed at a national level and/or at a single time point, which made their impact limited from a public health point of view. To our knowledge, there were only two continuously running forecasting models in France in 2021: COVIDici and an ensemble of statistical models implemented by the Pasteur Institute [7]. These two models were built on two types of approaches each with different limitations and strengths.
Our benchmarking analysis revealed two major limitations for COVIDici. First, the prediction intervals were far too narrow which obfuscated any performative advantages that it had when considering distributional evaluation metrics (e.g. WIS and Brier score). Second, COVIDici exhibited relatively weak performance up to the two-week horizon to predict ICU occupancy compared to statistical modelling approaches. However, this was expected as COVIDici only uses hospital data to update its inference and there is a nearly two-week delay between infection and hospital admission [32].
The main strength of COVIDici was improved accuracy at the four-week horizon for point forecasts of ICU occupancy compared to statistical methods, especially during the trailing edge of waves. For anticipating wave peaks, COVIDici had one of the best overall performances with respect to the trade-off between the correct prediction rate of observed overload and observed underload on the four-week horizon. AR-Lasso on the other hand failed to reasonably anticipate overload, despite relatively optimistic performance in terms of more standard metrics (e.g. WIS), which should serve as a cautionary example.
Forecasts popularisation
In addition to real-time subnational forecasting, a key feature of COVIDici was to offer visualisation of numerous unobservable indicators that can only be inferred through an underlying model (e.g. the estimate of all active infections). This is an asset from a popularisation point of view, but it raises issues because estimates can be strongly biased in areas with low population density.
Furthermore, it is important to distinguish between two types of forecasts. Some, like COVIDici, attempt to capture what will happen if transmission remains identical, i.e. in absence of intervention or behavioural change. Others, especially the ones built on machine learning, try to factor in these changes to forecast what will eventually happen. In terms of guiding public health decisions such as the allocation of resources, the former seems more appropriate than the latter. However, this requires familiarising the audience with the well-known dilemma that we expect forecasts made under the assumption that nothing changes to be proven wrong by the observed data when they are too pessimistic. Otherwise, it means they were not considered when shaping the public health response.
Perspectives
Many countries are decreasing their investment in epidemic surveillance, and some rely on statistical model forecasting with the inclusion of new predictors, such as that from wastewater data. However, there is still room for compartmental model forecasts like COVIDici that can rely on variables with a high level of sampling such as hospital admissions data.
Regarding SARS-CoV-2, future extensions of COVIDici would require updating the model to account more precisely for the diversity in immune protection among individuals, given the number of natural infections since the evolution of the Omicron variants [15]. However, existing non-Markovian models suggest that this is feasible [32].
Furthermore, in several ways COVIDici’s forecasting potential was under-exploited. It already incorporated variations in vaccine coverage but, thanks to its mechanistic nature, it could also readily include planned events such as school holidays or early predictors of variations in reproduction numbers such as temperature.
Data Availability
The model in this manuscript is based on publicly available data provided by Santé Publique France. The scripts and data used to perform the analysis and generate this manuscript are available on GitLab (https://gitlab.in2p3.fr/ete/covidici_public) and archived in Zenodo (doi:10.5281/zenodo.7641133). The companion web dashboard is hosted by France Bioinformatique (https://cloudapps.france-bioinformatique.fr/covidici/).
https://www.data.gouv.fr/fr/datasets/donnees-relatives-aux-personnes-vaccinees-contre-la-covid-19-1/
https://www.data.gouv.fr/fr/datasets/donnees-hospitalieres-relatives-a-lepidemie-de-covid-19/
ACKOWLEDGEMENTS
We acknowledge support from the MODCOV19 platform set up in March 2020 thanks to the National Institute of Mathematical Sciences and their Interactions (INSMI) and the CNRS (PaSSES grant) and the MODVAR project under the EMERGEN consortium. We also want to thank Santé publique France, the University of Montpellier, the South Green computational platform/IRD and the French Institute of Bioinformatics.