Abstract
Background Recurring COVID-19 waves highlight the need for tools able to quantify transmission risk, and identify geographical areas at risk of outbreaks. Local outbreak risk depends on complex immunity patterns resulting from previous infections, vaccination, waning and immune escape, alongside other factors (population density, social contact patterns). Immunity patterns are spatially and demographically heterogeneous, and are challenging to capture in country-level forecast models.
Methods We used a spatiotemporal regression model to forecast subnational case and death counts and applied it to three EU countries as test cases: France, Czechia, and Italy. Cases in local regions arise from importations or local transmission. Our model produces age-stratified forecasts given age-stratified data, and links reported case counts to routinely collected covariates (test number, vaccine coverage..). We assessed the predictive performance of our model up to four weeks ahead using proper scoring rules and compared it to the European COVID-19 Forecast Hub ensemble model. Using simulations, we evaluated the impact of variations in transmission on the forecasts. We developed an open-source RShiny App to visualise the forecasts and scenarios.
Results At a national level, the median relative difference between our median weekly case forecasts and the data up to four weeks ahead was 25% (IQR: 12-50%) over the prediction period. The accuracy decreased as the forecast horizon increased (on average 24% increase in the median ranked probability score per added week), while the accuracy of death forecasts remained stable. Beyond two weeks, the model generated a narrow range of likely transmission dynamics. The median national case forecasts showed similar accuracy to forecasts from the European COVID-19 Forecast Hub ensemble model, but the prediction interval was narrower in our model. Generating forecasts under alternative transmission scenarios was therefore key to capturing the range of possible short-term transmission dynamics.
Discussion Our model captures changes in local COVID-19 outbreak dynamics, and enables quantification of short-term transmission risk at a subnational level. The outputs of the model improve our ability to identify areas where outbreaks are most likely, and are available to a wide range of public health professionals through the Shiny App we developed.
Introduction
Local dynamics of COVID-19 depend on a combination of the level of immunity in a population, which in turn depends on previous incidence, vaccine uptake, and the immune escape properties of the currently circulating SARS-CoV-2 variants, and other factors such as population density, social contact patterns, migration patterns and adherence to public health measures. Most countries in Europe have been affected by repeated COVID-19 waves since March 2020, and have implemented extensive vaccination campaigns in order to reduce the health impact of these waves. This led to high national levels of both natural and vaccine-induced immunity (1). Before the emergence of the Delta and Omicron variants of concern (VOCs), this immunity provided considerable and durable protection against severe outcomes (hospitalisation and death) and some transient protection against infection (2). The emergence of new VOCs, and the extent of their immune escape, along with the waning of efficacy observed for a two-dose vaccine course, precipitated the expansion and acceleration of “booster” vaccination campaigns, further shifting the immune landscape of the population. Overall immunity to the Omicron BA.1 variant rose quickly after its emergence due to the unprecedentedly large wave of BA.1 cases that occurred throughout Europe in late 2021-early 2022, and vaccine booster campaigns. Since then, repeated emergence of Omicron subvariants with high levels of immune escape has led to multiple waves of infections, with lower case burden than the first BA.1 wave.
Currently, the overall level of immunity against infection is high compared to the early phase of the COVID-19 pandemic. The immune landscape of the population is spatially heterogeneous due to considerable variation in infection histories (e.g. multiple reinfections with different variants) and differences in vaccine uptake and timing across the population. If overall immunity rises, this immune landscape should result in the risk of outbreaks becoming more spatially and demographically heterogeneous, with the potential for distinct identifiable outbreaks. Therefore, there is interest from public health agencies in forecasting short-term incidence at a subnational level, and in different age groups, to anticipate large localised spikes in case numbers and local pressure on health care systems. Predicting such spikes in health care demand will become increasingly relevant as time moves forward if COVID-19 becomes more seasonal and influenza-like in its dynamics. This project was developed in collaboration with the European Centre for Disease Prevention and Control (ECDC), with the objective of developing a statistical framework for forecasting local case and death incidence in a range of European countries, and visualisation tools to communicate the predictions. The forecasts and scenario analysis can then be used to optimise planning and allocation of resources.
There have been numerous attempts to model subnational incidence of COVID-19 over the course of the pandemic. These have tended to fall into two broad categories: mechanistic “spatial” susceptible-exposed-infectious-recovered transmission models and statistical time series/spatiotemporal models. Most time series and transmission models have treated subnational regions as independent (fitted the model and made predictions separately for each region) without accounting for spatiotemporal correlations in incidence between regions (3–7), despite these being strong (8–10). A limited number of transmission models have instead treated subnational regions as connected sub-populations via a metapopulation approach, and used geographical distance or mobility/commuting data to parametrise connectivity between regions (11–16). These models become increasingly complex and hence slow to generate predictions as the number of affected regions increases, and have therefore received limited use for real-time forecasting during the pandemic. Time series models and spatiotemporal statistical models, on the other hand, have been used extensively and successfully to forecast future incidence at national and subnational levels (6,7,17). In the latter category, we focus on Endemic-Epidemic models, a flexible class of spatiotemporal statistical models that can be used to link changes in incidence to recent case numbers and the effect of various different covariates, on which the framework used in this paper is based. Endemic-Epidemic models have already been employed during the pandemic to understand and forecast spatiotemporal spread of COVID-19 at a subnational level (18–21) and at a national level in Africa (22), and to assess the impact of non-pharmaceutical interventions (NPIs), including lockdowns and border closures (23–26) (see (27) for a review). Their superiority to time series models that assume independence between regions has been demonstrated on data from northern Italy (28). However, they have not, as yet, been applied to forecast subnational incidence across multiple countries.
In this paper, we present a flexible modelling framework to capture subnational, age-stratified case dynamics of COVID-19, alongside a publicly available RShiny App to visualise the results and forecasts generated. The framework is used to predict subnational incidence of COVID-19 cases and deaths from routinely collected, publicly available surveillance data, and forecast the impact of changes in transmission on short-term dynamics (e.g. due to changes in behaviour or transmissibility). The framework can be run with an age-stratified model if local, age-stratified data is available. Public health professionals can use the RShiny App to visualise the forecasts of reported cases and deaths, and the predicted incidence under different scenarios, to get a full picture of the anticipated short-term burden of COVID-19 in local areas. We apply the framework to forecast cases and deaths at a NUTS-3 (Nomenclature of Territorial Units for Statistics 3) level in France, Czechia and Italy as test cases, and evaluate its ability to predict subnational incidence up to four weeks ahead.
Methods
Spatiotemporal modelling framework for reported cases
We model subnational COVID-19 case counts using the Endemic-Epidemic spatiotemporal modelling framework (29), as implemented in the surveillance R package (30). Endemic-Epidemic models provide a flexible framework for relating current incidence of cases to recent case incidence and to imported cases, accounting for the influence of other factors (i.e. covariates) on these relationships. They decompose local incidence in region i at time t into three components:
- An autoregressive component that quantifies the number of new infections expected from cases reported in region i at previous time steps.
- A neighbourhood component that quantifies the number of new infections expected from cases reported in regions around region i in previous time steps, which depends on the connectivity (or the human mobility) between regions.
- An endemic component that quantifies the background number of new infections occurring in region i at time t, independent of the current level of transmission, representing importations from regions outside those included in the study (or cases that could not be linked to the mechanistic components).
Endemic-Epidemic models are able to integrate multiple data sources from disease surveillance activity to link case incidence and various covariates. Each component is independently impacted by each covariate. Since we introduce various covariates and controls in our model, we merge the autoregressive and neighbourhood component into an epidemic component to reduce the number of parameters estimated, and avoid identifiability and convergence issues.
Early versions of the Endemic-Epidemic models only included dependence of current cases on cases in the previous time step, but they have since been extended to account for distributed lags via the hhh4addon R package (31–33). This allows for a more faithful representation of the impact of recent incidence on current case number, for instance by aligning the lag distribution to the serial interval of the pathogen.
The link between indicators of immunity and risk of SARS-CoV-2 infection is complex and unstable, due to waning of vaccine and infection-induced immunity, and emergence of variants able to escape immunity. The flexibility of the Endemic-Epidemic framework therefore makes it well suited to capture local dynamics of COVID-19 cases, while fully mechanistic frameworks may be too complex to parametrise. Depending on data availability, the model can be stratified by age.
Non-age-stratified model
The total number of cases (over all age groups) in region i at time t, Yit, conditional on the number of cases in the same region and neighbouring regions in the previous p time steps, is modelled as negative binomial: with conditional mean μit and dispersion parameter ψi such that
The mean is given by where is the transmission potential from recent cases in region j, with [ud] the normalised lag weight for cases d days ago. We set p= 20 days and use a custom composite serial interval distribution for ud that accounts for missing infection generations (34), with a mean and standard deviation for the first infection generation of 5 days and 1.5 days respectively, based on the estimated serial interval for SARS-CoV-2 for pre-Omicron variants (35–37), and 80% of the composite serial interval assumed to reflect direct transmission (without missing infections) (Supplementary Figure 1). The predictors for the epidemic (combined autoregressive and neighbourhood) and endemic components, ϕit and vit, determine the number of cases stemming from each component and are assumed to depend on log-linear component-specific predictors:
The predictors are independently impacted by different covariates, , i.e. a covariate may be associated with fewer imported cases (endemic component vit), but have little impact on the spread of the virus in and between the regions (epidemic component ϕit). The association between the covariates and the local number of cases expected is quantified by regression coefficients, β(ϕ), β(v), estimated through a maximum-likelihood approach. The weights wji in the neighbourhood component quantify the degree of connectivity between region i and surrounding regions j. We use a power-law model for the neighbourhood weights: where oji is the adjacency order between regions i and j and ρ is a decay parameter to be estimated. The adjacency order is defined as the minimum number of borders that must be crossed to get from i to j: the adjacency degree is equal to 1 between neighbours, 2 between neighbours of neighbours, and so on. We normalise the weights, such that , where, I is the maximum adjacency order considered, which we take to be 5.
Age-stratified model
For countries in which age-stratified subnational case count data is available, we implement an age-stratified version of the model above using modified code from the hhh4contacts R package (38,39). For this model, the number of cases in age group in region i at time t, Yait, conditional on the numbers of cases in the previous p time steps in the same and other age groups in region i and the surrounding regions is modelled as negative binomial: with conditional mean and overdispersion parameter ψi >0 such that , where with na age groups and ni regions, and cba is the mean number of daily contacts in age group a of an individual in age group b. We stratify the population into na =9 age groups (0-9, 10-19,…, 70-79, 80+ years). We use age-structured contact matrices from country-specific pre-pandemic contact surveys for cba where available, e.g. for France (40), and synthetic contact matrices estimated from contact survey and demographic data for countries in which no nationally representative contact surveys have been conducted, e.g. for Czechia (41). We used age-stratified intercepts in the epidemic component.
Covariates
We incorporate various covariates in and in the log-linear predictors for ϕit and vit. Thefull model equations are given in the Supplementary Material (see Model equation for each country and Supplementary Table 1). The covariates were picked based on potentially having had an effect on transmission and importation risk, or having an effect in future. We compare the forecasts obtained with the full model with covariates against a baseline model without covariates in the Supplement.
We included the same set of covariates in the endemic component in the age-stratified and non-age-stratified models:
- population: the total population of region i
- urban/rural status: binary covariates for whether the NUTS-3 region i is classified as urban, intermediate urban, intermediate rural or rural.
- seasonality: sinusoidal terms with annual periodicity to account for seasonal effects on importations. The amplitude and offset of the seasonality function are estimated by the model.
- number of cases in the WHO European region over the last month.
These covariates cover the impact of demographic characteristics and transmission in neighbouring countries on the background number of cases in each region (and age group in the age-stratified version of the model). The covariate specifications of the epidemic component depend on the availability of age-stratified data as follows:
- population:
- age-stratified: two covariates:
- total population of region i
- proportion of the population of age group a in region i.
- non age-stratified: one covariate: the total population of region i.
- testing:
- age-stratified: two covariates:
- proportion of the population in the country tested in the last 2 weeks.
- if local age stratified testing data is available: proportion of population of age group a in region i (if local testing data is available, national otherwise) tested in the last two weeks. Otherwise, proportion of population of age group a in the country tested in the last two weeks.
- non age-stratified: one covariate: the proportion of the population in the country tested in the last 2 weeks.
- vaccination coverage: the proportion of the population in region i who received their second dose in the last 120 days, or who have received three or more doses (for each age group in the age-stratified model)
- cumulative incidence: the cumulative incidence of cases between the start of the fitting period and a month ago in region i (for each age group in the age-stratified model)
- recent incidence: the cumulative incidence of cases in the past month in region i (for each age group in the age-stratified model)
- variant: two binary indicator variables for whether the proportion of sequenced cases that were Delta or an Omicron variant was higher than 30% (in all three countries, the Delta covariate is equal to one in late 2021, and the Omicron covariate is equal to one from January 2022 onwards).
- day-of-the-week: indicator variables to account for day-of-the-week reporting effects in the numbers of cases.
- urban/rural status: binary covariates for whether the NUTS-3 region i is classified as urban, intermediate urban, intermediate rural or rural.
- seasonality: sinusoidal terms with annual periodicity to account for their effects on local transmission. The amplitude and offset of the seasonality function are estimated by the model.
The covariates above cover the impact of local immunity (due to vaccine or previous infection), testing patterns, variants and seasonality on the risks of reported cases. We Included covariates that quantify the association between number of tests and incidence so that the model may capture changes in surveillance and reporting patterns, whereby drops in testing can be associated with changes in the number of new cases. For instance, in age groups with a lower proportion of hospitalised cases (e.g. younger age groups) the reporting rate may vary greatly if only severe cases are tested and reported. The covariates quantifying the association between infection or vaccine-acquired immunity and transmission risk depend on various thresholds (e.g. recent incidence corresponds to cases reported in the last month; the second vaccine dose is only taken into account during 120 days). These thresholds can easily be changed in the code to generate sensitivity analyses, or to adapt to the characteristics of new variants or vaccines.
Model fitting
We fit the model to case data reported between September 2020 and the latest reported date for each country (currently end of April 2023) to estimate the regression coefficients β (ϕ) and β(v). The model is fit separately for each country. We do this via maximum-likelihood estimation, as implemented in the surveillance and hhh4addon R packages.
Case forecasts
We use the parameter estimates from the model fitting to generate four-week-ahead forecasts of daily case numbers in each region and age group by simulating the model forward 28 days with projected values of the covariates. We used the latest value of the covariates describing vaccine coverage and testing, and the mean value over the past 30 days for the number of cases in the rest of Europe in the past month. We run 500 simulations for each country to account for stochasticity and parameter uncertainty, and output the median, 2.5th, 25th, 75th, and 97.5th percentiles of the predicted distribution for each date and region (and age group for the age-stratified model) for visualisation in the RShiny app.
Death forecasts
We use simple linear regression models to generate four-week-ahead forecasts of weekly numbers of reported deaths. To do so, we estimate the Case Fatality Rate (CFR), and combine it with the recent number of reported cases to forecast the number of deaths, We first aggregate the number of reported cases and deaths by week, and compute a proxy for the age-stratified CFR assuming a three-week delay between weekly reported cases and reported deaths (42– 44). Specifically, we compute the CFR for age group ai in region i at week w as: where Naiw and Daiw are the numbers of cases and deaths in age group a, in NUTS-2 region i, in week w. The death forecast model was implemented at a NUTS-2 geographical scale since death data was not always available at a NUTS-3 level.
We then calculate ΔCFRaiwx, the change in CFR, and ΔNaiwx, the change in the number of cases, between the prediction date and the forecast horizon: where x is the forecast horizon (1, 2, or 3 weeks). We implement a linear regression model for each forecast horizon and age group, with ΔCFR as outcome, and ΔN as explanatory variable, again using a three-week delay: We then use the estimates of αax and βax to predict ΔCFR between the last week of reported case data (wpred) and the forecast dates (wpred + 1, wpred + 2, wpred + 3), and calculate the predicted CFR as: (using a sample of 10 values of ΔCFRaiwx, computed from the mean estimate and the prediction interval of the linear regression). Finally, we draw the number of new weekly deaths each week using a binomial distribution, from the estimated CFR, and the number of cases reported three weeks before: We draw 500 forecasts for in total (50 per value of ). Given the three-week delay between cases and deaths, one-, two-, and three-week ahead death forecasts are generated using already reported case data. In contrast, four-week-ahead death forecasts require one-week-ahead case forecasts. We estimate the change in CFR using the regression parameters from the three-week-ahead forecasts and the four-week change in number of cases in the one-week-ahead case forecasts
Scenario simulations
To explore the impact of variations in transmission intensity or implementation of non-pharmaceutical interventions (NPIs), we generate 28-day forecasts under combinations of the following scenarios, by changing the values of the epidemic predictor ϕait, or the endemic predictor vait :
- Moderate (20%) or large (40%) increase in transmission intensity (ϕait), due to inherent properties of the pathogen (i.e. emergence of a new, more transmissible variant), or to changes in behaviour.
- Moderate (20%) or large (40%) decrease in transmission intensity (ϕait), due to changes in human behaviour or NPIs. Furthermore, for countries where an age-stratified model was implemented, this decrease can be targeted at a certain age group (children and teenagers below 20 years old, adults between 20 and 60 years old, or older inhabitants), and implemented one or two weeks after the current date.
- Removal of all importations, for instance due to border closure (i.e. vait = 0).
All scenarios affect every region in the same way (i.e. we do not consider local NPIs). We generate 100 simulations under each scenario.
Data
Several publicly available data sources are used to implement the model and are summarised in Table 1. Since compilation of COVID-19 surveillance reports in centralised databases was interrupted in early 2022 in many countries, the majority of the data, including the case and death data, is imported from country-specific sources.
Case and death data
Local case data at a NUTS-3 level was used in all countries (age-stratified in France and Czechia). In Italy and France, death data was only available at NUTS-1/NUTS-2 level.
Covariate data
Where available, daily age-stratified (for the age-stratified model) vaccination data at NUTS-3 regional level is used, though most sources provide weekly data, and a two-week delay for protection from each dose to develop is assumed. Publicly available testing data varies considerably in spatial resolution (for some countries only national data is available) and age stratification (for some countries only total data is available), so age-stratified national testing data is used as a default and subnational data used when available. Age-stratified regional population data is drawn from a different source for each country (Table 1). Data on the proportion of different variants among sequenced cases is drawn from the ECDC variant database (45). The urban-rural status of each NUTS-3 region is taken from the Eurostat database (database labelled “urban-rural remoteness”) (46). Daily numbers of cases in the rest of Europe are taken from the World Health Organisation database of daily numbers of cases and deaths in different countries (47).
Calibration analysis
We evaluate the ability of the model to generate accurate and reliable case and death forecasts in each country by fitting the model up to a set of dates in a calibration period, generating 28-day-ahead forecasts from these dates using the model fits, and comparing these forecasts with the data. The forecasts are generated by re-fitting the model for each calibration date (i.e. we do not use the data posterior to the calibration date to generate the forecasts). We used weekly dates from the last six months of data as calibration period (29th October 2022 to 22nd April 2023, (i.e. 25 calibration dates). A model is deemed well-calibrated if it can identify its own uncertainty in making predictions, i.e. if the data points are evenly distributed across the prediction intervals generated by the model. Since the reliability of our forecasts is likely to depend on the prediction horizon, we compare the performance of the model for one-, two-, three-, and four-week-ahead forecasts, to determine at which point the quality of the forecasts declines.
We use various metrics and figures to evaluate the calibration. Firstly, we visually compare our weekly national-level case and death forecasts, obtained by summing up the age-stratified and local forecasts, with the ensemble forecasts produced by the European COVID-19 Forecast Hub, which serves as a benchmark of short-term forecasts of COVID-19 incidence. We also compare our national-level forecasts with the ensemble forecasts quantitatively via the weighted interval score (WIS) (31), a proper scoring rule (i.e. one that measures both calibration – how accurate the forecasts are – and sharpness – how precise the forecasts are) for quantile forecasts, and the squared error of the median. This comparison assesses whether the overall performance of our model is in line with other COVID-19 prediction models.
Secondly, we evaluate our local forecasts against those of a baseline model (chosen as the Endemic-Epidemic model without transmission between regions, covariates or seasonality, i.e. the model in Equations (1)-(4) but with wji =1 when j=i and wji =0 otherwise and log(ϕit)= α(ϕ) and log(vit)= α(v), using proper scoring rules, namely the ranked probability score (RPS), Dawid-Sebastiani score (DSS) and squared error score (SES) (see Supplementary Material for definitions). We also generate the predictive probability distribution of the local, age-stratified (if available) forecasts at each calibration date. We then use the Probability Integral Transform (PIT) histogram to assess the calibration of the model: in models with good calibration, the data should follow the predictive probability distribution, and the PIT histogram should be uniform. We computed a non-randomised yet uniform version of the PIT histogram, to correct for the use of discrete values, as described in Czado et al (56).
Comparison of age-stratified and non-age-stratified model
As age-stratified case and death data is only available for certain countries, we explore the impact of fitting the model to subnational total case counts for France and Czechia on the ability of the model to predict subnational total numbers of cases and deaths. We do this by fitting the non-age-stratified equivalent of the age-stratified model in Equations (3)-(4) and (A1)-(A2) with the age-stratified covariates removed, i.e. Equations (1)-(2) and (A4) (see Supplementary Material), and re-running the calibration analysis of the predictive performance of the model described above.
RShiny Application
All forecasts and predictors generated by the model are available in an R Shiny Application (https://github.com/EU-ECDC/RShinyCovidApp). The users can use the application to see the latest case and death forecasts of the model in each country, compare past forecasts to recent data points, generate different transmission and NPI scenarios, and observe the regions most at risk of transmission according to the model. The forecasts contained in the application are updated weekly.
Results
Calibration analysis: assessing the performance of the model
Case forecasts calibration
Firstly, we compute the weekly case forecasts generated by the model across all regions and age groups, by summing up all local age stratified forecasts, across the calibration period (last six months of data). We then compare the one, two, three, and four-week ahead forecasts with the observed data, and the weekly forecasts from the European COVID-19 Forecast Hub.
The comparison of the forecasts is shown in Figure 1 and Table 2. It demonstrates that our model was able to reliably capture the dynamics observed in the data up to two weeks ahead. The magnitude of the peak of the different outbreaks is accurately estimated in all three countries, and the trend forecasted by our model corresponds to the data. Beyond a two-week forecast horizon, the difference between the forecasts and the data becomes starker, in particular during outbreaks (for instance around December 2022 in France), with changes in case numbers being captured with a bigger lag in the forecasts. Based on the squared error of the median, the median forecasts generated by the model are on par with the ensemble forecasts from the European COVID-19 Forecast Hub across all forecast horizons (Table 2). There are periods where our model outperforms the ensemble model (e.g. two-week ahead forecasts between September and December 2022 in Italy), and others where the ensemble model outperforms ours (January-March 2023 in Czechia across all forecast horizons).
However, the dispersion of our forecasts is considerably narrower than that of the ensemble forecasts, particularly for France. The data points are therefore more often outside the 95% prediction interval of our forecasts, despite the median forecasts being close to the data points, and the median and 95% interval of the weighted interval score is higher for France for our model than the ensemble model (Table 2). The narrow prediction interval is a consequence of aggregating all local and age-stratified forecasts to get the national-level estimate rather than directly estimating the national number of new cases: since the dispersion of a sum of negative binomial random variables is much smaller than that of each individual negative binomial random variable. This is also why the prediction interval is narrowest in France, where the number of groups is highest (846 groups = 94 regions times 9 age groups).
We now assess whether the dispersion of local estimates is in line with the data (i.e. whether the prediction intervals from the model include the observed data points). To do so, we compute the case PIT histogram for each country and forecast horizon (Figure 2). We also generate PIT histograms stratified by broad age groups for France and Czechia (Supplementary Figures 2 and 3). In one-week-ahead forecasts, we observe an inverse U-shaped histogram in Italy, indicating that the forecasts are slightly overdispersed. The performance of the age-stratified model varies in the different age groups. The model tends to overestimate the number of cases in younger age groups (Supplementary Figures 2 and 3), while the histograms in age groups between 20 and 80 years old were flat, indicating good calibration. The two-week ahead PIT histogram for Italy also shows very good performance, with little sign of bias, although the model tends to slightly overestimate the number of cases (few observations fall into the highest categories). Over longer time horizons, the PIT histogram becomes U-shaped, i.e. skewed towards extreme values, showing that the forecasts generated by the model are too confident.
The histograms for Czechia and France are U-shaped, indicating that the model is overly confident (i.e. the prediction interval is too narrow). The overestimation of cases in younger age groups may be due to case incidence being determined by the age-structured contact rates used in the model (derived from contact surveys), but cases in this age group may also be under-reported as they typically have milder symptoms, and so are harder to spot without active case finding campaigns. Forecasts past two weeks ahead are more strongly biased, and underdispersed, but better performance is still observed for Italy, where the model is not age-stratified. Similar underdispersion was observed in Czechia and France when a non-age-stratified model was implemented (Supplementary Figures 6 and 7).
We also generated local forecasts using a baseline model (with no transmission between regions, covariates, seasonality, or age-specific intercepts), and compared them to the forecasts from the full model (Supplementary Table 2). The addition of transmission between regions, covariates and seasonality to the model substantially improved the predictive performance up to 2 weeks ahead for all countries, with an average 38% improvement across the three countries in the median RPS at a 2-week forecast horizon. This was especially true in Czechia and France, where the age-specific dynamics were hard to capture without covariates.
Death forecasts calibration
The death forecast comparison is shown in Figure 3 and Table 3. The performance of the model is not homogeneous across countries. The model performs better in Italy and Czechia, where the median estimate is closer to the observed data, and the 95% prediction interval includes the data more often than in France. In France, the model was not able to accurately capture the drop in number of deaths following the outbreak in December 2022. For most dates, the median estimate of the number of deaths is similar in our model and the European COVID-19 Forecast Hub ensemble model (Figure 3), and the median of the squared errors of the median predictions across all time points in the prediction period is similar (Table 3). The predictive accuracy and coverage of our model is closer to that of the ensemble model for the death forecasts than the case forecasts (the WIS is closer, Table 3), and the impact of forecast horizon on the performance is not as severe as for the case forecasts, for instance for Italy four-week-ahead forecasts are still in line with the number of reported deaths (Figure 3). This is due to the larger prediction intervals generated with the death model relative to the number of deaths.
For Czechia and France, the PIT histograms generated for the local age-stratified weekly death forecasts follow much more uniform distributions than those for the case forecasts, for all forecast horizons (Figure 4). Since the model was age stratified for both countries, the majority of weekly death counts in the data were 0, which was easier for the model to capture and forecast. In Italy, the upper bound of the prediction interval tends to be too low, since few observed weekly death counts fall into the lower categories of the PIT histograms. Therefore, our model tends to underestimate the number of weekly deaths per region in Italy. The age-stratified PIT histograms in Czechia and France (Supplementary Figures 4 and 5) are uniform for younger age groups (where almost all observations are 0), and follow an inverse U-shape in older age groups, indicating that the forecasts are slightly overdispersed.
Visualising the predictions in the R Shiny Application
The RShiny app is designed to make three features accessible to the user:
Forecasts: current and previous four-week ahead forecasts of the number of cases and deaths in each region (and age group when available)
Predictors: The risks of secondary transmission and importations estimated by the model
Scenarios: The impact of changes in transmission, be it due to more infectious variants, changes in behaviour, or NPIs on case and death forecasts. These changes in transmission represent the potential impact of control measures on transmission, and should be interpreted with caution in a constantly changing epidemiological situation (impacted for example by behaviour, adherence and other factors). Similarly, this model only considers the epidemiological impact of NPIs, the social or economic costs of different control measures is not considered in this analysis.
Previous and current forecasts
Case and death forecasts can be viewed via a map of the predicted median case/death incidence over the next two weeks, or via time series plots by region (and age group when age-stratified data is available) of the median numbers of daily cases/weekly deaths over the next four weeks (with 50% and 95% prediction intervals) (Figure 5). Forecasts can be visualised at different geographical scales, either at NUTS-3 level or NUTS-1/NUTS-2 level. The accuracy of past forecasts at different geographical scales and over different time horizons can be visually assessed by varying the date from which to predict (to a date in the past) and comparing past predictions to the observed data.
Local predictors of transmission and importations
Spatial heterogeneity in case incidence, transmission and importation risk at the latest date of the forecasts is displayed in the app via three different maps showing the spatial distribution of cases at a NUTS-3 or NUTS-1/NUTS-2 level, the local risk of transmission (the population-weighted average value of the epidemic component for each region across all age groups), and the local risk of importation (the epidemic component for each region summed across age groups) (Figure 6). The local risk of transmission and importation are shown as a percentage of the highest value of the predictor in the country, and give insight into the spatial heterogeneity in risk in the country. In all three countries, the risk of transmission is more homogeneous across the regions than the risk of importation. The risk of importation, quantified by the endemic predictor, is heavily influenced by the number of inhabitants in a region, so the regions gathering most of the importation risks are regions containing the major cities in all three countries.
Short-term transmission scenarios
Finally, simulations showing the impact of various changes in transmission (either increases due to changes in behaviour or new variants, or decreases caused by targeted or global NPIs) on predicted numbers of cases and deaths are shown in the app via similar figures to those used to display the case and death forecasts (Figure 7). Options for exploring the impact of changes in transmission include combinations of:
- Increasing transmission by 0, 20, or 40% to represent different properties of an emerging variant, or changes in behaviour that lead to increased risks of spread.
- Dropping transmission by 0, 20, or 40% to represent the impact of increased NPIs.
- Targeting NPIs at the whole population or a specific age group.
- Removing importations from outside the selected country to represent stringent border closures.
- Changing the delay in the time it takes for NPIs to become effective (1 week or 2 weeks).
Throughout all three countries, removing importations has very little impact on the transmission dynamics in the case forecasts. Local transmission is sufficient to maintain transmission without new cases being added through the endemic component. On the other hand, the moderate and large changes in transmission through NPIs or changes in behaviour have a large impact on the forecasted number of cases and deaths. Delaying the implementation of NPIs by a week strongly decreases the impact of the control measures.
Discussion
We have developed a framework to forecast subnational COVID-19 case and death incidence up to 4 weeks ahead, and explore the potential impact of changes in transmission on reported incidence. The framework was applied to France, Czechia and Italy. The model outputs are based on routinely collected, publicly available surveillance data. We have also developed a RShiny app, where users can visualise the 4-week-ahead forecasts of both reported cases and deaths, and the predicted impact of changes in transmission. The code we developed to implement the model and the RShiny App is publicly available in two Github repositories: https://github.com/EU-ECDC/RShinyCovidApp and https://github.com/EU-ECDC/BackendCovidApp. The model fits and scenario simulations are updated automatically every week.
Case and death forecasts aggregated at a country level perform comparably with the European COVID-19 Forecast Hub ensemble model, a benchmark in epidemic forecasting as an ensemble of forecasts from many independent models. In addition, our framework provides NUTS-3-region-level (i.e. much more highly spatially resolved) and, if age-stratified data is available, age-stratified forecasts. Therefore, it can be used for more targeted policy making and planning at a local level. Given the significant spatiotemporal heterogeneity in COVID-19 incidence, and changes in age patterns, the detailed local-level visualisation provided by our model is particularly useful to evaluate targeted control measures. This study highlights the need for surveillance systems that gather accurate, timely, age-stratified data, and the value of making such data publicly available to improve understanding and prediction of local transmission and outbreak response planning.
As demonstrated by our calibration analysis, the subnational and age-stratified case and death forecasts are accurate up to 2 weeks ahead: the case PIT histograms are flat for Italy and for most age groups for Czechia and France, while the median country-level forecast is closer to the data than the European COVID-19 Forecast Hub ensemble forecast. However, the forecasts become less reliable beyond a 2-week horizon. This may be due to fundamental predictability limits, the difficulty of forecasting changes in behaviour and/or sudden changes in transmission, e.g. due to the emergence of a new highly transmissible variant, more than a couple of weeks into the future or other factors that cause the model to be misspecified, and reflects similar findings from other forecasting efforts (7,17,57). If COVID-19 transmission dynamics become similar to those of seasonal influenza, sudden unexpected changes in transmission may become rarer, which would improve the performance of the model. The age-stratified PIT histograms in both Czechia and France show that calibration is especially difficult in younger age groups, where changes in case-finding strategy had a big impact that could not be fully captured by the testing covariates (58,59).
The model was built to provide accurate predictions of case and death incidence at a regional level. Our results show that generating reliable case forecasts several weeks ahead is challenging, even when using age-stratified local data. However, the accuracy of death forecasts was consistent throughout all the forecast horizons that were tested. The quality of the case and death forecasts was robust to the large reporting changes observed throughout the fitting period, and improved at the latest calibration dates where incidence was low in all three countries. This may be because local outbreaks in groups at risk in low-incidence settings are easier to forecast, rather than outbreaks where all regions are equally vulnerable. Forecasts past two weeks generated by the model appear to be overconfident, and underestimate the uncertainty in the potential level of transmissions. The alternative transmission scenarios in the Shiny App therefore help to illustrate the full range of variability in short-term regional transmission dynamics that is possible. The local risk of outbreaks and importations is also represented in the Shiny App, using the local predictors of the Endemic-Epidemic model. The forecasts, scenarios, and risk map constitute a reliable, thorough representation of the local risk of transmission.
The model fits and simulations highlight several features of COVID-19 dynamics common to all three countries: the risk of background importation of cases (i.e. new cases that were not linked to recent local infections) is always very strongly associated with the number of inhabitants in the region. Hence, more populated and urban areas are at higher risk of further importations of SARS-CoV-2 infections.. On the other hand, the risk of transmission is relatively homogeneous across regions, with no regions where the number of secondary cases expected is less than half the region most at risk (Figure 6B). This could change in future COVID-19 dynamics, where the local level of immunity may be sufficient to completely avoid transmission in certain regions, as is currently observed for pathogens such as measles (34). The vast majority of cases stem from the epidemic component of the Endemic-Epidemic model, showing that the local transmissibility is sufficient to maintain transmission, even without input from background importations.
Therefore, the removal of importations (e.g., by border closures) consistently resulted in a minimal impact on the expected number of case numbers up to four weeks ahead. In contrast, altering the transmission risk within the regions (e.g. via interventions targeted at specific age groups or encompassing the entire population) can substantially change the forecasted case dynamics. Even with strategies targeting a particular age group, we observe indirect effects on incidence in all age groups. We did not observe a strong effect of changes in transmission risk on the four-week-ahead death forecasts. However, we emphasise that we considered a three-week delay between cases and deaths and, hence, the impact of changes in transmission on deaths would be visible for longer-term death forecasts (beyond four weeks ahead).
Including other countries
The framework is currently implemented for France, Czechia and Italy, but other countries can be straightforwardly incorporated, provided the data required for making the subnational daily case and weekly death forecasts (namely NUTS-3-level daily case counts, subnational daily death counts, NUTS-3-/NUTS-2-level daily/weekly vaccinations administered by dose, and NUTS-3-/NUTS-2-/national-level daily/weekly numbers tested) for those countries is available and up-to-date. Full instructions for incorporating a new country are available in the public GitHub repository for the model code: https://github.com/EU-ECDC/BackendCovidApp.
Age stratification
Our comparison of forecasted and observed numbers of cases and deaths over all age groups for France and Czechia indicates that the non-age-stratified model has similar predictive performance in terms of overall cases/deaths as the age-stratified model (Supplementary Figures 6-9), albeit with slightly worse prediction of deaths (Supplementary Figure 9). We believe the loss in performance observed in death forecasts to be due to changes in the age structure of reported cases from changes in case detection. Indeed, active case finding may lead to milder cases being reported, while interruption of such a strategy would mean that a larger proportion of the cases are severe. Such changes are likely to be reflected in the age structure of the cases and such information is lost in the non-age-stratified version of the model.
The overall good performance of the non-age-stratified model is encouraging for the application of the model to other countries, very few of which report age-stratified case and death data. While we did not observe a significant improvement by age-stratifying the model, we emphasise that age-stratified data and models are crucial for evaluating the impact of targeted (age-specific) non-pharmaceutical and pharmaceutical interventions, as well as for health-economic analyses (e.g., the computation of DALYs).
Limitations
Our forecasting framework does have some limitations. First, as for all forecasting studies, the forecasts are only as reliable as the input data, and we cannot guarantee the accuracy of the data imported from public data sources, which may have reporting errors or biases that we are not able to account for. The framework will only work as long as subnational (and age-stratified, for France and Czechia) case and death data continues to be reported online in the same location and format as it is currently, but several countries have ceased reporting subnational case and death data or changed the format or location of their reporting since we started developing the framework. Changes in data availability could potentially be addressed via changing spatial resolution of the model, but this would involve substantial modification of the model structure. We only forecast reported cases, which reflect a combination of underlying incidence of infections and reporting, rather than the “true” number of cases, but these still provide a useful indication of changes in transmission. The model for the death forecasts, which uses an estimate of the recent CFR (with uncertainty) to predict deaths, is relatively simple and assumes a constant relationship between changes in case numbers and changes in the CFR over time. However, the calibration analysis shows it provides a relatively straightforward and reliable means of translating case forecasts into death forecasts.
The contact matrices used to fit the model for France and Czechia were taken from pre-pandemic studies, and may not reflect the contact patterns between age groups in 2020 and 2021. Ideally, time-varying matrices would have been used to estimate age-stratified contact, but such matrices were not available or straightforward to implement in the Epidemic-Endemic framework (60). The addition of an age-specific intercept in the epidemic component allowed the model to modify the risk of transmission in each group where the number of contacts was not in line with the case counts. However, since these coefficients were not time-dependent, the model considers this age-stratified risk of new cases to be constant (if all other covariates do not vary).
We do not include information on NPIs or population mobility in the covariates in the model despite the improvements in predictions these may yield, since centralised databases of these covariates (such as Google Mobility data (61), the Oxford Stringency Index (62), and the ECDC-JRC Response Measures Database (2)) are no longer being updated, and binary covariates made the model less consistent and comparable between countries, while potentially not being specific enough (i.e. they incorporated the impact of other parameters associated with transmission). Including more NPI covariates also led to issues with identifying the effects of different interventions as many were implemented at the same or overlapping times.
Generalising beyond COVID-19
The flexible framework developed in this paper could be used or readily adapted to model incidence of both novel and seasonal pathogens of public health importance, such as influenza, in order to predict local health burden and inform outbreak response. The Endemic-Epidemic framework underlying our model has already been applied to a variety of other pathogens including measles, cholera, leishmaniasis and pertussis (34,63–65). However, the number of model parameters that can be estimated directly depends on the amount of data available.
Therefore, in its current specification (with a large number of parameters), the model may not be suitable for the early stages of an outbreak (except if the pathogen is seasonal, with available data on previous outbreaks). However, it can be used to estimate the impact of various covariates (from different data sources) on transmission risk.
Data availability
All source data used in the study were openly available. The code used to develop the analysis and links to the data sources are available on Github (back-end code: https://github.com/EU-ECDC/BackendCovidApp, code for the RShiny App: https://github.com/EU-ECDC/RShinyCovidApp).
Funding
The work received funding from the European Centre for Disease Prevention and Control (ECDC) via the specific contract “Analysis of COVID-19 outbreak risk at subnational level in the vaccine era: development of an RShiny app” (REOP/2021/SMS/13060) under the framework contract “Mathematical Modelling & Economic Evaluation” (OJ-2019-SRS-10962). AR was supported by the National Institute for Health Research (NIHR200908), LACC was supported by Wellcome Trust (grant: 206250/Z/17/Z) and NIHR (NIHR200908). SF was supported by a Wellcome Trust Senior Research Fellowship in Basic Biomedical Science (210758/Z/18/Z), AJK was supported by a Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (206250/Z/17/Z).
Author’s contribution
AR, LACC, AJK and SF led on conceptualisation and methodology. AR and LACC were in charge of software, validation, formal analysis, data curation, visualisation, writing - original draft preparation, and writing - review and editing. BP, RN, FS, and RG were in charge of project administration, funding acquisition, and writing - review and editing. AJK and SF were responsible for supervision and writing - review and editing.
Acknowledgments
We acknowledge Dr Naomi Waterlow (London School of Hygiene and Tropical Medicine) and Dr Sophie Meakin (Epicentre, Paris) for their feedback on early versions of the RShiny App, which helped us clarify the presentation of the results. We also acknowledge Dr Johannes Bracher (Karlsruhe Institute of Technology) for helpful discussions regarding the hhh4addon and hhh4contacts R packages.