Abstract
We present an approach to extend the Endemic-Epidemic (EE) modelling framework for the analysis of infectious disease data. In its spatiotemporal application, spatial dependencies have originally been captured by a power law applied to static neighbourhood matrices. We propose to adjust these weight matrices over time to reflect changes in spatial connectivity between geographical units. We illustrate this extension by modelling the spread of coronavirus disease 2019 (COVID-19) between Swiss and bordering Italian regions in the first wave of the COVID-19 pandemic. We adjust the spatial weights with data describing the daily changes in population mobility patterns, and indicators of border closures describing the state of travel restrictions since the beginning of the pandemic. We use these time-dependent weights to fit an EE model to the region-stratified time series of new COVID-19 cases. We then adjust the weight matrices to reflect two counterfactual scenarios of border closures and draw counterfactual predictions based on these, to retrospectively assess the usefulness of border closures. We observed that predictions based on a scenario where no closure of the Swiss-Italian border occurred increased the number of cumulative cases in Switzerland by a factor of 2.5 over the study period. Conversely, a closure of the Swiss-Italian border two weeks earlier than implemented would have resulted in only a 12% decrease in the number of cases and merely delayed the epidemic spread by a couple weeks. Despite limitations in the current study, we believe it provides useful insight into modelling the effect of epidemic countermeasures on the spatiotemporal spread of COVID-19.
1. Introduction
The many types of uncertainty surrounding an ongoing emerging infectious disease outbreak with future endemic potential, such as coronavirus disease 2019 (COVID-19), motivates the use of statistical modelling to investigate current and future outbreaks of the disease. Not understanding the uncertainty of the true scale of the disease, e.g. through unreported cases, leads to difficulties in assessing the impact of required interventions [1, 2]. Multiple epidemic data sources provide valuable information on different aspects of such an outbreak, but require appropriate statistical techniques to incorporate the associated uncertainties.
One such statistical tool is the endemic-epidemic (EE) modelling framework, created for the analysis of infectious disease surveillance data [3]. It is a multivariate time series model that additively decomposes incidence into an endemic and an epidemic component. The epidemic component captures incidence driven by previous case counts, or force of infection, and the endemic component captures exogenous contributions to incidence (such as seasonal, socio-economic or demographic factors) [1, 3].
The best demonstration of the framework’s flexibility must be the wide range of infectious diseases it has been applied to. It can capture the incidence of infectious diseases of different types, with varying natural histories, vaccine preventability, and vector dynamics. In addition, the framework has already been widely used in analysis of COVID-19 outbreaks. Bekker-Nielsen Dunbar and Held [1] give an overview of the model’s various applications. This modelling approach can handle the heterogeneities and interdependencies in high-dimensional count time series and is useful for analysing infectious disease surveillance data which is stratified by geographic regions.
The EE framework was originally designed to fit weekly surveillance data. However, in light of the COVID-19 pandemic and the availability of daily case counts, the model has been extended to include higher-order lags. This means that the autoregressive process can be driven by a larger time-span of past case counts. In particular, this allows for the inclusion of infectiousness from the whole serial interval of COVID-19. Various epidemiological publications have used this feature in their analysis of COVID-19 incidence in Italy [4, 5], Germany [6], England [7], and the African continent [8] among others.
A key feature of the EE framework that will be discussed in the present paper is the possibility to include spatial dependencies, in addition to temporality. This allows for the modelling of transmission between regions over time. The spatial dependencies are included as weights in the autoregressive process of the epidemic component. The strength of spatial connectivity between regions, and the resulting transmissibility of the disease between them, can be formalised by the power law formulation introduced by Meyer and Held [9]. It reflects the decay of transmissibility between regions as the distance between them increases. Other groups have considered this approach in their endemic-epidemic modelling approaches [7, 8]. It has for instance been used by Ssentongo et al. [8] to model the effect of various socio-economic and meteorological factors, as well as containment strategies on the within- and between-country transmission in the African continent.
In this paper, we show that this approach can be extended with time-dependent weight matrices driving the epidemic component of the EE framework. We showcase how the static weights derived from the power law formulation can be adjusted over time to reflect the changing spatial relationship between geographical units as the mobility of the population shifts as a result of travel bans and border closures. This approach is of particular interest in the context of COVID-19, as in the first half of 2020 mobility-related social distancing interventions such as travel restrictions, border closures, and visit bans were imparted in many parts of the world, including within the Schengen area, in which the regions we consider here are contained. This was in spite of the World Health Organisation advising against the use of such measures at the time [10]. Mobility restrictions are only considered to work if protective sequestration is feasible, i.e. autochthonous transmission has not yet occurred in the territory and so only imported cases are of concern. In our work we adjust the weight matrices according to both within-region mobility (measured via smartphones) and between-region mobility (using information from the International Office for Migration). This provides a level of sophistication beyond simple mobility restriction/no mobility restriction indicators, as seen by e.g. Woo et al. [11] who used a similar model to investigate the spatiotemporal spread of COVID-19.
The possibility to adjust spatial weights over time gives us the opportunity to simulate different counter-factual scenarios, representing different timelines of pandemic countermeasures targeting population mobility. This in turn allows us to retrospectively evaluate the effect of border closures as a pandemic countermeasure. We can thereby provide evidence to improve public health response and aid in decisions on optimal infectious disease control strategies, particularly when to initiate travel restrictions.
2. Methods
2.1. Methodological framework
We examined the role of a border closure between Italy and Switzerland on the spatiotemporal spread of COVID-19 through the following steps:
Fit an EE model to the region-stratified time series of number of confirmed new COVID-19 cases from 24th February 2020 until 4th August 2020;
Retrospectively predict from the fitted model, given data until 2nd March 2020, under scenarios:
Closure of the Swiss-Italian border on 17th March 2020, i.e. as really implemented (baseline scenario)
No closure of the Swiss-Italian border (counterfactual scenario A)
Closure of the Swiss-Italian border on 3rd March 2020, two weeks earlier than in the baseline scenario (counterfactual scenario B);
Determine the total and region-specific number of cases in Switzerland based on the baseline and counterfactual scenarios from step 2; and
Examine the difference in total and region-specific incidence in Switzerland, to determine the usefulness of a border closure.
To fit our model, we included data from 24th February 2020, as cases have been recorded in Switzerland and Italy after that date, until 4th August 2020, as we only wanted to capture the dynamics of the first wave of the COVID-19 epidemic in Switzerland. The cutoff date of 2nd March 2020 for the (counterfactual) predictions was chosen as the date after which the three scenarios might diverge.
Costantino et al. [12] consider similar scenarios when evaluating the effect of travel bans on COVID-19 spread in Australia.
2.2. Data
We were interested in analysing the COVID-19 spread in Switzerland and in the neighbouring regions in Italy. We considered data from the regions on the second level of the Nomenclature of Territorial Units for Statistics (NUTS-2). We included the seven Swiss regions, and the four bordering Italian regions, as listed in Table 1. A map of these eleven regions with their respective NUTS-2 code can be found in the supplementary materials.
For each of these regions, we considered daily case data, population data, daily or weekly testing data, daily temperature data, data on public holidays, and data on border closures and population movement. Information on data sources and manipulations is available in the supplementary materials.
2.3. Model
We considered the EE modelling framework. Given daily disease surveillance case counts Yrt indexed by NUTS-2 region (r) and day (t), the region-stratified EE model was given by [13, 14] where ψr is a region-specific overdispersion parameter, er are population fractions, ⌊.⌋ denotes normalised weights, ⌊ul⌋ is the serial interval distribution, wrr′,t − l is the transmission between regions r and r′ at day t − l (see Section 2.4), p is the maximum lag, and νrt and ϕrt are log-linear predictors. The ul are shifted Poisson weights as recommended by Bracher and Held [14] for daily disease counts: where κ is an unknown weighting parameter estimated from the data. We considered a maximum lag of p = 7
To determine optimal model specification, different model formulations were compared using model selection criteria. Here we used the Bayesian information criterion (BIC). In the final model, we included an indicator for weekends and country-specific public holidays, as medical facility clinic and laboratory operating hours may be reduced and population health-seeking activities may differ on weekends or public holidays compared with weekdays. We also included a country indicator, as we expected to see differences in endemicity between regions located in Switzerland and regions located in Italy. Additionally, we included indicators for each Swiss region bordering Italy (CH01, CH05, and CH07), as potential differences in underreporting between Italy and Switzerland would make it harder to capture the whole spatiotemporal spread by the power law alone. We also included country-specific (log-transformed) testing rates, as higher testing rates were expected to increase the number of cases. In the epidemic component, we further included a gravity model to reflect how more populated regions attract more imported cases [9, 15]. As a proxy for urbanisation, we included the log-population of the largest city in each region [16, 17]. We included log-temperature and yearly seasonality in our model, as we expected some form of seasonality in transmission related to seasonal changes in human behaviour. Finally, we included a simple linear time trend. All continuous covariates were centred.
The endemic predictor in our model was given by: and the epidemic predictor was given by: where α are overall intercepts, Tj(r)t is the country-specific testing rate at time t, Crt is the region-specific temperature at time t, Dr is the population of the largest city in region r, Pr is the population of r, and j(r) describes the country in which region r is located and takes values “Switzerland” or “Italy”. The predictor describes the gravity model [15].
2.4. Time-dependent matrix of neighbourhood order
To represent the spatial relationship between regions, driving the transmission between regions, as it changes over time a time-dependent matrix Wt of neighbourhood order was included in our model. We started from a static matrix of neighbourhood order O (see supplementary materials) and adjusted it to reflect the movement patterns and border closures between regions over time. We proceeded as follows:
To reflect the decreasing propensity of COVID-19 to spread between regions, as the distance between these regions increases, we implemented a power law model, as described in Meyer and Held [9]. Concretely, each entry of the matrix equals the inverse of the neighbourhood order, scaled by a decay parameter d. That is, the matrix entries are given by: We set the parameter d = 1.6 [9]. This parameter is scale-free; it is independent of the units of distance [18]. It can thus be applied to neighbourhood order as a metric of distance. All further manipulations were not symmetric anymore, as movement between regions is not expected to be either. From here on, matrix rows represent “travel from” and columns represent “travel to”.
The Facebook2 Mobility data [19] allowed us to perform a daily adjustment of the matrix to reflect the change in population mobility. The time-dependent entry is given by: where fr,t is the value of the Facebook movement change variable for region r at time t (see supplementary materials and [20]). The Facebook metric reports change in the quantity of movement from inhabitants of a region r and not change in movement direction. Therefore we could not directly infer the impact this change in movement has for the travel between r and each r′ individually. We assumed that it is uniform across all r′ and therefore adjusted whole matrix rows with the Facebook metric.
International Organisation of Migration (IOM) data allowed us to further adjust the time-dependent matrix to reflect border closures. The time-dependent entry is given by: where brr′,t is the state of restrictions to travel from region r to region r′, at time t. The IOM metric is qualitative and reports different border states as “No restrictions”, “Partial restrictions”, “Total restrictions”, and “No official restrictions reported”. We quantified these different border states as follows:
The time-dependent matrices are visualised in Figure 1 for the 1st day of each month, after each step of adjustment.
It should be noted that the ordering of regions in the matrices is arbitrary.
2.5. Counterfactual prediction of case counts
To examine the effect of border closures in place, we adjusted the neighbourhood matrices according to the three scenarios described under Section 2.1. For counterfactual scenario A, we implemented no border closures, which means that we only adjusted the matrix with the time-dependent Facebook metric fr,t. For counterfactual scenario B, we adjusted the matrix with fr,t and, additionally to the border closures implemented in the baseline scenario, we implemented travel restrictions from Italy to Switzerland (brr′,t = 0.1) between 3rd March 2020 and 16th March 2020.
We obtained parameter estimates by fitting the EE model described in Section 2.3 to the region-stratified time series of confirmed COVID-19 cases, based on the time-varying matrix of neighbourhood order from the baseline scenario. Then, given data from 24th February 2020 to 2nd March 2020, we predicted from the model, based on the weight matrices adjusted for counterfactual scenarios A and B. We assessed both predictions to determine the usefulness of border closures, reflected in both the total number of cases and the cases by region.
To obtain predictions based on our EE model, we used the predictive moments approach [13, 21]. To compare the predicted scenarios, we considered the relative increase between each counterfactual scenario (A and B) and the baseline scenario, and thus calculated where C is the cumulative (total or region-specific) number of cases under the counterfactual scenario at hand, and c is the cumulative (total or region-specific) number of cases under the baseline scenario on 4th August 2020 (end of the study period).
2.6. Implementation
This work used the EE modelling framework and associated hhh4 package ecosystem, in particular the R (version 4.0.2) packages surveillance (version 1.19.1), and hhh4addon (version 0.0.0.0.9013).
3. Results
We fitted an EE model to the region-stratified time series of confirmed COVID-19 cases between 24th February 2020 and 4th August 2020 and the according covariates as described in Section 2.
It includes a weekend and (national) public holiday indicator, a country indicator, an indicator for the Swiss border regions, testing rates, temperature, population, population of the largest city, a time trend and seasonality. The coefficients of the model and the corresponding standard errors are shown in Table 2, and the fitted values are depicted in Figure 2. As is evident from Figure 2, the model fit depends on time. In particular in the first months, we see a slight underestimation of cases in most regions. It is interesting to note the different proportions of endemic to epidemic incidence in the different regions. We see the highest share of endemic incidence in Italy and Ticino (CH07). Given that the first COVID-19 cases in Switzerland were recorded in Ticino, and imported from Italy [22], this makes sense. The incidence in the French and German speaking regions of Switzerland can primarily be attributed to epidemic spread.
The estimated lag distribution can be found in Table 3, and the corresponding plot in the supplementary materials. Our findings are consistent with the estimates obtained by Ssentongo et al. [8].
Given this model, data from the 8 days between 24th February 2020 and 2nd March 2020, and the counterfactual weight matrices described in Section 2.5, we predicted total and region-specific cumulative cases until 4th August 2020 under each scenario. The results are summarised in Table 4. To illustrate prediction accuracy, we plot a comparison of the daily incidence predicted under the baseline scenario, and observed incidence over the study period in the supplementary materials. Plots of the daily difference and ratio of new and cumulative cases between the counterfactual scenarios and the baseline scenario are available in the supplementary materials for the complete study period.
Under counterfactual scenario A, i.e. no border closure, we see a 1.6-fold increase in cases compared to baseline, with most excess cases happening in the first months. The peak number of excess cases compared to baseline would have been attained in early April with around 2000 additional daily cases (see supplementary materials for details). Overall, this scenario would have caused 100067 extra cases between 3rd March 2020 and 4th August 2020, with 54708 cases in Switzerland and 45359 in the bordering Italian regions.
Under counterfactual scenario B, i.e. earlier border closure, we see only a very small relative decrease overall. This would primarily have affected the early months, with a peak reduction of daily cases (c.a. − 450) around mid-March. From April onwards, no difference in daily cases would have been observed between scenario B and baseline (see supplementary materials for details). As we only implemented an internal border closure in Switzerland, i.e. only restricting entry in Switzerland from Italy, this decrease in cases naturally only affects the spread in Switzerland. Overall, this scenario would have prevented 4299 cases between 3rd March 2020 and 4th August 2020, of which the vast majority (4122) in Switzerland.
Figure 3 shows a timeline of predicted cases under each scenario for each of the regions. It is apparent that scenario A (blue) would have cause a marked increase in daily cases throughout the study period in all Swiss regions. The marginal effect of an earlier closure of borders (scenario B, yellow) can be clearly seen from the plot, as it remains barely distinguishable from the baseline scenario (pink) throughout the first wave.
The results seem to support the baseline border closure scenario. Without the measures implemented, we would expect to have seen a tremendous rise in cases, whereas closing the border closures earlier would have merely delayed the spread of infection in Switzerland, and would overall have prevented only around 4300 cases.
4. Discussion
We highlighted the use of time-dependent weight matrices reflecting changes in spatial connectivity, to extend the traditional spatiotemporal EE framework. It is a particularly useful approach in the context of COVID-19, where pandemic countermeasures have targeted the spatial relations between geographical units with the implementation of border closures and travel restrictions, and where many data sources on patterns of population mobility have become available.
Some limitations in the present paper need to be addressed to draw unbiased conclusions from our results. The foundation of the spatial relationship between geographical units in this study is the power law application, upon which all further adjustments rely. It is therefore crucial that it reflects the spatial connectivity correctly. In this context, it would have been important that the corresponding decay parameter d be estimated from the data [13]. However, this parameter estimation is not (yet) compatible with the R-implementation of the higher-order lags ul, which are also crucial in the study of COVID-19. In this study, we chose to use a fixed parameter motivated by existing studies [9, 18], but we recognise that this could have weakened our time-dependent weights. Further work could examine how to facilitate harmonising the two approaches. Similarly, the implementation of the IOM metric in this study might need appropriate modification. As mentioned in the methods, the IOM metric is qualitative, and the way we chose to quantify the indicators could be disputed. Indeed, we have chosen to weigh “total restrictions” as 0.1, meaning that the strength of the spatial relationship between regions separated by a border would be weakened by a factor of 10 upon closure of the latter. Whether this accurately reflects reality remains to be determined.
It is also crucial that we address model fit, which naturally influences the subsequent predictions. In initial fittings of the EE model, we observed a particular underestimation of cases in the first wave for the regions CH01, CH05 and CH07, the three regions bordering Italy. Ideally, region-specific intercepts would have been implemented to address this, but this led to model divergence and thus unreliable parameter estimations. In this paper, we chose to address this by fitting a model with region-specific overdispersion, which significantly improved overall fit, and a specific indicator for the border regions, which improved fit in those regions. However, in the present formulation of the model, we still see an underestimation of cases in most regions in the first months. This leads us to conclude that there is more spatiotemporal spread happening than what can be captured by the power law alone. We suspect that this is due to differing levels of underreporting, particularly in the first wave, between Italy and Switzerland but also between individual regions. In light of these limitations, we recommend that further studies take underreporting and underascertainment of case counts into consideration. Simple multiplication factors can be used [23] as a first-order approach.
Moreover, a full picture of the border situation of Switzerland would be beneficial to gain precision, as the spatiotemporal spread of COVID-19 in Switzerland arguably can not only be explained by importation of cases from Italy and endemic factors. We intend to extend our work to additionally take the neighbouring regions in France, Germany, and Austria into consideration. Finally, the EE framework can be adjusted to incorporate pharmaceutical countermeasures such as vaccines [24]; an obvious direction to pursue were a similar analysis to be conducted for 2021.
All in all, we do believe that the present paper, extending the EE modelling approach with time-dependent spatial weights, provides useful insight into the analysis of the spatiotemporal spread of COVID-19 and can be an important stepping stone for further research informing decision-making around COVID-19 counter-measures.
5. Conclusion
We used time-dependent matrices of neighbourhood order between regions, adjusted through power laws, mobility data, and border status indicators, allowing us to explore spatial interventions’ impact on early cases of COVID-19 in Switzerland and Italy. This approach has allowed us to draw counterfactual predictions based on weight matrices adjusted for counterfactual scenarios of border closures. We have observed that predictions based on counterfactual scenario A, where no closure of the Swiss-Italian border occurred over the study period, would have almost doubled the cumulative number of cases over the study period. Additionally, based on counterfactual scenario B, where closure of the Swiss-Italian border occurred two weeks earlier than implemented in reality, predictions showed only marginally reduced numbers of cases and an epidemic spread merely delayed by a couple weeks, which is in line with the WHO’s statement [10].
We are unable to determine what an acceptable increase in cases would be as we are not policy makers, and as a result have not considered cut-off criteria or hypothesis tests. However, the total number of cases in the first wave of the pandemic almost doubled under the scenario where no border closures occurred.
Data Availability
The code and data used for this article are available on the GitLab repository linked below. Further description of the data availability is available in the supplementary materials.
Credit taxonomy
Mathilde Grimée: methodology, software, formal analysis, visualisation, and writing
Maria Becker-Nielsen Dunbar: conceptualisation, methodology, software, funding acquisition, writing, project administration, and supervision
Felix Hofmann: methodology, software, data curation, and visualisation
Leonhard Held: conceptualisation, methodology, funding acquisition, writing, project administration, and supervision
Funding
This work was supported by the Swiss National Science Foundation [grant number 196247, https://data.snf.ch/covid-19/snsf/196247].
Acknowledgements
This work would not be possible without the financial support of the Swiss National Science Foundation (https://data.snf.ch/covid-19/snsf/196247). They did not play any role in the formulation of this manuscript and the views expressed in this manuscript are those of the authors. The additional members of the SUSPend modelling consortium (in anti-alphabetical order by surname) are: Jon Wakefield, Sebastian Meyer, Niel Hens, Deborah Chiavi, and Johannes Bracher (https://suspend.pages.switch.ch/project). Some of the authors are residents of the regions considered in this analysis. The authors declare no further conflicts of interest.
Footnotes
Mistake in the results reporting in the abstract is corrected.
↵2 Our use of data provided by Facebook is not to be seen as an endorsement of Facebook as a company.