Abstract
Early warning indicators based on critical slowing down have been suggested as a model-independent and low-cost tool to anticipate the (re)emergence of infectious diseases. We studied whether such indicators could reliably have anticipated the second COVID-19 wave in European countries. Contrary to theoretical predictions, we found that characteristic early warning indicators generally decreased rather than increased prior to the second wave. A model explains this unexpected finding as a result of transient dynamics and the multiple time scales of relaxation during a non-stationary epidemic. Particularly, if an epidemic that seems initially contained after a first wave does not fully settle to its new quasi-equilibrium prior to changing circumstances or conditions that force a second wave, then indicators will show a decreasing rather than an increasing trend as a result of the persistent transient trajectory of the first wave. Our simulations show that this lack of time scale separation was to be expected during the second European epidemic wave of COVID-19. Overall, our results emphasize that the theory of critical slowing down applies only when the external forcing of the system across a critical point is slow relative to the internal system dynamics.
1 Introduction
Forecasting the (re)emergence of infectious diseases is of great importance to public health (George et al., 2019; Heesterbeek et al., 2015; Morens & Fauci, 2013; Morens et al., 2004; Reich et al., 2019; Viboud et al., 2018). In recent years, early warning indicators based on the phenomenon of critical slowing down have been suggested as a way to anticipate transitions in a wide range of dynamical systems (for overviews, see e.g., Dablander et al., 2020; Drake et al., 2019; Drake et al., 2020; Lenton, 2011; Scheffer et al., 2009; Scheffer et al., 2015). Critical slowing down describes the phenomenon that many systems, as they approach their critical point, return more slowly to their equilibrium after small external perturbations, resulting in an increase in statistics such as the local autocorrelation coefficient and variance (Drake & Griffen, 2010; Wissel, 1984). In standard models of infectious disease transmission, major outbreaks are possible when the effective reproductive number, Rt, is greater than one. The threshold Rt = 1 corresponds to a (dynamic) transcritical bifurcation, which is a type of bifurcation that is preceded by critical slowing down (Kéfi et al., 2014; Kuehn, 2011). Early warning indicators based on critical slowing down have been studied extensively and led to a promising research line that aims to utilize them as a tool to forecast the (re)emergence as well as the elimination of infectious diseases (e.g., Brett et al., 2020; Brett et al., 2017; Brett et al., 2018; Brett & Rohani, 2020; Dessavre et al., 2019; Dibble et al., 2016; Drake et al., 2019; Harris et al., 2020; Miller et al., 2017; O’Dea et al., 2018; O’Regan & Burton, 2018; O’Regan & Drake, 2013; O’Regan et al., 2020; Southall et al., 2020).
In light of this prior research, it seems natural to ask whether early warning indicators based on critical slowing down could have allowed us to anticipate the second COVID-19 wave (e.g., O’Brien & Clements, 2021; Proverbio et al., 2021) and if not, how this can be understood. Here, we question the applicability of early warning indicators in the COVID-19 context, because the COVID-19 situation violates a key assumption of the theory of critical slowing down: a separation of time scales such that the dynamics of the epidemic settle down to a quasi-equilibrium from which there is a slow drift toward the critical point. To our knowledge, there is presently no theory that would indicate whether early warning signals, under such commensurate time scales, can be expected to be reliable. In this paper, we report on a combination of empirical analysis and simulation studies to investigate this issue. Focusing on Europe, we find that a suite of early warning indicators did not reliably rise prior to the second wave in any country as the classical theory of critical slowing down would predict. Using a simulation study that mimics the COVID-19 situation — a first outbreak closely followed by a second one — we show that this contradictory result can be fully explained by the fact that, in the case of COVID-19, in almost all countries Rt already began to creep up again before the number of case reports stabilized at a low value after the first wave. These results indicate that caution is warranted in applying early warning indicators to highly non-stationary settings, such as multi-wave epidemics.
2 Early warning signals for COVID-19
In this section, we quantify the extent to which early warning indicators increased prior to the second wave in a number of European countries.1 We outline our methodology aided by Figure 1 in Section 2.1, and report our results in Section 2.2.
2.1 Methods
Estimation of Rt
To identify the time at which the COVID-19 epidemic became supercritical for the second time in each country, we followed Gostic et al. (2020) to estimate the instantaneous Rt using the method of Abbott et al. (2020), which improves upon Cori et al. (2013). The method simultaneously estimates the incidence of infections and Rt using Bayesian latent variable modeling. The method proceeds in two steps. First, the incidence at each time step is estimated by convolving the previous number of infections with a probability distribution for the generation interval. This incidence is then convolved over an uncertain incubation period and reporting delay distribution to yield the reported cases (for details, see Abbott et al., 2020). We applied this method to a broad range of European countries using data from March to October 2020.
Selecting the time period between waves
Next, we selected a time period in which to search for evidence of critical slowing down. Early warning indicators are sensitive to changes in the effective reproductive number, and should rise prior to the critical point Rt = 1 (Drake et al., 2019; O’Regan & Drake, 2013). Using our country-specific estimate for Rt, we defined the start and end date of the time-series on which we computed the early warning indicators as follows. We chose as start date the date at which Rt is at its lowest point before reaching Rt = 1 prior to the second wave. Similarly, we chose as end date the date at which Rt is at its maximum (before going down again) after it crosses Rt > 1. Panel (a) in Figure 1 illustrates this selection procedure on a simulated example, with the black line showing Rt and the vertical blue lines indicating its respective minimum and maximum after the first wave receded. We chose this criterion for two reasons. First, after Rt drops below 1, it continues to decrease in all European countries, and we would thus expect early warning indicators to fall, rather than rise. Panel (a) in Figure 1 shows a characteristic bifurcation delay (see also Section 2.3) that describes that cases lag behind the equilibrium value consistent with Rt < 1. Choosing for the starting date the time of the minimum value of Rt before Rt rises again allows the system to come closer to its new equilibrium value. Similarly, choosing to end the interval with the maximum of Rt after it crosses the threshold should predispose the analysis toward detecting early warning indicators because it yields the greatest possible length of the time-series and because of the bifurcation delay (see Brett et al., 2017; Dibble et al., 2016, and Section 2.3). Overall, our selection criterion is biased in favor of detecting critical slowing down.
Figure 2 shows the reported (gray) and estimated true number of cases (black) across European countries, with vertical blue lines indicating the segment of the time-series for which we calculated early warning indicators. Figures 6-10 in Appendix A provide a more detailed picture, showing European countries together with their estimated effective reproduction numbers.
Detrending and estimation of early warning indicators
As illustrated in Panel (b) and (c) in Figure 1, we detrended the time segment of interest and then estimated early warning indicators using backwards rolling windows with a uniform kernel (i.e., equally weighted past observations) and window sizes δ1 and δ2, respectively. A backward rolling window only uses data from the past to estimate the current value of a particular statistic. For example, to estimate the mean at time point t, we calculate: where yj is the number of reported cases at a particular time point j (see black line in Panel (b) in Figure 1, for an example). Other early warning indicators we studied were variance, coefficient of variation, index of dispersion, autocovariance, autocorrelation, decay time, skewness, kurtosis, and first differenced variance (for mathematical definitions, see Brett et al., 2018, Table 3). All of these indicators require an estimate of the mean, and so we first estimated the mean and then estimated the particular early warning indicator using a rolling window size of δ2. For example, the variance at time point t, which is shown in Panel (c) in Figure 1, is calculated as: We conducted sensitivity analyses with rolling windows of size δ1 ∈ [2, 4, …, 18, 20] for detrending and rolling windows of size δ2 ∈ [5, 10, …, 45, 50] for indicator estimation using the R package spaero (O’Dea, 2016). A window size of 10, for example, means that the previous ten data points are being used to compute the statistic at the current time point. To create a sampling distribution under the null hypothesis of no increase in the early warning indicators that respects the temporal ordering of the data, we fitted an ARMA(p, q) model to the country-specific data. We selected the best fitting model using AIC and subsequently generated 500 surrogate time-series from it, computed the early warning indicators as outlined above, and estimated their rank correlation with time (Kendall’s τ). This resulted in the sampling distribution under the null assumption of stationarity, which allowed us to test the actually observed Kendall’s τ against a significance level α (Dakos et al., 2012).
2.2 Results
Figure 3 reports results for European countries for δ1 = 4 and δ2 = 25. It shows the value of Kendall’s τ across all early warning indicators, coloring in red the countries for which τ was either significantly smaller or significantly larger than values generated from the best-fitting country-specific ARMA(p, q) at α = 0.05. Notably, many countries displayed a significant decrease in the mean, with some showing a decreases in the variance, autocovariance, autocorrelation, and decay time. Several countries exhibited a significant increase in the coefficient of variation, which is given by the standard deviation divided by the mean, and in the dispersion index, which is given by the variance divided by the mean. Hence, early warning indicators that were found to display notable signal across a number of countries are the mean, variance, or combinations thereof. Figures 11-20 in Appendix B show sensitivity analyses for the ten early warning indicators across different rolling window sizes for detrending and estimation, indicating that the pattern shown in Figure 3 is robust to different choices of these hyperparameters.
Table 1 shows the number of significantly rising or falling early warning indicators, the length of the selected time-series, the start of the second wave, and the respective posterior mean for Rt. From theory we expect all early warning indicators to rise except the coefficient of variation (Brett et al., 2018), yet we find that half of the indicators show a tendency to fall instead.
We conducted simulations to investigate possible reasons that could underlie the poor performance of early warning indicators to anticipate the second COVID-19 wave. In what follows, we first illustrate how early warning indicators perform under ideal conditions, and then relax the separation of time scales to quantify the erosion in performance.
2.3 Model
We illustrate early warning indicators in the context of a first outbreak that is closely followed by a second one by simulating from a stochastic SEIR model calibrated to COVID-19 using the pomp R package (King et al., 2016). In particular, let S(t), E(t), I(t), R(t) denote the number of individuals in the susceptible, exposed, infectious, and recovered compartment at time point t, respectively, and let ΔNS→E, ΔNE→I, and ΔNI→R denote the number of individuals that have transitioned from one compartment to another during the time interval [t, t + Δt]. The model is updated according to where we assume an average incubation and infectious period of 1/σ = 5.2 days (Li et al., 2020) and 1/γ = 10 days (CDC, 2021). The force of infection is given by where η(t) is the sparking rate, which we assume to be 0 until day 50, from which point onward cases are imported with a rate of η = 1/50,000. Our goal here is not to produce a simulation model that accurately tracks the COVID-19 outbreak, but instead to investigate critical slowing down in a standardized system that we understand well. To do so, we wish to force Rt to create a multi-wave epidemic. We achieve this by changing β(t) accordingly, compensating for the depletion of the susceptible population by multiplying with S0/S(t) at time point t. Lastly, we assume that each infected person is reported without delay.
To illustrate the phenomenon of critical slowing down under ideal conditions, we start with 10, 000 infected persons out of N = 1, 000, 000 and R0 = 3. This results in a first outbreak, which is rapidly brought down through control measures that we model as bringing Rt down to 0.50 within 25 days. We then force Rt to remain at this low value for 200 days, and then allow it to rise linearly to Rt = 1, forcing a second wave. This simulation mimics the situation at the start of the pandemic where the first outbreak caught countries by surprise and lockdown was the key mitigation measure that substantially reduced the effective reproductive number. In the illustration, mitigation measures are maintained for a long period of time. However, in reality mitigation measures were slowly relaxed towards the summer, and with no vaccination in place together with imported infections and increased mixing, the system could not reach a disease-free equilibrium and the reproductive number increased again. This led to a second outbreak in the fall of 2020 in virtually all European countries. Our simple model adequately describes this general pattern as shown in Figure 4a. In particular, the left column in Figure 4a shows the two waves of transmission and their associated early warning indicators, while the right panels in Figure 4a show a similar situation except that no second outbreak occurs. In contrast to the situation with a second wave, variance and autocorrelation do not rise in this case. This illustration demonstrates that under these conditions a second epidemic wave can be anticipated using nonparametric early warning indicators.
It is known that epidemiological systems can experience a bifurcation delay, which describes the transient trajectory of an epidemic as its attracting equilibrium changes. One consequence of bifurcation delays is that the time for a large outbreak to settle to its equilibrium even after crossing Rt > 1 can be considerable. Dibble et al. (2016) studied bifurcation delays for disease emergence, and Figure 4 indeed shows that it takes a while for the system to show a significant rise in cases even after Rt > 1 (see Hungary in Figure 2, for a possible example with regards to COVID-19). As can be seen in Figure 4, a bifurcation delay also occurs for disease elimination. In particular, for Rt < 1 the disease is not endemic and the stable equilibrium consists of a number of new cases that depends on the rate of at which cases are imported. There is, however, a substantial delay between the point at which Rt < 1 for the first time and a low number of newly reported cases. This means that early warning indicators computed immediately from the period after Rt first declines to less than 1, are tracking a transient far from equilibrium and thus do not provide information about the return rate to equilibrium from small perturbations, i.e. the phenomenon of critical slowing down.
To understand the extent to which this bifurcation delay may influence the performance of early warning indicators, we decreased the time interval for which Rt = 0.50 from 200 days (Figure 4a) to 50 days (Figure 4b). We find that both the variance and autocorrelation first increase and then decrease in the case of both a second outbreak (left panels) and in the case of no second outbreak (right panels). The variance then rises slightly prior to the second wave, a pattern that does not occur for the autocorrelation, nor for the indicators in case of no second wave. This pattern hints at the fact that the bifurcation delay at elimination will interfere with the detection of critical slowing down if the system is not allowed to settle to its new equilibrium because the magnitude of the transient is commensurate with (or larger than) the magnitude of the fluctuations.
2.4 Simulation setup
We conducted additional simulations to systematically assess the extent to which these patterns impact the performance of early warning indicators. The forcing of Rt in the previous illustrations depends on five parameters: the value of R0; the value of the lowest point Rt reaches; the time it takes Rt to reach it; the time for which Rt stays at the lowest point; and the time it takes Rt to reach criticality again. We again assume that R0 = 3 and that it takes the system 25 days to reach its lowest point of Rt = 0.50, but we vary the number of days for which Rt is held constant to be t1 ∈ [25, 50, 100, 200] and the time it takes the system to reach Rt = 1 to be t2 ∈[25, 30,…, 95, 200]. For comparison, we also simulate from a system that stays at Rt = 0.50 and does not exhibit a second outbreak. We match the length of the time-series on which we compute early warning indicators (t2) in case of no outbreak to when an outbreak does occur. As before, backwards rolling windows with a uniform kernel were used for detrending and nonparametric estimation of the mean, variance, coefficient of variation, index of dispersion, skewness, kurtosis, autocovariance, autocorrelation, decay time, and first differences in variance. We used rolling windows a tenth the size of the duration for which Rt stays constant; that is, for t1 ∈ [25, 50, 100, 200] we used rolling windows of sizes 3, 5, 10, and 20, respectively. For indicator estimation, we used rolling window sizes of 50, using the R package spaero (O’Dea, 2016). We simulated 500 trajectories for each setting and calculated the area under the curve (AUC), a measure of classification performance, for all indicators. For each indicator, we calculated its rank correlation with time (Kendall’s τ), which indicates whether the early warning indicators rise or fall prior to reaching the critical point. The AUC can then be estimated as the probability that τtest is larger than τnull (Brett et al., 2018; Flach, 2016). A value of | AUC −½ | = 0 indicates chance performance, with AUC < ½ and AUC > ½ indicating a fall or rise in indicators prior to criticality, respectively. Theory predicts a pre-critical increase of all early warning indicators except the coefficient of variation (Brett et al., 2017; Brett et al., 2018). In addition to AUC, which requires comparing the indicator trend in the case of a second outbreak to the case of no second outbreak, we also use the method proposed by Dakos et al. (2012) and outlined in Section 2.1 to ascertain whether an indicator rises significantly. This more closely mimics the real-word situation where we do not have access to the counterfactual situation in which no outbreak occurred. We report the true positive rate (TPR), that is, the proportion of times we find p < α for each indicator and condition, using α = 0.05.
2.5 Simulation results
Figure 5a shows that the performance of early warning indicators improves with the time it takes the epidemic to reach a second critical wave. For the case for which the system stays for 200 days at Rt = 0.50 (top panel of Figure 5a), we find that all indicators except the kurtosis and the index of dispersion performed well, with the mean and the variance performing best. The coefficient of variation, given by the ratio of the standard deviation to the mean, decreases prior to criticality, indicating that the mean rises more quickly than the standard deviation. Most early warning indicators perform worse when Rt = 0.50 for 100 days, yet the mean and variance still perform well overall. Interestingly, the slight decrease in performance in the variance implies a stronger decrease of the coefficient of variation and the index of dispersion especially when the system is forced more quickly (i.e., t2 < 125).
For a period during which Rt = 0.50 of 50 days, the performance of the variance decreases, leading to an increasingly strong decrease in the coefficient of variation and the index of dispersion. When forcing is rapid (i.e., t2 < 100), the autocovariance, autocorrelation, and decay time also begin to show a downward trend (AUC < ½) prior to reaching the critical point. These trends are exacerbated when the system stays at Rt = 0.50 for only 25 days. One may think that the simulation shows the reverse pattern than the empirical analysis, summarized in Figure 3, because the mean and variance show a positive AUC (hence they increase compared to the null simulation) while the mean and variance show a decrease in the empirical analysis. There is no contradiction, however, because the mean and variance do in fact decrease in case of a second wave, it is just that they decrease less compared to when there is no second wave, as can be seen in Figure 4b.
In the data, the median time for countries to go from their minimum Rt value after the first crossing to their maximum Rt value after the crossing was 42 days. Figures 6-10 further show that Rt basically never stays at a low constant value for a sustained period of time, but is forced immediately towards the critical point. Under the most realistic scenario in our simulation study (t1 = 25 and t2 < 50), many indicators perform poorly, yet we still find excellent performance of a rising mean and excellent performance of a falling coefficient of variation and index of dispersion. This does not imply, however, that they will lead to reliable warnings in practice. While we can quantify discriminatory power using AUC in simulations, in practice early warning indicators have to be calibrated. Figure 5b shows that testing for an indicator increase at α = 0.05 based on a stationary null distribution created by using the best-fitting ARMA(p, q) model to the time-series under consideration is poorly calibrated, leading to an extremely poor true positive rate which mirrors the empirical results in Section 2.2. This is because the distribution of Kendall’s τ under the stationary model is centered around zero, while the actually observed Kendall’s τ is negative. As a result, hypothesis tests for an increase in indicator values are expected to suffer from extremely low statistical power in realistic situations. This problem may be exacerbated by a potentially poor fit of the model used to create the null distribution.
3 Discussion
Early warning signals based on the phenomenon of critical slowing have been suggested as a way to anticipate transitions in a wide range of dynamical systems, including the (re)emergence of infectious diseases. We analyzed whether a suite of indicators could have given early warning of the second COVID-19 wave in European countries. We found that the majority of indicators did not rise reliably, instead showing a pronounced decrease, a finding inconsistent with previous applications of the theory of critical slowing down. To understand this pattern, we conducted a simulation study in which we varied the time that is available for the system to settle at its new equilibrium after a first outbreak, as well as the speed with which a second wave is forced. We analyzed the performance of early warning indicators using the area under the curve to quantify classification performance and the true positive rate, using the same methodology with which we analyzed the empirical data. We found that classification performance suffered when the system had too little time to settle to its new (quasi-)equilibrium and the second wave is forced quickly (due to changing conditions in the population, such as reduced adherence to control measures), as we saw in the empirical data. Yet we also found that some indicators, such as the mean, continued to perform well (in terms of AUC) in contrast to what we observed in the empirical analysis. Using the same methodology as in the empirical analysis, however, we found a true positive rate of close to zero when testing for an increase in indicators, except for the coefficient of variation and index of dispersion, which is in line with our empirical results.
Our analyses suggest the following conclusions. First, violating a key assumption of early warning indicators based on critical slowing down — namely that the driver (Rt) changes slowly compared to the time it takes the system to return to its equilibrium after small external perturbations — dramatically reduces their performance. While this may be expected from theory, our analyses underscore this point and show that early warning indicators cannot be used to anticipate future outbreaks that are quickly forced after an initial wave. Second, as a consequence of the fact that the system is not allowed enough time to settle at its new stable equilibrium after an initial outbreak, the first part of the data used for early warning indicator estimation constitutes a transient. Hence there is a bifurcation delay not only after Rt crosses one from below, as previously observed and studied (e.g., Dibble et al., 2016), but also after Rt crosses one from above. If this transient is incorporated in the indicator estimation, then indicators will show a pronounced decrease rather than an increase. This does not imply, however, that we can use a decrease in indicators as a signal for a future outbreak that quickly follows an initial one, because such a decrease also occurs in case of no outbreak. The poor performance of early warning indicators in our empirical analysis is likely due to a combination of this transient phenomenon and the quick forcing of Rt. The only two indicators that showed a relatively consistent increase across countries are the coefficient of variation and the index of dispersion. This is likely due to the fact that the mean decreased more quickly than the standard deviation and the variance during the transient phase, leading to an increase in the indicators. In other words, the coefficient of variation and the index of dispersion likely increased for reasons other than critical slowing down. Third, our simulation study demonstrated that while early warning indicators can yield high discrimination (i.e., a high AUC), in practice they need to be calibrated. We found that the widely used methodology proposed by Dakos et al. (2012) with decision criterion p < 0.05 is poorly calibrated. This leads to poor performance consistent with our empirical results. The key issue is that the sampling distribution created under this methodology is not centered around a negative Kendall’s τ (implying a decreasing trend) but a Kendall’s τ of around zero (implying no trend). Thus the statistical power to reject the null hypothesis of no increase when actually observing a strong decrease in indicators is too low for these tests to be of practical value in realistic situations. Previous research also suggested that indicators can fail in the COVID-19 context (Proverbio et al., 2021).
Some limitations of this study should be kept in mind. Our empirical analysis takes the reported number of cases across European countries at face value. While we accounted for reporting delays, we disregarded any issues related to changes in reporting or testing that may affect the estimation of Rt. While the flexible method proposed by Abbott et al. (2020) renders any bias induced by a change of testing transient, any bias may have indeed changed the true value at which Rt crosses one. A more extensive analysis would look at all countries that experienced a second wave. However, we chose to limit ourselves to European countries because of the comparatively good reporting standards and the fact that there is sufficiently large heterogeneity in epidemic trajectories across European countries for the purposes of this study. On a similar note, because the time period between the end of the first and the beginning of the second wave was shorter than the time period it takes the system to settle at its new stable equilibrium after the first wave recedes in virtually all countries, we expect our findings to generalize well to non-European countries. We used an admittedly conservative criterion for date stamping the end of the first wave and the start of the second one to reduce the extent of the transient period we incorporate for indicator estimation. In particular, we chose the day at which Rt reaches its lowest value as starting point for the computation of early warning indicators. If anything, based on our finding that incorporating the transient decreases performance, our choice may be too charitable. We chose the end date for the indicator computation as the day at which Rt reaches its maximum after crossing one so as to increase the number of time points and reduce the extent of any bifurcation delay. If anything, this may again have been too generous. At the same time, while the epidemic unfolded quite distinctly in different European countries, Rt never stabilized at a low value and rose quickly after the first outbreak. These are far from the conditions under which to expect a reliable signal in early warning indicators, and our results should not be interpreted as a rejection of their potential in other applications, including other epidemics.
We used backwards rolling windows to avoid the use of data from the “future”, and our results can thus translate to a situation in which indicators are computed in real-time. A critical issue when using nonparametric estimation concerns the choice of the size of the rolling windows (Dakos et al., 2012; Dessavre et al., 2019; Lenton et al., 2012). There is a trade-off between a window size that is too small, where estimation accuracy suffers, and a window size that is too large, where stationarity is (more severely) violated (Brett et al., 2017). If a model is available, Dessavre et al. (2019) find that detrending based on model simulation works well, but this route is unavailable as an epidemic unfolds for which accurate models do not yet exist. Similarly, while Miller et al. (2017) found that indicator performance was robust to seasonal forcing, the time scale of such seasonal forcing is much longer compared to the movements of Rt that were observed in some European countries, and which hence may have further reduced performance. We have addressed the issue of window size selection by reporting extensive sensitivity analyses. Our finding that indicators poorly anticipate the second COVID-19 wave is robust to different choices.
Critical slowing down is a phenomenon that has primarily been studied in low-dimensional systems. It is prominent in the study of ferromagnetism and the Lenz-Ising model (Brush, 1967), and has been known to proponents of catastrophe theory since at least the 1970s (Zeeman, 1976). Wissel (1984) suggested critical slowing down as a way to forecast the extinction of a population of rotifers (see also Dai et al., 2012; Drake & Griffen, 2010). Scheffer et al. (2009) brought significant attention to the idea of using critical slowing down as an early warning signal which led to a surge of interest across many fields. Yet there is the obvious question of whether we should expect a phenomenon that pertains primarily to low dimensional systems to occur in the high dimensional real-world. Infectious diseases do not spread in homogeneously mixed populations with people being distinct only in terms of whether they are susceptible, exposed, infected, or recovered, as our simulation model assumes. Instead, infectious diseases spread between unique individuals on a network that is itself continuously changing. Studying the effect of test sensitivity and frequency on COVID-19 transmission, Larremore et al. (2021) find essentially no difference between a homogeneous compartment model and an agent-based model that is calibrated to New York City micro-census data. More relevant to our investigation, Brett et al. (2020) found that early warning indicators based on critical slowing down do indeed rise prior to an outbreak in high-dimensional network and agent-based models.
A related issue with early warning indicators based on critical slowing down concerns the decision criterion. When do we decide that a rise in indicators is “significant” and constitutes an early warning? In our empirical analysis, we chose a rise in trend to be significant at the α = 0.05 level, but this may well require adaption to the specific case at hand. There is a difference between making a statistical inference (e.g., estimating Kendall’s τ) and making a decision (e.g., restricting mitigation measures; Boettiger & Hastings, 2012). The latter requires calibration, which is understudied in the context of early warning indicators based on critical slowing down but essential to use in applications. Importantly, some indicators, such as the mean and variance, continue to rise even after Rt crosses one, as predicted by theory (O’Dea & Drake, 2019; Southall et al., 2020). Others are expected to peak at the point at which Rt = 1, although the exact maximum may not be clear (O’Dea et al., 2018). This means that it is hard to assess whether, say, a rise in the autocorrelation from 0.50 to 0.70 is already problematic, or whether one should wait until it reaches, say, 0.90 (if it ever will). The extent to which indicators such as autocorrelation rise also depends on a number of reporting details such as the frequency of reporting. It is therefore impossible to provide general guidelines for use in applications. Simulation studies that incorporate reporting issues and focus on specific diseases may provide further insight (Brett et al., 2018; Tredennick et al., under review).
Early warning indicators based on critical slowing down promise to be a quite general and low-cost tool to monitor the emergence and elimination of infectious diseases (e.g., Drake & Hay, 2017; Harris et al., 2020; Tredennick et al., under review). It is understudied how well these indicators perform compared to other tools that may be used as early warning signals. In the context of COVID-19, it seems plausible that by making stronger assumptions about the dynamics of the system or using system-external information such as mobility would lead to much better early warning systems. Simply estimating Rt and forecasting whether and when Rt > 1 may be a similarly low-cost but potentially more reliable approach. Conceptually, however, it is not so clear that one would like to have an early warning indicator signalling that Rt is about to cross one. This is due to two related reasons. First, because of the bifurcation delay, it may take weeks or months for the actual outbreak to occur. A method that is able to incorporate this bifurcation delay and produce an early warning of an actual exponential increase in cases may therefore be preferable. Ideally, such a method produces a probabilistic assessment of an outbreak, which can then feed into further decision making. Second, the simple fact that Rt crosses one does not imply that a second wave is incumbent. Instead, it may stay there for a while or fall again, as it did in several European countries during the current pandemic. One cannot impose strong mitigation measures to curb virus spread whenever Rt > 1. All this points to a more continuous approach in which multiple, system-external factors are taken into account to assess the risk of future outbreaks. Early warning indicators may be a part of this risk assessment toolbox for (re)emerging diseases when an outbreak is slowly forced — but not, as we have shown, when one outbreak follows closely after another.
Data Availability
All COVID-19 data used in this work is publicly available from a number of sources, for example the WHO (https://covid19.who.int/info/).
Author Contributions
H.H. and D.B. suggested to investigate early warning signals in the context of COVID-19. F.D. analyzed the data, conducted the simulation study, and wrote the first draft of the manuscript. J.D. provided valuable advice on the data analysis and simulation study. F.D., H.H., D.B., and J.D. wrote the manuscript. All authors read and approved the submitted version of the paper. They also declare that there were no conflicts of interest.
Funding
F.D. was supported by ZonMw project 10430022010001.
A Estimation of Rt across European Countries
Figures 6-9 show countries and their estimated effective reproductive number, with vertical lines indicating the time period on which we computed early warning indicators.
B Sensitivity Analyses
Figures 11-20 show sensitivity analyses for the ten early warning indicators across different rolling window sizes for detrending and estimation.
Footnotes
↵1 We analyzed countries in the EU, excluding Spain because of a strong weekend reporting effect that presented difficulties for model convergence, as well as the United Kingdom.