Abstract
Pandemic preparedness requires institutions, including public health authorities and governments, to detect, survey and control outbreaks. To maintain an accurate, quantitative and up-to-date picture of an epidemic crisis is key. For SARS-CoV-2, this was mostly achieved by ascertaining incidence numbers and the effective reproductive number (Reff), which counts how many people an infected person is likely to infect on average. These numbers give strong hints on past infection dynamics in a population but fail to clearly characterize current and future dynamics as well as potential effects of pharmaceutical and non-pharmaceutical interventions. We show that, by using and combining infection surveillance and population-scale contact statistics, we can obtain a better understanding of the drivers of epidemic waves and the effectiveness of interventions. This approach can provide a real-time picture, thus saving not only many lives by quickly allowing adaptation of the health policies but also alleviating economic and other burdens if an intervention proves ineffective. We factorize Reff into contacts and relative transmissibility: Both signals can be used, individually and combined, to identify driving forces of an epidemic, monitoring and assessing interventions, as well as projecting an epidemic’s future trajectory. Using data for SARS-CoV-2 and Influenza from 2019 onward in Germany, we provide evidence for the usefulness of our approach. In particular, we find that the effects from physical distancing and lockdowns as well as vaccination campaigns are dominant.
1. Introduction
Infectious diseases represent serious threats to an ever increasingly connected humankind, on par with e.g. natural disasters and infrastructure failures. Epidemic preparedness – the ability to predict and mitigate future epidemic outbreaks – has thus risen to one of the most pressing challenges in modern societies and recently focused a wealth of research efforts building on a variety of data [1] in response to awareness elicited by the SARS-CoV-2 pandemic [2].
pidemic dynamics are shaped at the crossroads of human and viral driving forces: a pathogen’s reproductive cycle, defining its relative transmission rate upon physical proximity between individuals with full or partial susceptibility, as well as human behaviour, via the frequency of transmissionprone contacts between individuals itself [3]. Critical events such as the emergence of fitter mutants or collective shifts in human activity patterns set the pace for new epidemic waves. Real-time monitoring of these forces during an epidemic, whether it is fueled mostly by increased contact levels or changes in relative transmissibility, is of paramount value for epidemic forecasting as well as the ability to set up informed, targeted mitigation strategies and estimating the effects of (non-)pharmaceutical health policies [4].
Using SARS-CoV-2 and Influenza as key examples of airborne transmissible contagions, we showcase monitoring and forecast tools for epidemic crises centered around a crowd-sourced, real-time method to assess levels of physical proximity in a population using GPS location information, the Contact Index CX [5]. We show that diverging trends between contact levels and independently recorded infection surveillance are indicators of altered relative viral transmissibility. Using 2020-specific data as a baseline for purely contactdriven SARS-CoV-2 epidemics, all observed transition points are explained by the onset of key immune escape variants (alpha, delta, omicron). The resulting dual evolution, Contact Index CX and relative transmissibility T, provides a highly transparent and timely picture of ongoing epidemics, including the possibility to identify likely driving forces in future epidemic waves.
2. Materials and Methods
2.1 Contact metrics relevant for epidemics
Contact networks are a representation of human interactions [6] with immediate implications for the spread of contagions in a population [7, 8]: Nodes represent individuals and edges are drawn between pairs of nodes in the event of contact between them (Figure 1(a,b)). A contagion can propagate through a population along paths following the links of the network.
Intuitively, transmission levels scale with the average number of links per node ⟨k⟩ = Σk≥0 kP(k) = 2L/N [3], where P (k) is the distribution of these numbers across a network and N (L) is the number of nodes (links). Beyond this local property, more global topological network features – how contacts are collectively configured across the network – do also affect the course of epidemics [3] by fueling and constraining the number of available paths. Groundbreaking epidemiological and network-theoretical work established that the effective reproduction number Reff, quantifying epidemic spreading, scales withi.e. the presence of very social nodes (superspreaders) with outstanding k mediate enhanced propagation. Typical social networks are very inhomogeneous in terms of social activity, with outstanding community structure and few individuals responsible for most contacts [9]. The pivotal role of the second moment ⟨k2⟩ =Σk≥0 k2P (k) is intuited by the friendship paradox [13]: An individual’s friends are on average more social than oneself; in other words, the number of next-nearest neighbors (k2) in the network exceeds the expectation ⟨k⟩2 from the number of nearest neighbors, a mere consequence of non-zero variance in P (k): ⟨k2⟩ −⟨k⟩2 > 0 (Supp Mat S2).
2.2 Assessing contact levels in real-world networks
The contact network relevant to transmission of airborne viruses such as Influenza and SARS-CoV-2 arises from physical proximity between individuals (Figure 1(a)). Compared to (virtual) social networks, such real-world networks are expected to have distinct properties, as they are constrained by geography and physical distance, but are also tremendously more difficult to track at the population scale. Coarse contact and mixing patterns in real-world networks have been inferred using limited data gathered from surveys [14, 15] or viral phylogeny [16]. Locally confined real-world networks, such as on cruise ships [17], school campuses [18] or within towns [19] have been measured using Bluetooth communication between nearby mobile devices.
We use a previously developed approach to probe population-scale real-world contact networks based on crowd-sourced datasets of GPS locations [20, 5] to measure the Contact Index as a statistical measure of contact levels relevant for epidemics [5]. The crowd-sourcing data is collected in near real-time via opt-in from each of an anonymized panel of 1 million mobile app users (roughly 1 % of Germany’s population) and consists of ≈ 100 daily samples per device tagged with time and GPS location information. It allows us to reconstruct samples of the actual contact network realized in the population: Contacts (links) are drawn between devices (nodes) co-located in space and time (Figure 1(a) and Supp Mat S1). Examples of reconstructed contact networks are shown in Fig. 1(e).
2.3 Network sampling correction
The incomplete nature of such crowd-sourced data represents a major challenge: Contacts from uninvolved or inactive devices are not captured, giving rise to missing nodes and links in the network. This aspect of our data can be crafted into a network sampling framework [21, 22] in which nodes and edges are randomly removed with probabilities p and q, respectively (Figure 1(b,c) and Supp Mat S3). p denotes the population share represented in the panel of app users, while q is interpreted as the rate fij of simultaneous samples from pairs of app users (Figure 1(c)), a necessary condition to detect a contact between users with individual sample rates fi and fj, respectively. These sampling parameters are subject to change over time beyond daytime-related periodicity (see below), mostly in response to software updates and app usage (Figure 1(d)), and are heterogeneous in space (Supp Mat S4 and Figure S3(a,b,c)).
For simplicity, we here use daily averages of sample rates. The rate fij of simultaneous samples tends to exceed the expectation from individual frequencies fifj under the hypothesis of independence of distinct mobile devices, i.e. fij > fifj, especially prior to February 2020 (Figure 1(d)); a major app update in February 2020 has significantly altered the daytime distribution and overall number of samples (Figure 1(d)). This apparent correlation between devices stems from the non-uniformity of the sampling activity over the day: Devices are more active during daytime than at night, an effect particularly prominent prior to February 2020 (Figure 1(d)). However, aside from a common daytime pattern, devices show a predominantly independent activity pattern from one another (Figure 1(d)): At any given timepoint (2 min interval), squared single-device distributions, i.e.do capture the distribution of simultaneous samples ρ2(t) across the day well. Solely in consequence to the daytime-related correlation, we are likely to slightly underestimate the true value of q by using daily averages.
Our improved mathematical modeling based on Horvitz-Thompson theory disentangles actual changes in contact levels from signals unrelated to the users’ contact behaviour, including participation and activity levels in the user panel, but excluding correlation between devices, see above. We thus achieve a persistent and comparable results across the full time span since the beginning of measurement in 2019 (Supp Mat S3 and Supp Mat S4). In summary, we show that the Contact Index CX of an unobserved complete network G can be re-trieved from a network sample G∗ obtained under the described sampling scheme according to where is the same quantity measured within the network sample and qeff is an effective node sampling probability for networks of unique contacts (see below).
Importantly, abstractions of contact networks exist in two distinct flavours: weighted versus unweighted [23]. Links may be endowed with weights wij ∈ {0, 1, 2, … } representing the duration or multiplicity of contact between individuals i and j [24] or simply indicate the presence or absence of contact aij = sgn(wij) ∈ {0, 1} (Figure 1(f)). In the epidemiological context, we assume that network topology, represented by aij, is more important than the recurrence of contacts between the same individuals: For instance, the (statistical) contribution to viral spread from a cluster of short contacts at a crowded event would outpace a lengthy contact between an isolated couple while in lockdown. We thus focus on unweighted networks and exclude contact duration in our analyses other than in the fact that short contacts are unlikely to be recorded during the random sampling inherent to the crowd-sourcing method.
However, network sampling destroys topological information about underlying complete networks (Figure 1(f)); the success of Horvitz-Thompson theory [21] to establish a connection between original and sample networks relies in the use of weighted links (Supp Mat S3). To establish the same connection for unweighted networks, we devised a Bayesian approach which identifies missing topological information as the weight distribution for existing links in the complete network P (w|w > 0) and defines the edge sampling probability as where Gw|w>0(ξ) = Σw>0 P (w|w > 0)ξw is the probability generating function of P (w|w > 0) (Supp Mat S3). We find that available complete real-world networks in various contexts [17, 18, 19] appear to show strikingly similar weight distributions (Figure 1(g)), which suggests a universal shape of P (w|w > 0) also applicable to our problem. Here, “complete” refers to the aspect that these networks represent a fraction of the population (p < 1), but all contacts within that sub-population are being detected (q = 1) – node sampling, but no edge sampling. These distributions are consistent with power laws P (w|w > 0) = w−(1+α)/?(1 + α) with small exponents [25, 26] (Figure 1(g)), a repeatedly demonstrated feature of complex networks [27] and beyond [28]. Yet, we do not imply that power laws are the true mechanism behind network weights, as a variety of other distribution classes are easily confounded with power laws [28, 29, 30], but merely use it as a prior for P (w|w > 0).
Results
Evolution of CX since 2019
By means of our refined correction method for network sampling effects, we achieve a consistent measurement of contact levels since the beginning of crowd-sourcing in 2019, despite the timedependent sampling. That is, we cover the prelude and entire course of the SARS-CoV-2 epidemic in Germany (Figure 2(a)). The gap in February 2020 is explained by missing data due to the rollout of a major crowd-sourcing software update.
Holiday season comes along with reduced CX under normal conditions, as shown by the Fall and Christmas breaks in 2019, thus showing a reduction of transmission-prone contacts. The onset of the first SARS-CoV-2 wave in March 2020 induced an unequivocally more pronounced drop in CX, probably explained by a more systematic cessation of super-spreading activities. The dramatically altered contact network structure during a lockdown is depicted in Figure 1(e).
Since onset of the SARS-CoV-2 pandemic, changes in contact behaviour as reflected by CX underwent several periods of spiking (partial or complete deregulation of mass events in fall 2020, fall 2021 and spring 2022) and damping (winter wave 2020, emergence of the omicron variant in late 2021). Overall, a similar evolution is observed between CX and the rigor of SARS-CoV-2-related policy as measured by the Government-Response Index [31] (Figure S1(a)), thus indicating broad awareness of the situation at the population and governance levels albeit no causal link shall be implied.
Interestingly, recent CX values have not yet returned to pre-pandemic levels by a factor of 2 to 3, despite a return to no contact-related restrictions in 2022. This suggests the existence of a hysteresis effect in addition to the fast response of CX discussed above: The collective behaviour has not returned to its unperturbed state in response to relaxed conditions, possibly as a result of continued broad perception of disease risk [32, 33].
From a dimensional viewpoint, CX represents an average number of (next-nearest) contacts per (nearest) contact: Comparing values of CX across areas with vastly different population densities within Germany supports our expectation that CX scales (non-linearly) with the absolute propensity of physical proximity between individuals (Figure S3(d) and Supp Mat S4).
3.2 Deciphering epidemic forces: contacts vs. relative transmissibility
In 2020, SARS-CoV-2 epidemic trends were primarily driven by trends in contact levels, as both immune escape variants and vaccines were not yet relevant and relative SARS-CoV-2 transmissibility its intrinsic transmission probability per contact was thus constant (Figure 2(b)): Official daily now-cast reproduction numbers Reff, independently recorded from national infection surveillance [34], correlate well with daily CX, but CX shows a time lead of approximately 2 − 3 weeks over Reff (Figure S1(a, right inset)) [5], explained by incubation time as well as testing and reporting delays. This underlines the predictive character of real-time contact metrics for wild-type dominated epidemics [20]. Since then, the correlation between Reff and CX has repeatedly changed, with the resulting signal quantifying shifts in relative transmissibility accountable to key epidemic changes other than contacts.
The effective reproduction number Reff is defined by Reff = ⟨k⟩ · U · τ, where ⟨k⟩ denotes the contact number per day, U the probability of transmission per contact, and τ the mean duration of infectivity in days. Both U and τ are determined by physiological processes involved in transmission and, together, define the intrinsic transmission efficiency (per contact) T = U · τ.
Furthermore, as we assume replaces ⟨k⟩, we replace the definition by Reff = (a + b · CX) · T. A linear relationship of this form between CX and Reff is motivated by our findings in 2020. We use values for a and b obtained from a linear regression between CX and wild-type Reff data at the optimal time delay of ∆t = 16 days (Figure S1(a, left inset) and Supp Mat S5). Upon interpreting RWT(CX) ≡ a+b·CX as the wild-type specific reproduction number, we have that where T represents relative transmissibility with respect to wild-type in a fully susceptible population (TWT = 1). Note that, in contrast to now-cast data, Eq. (3) assigns reproduction numbers to the day of contact/infection.
From independently recorded values for Reff and CX, we can determine the relative transmissibility of the contagion by factoring out contactrelated contributions from overall infection dynamics asfor any given day. We expect network-wide propagation of transmissibilityrelated information to be slow compared to network dynamics itself and, thus, T to undergo evolution on longer timescales. We interpret fast signal in T as random fluctuations from the measurement of Reff and capture actual trends by ⟨T⟩, centered averages over sliding time windows of 2 months (Supp Mat S5).
3.3 Epidemic evolution of relative SARS-CoV-2 transmissibility
The evolution of relative SARS-CoV-2 transmissibility ⟨T⟩ is shown in Figure 2(b). This time series reenacts the various phases of the SARS-CoV-2 pandemic:
Relative SARS-CoV-2 transmissibility ⟨T⟩ is approximately equal to unity throughout 2020, an initial period purely driven by unperturbed wildtype epidemics that we used to “calibrate” CX and Reff which evolve on shorter timescales. It subsequently follows a tug-of-war pattern shaped by alternating epidemic forces beyond contacts: immune escape variants and development of population immunity through infection and vaccination. Three waves of increased relative transmissibility are explained by the takeover of fitter virus lineages (Figure 2(b)), specifically alpha (spring 2021), delta (summer 2021) and omicron BA.1/BA.2 (winter 2021/22). We hypothesize that subsequent relaxation of ⟨T⟩ after each wave may be attributed to natural immunity, while the superposed long-term downward trend may be explained by the additional immunity acquisition through (initial and booster) vaccination campaigns. Interestingly, the effect of omicron BA.4/BA.5 takeover in summer 2022 on ⟨T⟩ is nowhere close to those of previous variants.
Comparing correlations with different parameters rules out the possibility that the measured ⟨T⟩ is shaped by factors confounding the reproduction numbers or CX values (Figure S1(b,c) and Supp Mat S5). These possible confounders include viral prevalence, CX itself through higher-order effects from network sampling not captured by our modeling and other topological network features (such as clustering, small-world properties) as well as Reff itself through changes in testing strategies and systematic under-reporting of infections [35]. For instance, testing individuals indiscriminately versus focusing test capacities on suspected infection cases may lead to incomparable snapshots of ongoing infection dynamics. Overall, strong positive correlation is exclusively observed between ⟨T⟩ and variant dynamics (Figure S1(b,c)) [36]. In this analysis, we use test positivity [37] and results from local prevalence studies [38] as proxies for overall prevalence. Also, we neglect possible effects from network sampling on different topological measures [39, 40], but we expect trends to be conserved as long as the sampling process remains unchanged.
We note the absence of seasonal oscillations in ⟨T⟩ as well as clear signatures of mask mandates (in effect across many social contexts between April 2020 and April 2022). A seasonal oscillation in ⟨T⟩, larger values in winter and smaller values in summer, might be expected from the shift of human activity between in- and outdoor settings. Also, previous research established the effectiveness of mask usage at reducing transmission of respiratory diseases (reviewed in [41]). Overall, our results suggest that, at least in the epidemic stage of SARS-CoV-2, infection rates were predominantly driven by the strong variability in contacts as well as the repeated emergence of more transmissible variants, in line with previous findings [42, 43, 44].
3.4 Forecast of infection level and trend changes
The challenge of epidemic forecast consists in the accurate prediction of current and future reproduction numbers Reff. Using the rationale that trends in infection levels carry the combined signature of trends in contact and relative transmissibility levels, we propose to construct predictions according to where Rtrue is assigned to the projected day of contact/infection. The key difference to Eq. (3) is the use of ⟨T⟩which eliminates noise from reproduction numbers. Importantly, we therefore expect that our prediction Rtrue represents actual epidemic trends (ground truth) more accurately than epidemic surveillance (Reff).
Figure 3(a) shows Rtrue together with data from infection surveillance, both plotted with respect to their date of recording (assuming real-time CX measurement). This shows how our prediction overall anticipates current epidemic trends that are observed via infection surveillance only about ∆t = 2 − 3 weeks later. Thus, we propose to use our method as a tool for real-time infection surveillance.
To extend forecasts beyond this horizon and predict future reproduction numbers, CX and ⟨T⟩ themselves need to be projected beyond latest data.
For several choices of the current day t0, Figure 3(b) showcases forecasts (Rpred) where CX and ⟨T⟩ are continued beyond the last days of available data (t0 and t0 − ∆t, respectively) using autoregressive integrated moving average (ARIMA) models prior to applying Eq. (4) (Supp Mat S6). These forecasts outperform a null forecast based on a mere ARIMA-type continuation of infection surveillance data (Reff), as shown by narrower distributions of residuals (Rpred−Rtrue) across all choices of t0 (Figure 3(b)). Furthermore, we highlight the broad applicability of our method to airborne infectious diseases by performing an identical forecast analysis for Influenza (Figure S2(a)), using coarser infection surveillance data [45] and presuming a similar relationship between Reff and CX as for SARS-CoV-2 (Supp Mat S6).
Most importantly, trend changes in epidemic driving forces such as ⟨T⟩ and CX are indicators of new phases in an epidemic. Timely detection of new trends in these time series, e.g. using anomaly detection methods, can provide valuable information to estimate the risk of upcoming epidemic waves and to predict their nature – whether dynamics is fueled by contacts or increased transmission efficiency. Such trend detection is potentially easier to achieve but equally informative than the ability to accurately predict infection surveillance. The onset of rising trends could shape decision-making with regard to the effectiveness of health policies, e.g. pharmaceutical and non-pharmaceutical interventions for rising ⟨T⟩ and CX, respectively. Figures 3(c) and S2(b) highlight rising and falling trends in both CX and T for SARS-CoV-2 and Influenza, respectively, akin to trends in stock prices. For SARS-CoV-2, trend changes are timely indicators of all major escape variant- and contact-driven epidemic turning points (Figure 3(c)). Unlike for SARS-CoV-2 in its epidemic stage, major upheavals in relative transmissibility for Influenza are limited to seasonality, with the notable exception of 2020, presumably reflecting its endemic dynamics (Figure S2(b)).
Discussion
We presented a simple, yet insightful quantitative method for a data-driven decomposition of overall epidemic dynamics into contact-related and transmission efficiency-related contributions. It relies on both the availability of infection surveillance data as well as crowd-sourced GPS location data to detect and quantify physical proximity between susceptible individuals. Its appeal resides in the merely bivariate yet highly informative projection of epidemics paving the way towards timely identification of driving forces in an ongoing epidemic human versus viral factors – and possibly effective mitigation strategies – pharmaceutical versus non-pharmaceutical.
The approach can be used for epidemic forecast in multiple ways. Recent and projected future values of CX and ⟨T⟩ can be used for short-term (2 − 3 weeks) and long-term prediction of infection or reproduction numbers, thus taking our previously described short-term forecast further [5]. Yet, a timely detection of trend changes could reliably forecast upcoming waves and their nature without the necessity to accurately predict infection surveillance data. These tools can lead towards a more strategic approach to epidemic mitigation and potentially save lives by reducing the spread of deadly diseases.
Results from the presumably most systematically tracked epidemic to date, SARS-CoV-2, draw the picture of co-evolution within the virus-host relation: Increasing immunity levels in the host population alternate with step-wise adaptation of the virus through immune-escape variants. Other frequently discussed factors, including mask policies and seasonality, are presumably still below the current statistical resolution of our method, defined by the sampling noise in the CX and Reff time series. Moreover, a larger impact of seasonal variation is expected in the endemic phase of SARS-CoV-2 [46].
Our method is broadly applicable to airborne contagions beyond SARS-CoV-2, but depends on the availability of infection surveillance and crowdsourcing strategies that remain persistent over extended amounts of time. Changes in testing strategy can lead to signal and biases unrelated to underlying epidemic driving forces [35]. More crucially, systematic infection surveillance is not implemented beyond the case of SARS-CoV-2. We illustrated a framework to correct for the effect of varying sampling depth in the contact network. Yet, higher-order effects in the signal can occur as a result of sampling aspects not captured by our mathematical modeling. In order to ensure valid prognoses through our method, we advocate for systematic and persistent crowd-sourcing and infection surveillance strategies across a variety of diseases with epidemic potential.
Geographical resolution of our forecast method is currently limited by the sampling depth, as the estimation especially of higher moments of degree distributions P ⟨k⟩ becomes increasingly difficult as smaller portions of the network are available. A higher spatial resolution of contact and relative transmissibility levels, with potential to locate the origin of new variants of concern and define locally targeted mitigation strategies, can be achieved by e.g. increasing the panel of app users.
Our analysis assumes statics, but actual contact networks are dynamic in nature [47, 48]: While some contacts are frequently repeated (e.g. between household members), other contacts are randomly redrawn on each occasion (e.g. in public transportation), with implications for epidemic spread [49, 50]. Our method can be improved by analyzing contact data in light of existing models of dynamic networks [51, 48].
Data Availability
All aggregated data produced in the present study (i.e., anonymized daily contact networks, daily participant and sample numbers) are available upon reasonable request to the authors. Official infection surveillance, vaccine coverage, and virus variant data for Germany used in this work are publicly available from the Robert Koch Institute (see references in the manuscript).
Acknowledgment
This work was supported by grants from the Federal Government of Germany through the Federal Ministry for Economic Affairs and Climate Action (BMWK) for the project DAKI-FWS (01MK21009A) and the Federal Ministry of Education and Research (BMBF) for the project Optim-Agent (031L0299).
Footnotes
In SI, we added a mathematical proof of the importance of the degree distribution's second moment <k^2> for epidemiology in contact networks.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].
- [11].
- [12].
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵