Abstract
Propagation of an epidemic across a spatial network of communities is described by a variant of the SIR model accompanied by an intercommunity infectivity matrix. This matrix is estimated from fluxes between communities, obtained from cell-phone tracking data recorded in the USA between March 2020 and February 2021. We apply this model to the SARS-CoV-2 pandemic by fitting just one global parameter representing the frequency of interaction between individuals. We find that the predicted infections agree reasonably well with the reported cases. We clearly see the effect of “shelter-in-place” policies introduced at the onset of the pandemic. Interestingly, a model with uniform transmission rates produces similar results, suggesting that the epidemic transmission was deeply influenced by air travel. We then study the effect of alternative mitigation policies, in particular restricting long-range travel. We find that this policy is successful in decreasing the epidemic size and slowing down the spread, but less effective than the shelter-in-place policy. This policy can result in a pulled wave of infections. We express its velocity and characterize the shape of the traveling front as a function of the epidemiological parameters. Finally, we discuss a policy of selectively constraining travel based on an edge-betweenness criterion.
WHen plague hit Florence in August 1630, the Florentine authorities made a number of high-stakes decisions which proved highly effective [1]. One of the reasons the Florence Sanità could organize this response was the ample time they had, forewarned as they were by the Milanese authorities in November 1629. Today’s public-health authorities work under much more compressed timescales, as evidenced by the SARS-CoV-2 pandemic. Long-distance travel radically changes the dynamics of spreading, which raises a number of questions about the spatial dynamics of transmission in modern times. Epidemic outbreaks in the last two decades have provided the scientific community with a wealth of material to study these questions, going beyond the classic Susceptible-Infected-Recovered (SIR) theory with perfect mixing [2–6]. Several studies have shown how the total epidemic size can be affected by factors such as inhomogeneity in transmission rates [7–14] or in the mode of transmission [15, 16]. Classically, motion of individuals was taken into account by introducing diffusion terms in the standard SIR equations, allowing the emergence of spatio-temporal patterns [17–20]. Recently, in the context of the SARS-CoV-2 pandemic, such approaches have been especially valuable in order to study the effect of containment policies such as lockdown and quarantine [21, 22]. These models are, however, limited in that they do not, in principle, take long-distance air travel into account. Several works have, therefore, considered disease spread in a network, typically constructed from air-traffic data [19, 23, 24], where edges can connect locations separated by large geographical distances. This approach can lead to very accurate predictions at the country scale [25] but predictions at finer scales remain challenging. Another study considering human mobility emphasized how spatial variation in public-health infrastructure reflected on epidemiological parameters can affect the dynamics of spread to different countries [26]. Data-based studies of epidemic spread and the impact of social distancing through the analysis of social-network structure have also been very informative [16, 27–29].
A recent paper by Chang et al. [30] obtained a model for the spatio-temporal spread of a disease at a high spatial resolution by using extensive mobile tracking information to identify physical interactions between individuals. Chang et al. showed that the actual spread of the SARS-CoV-2 epidemic can be well explained from the mobility data of individuals. The model relied on the simulation of interactions among individuals on a bipartite graph where nodes, representing locations at a very fine spatial resolution, are divided in two sets: Census Block Groups (residential areas) and Points Of Interest (non-residential), each of them having its own transmission rate. The model was fitted to reproduce known reported cases of COVID-19 in 10 metropolitan areas, and could then be used to make short-term predictions about the spreading or study the effect of different mitigation strategies.
Here we take an approach similar to that of Chang et al., using mobility data to calibrate a model for disease spread. However, we investigate this propagation at the scale of a large country, the USA, rather than metropolitan areas. Specifically, we introduce a spatial SIR epidemiological model in which effective transmission rates between N = 210 communities are computed from mobility data of individuals belonging to these communities. We show that this model captures well the spread of the SARS-CoV-2 epidemic. Remarkably, we find that a simple model consisting of an interaction frequency dropping under the effect of lockdown, and of a single flux matrix encoding the travel of individuals, faithfully reproduces the reported cases of COVID-19 both globally and locally in each community. Strikingly, the SARS-CoV-2 epidemic spreads in a delocalized fashion, infecting distant communities very quickly. Moreover, an even simpler model with uniform transmission rates between the communities gives results very close to the model based on mobility data, emphasizing the prevalent role air travel played in the spread of the SARS-CoV-2 epidemic. We then study how interventions that change travel patterns can localize epidemics. In particular, we investigate the hypothetical effect of a policy preventing long-distance travel. In addition to “flattening” the curve, spreading through nearest-neighbor interactions creates traveling waves, which we characterize both analytically and numerically. These results allow us to discuss which interventions are more effective, limiting short-range contacts (a lockdown), or limiting long-range trips (a quarantine). We then propose an alternative mitigation strategy based on an edge-betweenness criterion.
Model
We consider a metapopulation model of N communities numbered 1, 2,…, N. Denoting by Sa, Ia and Ra the numbers of susceptible, infected and recovered individuals in community a, the standard SIR equations read: where βab is the transmission rate from infected individuals in community b to susceptible individuals in community a and y is the recovery rate, assumed to be the same for all communities. Diagonal elements of the infectivity matrix [βab] describe intracommunity infections, while off-diagonal elements describe inter-community infections (Figure 1a-b). We also introduce the local epidemic sizes Ta =Ia − Ra. The total population in each community Ma is constant through time,
The model in equation (1) has been extensively studied [31–35]. We show in the Supplementary Information (SI) that the dynamics can be reduced to an ODE of just one N-vector variable, and the endemic equilibrium can be obtained by solving a transcendental equation involving the infectivity matrix [βab].
In order to estimate the infectivity matrix, we used mobility data compiled by SafeGraph [36], tracking the location of about 20 million USA cell phones between March 2020 and February 2021. The locations consist of more than 200 000 Census Block Groups (CBGs). Each cell phone is assigned for physical residence the location where it spent the most time, and daily visits to other locations are recorded. For computational purposes, we coarse-grained the physical locations into N = 210 communities, which are shown in Figure 1c. Let fab be the number of individuals from community a visiting community b per unit of time. We will assume that The variation in susceptible individuals in community a due to new infections during the time interval Δt has the form: where β is the disease-specific transmission rate when a susceptible individual has contact with an infected individual. There are three kinds of infection to consider: (i) the infected person and the infector belong to the same community, (ii) an infected person visits a neighboring community and infects a resident of this community, and (iii) a susceptible person visits a neighboring community and gets infected by one of its residents. We will neglect the rarer “tourist to tourist” infections, when an infected person visits a neighboring community and infects there a visitor from yet another community. The term Pr meeting an infected individual can therefore be evaluated as a function of the pseudo-flux matrix [fab] for each of the three aforementioned cases (SI). After summation of the three contributions, we obtain: where p is the frequency with which an individual is having contact with an other individual of its community, and is assumed to be the same for all communities. Equation (5) determines the infectivity matrix [βab] from equation (1) up to a proportionality constant, namely βp.
Results
The model reproduces the spatial dynamics
In order to assess the validity of the infectivity matrix based on mobility data, equation (5), we confronted the model’s predictions against COVID-19 case numbers reported in the USA by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University [37]. Specifically, we fitted the daily βp(t) values so as to minimize the sum of squared errors between values predicted by the model and values reported by the CSSE (Material and Methods). The fitted βp values show a steep decay during the month of March 2020, followed by a plateau lasting until February 2021 (Figure 2b). Although we fitted the model to the real COVID-19 confirmed cases using the global quantity Ω, local epidemic sizes a predicted by the model agree very well to the empirical values.
A two-phase simplified model
The time variations of βp(t) shown in Figure 2b imply a drastic decrease in the interaction frequency among individuals across all communities. This result reflects the effect of the shelter-in-place policies that were implemented at the beginning of the SARS-CoV-2 pandemic in many USA states. We can use this observation to estimate the effect of these policies quantitatively: the interaction frequency among individuals, namely p, is about 5 times smaller in the plateau following shelter-in-place policies than it was at the onset of the SARS-CoV-2 pandemic. Following this observation, we defined a simplified model, in which the fit with reported cases was carried out while enforcing a softplus shape for βp(t) (Figure 2b and Material and Methods). This simplified model reproduced the average progression of the epidemic (Figure 2a), but it didn’t capture the three oscillations visible in the number of new cases. We conclude that these oscillations in the number of new cases were mostly driven by a similar oscillatory pattern in the interaction frequency as seen in Figure 2b. In Figure 2c, we show that the number of new COVID-19 cases in each community predicted by the simplified model follows closely the empirical values. This result suggests that the infectivity matrix constructed from mobility data (equation (5)) is a good approximation of the “true” infectivity matrix.
Turning down long-range interactions
The previous results suggest that the decrease in new COVID-19 cases was mostly driven by a country-wide reduction in the interaction frequency among individuals. Here we investigate the hypothetical effect of an alternative policy, namely a travel restriction while keeping unchanged the interaction frequency among individuals. Specifically, we modified the infectivity matrix so that communities separated by a physical distance larger than a prescribed cutoff do not interact: βij = 0 if dij > dC (Figure 3a). We seeded the infection in a community belonging to the state of Washington and simulated the spread of the epidemics using a fixed interaction frequency (βp(0) from the simplified model). As expected, we observed a reduction in the number of daily cases dT when the cutoff distance dC decreased, illustrating the “curve-flattening” effect that was targeted by travel restriction policies (Figure 3b). In this idealized scenario with a single seed for the infection, the epidemic propagates as a traveling wave from the west coast to the east coast (Figure 3c). However, as can be seen by comparing Figure 2a to Figure 3b, travel restriction policies are not as efficient as lockdown policies to decrease the spread of an epidemic.
Agreement with 1D analysis of a wave
The results from the previous sections suggest the apparition of a wavefront when transmissions are short range. To investigate this phenomenon, we consider a simplified model of communities lying on a two-dimensional square lattice. Each community index is replaced by coordinates (i, j) ∈ ⟦1, 2n⟧ × ⟦1, 2m⟧. In particular, we consider that only individuals from neighboring communities interact together. We rewrite equation (1): where α (respectively β) is the intra-community (resp. inter-community) transmission rate.
After rescaling the time and space variables, and defining the rescaled recovery rate , we look for wave solutions in the continuum limit by introducing the shape functions and , where represents the velocity of the wave in rescaled time and space (see SI). The shape functions satisfy the ODE: We find that the velocity of the traveling wave is bounded from below (SI): which is in agreement with previous reports [18, 38] and with results from marginal-stability analysis [21, 39, 40]. Interestingly, by an independant argument, we also established (see SI) that . Therefore the traveling wave must move at velocity . This indicates that the SIR dynamics in equation (6) falls into the Fisher-Kolmogorov-Petrovsky-Piscunov universality class, resulting in pulled waves [41–44]. Although we have taken the continuum limit of a nearest-neighbor model, this analysis is also valid for any finite-range infection matrix with the appropriate rescaling of variables.
We performed simulations of the dynamics given by equation (6) on a square lattice with a varying aspect ratio (Figure 4a). As expected, there is a front of new infections, moving from west to east as time progresses. A timelapse of a traveling front of infected individuals with = β = y = 0.1 is shown in Figure 4b. The wave position increases asymptotically linearly with time, but the velocity of the wave varies with (Figure 4c). The profiles obtained are in agreement with the shape functions obtained by solving equation (7), as shown in Figure 4d.
Properties of infectivity matrices
The daily infectivity matrices constructed from the SafeGraph mobility data can be viewed as elements of a random-matrix ensemble. Remarkably, matrix elements seem to be distributed according to the law βij = ⟨βij⟩eξ, where ⟨βij⟩ is the mean infectivity matrix and ξ ≡ N(0, 1) is a centered reduced Gaussian variable (Figure S3). The probability density of the eigenvalues is also shown in Figure S3. A connection can be made with random matrix theory (RMT) [45–47], initially introduced by E. Wigner to model the spectra of the nuclei of heavy atoms, where the interactions between many nucleons are assumed to be drawn from a random ensemble. In RMT, random matrices are classified according to their symmetry, or according to their corresponding level statistics (namely, the probability density of the spacing between consecutive eigenvalues) that exhibit different degrees of level repulsion [45]. In particular, the Wigner-Dyson (WD) statistics is typical of the Gaussian Orthogonal Ensemble (in which eigenvalues “interact”), while the Poisson statistics is typical of a matrix with independent eigenvalues. The level-spacing statistics of the infectivity matrices is shown in Figure 5a. We found that they were time-independent. Yet interestingly, the level statistics interpolates between the WD statistics and the Poisson statistics [48]. As shown earlier, removing links between communities according to their geographical distance results in a slower spread and a smaller epidemic size. Concomitantly, the level spacing distribution converges toward a Poisson statistics (Figure 5c). The crossover from WD (entropy S = 0.7169) to Poisson (S = 1) distribution as links between communities are successively removed suggests an isolation policy that can lead to an effective reduction of the epidemic size (SI). As alternative mitigation strategies, we choose to induce the transition toward a Poisson distribution by decimating links between communities according to their “nominal” distance or the “edge-betweenness” centrality [49–51] (Figure 5b). Figure 5d shows how the level spacing distribution converges toward a Poisson statistics when the nominal distance threshold is lowered. We find the edge-betweenness centrality to be more efficient in decreasing the epidemic size. This is because decimation according to edge-betweenness first targets links with the largest transmission rates. One could also consider a moderate policy: instead of eliminating links completely, one could impose constraints on the flux of individuals that are allowed to commute via central pre-determined links.
Discussion
Biological systems are inherently complex and it can be a challenge to characterize them by a small number of parameters. In the case of epidemics, the huge number of parameters (for example, intercommunity spread involves N2 transmission rates for N communities) can obscure the salient mechanisms and make it difficult for policy makers to find efficient interventions. Reducing the dimension of a model, when possible, is therefore of great value. In this article, we have proposed a spatial SIR model with an infectivity matrix based on the local travel patterns. Despite its simplicity (there is only one global parameter to be adjusted), we find that our model is able to capture the spatial spread of the SARS-CoV-2 epidemic, with a delocalized multicenter spreading caused by long-distance travelers bringing infection into far regions, then becoming secondary centers of infection. The time evolution of the interaction frequency reflects the shelter-in-place policies that were implemented by various USA states in the early stages of the SARS-CoV-2 pandemic. In fact, the rich diversity of human responses could be summarized by a simplified model for the interaction frequency, whose asymptotes represent the values “before lockdown” and “after lockdown”, with few assumptions about the infection process. The individual variances between people and communities turn out to self-average, giving a clear picture of the spread.
If a model with relatively few parameters describes the observations, one can assume that it also describes the situation when these parameters are changed by an intervention. Therefore, we suggest such a model could be used as the basis for efficient policies—or at least reliably to estimate the consequences of adopting policies. As an example, we have shown the hypothetical effect of an alternative to the shelter-in-place policy in the case of the SARS-CoV-2 pandemic. Specifically, we have investigated a travel restriction policy in which individuals can only move within an area of fixed radius centered around their residence. By contrast to the lockdown policy, we find that infections spread through a well-defined wave front, traveling with a certain velocity. This scenario might be preferable since it gives time to communities and public health infrastructures to prepare for the onset of the epidemics, while still “flattening the curve” of new infections (Figure 3b). We have also provided both an analytical and numerical analysis elucidating the mechanism of formation of a wavefront. In particular, we give the velocity and the shape of the wavefront for an epidemic spreading through nearest neighbors interactions.
Although SafeGraph mobility data [36] provide a realistic picture of people movement between communities, one might ask what is the sensitivity of the results to the infectivity matrix derived from the mobility fluxes. We thus considered a model with uniform transmission rates among communities, namely βab = β, which suppresses spatial effects. In particular, this model leads to the natural variables va (see SI) to be uniform: va(t) = vUN(t). Surprisingly, carrying out the same fit to the reported cases (Figure 1a-b) was only marginally inferior to the fit carried out with the SafeGraph mobility data (see Figure S1b). By contrast, carrying out the same fit with the infectivity matrix derived from SafeGraph mobility data but with long-range interactions turned down resulted in a significantly different dynamics (see Figure S1c). There are several implications of this result. (1) Although changes in the structure of the infectivity matrix can lead to drastically different dynamics (Figures 3 and 5 and Figure S1c), it appears that the infectivity matrix derived from SafeGraph fluxes falls in the same “universality class” as the uniform model. This suggests that long-range movements (e.g. air traffic) played a prevalent role in the spread of SARS-CoV-2. (2) Our choice to reduce the complexity of the model to only one fitting parameter might not be adequate to discriminate between models falling into the same “universality class”. Instead of fitting only one scalar p(t) at each time point, we also investigated the possibility of fitting the N2 transmission rates βab(t) minimizing the errors with reported cases (see Material and Methods and Figure S4). Although this latter approach is clearly prone to overfitting, it shows that there exists a parametrization of the model which reproduces very closely reported cases. We anticipate that the proper number of fitting parameters lies in between those two extreme scenarios.
During the course of our research, an epidemiological study bearing similarities with our approach was published [55]. In that study, the authors developed a county-resolved metapopulation model describing the spread of SARS-CoV-2 in the USA, informed with mobility data from SafeGraph. Instead of calibrating the model by fitting the country-wide number of reported cases as in the present study, the authors fitted their model to reproduce reported cases of COVID-19 on a per-county basis. Furthermore, the model required fitting many time-dependent parameters on a per-county basis, including per-county transmission rates, making the fitting procedure very high dimensional. Finally, the dynamics of the disease spread introduced differs from equation (1) since S, I and R compartments were introduced for each commute channel i → j rather than for each community. Altogether, the complexity of that model makes it less amenable to analytical study.
In conclusion, we used a simple model of intercommunity spread of an infectious disease to show the transition between different regimes of epidemic progression. Because of the complexity of the infection process (e.g., variations in individuals’ responses, mutations, etc.) and of human behavior, we are still far from a global forecasting system able to predict the spread of different infectious agents throughout the world. As with weather forecasting, observables must be measured in real time in order to inform complex models to yield short-term forecasts. One salient feature in our approach is showing how such widespread measurements (namely, the mobility data) can be integrated in a model describing the spread of an infectious agent, and showing what types of predictions can be obtained. It is a step toward more predictive epidemiology models, grounded in measurable quantities.
Material and Methods
Data availability
Data and scripts used in this study are available at the GitHub repository github.com/czbiohub/epidemiology_flux_model. Except the raw mobility data, which belongs to SafeGraph.
Mobility data
The mobility data was obtained from SafeGraph, a company that aggregates anonymized location data from numerous applications in order to provide insights about physical places, via the SafeGraph Community. To enhance privacy, SafeGraph excludes census block group (CBG) information if fewer than two devices visited an establishment in a month from a given census block group. In this manuscript, we use data extracted from the “Social Distancing Metrics” dataset [36], with dates between February 2020 and February 2021. SafeGraph has stopped sharing mobility data under this format. However the matrices of fluxes between our coarse-grained communities can be found in the ‘data’ folder in our github repository. We obtained the coarse-grained communities by running a K-means clustering algorithm to group the 220, 333 CBGs into 210 communities. We used the implementation from Scikit Learn. We then ran a hierarchical clustering algorithm to re-index communities so that communities close in space had close indices, as shown in Figure 1. We used the linkage function from SciPy. We then computed for every day the “flux matrix” where each entry fab represents the number of cell phone whose residence CBG belongs to community a which visited a CBG belonging to community b. The average flux matrix was constructed by averaging all the daily flux matrices. We used population counts for CBGs in agreement with the United States Census Bureau’s and available in the SafeGraph Open Census Data (file “cbg_b01.csv”, column “B01001e1”). We checked that the population counts from the United States Census Bureau were approximately proportional to the residential mobile-phone counts, therefore validating mobile tracking as a proxy for actual population.
Reports on SARS-CoV-2 infections
In order to fit our model, we used USA cases of COVID-19 reported by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University [37] which can be accessed at the GitHub repository CSSEGISandData/COVID-19. New cases of COVID-19 reported by the CSSE were mapped to the closest mobility-data-derived community based on their latitude and longitude. We therefore obtained the time evolution of SARS-CoV-2 infections through the mobility-data derived communities. Integer absolute new cases were converted into relative population fractions using the community population counts obtained from the United States Census Bureau.
Model fit
Daily scales
Here we consider the SIR dynamics (see Supplementary Information): We also define the chi-square at each time t: where Ma is the population at location a and (1 − ŝa)Ma is the number of reported cases at location a. Assuming that p(t − 1), p(t − 2), …, p(0) have been previously evaluated, the dynamics (sa, ja) is determined up to time t, and we have: To obtain the scale p(t), we solve for:
Simplified model
We look for a simplified model in which the scales have a functional form close to a ramp function: We first set θ by fitting this function to the daily scales: Then we adjust the slope of the ramp in order to minimize the error with reported cases. In particular, we set , and we solve: We thus define, and the model for simplified scales is:
Daily infectivity matrices
We also considered another approach, consisting of fitting the N2 transmission rates βab(t) at every time t. In particular, we solve: subject to the constraints βab ≥ 0. We show the results of this fit in Figure S4, however his approach is prone to overfitting since there is N2 fitting parameters and only N data point at each time t.
Simulations with nearest-neighbors-only interactions
The curves shown in Figure 4b-c and the symbols shown in Figure 4d were obtained by integrating equation (6).We used the function solve_ivp from SciPy with the “DOP853” integration method. At t = 0, we considered S = 1 everywhere except at the sites of coordinates (0, 2m−1 − 1) and (0, 2m−1) (see Figure 4a), where we set S = 0 and I = 1. The Laplacian was computed using the 9-point stencil . We considered periodic boundary conditions along the y direction and Dirichlet boundary conditions along the x direction.
Data Availability
All data produced are available online at https://github.com/czbiohub/epidemiology_flux_model and upon reasonable request to the authors.
Supplementary Information
1 Supplementary figures
2 Formal solution to the spatial SIR
2.1 Matrix SIR
We consider N communities affected by a growing epidemic. The spreading of the epidemic is described by the multi-compartment SIR model. Every community a is characterized by the number of susceptible individuals Sa, the infected individuals Ia and the number of recovered individuals Ra. The dynamics for Sa, Ia and Ra with a ∈ ⟦1, N⟧ is described by the equations We assume that the population in each community is fixed, i.e. The boundary conditions for equation (18) are: where the parameter ϵa describes the initial infection (the initial number of infections per per person in the given community).
We introduce the rescaled variables: sa = Sa/Ma, ja = Ia/Ma and ra = Ra/Ma. Then equations (18) and (20) become with
2.2 Solution of matrix SIR equations: the canonical basis
In order to solve equations (21) and (22), we define: Substituting equation (23) in equation (21), we solve for sa and obtain: We also note that ra (t) can be written as: where denotes the inverse matrix such that . We therefore obtain the parametrization: as a function of va (t). Note that when the infectivity matrix is diagonal, namely βab = βδab, we recover the parametrization from reference [4]: with u(t) = e−v(t).
The dynamics is determined by the functions va(t). Starting from equation (21), we multiply the equations by βab and sum, obtaining Equation (28) can be solved numerically. In the t → ∞ limit, we obtain: which is the same as setting ja = 0 in equation (26). Equation (29) can be solved iteratively. The size of the epidemic is given by:
2.3 Solution of matrix SIR equations: SVD basis
A solution can also be written using the singular value decomposition of the infectivity matrix β: For a given vector x, we use the following notation: . From equation (28) we thus obtain:
3 Initial stage of an epidemic
We again start from equation (21), which we linearize around the initial state , , and . To first order j and r satisfy the ODE: [βab] is a real square positive matrix. According to the Perron-Frobenius theorem, there exists the maximal eigenvalue λω > 0, such that any other eigenvalue λα < λω, and the associated eigenvector vω is positive. At large times, we have: where (A · B) denotes a scalar product. The epidemic grows only if the basic reproduction number . Furthermore, if the infectivity matrix can be factorized as βab = fagb, then β is of rank 1 and λω is the only non-zero eigenvalue. We then have and vω = f.
4 Construction of the infectivity matrix from mobility data
The mobility data obtained from SafeGraph allows us to construct a pseudo-flux matrix in which each entry fab represents the number of individuals from community a visiting community b per day. Let us now consider one community a, having Sa susceptible individuals and Ia infected individuals. The variation in susceptible individuals due to new infections during the time interval Δt has the form: where β Δt represents the probability to get infected when meeting an infected individual. We now list the different contributions.
4.1 Intra-community contributions
The contributions coming from infected individuals in the same community are: where we have introduced f*a = ∑b fab, and the parameter pa representing the frequency with which an individual is interacting with other individuals in community a. Here Ia/(f*a + Ma) is the probability to meet an infected individual when interacting with an individual in the community.
4.2 Inter-community contributions: incoming visitors
The contributions to the infections in community a from infected individuals visiting from another community b are: where we have assumed that fbaIb/Mb is the number of infected visitors from b.
4.3 Inter-community contributions: returning natives
The contributions coming from individuals from community a infected while visiting another community b are: where we have assumed that fabSa/Ma is the number of susceptible visitors from a visiting community b.
Neglecting the f∗a/Ma terms, we obtain after adding all contributions: with: Note that in general, the matrix [βab] is not symmetric. In the manuscript, we make the assumption that the interaction frequency is the same in all communities, namely pa = p.
5 Spreading with a wave of infection
5.1 ODE for the wave profile
We consider the SIR model on a 2d lattice, with infections limited to nearest-neighbors, and with uniform population Ma = M. For the sake of simplicity, we will consider the rescaled S, I, R variables with S +I + R = M = 1. We write: where i and j denote the indices along the first and second dimensions. We introduce the discrete laplacian: where l is the lattice spacing. Equation (41) becomes: where a = 4 + α/β. In this study, we will consider α = β, so a = 5. In the continuum we have: Limiting ourselves to solutions which only depend on one space variable, namely S(t, x, y) = S(t, x), and after rescaling the time variable, t ← t(aβ)−1, the space variable, x ← xla−1/2, and the recovery rate, γ ← (αβ)γ, we obtain: We are interested in the traveling front solutions, namely S(x, t) = g(x −vt) and I(x, t), = h(x − vt), where v is the velocity of the traveling waves. Here, g(z) and h(z) are the shape functions for the propagating front. Making those substituations in equation (44), we obtain the ODE:
5.2 Velocity selection
The solution to equation (45) specifies the shape functions g and h for a traveling front solution with velocity v. We now characterize what are the admissible velocities. We start by computing the jacobian of the function on the right-handside of equation (45): and its characteristic polynomial: The fixed points of the ODE in equation (45) are of the form (0, 0, g), with g ∈ [0, 1]. At those points, there is one zero eigenvalue, and two eigenvalues satisfying the equation: The fraction of susceptible individuals S(x,t), is a decreasing function of t. Thus it forbids any oscillatory behavior in the shape function g(z). Therefore the two eigenvalues must be real, which results in the condition: The function in the right-handside of the inequality is an increasing function of g. Furthermore, the fixed point (0, 0, 1) must belong to the front solution since it corresponds to the unstable initial condition with only susceptible individuals. Therefore, enforcing that all admissible fixed points have real eigenvalues yields the condition:
5.3 Upper bound on the velocity
Since S ≤ 1, by retracing the steps from equation (41) to equation (44) we have that: Let us define Ĩ, which is a solution to Starting from the same initial condition, I(0, x) = Ĩ(0, x), we must have at all times: The function ϕ(x) = (1 − γ)x satisfies the condition [41] ϕ′ (x) ≤ ϕ′(0) for all x ∈ [0, 1], therefore equation (52) falls into the Fisher-Kolmogorov-Petrovsky-Piscunov (FKPP) universality class [41–44]. In partic ular, for a step initial condition, Ĩ(t, x) evolves into a moving front with velocity .
Because of the inequality in equation (53), any moving front I must move at a velocity smaller than the velocity of the moving front Ĩ. Otherwise, the I front would eventually pass the Ĩ front, which violates equation (53). Therefore, the velocity of the front I must satisfy:
5.4 Pulled wave
Combining equations (50) and (54), we obtain that any moving front resulting from equation (44) moves at velocity . Therefore it falls in the FKPP universality class and it is a pulled wave.
We have solved numerically equation (44) for several values of γ, starting from the initial condition: S(0, x) = 1, I(0, x) = 0, ∀x > 0 and S(0, 0) = 0, I(0, 0) = 1. In Figure S5a, we show the position of the moving front as a function of time. The position of the moving front, m(t), was defined such that S(t, m(t)) = (1 + S∞)/2. Clearly, after a transient regime, the solution evolves into a moving front. We plotted the velocity v as a function of γ in Figure S5b, which agrees with the theoretical prediction .
5.5 Shape of the traveling front
Since a traveling front solution must travel at velocity vc, it follows that the shape is uniquely determined by the recovery rate γ. The shape functions g and h satisfy the ODE: The shape of the traveling front must be such that: We determine the residual fraction of susceptible individuals, S∞, using the parametrization in equation (27) with u = e−v, and solving: Since the last equation is solved numerically, it is useful to derive two bounds. First, the condition S∞ > 0 yields the condition v∞ < v∗∗ = 1/y. Second, the function e−x + γx − 1 is strictly negative in the interval]0, v∞[, with a minimum reached at v∗ = − ln γ. Thus we have v∗ < v∞ < v∗∗.
Following the stability analysis introduced hereabove, (0, 0, g) is an unstable fixed point of the trajectory (the jacobian has one positive eigenvalue) as long as g < γ Therefore, is a stable fixed point of the trajectory (the jacobian has only negative eigenvalues), whereas is an unstable fixed point. We can therefore solve the ODE in equation (55) with the initial condition X = Xi and let the trajectory converge toward X f. Wave shapes for several values of γ are shown in Figure S6.
6 Level statistics and connectedness
6.1 The level spacing distribution of the flux matrix
We consider the level statistics of the symmetric matrix Fab = (fab + fba)/(MaMb), where fab is the number of people residing in community a visiting community b during one day and a is the total population of site a, see equation (19). The symmetric flux is related to the infectivity matrix through βab = pβFabMb, where the factor pβ is estimated by a fit to the SafeGraph mobility data over K = 353 days from March 1st 2020 to February 16th 2021 and is given by pβ = 0.05 ± 0.02, as is shown in Figure 2b of the main text. Following common practices in the treatment of random matrices [45], the diagonal elements of the K matrices of size N×N were drawn from a generic gamma distribution with mean and variance . To obtain the unfolded spectra [45], the eigenvalues Eak are re-scaled as where G(E) is the empirical level staircase function G(E) ≡ (NK)−1 ∑ak Θ(E − Eak) (i.e. the cumulative eigenvalue probability function) and is its smoothed interpolation [52–54]. The nearest level spacings sak ≡ ϵa+1,k − ϵa,k are then normalized so that the average level spacing across the entire spectrum is equal to one for all values of k: Figure S7a shows the level spacing distribution of the unfolded spectrum, , where Ns is the number of independent spacings (for K matrices, Ns = K(N − 1)). The level statistics of the full matrix, Figure S7a, is significantly different than the level statistics obtained after all the links of distance larger than 170 km have been omitted (Figure S7b). Clearly, the empirical distribution p(s) interpolates between the Wigner-Dyson distribution of the GOE ensemble (more precisely, the Wigner surmise [45]) that exhibits linear level repulsion typical of extended correlated states: and the Poisson distribution describing independent localized states: In both equations (60) and (61), the average level spacing is normalized to unity: .
6.2 KL divergence and the level spacing entropy
To quantify the difference between Figure S7a and Figure S7b, we have calculated the Kullback-Leibler divergence between the empirical spacing distribution, p(s), and the Poisson distribution where D is the KL divergence and S is the entropy of p(s). The KL divergence of the full matrix is then D ≃ 0.11. The KL divergence of the truncated matrix, that shows significantly less level repulsion, and is closer to Poisson statistics, is smaller than that of the full matrix by a factor of three: D ≃ 0.04. The KL divergence between the GOE and the Poisson distributions is where γE = 0.57721 … is the Euler constant. Therefore, in terms relative to D0 we obtain: D/D0 = {0.39, 0.12} for the full and the truncated matrices respectively. Thus, D(SafeGraph||Poisson) is smaller than D0 and approaches zero as links are being removed. Equivalently, since (i) the entropy of the Poisson distribution is equal to its average ⟨s⟩ = 1 and (ii) the entropy of Wigner-Dyson distribution is 1 − D0 ≃ 0.72, the level spacing entropy varies in the range 1 − D0 < S(SafeGraph)≤1 and it approaches unity as links are removed and the states get localized.
The entropy as a function of the cutoff distance dc is shown in Figure S7d. The distribution p(s) is found by (i) removing all links of distance larger than dc (ii) unfolding the spectra of the ensemble of K truncated matrices and (iii) repeating the procedure for each value of dc. The corresponding size of the epidemic Ω(∞) and its error ΔΩ are estimated by solving equations (29) and (30) for each one of the truncated matrices and then calculating the mean and standard deviation over K samples. In the numerical computation we assumed a recovery rate γ = 0.125 d−1 and initial fraction of infections ϵ = 10−4 uniform for all the communities. The epidemic size Ω/Ω0 as a function of dc, relative to the value of the unrestricted epidemic size Ω0 (i.e. for the case of a full untruncated matrix), is shown in Figure S7c. Thus, a cutoff distance of dc ≃ 500 km leads to a 10% reduction in the epidemic size. Similarly, dc ≃ 200 km reduces the epidemic size by 25%. These results are in accordance with the time evolution of the epidemic as is presented in Figure 3b of the main text.
6.3 Connectedness and epidemic size
The crossover from Wigner-Dyson to Poisson distribution as links between communities are being successively removed, suggests a policy of isolation which can lead to an effective reduction of the epidemic size. Edges connecting vertices a and b can be removed in several ways. For example, (i) according to the geographical distance d(a, b) as has already been done in section 6.2, (ii) according to the “nominal” distance |a − b|, or (iii) by using the edge-betweenness centrality index, as proposed by Girvan and Newman [49]. Note, that in each one of these cases, one can as well consider a moderated mitigation policy: instead of eliminating links completely, one can impose constraints on the flux of people that are allowed to commute via central pre-determined links.
To study mitigation according to the nominal distance |a − b|, we consider the ensemble of banded matrices Fab with half-bandwidth , such that the number of non-zero diagonals is 2B + 1 irrespective of the underlying physical distance. The increase of Ω/Ω0 as a function of B, along with a corresponding decrease of the entropy S(SafeGraph), is shown in Figure 5d of the main text. Comparing to Figure 5c of the main text, a reduction of 10% in Ω/Ω0 amounts to entropy values of S = {0.94, 0.98} for the geographical and nominal cutoffs, respectively. One may argue that mitigation based on “nominal” distance is meaningless, because such a distance changes under the permutations of sites. Indeed, a certain link may either be kept as is, or be omitted, according to an arbitrary re-ordering of communities. However, performing such permutation only reshuffles the matrices of the random ensemble among themselves. Consequently, for K, N ≫ 1, the level statistics and the resulting epidemic size are hardly affected.
Centrality indices have been used for detecting the modular structure of social and biological networks [49, 50]. The edge-betweenness centrality of edge (a, b) is defined as [51]: , where (t, s) stands for “target” and “source” vertices, respectively. Here, σts are shortest paths going t ← s and σts (a, b) are such paths which, in addition, go along edge a, b. The entries σs are weighted by the flux that they can carry (otherwise, unweighted paths take only binary values σts = 0, 1). The summation is carried over all (t, s) pairs that are different from (a, b). As observed in [49], edges running between loosely-connected communities should have high values of C.
The scheme proposed in [49] for identifying communities in a network is as follows: (i) calculate C for all edges of the network (ii) remove the edge with highest C (iii) re-calculate C for all edges affected by the removal. (iv) Repeat from step (ii) until no edges remain. Here, we are using the same procedure for mitigating the epidemic. There are few practical modifications though:
Edge-betweenness is calculated for the mean matrix . The removal of links is then applied to each matrix separately.
For a faster computation of step (ii), edges are removed in chunks rather than one-by-one. This is done with the help of the efficient shareware package MatlabBGL that is able to rank all the edges of H in a single instance. The initial size of a chunk is 10000 (about 1 % of the total number of links in H) and it then decreases adaptively.
The computation terminates when the resulting KL divergence is sufficiently close to zero. Our modified procedure is clearly sub-optimal. We found, however, that compared to the exhaustive search (i)-(iii), the results almost don’t change because the variations of the KL divergence at the initial stages are relatively small.
The entropy S(SafeGraph) and the size of epidemic Ω/Ω0 as a function of the number of removed links, NC (i.e. number of cuts), are shown in Figure S8. The size of epidemic drops abruptly when the relative number of cuts C ≡ NC/N/(N − 1) exceeds the value C = 0.6 (from a relative size of 95 % at C = 0.6 to 75 % at C = 0.7). This drop is accompanied by the increase of entropy around the same value of C. Thus, mitigation based on edge-betweenness performs better than either the distance or the nominal cutoffs, none of which exhibits a similar abrupt crossover. The betweenness index outperforms the other two methods mainly because it is sensitive to the flux that is flowing through links. The “transition” at c = 0.6 is expected to become much sharper as N ≫ 1 approaches the thermodynamic limit. The evolution of p(s) as links are gradually being removed is demonstrated in movies S5-S7.
Acknowledgements
The authors acknowledge the generous support of the Chan Zuckerberg Initiative and Chan Zuckerberg Biohub. The authors also thank Mark Rychnovsky for useful conversations on random matrix theory in the early stages of this project, and Naomi Rankin for her help with a preliminary version of the lattice simulations. D.Y. acknowledges support by MINECO (Spain) through Grant No. PGC2018-094684-B-C21, partially funded by the European Regional Development Fund (FEDER). We also are grateful to the anonymous reviewers of Scientific reports who asked deep questions, that helped us to better understand our results.
Footnotes
Revisions include an evaluation of the significance of the mobility-derived infectivity matrix and a more thorough understanding of the traveling waves arising when a pathogen spreads on a spatial network through nearest-neighbors interactions.