Uncovering heterogeneous inter-community disease transmission from neutral allele frequency time series ======================================================================================================= * Takashi Okada * Giulio Isacchini * QinQin Yu * Oskar Hallatschek ## Abstract The COVID-19 pandemic has underscored the critical need for accurate epidemic forecasting to predict pathogen spread and evolution, anticipate healthcare challenges, and evaluate intervention strategies. The reliability of these forecasts hinges on detailed knowledge of disease transmission across different population segments, which may be inferred from within-community transmission rates via proxy data, such as contact surveys and mobility data. However, these approaches are indirect, making it difficult to accurately estimate rare transmissions between socially or geographically distant communities. We show that the steep ramp up of genome sequencing surveillance during the pandemic can be leveraged to *directly* identify transmission patterns between communities. Specifically, our approach uses a hidden Markov model to infer the fraction of infections a community imports from other communities based on how rapidly the allele frequencies in the focal community converge to those in the donor communities. Applying this method to SARS-CoV-2 sequencing data from England and the U.S., we uncover networks of inter-community disease transmission that, while broadly reflecting geographical relationships, also expose epidemiologically significant long-range interactions. We provide evidence that transmission between regions can substantially change between waves of variants of concern, both in magnitude and direction, and analyze how the inferred plasticity and heterogeneity in inter-community transmission impact evolutionary forecasts. Overall, our study high-lights population genomic time series data as a crucial record of epidemiological interactions, which can be deciphered using tree-free inference methods. ## Introduction Despite extensive efforts to model epidemiological dynamics, particularly during the COVID-19 pandemic, accurately predicting epidemic trajectories has proven difficult for populations with heterogeneous transmissibility patterns (1–5). Heterogeneities arise from many factors, including varying population densities, mobility patterns, immunity levels, behaviors, and non-pharmaceutical interventions. While the ensuing transmission rate variations are difficult to predict, ignoring them reduces the applicability of model-based forecasts and may result in misguided interventions (6–9). In principle, metapopulation models can account for known heterogeneities in the host population by dividing it into suitably many subpopulations, which are distinguished by their epidemiological characteristics. A crucial input to these models is a matrix of parameters that represents the rates at which infections are transmitted between subpopulations. Referred to as infectivity matrix (10–12), it encodes how individuals in different population groups acquire infections from other groups. Metapopulation models can be used to predict the impact of heterogeneities on the disease spreading and evolution. By perturbing the transmission parameters, they also allow for the exploration of group-specific non-pharmaceutical interventions, immunity or behavioral changes. As the number *n* of subpopulations grows, estimating the *n*2 parameters of the infectivity matrix becomes increasingly challenging. Valuable clues about the transmission rates have been gleaned from measuring how often members of different groups come in contact. For example, the real-time tracking of cell phones enables estimating the mobility flux between different regions (13–15) and surveys can be used to infer the mixing between different age groups (16, 17). However, converting these fluxes into transmission rates requires additional assumptions about the infection dynamics during contact. For example, how precisely variations in mask-wearing or immunity levels lower transmission rates is difficult to measure directly (18, 19), thus, requiring estimation methods. Furthermore, differences in local interventions and individual behaviors can weaken the relationship between mobility metrics and transmission rates, thereby reducing their predictive power (20). To transcend these limitations, a direct data-driven approach to infer heterogeneous disease transmission rates is needed. Ground truths would be valuable even retrospectively, as they could be used to falsify transmission rates obtained from indirect methods and, more broadly, to develop improved evidence-based forecasting of epidemic spread and selective sweeps of new variants. Lastly, knowledge of how infections cross population boundaries can also inform phylogenetic approaches to embed genealogies of past outbreaks into geographical space, which are usually based on the assumption that lineages follow unbiased random walks (21, 22). We argue that the steep ramp up of the surveillance of virus sequence variants during the COVID-19 pandemic offers unprecedented opportunities to use population genetic tools to obtain a direct view of the underlying metapopulation transmission network. Numerous studies have already demonstrated that the related effort of molecular source attribution (23–25) substantially gains in precision by the abundance of data. For example, embedding the phylogenetic tree of the sampled viruses within the geographical landscape of England has allowed for the reconstruction of detailed spatio-temporal infection processes for different variants of con-cern (26, 27). Nonetheless, inferring actual infection processes necessitates making significant assumptions about the dispersal of lineages (21, 22). Furthermore, phylogenetic methods require constructing genealogical trees, which is computationally challenging for large datasets like the complete set of sequenced SARS-CoV-2 samples. Our objective is to create a computationally efficient, tree-free approach to infer infection matrices directly from neutral allele frequency time series. We show that, with adequate data, it is possible to map entire networks of disease transmission between communities. By analyzing genomic SARS-CoV-2 data from England, obtained from the COVID-19 Genomics UK Consortium (COG-UK) (28), and data from the USA obtained from GISAID (29), we highlight differences across variants of concern, examine the statistical characteristics of the resulting transmission networks, explore the significance of long- and short-range connections, and assess their impact on the spread of new variants. ## Results ### The Basic Idea We can illustrate the core of our method by examining how the importation of cases affects allele frequency differences between communities. Imagine two populations, A and B, that are initially epidemiologically isolated due to a complete traffic lockdown, which is later lifted. The population genetic simulations in Fig. 1A show that, during the isolation period, allele frequencies fluctuate independently in each population since there is no interaction between both populations. However, once the lockdown ends, both communities begin exchanging infections, causing their allele frequencies to gradually converge. The higher the transmission rate between the two populations, the faster this frequency alignment occurs. Our method leverages this phenomenon, computationally linking the convergence of allele frequencies to inter-community transmission. ![Fig. 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F1.medium.gif) [Fig. 1.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F1) Fig. 1. Inter-community transmission promotes allele frequency convergence. (**A**) Simulated dynamics of a neutral variant to illustrate the effect of inter-community disease transmission between two communities, A and B. Ten simulated trajectories are shown with their average as a dashed line. When the two communities are isolated by a traffic lockdown, their allele frequencies fluctuate independently due to genetic drift acting independently in both populations. After the lockdown is lifted (dotted vertical line), the allele frequencies tend to converge due to the exchange of infections between the communities. The rate of convergence reflects the rate of case importations. (**B**) Time series of allele frequencies for two prevalent SARS-CoV-2 alleles (top and bottom graphs), exemplifying allele frequency convergence during the Delta wave in England. Each color corresponds to one of the nine regions of England shown in (**C**). (**D**) Averaged across approximately independent alleles in our data set, the rate of convergence substantially differs for different region pairs. The markers show the autocorrelation function *R**ij* (*τ*) of the allele frequency differences between regions *i* and *j*. We focus on the three region pairs indicated by arrows in Fig. **C**: London and North East (blue); London and Yorkshire and the Humber (green); and London and South East (red) (see SI Sec. S.1 for the procedure for selecting the alleles over which we have averaged and for the mathematical definition of *R**ij* (*τ*)). The solid lines represent exponential fits. (**E**) The relative abundance of the top 20 Delta variant lineages over time in North East, Yorkshire and The Humber, London, and South East. Decay rates similar to those in Fig. D are obtained from lineage-frequency data (Fig. S5). Before delving into our method, it is useful to first verify that the expected alignment of frequency trajectories is indeed observable in SARS-CoV-2 data. Focusing for clarity on just two alleles of the Delta variant, Fig. 1B demonstrates that allele frequencies in different regions in England tend to converge over time. Moreover, Fig. 1D demonstrates that, on average, allele frequency mismatches between nearby regions, such as London and the South East, diminish faster than those between distant regions, like London and the North East, in line with an isolation-by-distance expectation (32) (see SI Fig. S5 for additional comparisons between region pairs). The Muller plots in Fig. 1E illustrate how the sample frequencies of the top 20 Delta variant lineages in the COG-UK phylogenetic tree fluctuated across regions in England during the 2021 Delta wave (SI Sec. S.1). ### Overview of the inference approach To mathematize the above idea, we resort to the principles by which lineage frequencies evolve under neutrality. Consider a population composed of *n* sub-populations, distinguished by location (different cities or districts), age, ethnicity, or any other feature that might influence the epidemic characteristics of its members. Under neutrality, we can assume that, up to random fluctuations, the frequency *X**i*(*t*) of a particular lineage in population *i* at time *t* depends linearly on the lineage frequencies {*X**j*(*τ*)}*j*=1…*n* at some earlier time *τ < t*, ![Formula][1] where **A** is a right-stochastic *n* × *n* matrix: its eleI:ments are non-negative and, within each row, sum up to one, ![Graphic][2]. For now, we do not need to know anything about the noise term, except that its expectation vanishes (See Sec. 1.2 for a derivation of Eq. 1). The coefficient **A***ij* represents the proportion of infec-tions that population *i* imports from population *j*, thus capturing cross-infection rates between regions. Therefore, we refer to **A** as the *importation-rate matrix*. Yet, an equally important dual interpretation of the matrix **A** arises when one tries to model *backward* processes, where one starts from a given sampled genome and follows its lineage of ancestors backward in time. For this process, **A***ij*(*t*; *τ*) describes the probability that the lineage jumps from population *i* to popu-lation *j* as time is run backward from *t* to *τ*. This backward-in-time interpretation is needed, for example, when one tries to embed phylogenetic trees within a metapopulation, because **A***ij*(*t*; *τ*) provides the probabilistic weight for assigning the branch of a phylogenetic tree to a transition from population *i* to *j*. We will frequently return to the interpretation of the rows of **A** as backward transition probabilities, as it helps develop intuition and hypotheses for the structure of **A**. For example, the constraint Σ*j***A***ij* = 1 is obviously required for the interpretation of the rows of **A** as probability distributions. The most straightforward way to estimate importation rates, **A***ij*, using the model described above, is to minimize the least squares difference between predicted and observed lineage frequencies over all right stochastic matrices. This method is most effective when the true lineage frequencies are known; however, in practice, these frequencies are derived from a random sample of infected individuals. As a result, the observed frequencies are affected by sampling noise, introducing biases into the least squares estimation, as demonstrated in SI Sec. S.2.4. To avoid sampling biases, we employ a hidden Markov model (HMM, sketched in Fig. 2A). The HMM analyzes the entire trajectory of the time series, treating true frequencies as hidden states and incorporating genetic drift and sampling error as Gaussian noise processes. Posterior distributions of the importation rates and the noise strengths are inferred using a Markov Chain Monte Carlo (MCMC) method. To speed up training, we have also implemented an Expectation-Maximization (EM) algorithm. For tracking neutral lineages, mutations are used as a tree-free alternative, with clustering techniques employed to ensure neutrality and reduce statistical errors (33). Further details about the inference method and the data processing are reported in Sec. 1.1–1.3 and SI Sec. S.2. ![Fig. 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F2.medium.gif) [Fig. 2.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F2) Fig. 2. Inference method overview. We use a hidden Markov model (HMM) with continuous hidden and observed states (a Kalman filter (30)) to infer transmission networks from allele frequency time series. (**A**) A schematic of the HMM for the frequency dynamics. (**B**) To demonstrate the utility of our approach, we here use the depicted 3-deme model to generate allele frequency time series as input for the inference of the importation rates (arrows). In these simulations, the frequencies are evolved according to Eq. 5, with their initial frequencies being the same as those observed in EE, LDN, and SE of England during the week of June 20-26, 2021. We used effective population sizes comparable to those measured for England regions during the Delta wave period (31). The measurement noise overdispersion parameter (defined in the Methods section) is set higher than the value actually inferred, for illustrative purposes. (**C**) The input for the inference comprises weekly sampled sequences (left) and observed frequencies of lineages (right) within a focal variant. (**D**) The output is posterior distributions of the 6 importation rates of the 3 *×* 3 network (left), of the local inferred population size (middle), and of the deviation from uniform sampling (right). Each plot has a vertical dashed line, indicating the true value of the considered observable. In Figs. 2B-D, we evaluate the effectiveness of our method to infer a known importation-rate matrix from synthetic data simulated under a metapopulation model of three demes interacting through a 3 × 3 matrix. The simulation duration (7 weeks), number of lineages, and effective population size were chosen to reflect the conditions during the Delta plateau period (Aug 2021 – Dec 2021) in England (31). ### Application to SARS-CoV-2 in England To apply our method to real-world data, we first focus on England due to its large number of sequenced SARS-CoV-2 cases since the early stages of the pandemic (SI Fig. S25). The fall of 2021 constitutes a particularly suitable test case because of its consistently high and relatively stable incident numbers over more than four months (SI Fig. S26). We initially concentrate on this plateau phase of the Delta wave because it offers long allele frequency time series data with relatively low statistical error. In later sections, we also apply our method to the Alpha and Omicron waves in England, as well as the Delta wave in the USA. Following our observations of regional allele frequency fluctuations in England (Figs. 1C-E), we further subdivided each region to create a total of 50 subunits, which we call demes, to enhance the spatial resolution of our interaction networks. Our subdivision algorithm ensures that each deme contributes a roughly similar number of sequences (see SI Sec. S.9.1). In SI Sec. S.4, we provide evidence that a finer spatial resolution leads to less reliable results, given the data we have. Since sampling and incidence reporting typically follow a weekly cycle, we run our inference with a time step of one week (Δ*t* = 1). This time step aligns well with the generation time of SARS-CoV-2, which has an average infectious period of approximately 4–6 days depending on the variant (34, 35). ### Transmission networks mirror geography After inferring the importation-rate matrix **A**, we performed hierarchical clustering to order the 50 demes by the similarity of their transmission networks, as quantified using the Jensen-Shannon divergence between their respective rows (SI Sec. S.7). Remarkably, the resulting matrix exhibits a pronounced block structure (Fig. 3A) with different blocks corresponding to geographically well-connected regions, such as the demes within London or near Liverpool (NE-1) and Manchester (NE-2). When we plot the major infection paths (the largest 5% of the off-diagonal matrix elements) on a map of England (Fig. 3B), a network of predominantly local connections becomes apparent. This aligns with a general decline of importation rates with distance and larger importation rates within than between regions (Fig. 3C). ![Fig. 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F3.medium.gif) [Fig. 3.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F3) Fig. 3. Network inference reveals heterogeneity, distance dependence and stochasticity of disease transmission (England, Delta wave period, from Jun 20 to Sep 25, 2021). Heat map of the matrix **A**. The matrix coefficient **A***ij* represents the proportion of infections that population *i* imports from population *j*, thus encoding cross-infection rates between regions. Hierarchical clustering was used to order the 50 demes according to the similarity between rows of the matrix, as quantified by the Jensen-Shannon divergence (see SI Sec. S.6). (**B**) Illustration of the main infection pathways (the largest 5% of the off-diagonal matrix elements). (**C**) Inferred probability distribution of jump distances: A jump from deme *j* to deme *i* is assumed to occur with probability *∝* **A***ij* *N**i**/N**j*, where *N**i* represents the population size of deme *i* (see SI Sec. S.6 for the underlying rationale). Inset: Histogram of matrix elements within and across regions. (**D**) Multidimensional scaling reveals an approximately geographic arrangement of populations on a two-dimensional plane. MDS is a procedure that projects high-dimensional data on a plane so as to maintain pair-wise distances as closely as possible. Here, pairwise distances are taken to be the square roots of Jensen-Shannon divergence between the rows of the matrix presented in Fig. **A**. In the right-side plot, the square root of Jensen-Shannon distance is compared with the physical distance (the Spearman correlation = 0.74). To further explore how well epidemiological interactions mirror geographic relationships, we sought a two-dimensional representation of the entire network of importation rates. To this end, we performed a multidimensional scaling (MDS) analysis (36) to embed the locations of all the demes in a plane such that the in-plane distance between any two locations measures how different their vectors of importation rates are. The result, properly rotated, crudely resembles the map of England (Fig. 3D). Thus, frequency fluctuations encode geographic relationships even beyond the pairwise level (see SI Sec. S.7 for details of the MDS and further analyses). ### Transmission networks are asymmetric and evolve The planar MDS representation only provides a time-averaged picture and also ignores any asymmetry in the epidemiological interactions among populations. To investigate the dynamics and asymmetry of infection, we focus on London (LDN) and its neighboring regions East of England (EE) and South East (SE). Applying our method to the Alpha, Delta, and Omicron waves results in the inter-community transmission rate matrices illustrated in Figs. 4A-C. Asymmetry is evident in all cases, with notable changes between waves. We find that the relative influence of different regions shifted across waves. During the Alpha and Delta waves, London generally had a stronger impact on South East and East of England than vice versa. Interestingly, within waves, we detect only relatively modest shifts (Fig. 4D), such as the growing influence of London on South East and East of England from July to October during the Delta wave. ![Fig. 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F4.medium.gif) [Fig. 4.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F4) Fig. 4. Time-dependence of transmission networks. (**A**-**C**) The 3×3 importation-rate matrix between London and its two neighboring regions, East of England and South of England, is inferred using the HMM-MCMC method. Arrows represent the mean importation rates for the Alpha, Delta, and Omicron variants. (**D**) Time series showing the ratio of importation rates between each pair of regions, with shaded areas indicating the upper and lower quartiles. At each timepoint, the importation-rate matrix is inferred using a time window of 7 weeks centered around that timepoint. ### Inference indicates pronounced genetic drift and uniform sampling noise Our HMM framework not only estimates importation rates but also quantifies the strength of genetic drift via the effective population size, *N*e,*i*, and assesses the impact of sampling noise through the overdispersion parameter, *c**i*. We compared the inferred effective population size with the number of infected individuals estimated by the COVID-19 Infection Survey (37). While there is a strong correlation between the two, the inferred effective population size was consistently lower by factors ranging from 18 to 29 across regions (Fig. 5). This discrepancy between effective and infected population sizes, which aligns with earlier studies aggregating over all England (31), may be explained by factors such as superspreading events and community structure (31, 38–43), as well as potential mutational fitness effects, although significantly non-neutral alleles were removed prior to the inference. In contrast, the overdispersion parameter *c**i* was close to one (Fig. S18), indicating that sampling error is uniform and of expected magnitude in the COVID-19 Infection Survey. ![Fig. 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F5.medium.gif) [Fig. 5.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F5) Fig. 5. Stochasticity of disease transmission, quantified by the effective population size. For the Delta wave in England, the inferred effective population size at the region level (x-axis) is compared to the estimated number of infected individuals (y-axis). The Pearson’s correlation is 0.73 (p-value = 0.026). The error bars represent the 95% confidence interval (CI), and the solid line indicates the best linear fit. ### Comparison with the USA and mobility proxies For comparison, we also applied our method to the USA, focusing on the Delta wave (Jul 18 – Oct 30, 2021). We divided the USA into 30 subunits using the same subdivision algorithm employed in the analysis of England (SI Sec. S.9.2). Fig. 6A illustrates major importation pathways. Similar to England, we found that inter-community importation rates in the US mirror geographic relationships (Fig. S24). ![Fig. 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F6.medium.gif) [Fig. 6.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F6) Fig. 6. Transmission network in the USA during the Delta wave (30 demes, Jul 18, 2021 – Oct 30, 2021) and comparison to mobility data. (**A**) Illustration of the main infection pathways. Arrows indicate the largest 5% of the off-diagonal matrix elements **A***ij*. See Fig. S23 for the inferred matrix. (**B**) Comparison between the jump kernel inferred from the 30 *×* 30 importation-rate matrix (blue) and indirectly estimated using the SafeGraph data (orange). The US data also allow us to compare the jump kernel with what would be expected based solely on mobility data collected by SafeGraph during the period of January to May 2020 (44). SafeGraph gathers data from a large panel of anonymous mobile devices in the US, recording an average “home” location over six weeks and all locations where the device pauses for at least one minute. To convert these data to the lineage jump probability, we assume that a lineage jumps from deme *i* to deme *j* with a probability proportional to ![Graphic][3], where *F**i*←*j* is the rate of trips from deme *j* to *i*, and *N**j* is the population size of source deme *j* (see SI Sec. S.1.4 for details). Our analysis shows that the jump kernel directly inferred from the sequencing data decays substantially slower with distance than predicted by the mobility proxy data (Fig. 6B). This suggests that deducing epidemiological interactions from mobility data may require a different model or different information than the aggregated cell phone movements provided by SafeGraph. ### Eigenvalue decomposition Given the potentially large number of inferred importation rates—2,500 in the case of England—it is important to determine which aspects of the transmission network are truly relevant for predicting epidemiological trajectories. Intuitively, what is relevant should depend on the timescale of interest. For instance, if we are concerned with the dynamics from one viral generation to the next, we need a detailed transmission network that captures the strongest importation rates, such as those within a city. Conversely, if we are interested in the dynamics over several weeks, the fine structure is less critical. After a brief relaxation period, the different parts of a city with strong intra-city connections will fluctuate coherently, allowing cities to be effectively represented as single compartments with weaker inter-city links. Our dynamical model of neutral frequencies enables us to formalize the intuitive concept of timescale-dependent collective variations. First, note from Eq. 1 that our best prediction for the allele frequencies at future time *t* is given by the *t*th power of the matrix **A** applied to the current allele frequencies, ![Formula][4] To compute the matrix power on the right-hand side, it is natural to decompose the matrix **A** in terms of its eigenvalues, upon which we get ![Formula][5] where ***v****µ* is the right eigenvector of **A** corresponding to the eigenvalue *λ**µ*. The mode amplitudes *a**µ* are determined by the inner product ⟨***u****µ* |***X***(0)⟩ between the *µ*th left eigenvector ***u****µ* and the initial frequency vector ***X***(0). We assume that the eigenvalues are arranged in descending order of magnitude and that the eigenvectors are normalized such that ![Graphic][6]. We expect the dynamics to mix all demes over long times, unless there are isolated pockets, leading to an eventual equalization of the expected allele frequencies across demes. This outcome is guaranteed (i) if **A** has an eigenvalue *λ* = 1 corresponding to a right eigenvector of constant frequency *X**i* = 1 and (ii) if the magnitudes of all other eigenvalues are less than 1, |*λ**µ>*| *<* 1. These conditions are mathematically ensured by the right-stochasticity of **A**, provided that it is also irreducible. ### Relaxation dynamics Before discussing the long-term stationary state, we first address the relaxation toward station-arity. The amplitude of mode *µ* decays e-fold after a time *τ**µ* ≡ −ln−1(|*λ**µ*|), which diverges as |*λ**µ*| approaches unity. Generally, many eigenmodes influence the relaxation pro-cess, as shown in Fig. 7A for the Delta wave in England. However, the total time to reach the steady state is controlled by the longest relaxation time *τ*1. ![Fig. 7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F7.medium.gif) [Fig. 7.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F7) Fig. 7. Relaxation dynamics and reproductive values. (**A**) The absolute values of the left eigenvectors ***v****n* of the 50 *×* 50 matrix during the Delta wave in England, presented in Fig. 3A. Each eigenvector ***v****n* is normalized such that Σ*i* | (***v****n*)*i*|= 1. (**B**) The longest epidemiological relaxation times [week] is calculated for each major variant in England from the regional importation-rate matrix. For comparison, we also show the relaxation times obtained from a modified matrix where transmission between non-neighboring regions is excluded. (**C**) The per-capita reproductive value for the Alpha and Delta variant in England. Here, the reproductive value*π**i* is evaluated by inferring the importation-rate matrix **A** at the region level, where NE, which has fewer sequences, is combined with YH to obtain reliable results. The values of *π**i* moderately depend on the choice of time intervals used for the inference (see Fig. S19). In Fig. 7B, we display *τ*1 for the Alpha, Delta, and Omi-cron waves in England. The relaxation times differ significantly between waves; for instance, the relaxation time during the Delta wave is approximately 10 weeks, which is less than half of the relaxation time during the Alpha wave. What features of the transmission network determine the overall relaxation time? The strongest interactions, typically short-range between neighboring demes, are certainly significant. However, we also identified relatively weak long-distance connections, whose impact is less clear. To assess their effect, we artificially removed all inferred long-distance connections (those spanning non-neighboring regions) and reevaluated the relaxation spectrum. We found that eliminating these long-distance interactions significantly increased *τ*1, by more than fourfold during the Delta wave, for example. This finding suggests that even very infrequent longdistance connections must be considered to accurately predict the speed of a pandemic. ### Equilibration and reproductive value To illuminate the long-term stationary state, let us consider a neutral allele initially fixed in deme *i* and absent in all other demes. Due to case importations, allele frequencies are expected to gradually even out across demes. Consequently, the long-term expected frequency, *π**i*, is the same in all demes and depends only on the identity *i* of the initial deme. The vector ***π*** = (*π*1, …, *π**n*)⊤, composed of these primary-deme-dependent long-term frequencies, is proportional to the leading left eigenvector previously denoted by ***u******* and normalized to sum to 1, Σ*i* *π**i* = 1. *π**i* represents the expected contribution of deme *i* to the future gene pool, which is called the class reproductive value (45, 46). When we normalize the reproductive value of a deme *i* by its number *I**i* of infected individuals, as measured by the COVID-19 Infection Survey (37), we obtain the per-capita reproductive value *π**i**/I**i*, which is the probability that a viral genome picked at random from an infected individual in the distant future traces its ancestry back to this individual in the present generation (45, 46). If the infection dynamics were the same everywhere in England, one would expect individual reproductive values to be the same across all regions. Interestingly, we find that per-capita reproductive values vary by about a factor of 2 across England in both the Alpha and the Delta wave, see Fig. 7C. In the same data set, we find that ratios of reproductive values approximately predict ratios of case importation rates (Sec. 1.4 and SI Sec. S.3), ![Formula][7] This relation with an equal sign is called detailed balance. Under most epidemiological models, this condition is expected because otherwise the lineage jump process exhibits unwarranted cyclic dynamics in equilibrium. Thus, finding detailed balance is reassuring of basic assumptions in epidemiological modeling. Additionally, detailed balance, Eq. 2, helps relate heterogeneities in reproductive value to heterogeneities in transmission coefficients. For example, a lower per-capita reproductive value for East of England (EE) is observed both during the Alpha and Delta waves in England (Fig. 7C right), suggesting that the per-capita importation rate from EE to any other region is systematically lower than the other way around. In SI Sec. S.3, we further investigate these importation-rate patterns, demonstrating that the heterogeneities suggested by the reproductive values can be confirmed directly from the importation-rate matrix. We also argue that possible variations in reporting timing could affect heterogeneous patterns of *π**i*, highlighting the importance of accurate reporting dates for precise inference (Fig. S11). ### Selected variants We have previously emphasized that inferring **A***ij* sheds light on case importation patterns, their variations across time and space, their potential for control, and their role in embedding phylogenetic trees. But knowing the **A***ij* of a variant is useful also because it enables predicting its behavior if it is under selection. In the SI Sec. S.5, we show how selective forces modify the allele frequency dynamics in an SIR or SEIR model. In particular, when a variant under positive selection is rare, its dynamics follows the neutral dynamics in Eq. 5, with the modification **A***ii* → **A***ii* + *σ*, where *σ* is the selected advantage of the considered variant. Thus, as long as the variant is rare, its expected frequency is computed as: ![Formula][8] The kymograph in Fig. 8A illustrates the spreading pattern of the Delta variant across England. We conducted simulations of the Delta variant’s frequency dynamics (detailed in SI Sec. S.8), starting from frequencies observed in the week of March 7-13, 2021, when Delta variant sequences were reported solely from the deme labeled “London-0.” Fig. 8B displays the simulated abundance of the Delta variant using the inferred **A***ij*, presented in Fig. 3A, along with fit parameters associated with the relative infectivity of the Delta variant (SI Sec. S.8). To assess the impact of long-range transmission, we artificially removed the inferred connections between non-neighboring regions and then conducted the same simulation (Fig. 8C). ![Fig. 8.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F8.medium.gif) [Fig. 8.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F8) Fig. 8. Replaying the dynamics of a selective sweep. (**A**) Observed frequencies of the Delta variant in England, ranging between 10−3and 1 − 10−3, across *N**D* = 50 locations in England (movies in SI). The heatmap displays the frequencies on a logit scale. On the vertical axis, epidemiological weeks (epiweeks) in 2021 are indicated below the dates by [*·*]. (**B**) Simulations following the non-neutral transmission model described in SI Sec. S.8, with inter-community importation rates as inferred for the Delta wave (shown in Fig. 3) (movie in SI). The observed frequencies at epiweek 10, when the Delta variant was reported only in deme “London-0”, are used as the initial condition (see SI Sec. S.8 for the fitting process). (**C**) Simulated frequencies, where transmission between non-neighboring regions (such as London and North West) is switched off (movie in SI). (**D**) Comparison of the midpoint epiweek ![Graphic][9]—the week when the frequency in a deme reaches 0.5—between observed data and simulated data for each deme (see SI Sec. S.8 for the determination of *t* ![Graphic][10]). Orange circles represent ![Graphic][11] from the simulation with the inferred **A***ij*, while blue stars represent ![Graphic][12] from the simulation without transmission between non-neighboring regions. The horizontal bars indicate the standard deviations obtained from the bootstrapping method. Figure 8D shows the comparison of the midpoint time, *t*1*/*2, at which the abundance of the Delta variant reached 50% in a deme, between the observed and simulated data. The simulation using the inferred matrix roughly reproduces observed *t*1*/*2, clustering around epiweeks 18-21. In contrast, without transmission between non-neighboring regions, the simulation show delays in *t*1*/*2. In particular, for the NE, NW, and YH demes, which are distant from the initial seeding at “London-0,” the *t*1*/*2 values are predicted to be delayed by 2-6 weeks without the non-neighboring transmission. This highlights the significant impact of rate, long-range transmission on the spread of the new variant. ## Discussion The infection rates between individuals from different population segments are among the many known unknowns of a pandemic. These rates, crucial for forecasting pathogenic spread, depend on complex behavioral patterns, immunity levels, and non-pharmaceutical interventions, rendering them exceptionally challenging to measure directly. We here proposed a novel approach to infer these elusive rates retrospectively by examining the dynamics of viral ge-netic variation among different population groups. Utilizing this method, we constructed transmission networks during the COVID-19 pandemic in England and the US. Our findings shed light on the heterogeneity and plasticity of transmission networks, which have implications for epidemiological forecasting and intervention strategies. ### Methodological Advances Our approach hinges on the premise that the observed frequencies of neutral variants are influenced solely by inter-community transmission, random genetic drift, and observational noise. By averaging out these noises, we can isolate the impact of cross-transmission, which tends to reduce regional differences in allele frequencies. This convergence towards a common value is evident in the allele frequency data of SARS-CoV-2 (Fig. 1) and demonstrates a plausible distance dependence. To optimally capture this signal, we have implemented a Kalman filter, which enables us to infer importation rates, local effective population sizes, and the strength of observational noise. A key advantage of this method is that it only relies on allele frequency time series, bypassing the need to construct a phylogenetic tree, which can be computationally demanding, especially when dealing with polytomies. We validated our method using population genetics simulations of a metapopulation with parameters consistent with those of SARS-CoV-2. ### Epidemiological Insights Our key finding is that, applied to highly sequenced SARS-CoV-2 populations, this method allows us to map an entire interaction network between different populations. Unlike methods based on proxies, this approach quantifies direct epidemiological interactions that can be integrated into models of the disease spread. Our findings reveal substantial heterogeneity and plasticity in disease transmission networks. By applying our method to the SARS-CoV-2 transmission data in England and the U.S., we uncovered several key patterns: ### Geographical Mirroring and Long-Range Interactions The inferred transmission networks largely reflect geographical proximities. Cross-importation rates are strongest between neighboring regions and gradually weaken with increasing physical distance. Interestingly, when interaction strength is converted into a distance and a map is drawn based on this metric, it roughly represents the geographic layout of England. This suggests that interactions reflect geographic relationships. The observed gradual weakening of interactions with distance likely reflects the limited traffic flow between distant regions. However, we did not find quantitative agreement between the distance decay of US traffic flows as estimated by SafeGraph and cross-importation rates. We see two potential reasons for the discrepancy: (i) SafeGraph data treats all visited locations equally, regardless of the duration of the visit, which could lead to an underestimation of the impact of long-distance visits. (ii) Long-distance trips may be associated with riskier behaviors, thereby increasing the likelihood of disease transmission. Therefore, the reduction in mobility flux with distance may be partially offset by an increase in infections caused by long-distance travelers. To separate these contributions, differently curated mobility data will be required. ### Dynamic Changes in Transmission Patterns The crosscommunity importation rates and their directionalities exhibit considerable variation across different waves of variants of concern. These dynamics were evident in the varying importation rates between London, East of England, and South East during different waves, with noticeable shifts in transmission dominance between these regions. Such dynamic changes emphasize the need for adaptive modeling approaches that can accommodate evolving transmission patterns over time. ### Asymmetry in Cross-Importation Rates Our analysis showed that cross-importation rates are often heterogeneous, indicating that certain regions exert a stronger influence on others. In general, epidemiological “rock-paper-scissor” interactions could exist between different regions. However, we found that such non-transitive interactions are not observed in our inferred importation rates, which approximately satisfy a certain symmetry property (detailed balance) that ensures that the lineage dynamics backward is time reversible (no cyclic fluxes). This also allows using per-capita reproductive values (Fig. 7C) to compare the infectivity across regions. ### Implications for Epidemic Forecasting and Future Directions The ability to directly infer detailed transmission networks has the potential to improve epidemic forecasting. We found that weak long-range interactions are crucial for explaining the spreading of beneficial mutations. This highlights the importance of accounting for such interactions in epidemiological models, especially for highly transmissible pathogens like SARS-CoV-2. However, since importation-rate matrices were also found to change considerably between waves, continuous genomic surveillance will be needed to update cross-importation rates. With accumulating time series data for different waves and different pathogens, future studies might also learn to predict the evolution of cross-importation rates through time. Such progress might come either by identifying interpretable patterns with the help of mechanistic epidemiological models or through machine learning. The patterns we have identified show that detailed balance is a reasonable assumption for epidemic models, but that different regions have different infectivities. Given detailed balance, one important open challenge is to understand mechanistically what sets the different infectivities of the different regions. Plausible candidate reasons are historical contingencies due to differential exposure to prior waves, inducing behavioral and immunological differences, or policy differences. Understanding the detailed structure of transmission networks may also allow for the design of targeted interventions. For example, identifying regions with high cross-importation rates can help prioritize areas for vaccination, testing, and other control measures. ### Limitations and Assumptions of the Method While our allele-frequency-based method offers substantial advancements, several limitations warrant discussion. Our method assumes that the alleles we track are neutral. Including alleles that are spatially sweeping and beneficial can lead to an overestimation of cross-importation rates from the origin to target areas. Therefore, it is crucial to exclude alleles whose selective changes are stronger than genetic drift, as we have done following our precursor work in ref. (31). The absence of direct measurements for crossimportation rates means we lack a definitive benchmark for comparing our results. As a result, our inferred matrices should be interpreted with some caution. Nonetheless, the observed distance decay in our inferred inter-community transmission networks is a convincingly realistic feature, aligning with expected cross-importation rates. It is reassuring that our method infers a plausible tradeoff between transmission rate and geographic distance based solely on frequency correlations, without requiring geographical information. Conversely, we have no way of verifying the weak longrange connections we inferred. Simulations with and without these interactions indicate their importance in facilitating rapid epidemic spread. So they need to be taken into account for prediction purposes. But our bootstrapping analysis also suggests that we cannot pinpoint the precise identity of these long-range connections: strong short-range interactions are consistently identified across bootstrap samples, but weak long-range connections vary (see SI Fig. S22). The uncertainty in the identity of these crucial weak links makes targeted intervention difficult, but hardly affect the predicted spreading time scales. Thus, while long-range interactions are significant for prediction purposes, their precise identities are not. Future work may develop a better representation for weak but important connections, for example, through systematic coarse-graining or other forms of regularization. Our method does not require constructing a genealogical tree; it only relies on standing genetic variation monitored over time. The accuracy of our inferences improves with the amount of time series data available. Therefore, our method is most effective when applied to periods where a particular variant of concern is already prevalent. However, it is less effective in measuring cross-importation rates early on when a variant is just beginning to invade and few samples are available. Phylogeographical inference methods are better suited for source attribution, as they enable tracing the emergence of a new variant (26, 27). Nonetheless, our analysis indicated that cross-importation rates did not change significantly within waves, suggesting that our inferences may serve as a baseline for cross-importation rates in the early stages of a new wave. ## Methods ### 1.1 Inference methods The simplest way to infer the cross-importation rates **A***ij*(*t* + Δ*t*; *t*) in Eq. 1 for a fixed time difference Δ*t* from time series data is to minimize the least square difference between the predicted and actual lineage frequencies, ![Formula][13] over all right stochastic *n* × *n* matrices, satisfying **A***ij* *>* 0, Σ*j* **A***ij* = 1. The summation over time points *t**′*on the right-hand side serves as a regularization step, which ensures that **A** does not vary on time spans smaller than 2*T*. The more sequencing data is available, the smaller *T* can be chosen, which leads to a better resolution of the temporal variations in **A**. Standard errors of the matrix coefficients are obtained by bootstrapping over the available lineages *µ*. The linear regression approach in Eq. 4 is computationally efficient but requires as input the true lineage frequencies, which are never known exactly. Instead, one can only measure the frequencies within the sequenced sample, which represent the true frequencies distorted by sampling noise. Accordingly, we have adopted a HMM, as depicted in Fig. 2, which treats the frequencies as hidden states. By modeling genetic drift and sampling noise as Gaussian distributions, the HMM effectively transforms into a computationally efficient Kalman filter (30) with a likelihood function that can be calculated analytically. The per-generation variance due to genetic drift is inversely proportional to the effective population size *N*e, which is usually smaller than the actual number of infected individuals. Specifically, for SARS-CoV-2 in England, the ratio between actual and effective population size was found to range from tens to hundreds (31). Likewise, sampling noise can be larger than expected based on random sampling, if sampling is not random but entails correlations. We therefore set the sampling variance to be inversely proportional to ![Graphic][14], where *S**i,t* is the number of sequences sampled from population *i* at time *t*, and *c**i* measures the deviation from random sampling. To infer the strength of genetic drift (*N**e*) and sampling noise (*c**i*) from our HMM, we have implemented an MCMC algorithm, which yields posterior distributions for these parameters (see Sec. S.2 in SI). Our simulation results in SI Fig. S7 show that, while both the least squares method and the HMM method with the MCMC algorithm retrieve importation-rate matrices close to the ground truth, the least squares estimation tends to overestimate small interactions. Such a bias appears from the fact that the solution of Eq. 4 tends toward the uniform matrix when noise levels are high. We thus focused on the HMM method to minimize these biases. To reduce computational cost of the MCMC, we have also implemented an EM algorithm, which provides the maximum likelihood estimate of all relevant parameters. The inference error of the EM algorithm was assessed by using a bootstrapping approach, where the parameters were inferred multiple times from randomly created sets of alleles, each set maintaining the same size as the original set. ### 1.2 Neutral evolution in a metapopulation Here, we provide additional mathematical rationale for why Eq. 1 describes the frequency dynamics of a neutral allele in a metapopulation. First, the deterministic terms in Eq. 1 are linear in the frequencies because, under neutrality, the frequency of any union of lineages must obey the same stochastic evolution equation. Additionally, neutrality implies that frequencies do not change in expectation if the frequency is the same in all populations, necessitating that each row of **A** sums up to 1. (If *X**i*(*τ*) = *x* for all *i*, Eq. 1 implies 𝔼 [*X**i*(*t*) | **X**(*τ*)] = *x* Σ*j* **A***ij*, which equals *x* only if Σ*j* **A***ij* = 1.) Finally, negative matrix elements are excluded because they can generate negative frequencies. Note that an alternative way of writing Eq. 1 is ![Formula][15] which explicitly shows (i) that the frequency in *j* only influences the frequency in *i* if *X**i* *≠ X**j*, and (ii) that a larger value of **A***ij* *>* 0 leads to a faster convergence of *X**i* to *X**j*. Each coefficient **A***ij*(*t*; *τ*) for *j ≠ i* denotes the proportion of infections that population *i* receives from population *j* during the interval from *τ* to *t*. Meanwhile, **A***ii*(*t*; *τ*) = 1 − Σ*j≠ i* **A***ij*(*t*; *τ*) represents the proportion of infections that are not imported, i.e., “homegrown” infections. From this perspective, it is straightforward to see how Eq. 5 arises. During the period from *τ* to *t*, population *i* imports a total of *I**i*(*t*)**A***ij*(*t*; *τ*) infections from population *j*, with a fraction *X**j*(*τ*) of these infections carrying the focal allele. Thus, at time *t*, the expected total number of infections carrying the focal allele in population *i* is *I**i*(*t*) *j* **A***ij*(*t*; *τ*)*X**j*(*τ*). Consequently, the updated allele frequency *X*(*t*) is given by Σ*j* **A***ij*(*t*; *τ*)*X**j*(*τ*). By substi-tuting **A***ii*(*t*; *τ*) = 1 − Σ *j≠i* **A***ij*(*t*; *τ*), we obtain Eq. 5. ### 1.3 From lineages to alleles Our inference technique is fueled by data collected through counting the prevalence of *independent* lineages across locations and times. Identifying these independent lineages is straightforward when a complete and accurate phylogenetic tree is available, as it simply involves segmenting the tree into monophyletic groups. But constructing a large phylogenetic tree in the first place—in the case of SARS-CoV-2 for millions of viral genomes—is not only computationally demanding but often results in unresolved polytomies. A tree-free alternative for creating time series of neutral lineages is provided by tracking the frequency of all preexisting neutral *mutations* in the different sub-populations. This works in principle because the frequency of a neutral allele has to obey Eq. 1, even in the presence of recombination. The downside of this approach is that, due to linkage, different alleles may not be independent. Treating them as independent underestimates statistical errors. Moreover, it is important to ensure that the included alleles are neutral. Therefore, we first cluster alleles based on their pairwise genetic distances and then select representative alleles for each cluster (33). We then excluded a small fraction of outlier alleles, whose time series are not consistent with neutrality, as measured by a maximum likelihood method described in refs. (31, 47) (also see SI Sec. S.1.3). In SI Figs. S3 and S20, we demonstrate that, regardless of whether we use tree-based lineages or alleles (including only synonymous mutations or all mutations), the qualitative and coarse-grained outcomes (the largest eigenvalues and reproductive values) of both tree-based and allele-based approaches are consistent and do not significantly differ for the Delta wave in England. Additionally, in SI Fig. S21, we demonstrate that similar importation rates are inferred from a downsampled fraction of the sequences (e.g., 10%). As an additional consistency check, we provide a theoretical explanation in SI Sec. S.4 on how the importation-rate matrix should change with spatial coarse-graining. We verified that the behavior of the inferred matrix aligns with these theoretical predictions, further supporting the robustness of our inference method. ### 1.4 Detailed balance condition The inferred infection matrices are typically asymmetrical, which can be checked for example for the Delta wave matrix shown in Fig. 3a. However, one might wonder if a more general symmetry, known as detailed balance, is maintained, which is often assumed in modeling and inference studies. Detailed balance is satisfied when the flux of lineages between two regions is balanced when traced backward in time. This implies that the backward-time lineage jump process is time-reversible. Mathematically, this detailed balance condition can be expressed as ![Formula][16] meaning that, in equilibrium, the lineage flux from region *j* to *i* is equal to the reverse flux. In Bayesian phylodynamic inference, detailed balance is often assumed to simplify the learning algorithms (48). Moreover, standard epidemiological models imply detailed balance, because otherwise the lineage jump process exhibits unwarranted cyclic dynamics in equilibrium. We show in SI Sec. S.5 how detailed balance emerges from a standard epidemiological SIR and SEIR models. It is therefore important to check whether the data justifies the detailed balance premise. We focus on the nine regions in England and plot in Fig. 9 the ratio of left and right-hand sides in Eq. 6 for all possible pairwise interactions. While most long-distance interactions are too weak to test for detailed balance, the strong neighbor-neighbor interactions are largely consistent with detailed balance. ![Fig. 9.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F9.medium.gif) [Fig. 9.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F9) Fig. 9. Testing the detailed balance assumption. Under detailed balance, the ratio ![Graphic][17] should equal one for all deme pairs (*i, j*), where *π**i* denotes the class reproductivevalue (satisfying ![Graphic][18] and normalized to 1). The box plots show these ratios for all pairs of regions in England, where YH and NE are combined to a single region, using the 8 *×* 8 importation-rate matrix **A***ij* inferred from the region-level data of mutations during the Delta wave period (June 20, 2021 - October 31, 2021). Results for neighboring region pairs, which generally exhibit stronger couplings, are shown separately from those for non-neighboring regions, where couplings are typically smaller and exhibit greater inference error in ![Graphic][19]. ## Data Availability All data produced are available online at [https://github.com/Hallatscheklab/NetworkInfer](https://github.com/Hallatscheklab/NetworkInfer) [https://github.com/Hallatscheklab/NetworkInfer](https://github.com/Hallatscheklab/NetworkInfer) ## Data Availability All of the SARS-CoV-2 genomes analyzed in this article are publicly accessible through the GISAID platform and the COG-UK consortium. The GISAID accession identifiers analyzed in this study are provided as part of Supplementary Materials. ## Code Availability The Python script for the HMM-EM method and the C++ code for the HMM-MCMC method, along with the Python scripts to reproduce the figures in this manuscript, are available at [https://github.com/Hallatscheklab/NetworkInfer](https://github.com/Hallatscheklab/NetworkInfer). ## Supplementary Information ### S1 Data sources and processing For the analysis of England, we downloaded the sequence metadata from the COVID-19 Genomics UK Consortium (COG-UK) (28) on March 25, 2022. The metadada include the time and location of sample collection. The number of sequences over time is presented in Fig. S25. For the analysis of the United States, the sequence metadata was obtained from the GISAID database ([https://www.gisaid.org/](https://www.gisaid.org/)). For the Delta variant analysis, we excluded the AY.4.2 sequences, whose proportion modestly increased in England during the Delta wave (49). For the Omicron variant analysis, we focused on the B.1.1.529 and BA.1 lineages, except for the BA.1 sequences that had any of the mutations S:K417N, S:N440K, or S:G446S (50). #### S.1.1 Lineage frequency data While the metadata include the lineage designation using the Pango nomenclature (51, 52), it classifies variants into a limited number of lineages. Therefore, we created our own lineages based on phylogenetic distance using the publicly available COG-UK phylogenetic trees (on March 25, 2022) (31); specifically, we cut the tree at a particular depth to create many subtrees, defining each subtree as a lineage (see Fig. S1A). If any subtrees occupy more than 2.5% of the total sequences, we introduce an additional cut at another position (farther from the root) and divide these subtrees further to create more subtrees (lineages). We continue this process until no subtree occupies more than 2.5% of the total sequences. For the analysis of the Delta variant, the tree was cut at the depths at 50.5*l*unit, 56.5*l*unit, 59.5*l*unit where *l*unit = 3.34 × 10−5. Fig. S1B) shows how the sequences are distributed along the depth of the tree over epiweeks. Note that in this figure and throughout the SI, we adopt epiweeks starting from December 29, 2019, extending their application across multiple years to enable continuous analysis despite the usual yearly reset. The correspondence between the epiweeks and calendar dates is summarized in Sec. S.10. ![Fig. S1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F10.medium.gif) [Fig. S1.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F10) Fig. S1. (**A**) Construction of lineages within a variant using the phylogenetic tree: Leaf nodes are represented by circles for sequences of the focal variant and by squares for sequences of other variants. Lineages of the focal variant are defined by cutting the tree at a certain depth, represented by a vertical dashed line. In this illustration, three lineages are obtained through this procedure. (**B**) Collection date versus tree depth for the metadata sequences of the Alpha, Delta, and Omicron variants: For each variant, only 300 sequences are displayed for clearer visualization. The dashed horizontal lines represent the cuts used to define the lineages of the Delta variant in England. The numbers [*·*] on the dates denote epiweeks. #### S.1.2 Mutation frequency data Mutation frequency data for England and the USA were generated from mutations listed in the COG-UK and GISAID metadata, respectively. The COG-UK metadata includes both synonymous and non-synonymous mutations, whereas the GISAID metadata includes only non-synonymous mutations. The frequency of each mutation was calculated by counting the number of sequences carrying that mutation and normalizing this by the total number of sequences at each sampled location, with a unit time of one week. For very rare mutations, the effect of sampling noise is significant, making the inference unreliable. Conversely, including very abundant mutations would limit the total number of independent mutations (obtained from a method described below (33)). Therefore, as a compromise, we decided to select mutations whose country-wide frequency, averaged over a focal time window, is moderately low, ranging between 0.003 and 0.05. The presence of genetic linkage between mutations can create statistical dependencies that can distort our inference. To avoid bias due to linkage-induced correlations, we pruned the set of mutations following ref. (33). The pruning procedure first defines a distance between two mutations *m*1 and *m*2 as ![Formula][20] where *S**i* is the set of sequences carrying the mutation *m**i* in a country. Next, we constructed a graph where nodes are mutations, and an edge exists between two mutations if *d*(*m*1, *m*2) *< d*th, with *d*th∈ [0, 1] as a threshold parameter. We then identified the connected components of the graph, treating each connected component as a cluster of mutations. Finally, from each cluster, we selected the mutation *m**i* with the largest |*S**i*| as its representative, producing a set of approximately independent mutations. The higher the threshold value *d*th, the more mutation trajectories are grouped into the same cluster, as illustrated for the the Delta variant in England in Figure S2. We confirmed that as *d*th increases, the importation-rate matrix for the Delta variant inferred from the mutation data becomes more similar to the matrix inferred from the lineage data (Figure S3). Based on these results, we decided to use a high value of *d*th = 0.9 for all the inferences presented in this paper (except for Figure S3). ##### S.1.3 Outlier detection In our inference, we focused on alleles within a particular variant, as our method relies on the neutrality assumption, and mixing alleles from different variants could introduce fitness differences that violate this assumption. However, there is still a possibility that significant differences in fitness exist between alleles even within a variant, which could potentially bias the inference of the importation-rate matrix. To prevent this, we applied the statistical test for neutrality (31, 47), which computes the maximum likelihood estimate of the relative fitness *s* and the *p* value for each allele, from the time-series data of allele counts. ![Fig. S2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F11.medium.gif) [Fig. S2.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F11) Fig. S2. (**A**) The country-wide frequency trajectories of a subset of mutations of the Delta variant in England. For each threshold value *d*th, the frequency trajectories of mutations that belong to the same cluster are plotted in the same color. Nine mutations are classified into three, two, and one clusters for *d*th = 0.1, 0.5, and 0.7, respectively. Similarly, Figures (**B**) and (**C**) show the frequency trajectories and the equivalence classes for other subsets of mutations of the Delta variant in England. ![Fig. S3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F12.medium.gif) [Fig. S3.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F12) Fig. S3. Comparison between the result for the Delta-sublineage data in England and the result for the mutation data, and the dependence on the threshold value *d*th used for clustering mutations. *N**D* = 50. (**A**) Importation-rate matrix, ![Graphic][21], inferred from the lineage data. (**B**-**D**) Importation-rate matrix, ![Graphic][22], inferred from the mutation data, where *d*th = 0.0, 0.4, 0.8, respectively. (**E**) *d*th vs the Pearson correlation coefficient between ![Graphic][23] and ![Graphic][24]. F-H) Element-wise comparison of ![Graphic][25] and ![Graphic][26] for *d*th = 0.0, 0.4, 0.8, respectively. For the mutations selected by the clustering described in Sec. S.1.2, we applied this statistical test using their country-wide count data. We then excluded significantly non-neutral mutations (*p <* 0.05) and used only putatively neutral mutations (*p*≥ 0.05) for our inference. Figure S4A displays the results of the statistical test on the Delta variant in England. From left to right, it shows the allele-frequency trajectories for putatively neutral alleles (left), significantly non-neutral alleles (middle), and the *p*-*s* distribution (right). Among the 85 representative mutations identified by the clustering method described in Sec. S.1.2, 68 mutations are identified as putatively neutral. As a control, we also applied the statistical test to the simulated data of neutral allele frequencies generated by the Wright-Fisher model (Fig. S4B). For both the actual and simulated data, the central region of the *p*-*s* distribution exhibits a triangular shape, indicating the validity of the neutrality assumption for the majority of mutations of the Delta variant sitting in this central region of the *p*-*s* distribution. Similarly, we applied the same filtering to the lineage data and used only putatively neutral lineages for the inference. ![Fig. S4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F13.medium.gif) [Fig. S4.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F13) Fig. S4. (**A**) Trajectories of mutation frequencies observed in the Delta wave in England (Left: trajectories with *p >* 0.05, Middle: trajectories with *p <* 0.05). Right: Inferred selective coefficient and p-value, where each dot represents these values for a particular mutation. (**B**) Trajectories generated by the neutral Wright-Fisher simulation (Left: trajectories with *p >* 0.05, Middle: trajectories with *p <* 0.05). The effective population size is set to 10,000, the sampling rate per week is set to 10,000, and the number of trajectories is 100, comparable to actual country-wide data in England. Right: Inferred selective coefficient and p-value. ##### S.1.4 Mobility data in the United States In the analysis of the jump kernel presented in Fig. 6C, we used SafeGraph data (44), which were derived from cell phone tracking. Specifically, we used the county-level dataset processed by the authors of ref. (53) and then aggregated spatial locations to obtain the mobility flux between the 30 demes. Note that while we compared the jump kernel inferred from the sequencing data during the Delta wave with that calculated from the SafeGraph data, the SafeGraph dataset spans March 2020 to February 2021, which predates the surge of the Delta variant. ##### S.1.5 Autocorrelation functions, *R**ij*, in Fig. 1D In Fig. 1D, we demonstrate the convergence of allele frequencies across regions in England by computing the autocorrelation functions *R**ij*(*τ*), defined as ![Formula][27] where *ν* and *t* label alleles and timepoints, respectively; *N**ν* and *N**t* denote the total numbers of alleles and timepoints, respectively; and ![Graphic][28] denotes the frequency of allele *µ* in region *i* at week *t*. We also calculated the autocorrelation functions using the lineage frequencies of the Delta variant and confirmed that decay rates similar to those presented in Fig. 1D are obtained from the lineage data (Fig. S5). ![Fig. S5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F14.medium.gif) [Fig. S5.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F14) Fig. S5. (**A**) The time-series trajectories of lineage frequencies are illustrated for three lineages of the Delta variant. (**B**) The dots show the normalized autocorrelation functions *R**ij* between London and North East (blue), London and Yorkshire and the Humber (green), London and South East (red), computed from the lineage frequency data. The solid lines show exponential curves fitted to the data. The auto-orrelation functions are computed by averaging over the 40 lineages with the largest sample sizes. #### S.2 Hidden Markov Model for the neutral frequency dynamics Our inference method is based on a dynamic linear model (a Kalman filter). We follow the notation and formulation in ref. (54). In the following, we denote the multivariate normal distribution with mean ***µ*** and covariance **Σ** as ![Formula][29] where *k* is the dimensionality of ***x***. We denote the function measuring “heterozygosity”, *x*(1 − *x*), as ![Formula][30] ##### S.21 Hidden Markov Model We consider the neural dynamics of a lineage (or allele) frequency described by the spatial Wright-Fisher model with *N**D* demes. The dynamics of the true and observed frequencies can be approximately described by the following Markov process (Fig. S6). * *Transition probability distribution*: The true frequency vector ![Graphic][31] (hidden state) obeys ![Formula][32] with the importation-rate matrix ![Graphic][33] satisfying **A***ij* *>* 0 (*i*≠ *j*) and Σ*j* **A***ij* = 1, and ![Formula][34] Here (**A*Z***)*i* = Σ*j* **A***ij**z**j*, and *N*e,*i* is the effective population size of deme *i*, which controls the strength of the genetic drift. * *Emission probability distribution*: The observed frequency vector ![Graphic][35] approximately obeys: ![Formula][36] whereΣ*t* characterizes the strength of the measurement error and is assumed to be given by ![Formula][37] Here *S**i,t* is the number of sequences at week *t* in deme *i*, and *c**i* ≥ 1 is a parameter quantifying the deviation from ideal random sampling. The quantity *S**i,t**/c**i* can be interpreted as the *effective sampling size*. * *Initial condition on z*: ![Formula][38] We assume that ***µ**** and ***V**** are given by ![Formula][39] ![Fig. S6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F15.medium.gif) [Fig. S6.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F15) Fig. S6. The HMM for the frequency dynamics. *Z**t* and *X**t* are true frequencies and observed frequencies, respectively. The transition probability densities (horizontal arrows) and emission probability densities (vertical arrows) are modeled by Gaussian distributions, Eqns. (S.5) and (S.7). ##### S.2.2 Inference Given the observed frequencies, *x**i,t*, and the number of sampled sequence, *S**i,t*, our goal is to estimate the importation-rate matrix **A**, the effective population sizes *N*e,*i*, and the overdispersion of measurement noise *c**i*. To achieve this, we developed two algorithms: * Inference of the posterior distributions using a Markov Chain Monte Carlo (MCMC) method. * Maximum likelihood estimation using an EM algorithm. We employed the MCMC method for the region-level inference in England. Conversely, for deme-resolved matrices (Figs. 3A and 6A), we chose the EM method because of intensive computational requirements of the MCMC approach. ###### S.2.2.1 Computing the likelihood function and the filtered trajectories using the forward algorithm Here, we describe the forward algorithm that computes the likelihood recursively. We have the identity: ![Formula][40] Because the second term *p*(***z****t*+1|***x***0:*t*) can be rewritten as ![Formula][41] we have ![Formula][42] In the standard Kalman filter, a crucial step in deriving the recursion equations is performing Gaussian integration over the hidden state variables ***z***. In our model, the covariance matrices **Γ***t* and **Σ***t* are dependent on ***z***, which makes the process non-Gaussian. To utilize the technique of the Kalman filter, we approximate the covariances using matrices that are independent of ***z***; ![Formula][43] where the hidden frequency *z**i,t* is replaced by the time-averaged observed frequency ![Graphic][44]. Under this approximation, all quantities appearing in Eq. (S.13) become Gaussian. We note that, unlike the standard Kalman filter, the covariance matrixΣ*t* in our model is time-dependent due to the time-varying sampling rates *S**i,t*. However, we can still perform Gaussian integrations even with the time-dependent covariance, as long as it remains independent of the hidden variables ***z****t*. After some calculations (as detailed in ref. (54)), it can be shown that the triplet {***µ****t*, ***V****t*, *l**t*}, defined by ![Formula][45] ![Formula][46] satisfies the following recursive equations: Forward algorithm![Figure16](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F16.medium.gif) [Figure16](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F16) For each lineage (or allele) labeled by *ν*, we compute the triplet ![Graphic][47] by solving Eqns. (S.17, S.18) recursively, with *µ** and *V** in Eq. (S.10) andΣ*t* and Γ*t* in Eq. (S.14). The total log likelihood of the time-series data for all lineages (or alleles) is given by ![Formula][48] where Θ is the set of the parameters to be inferred, Θ = {**A***ij*, *N*e,*i*, *c**i*}. We note that in Eq. (S.20), the log likelihood at the initial timepoint, ![Graphic][49], is excluded from the summation (even though *l* appears in the likelihood in Eq. (S.19)). This exclusion is appropriate due to our assumption about the true initial frequency, ***µ**** = ***x***. Under this assumption, Eq. (S.17) leads to ![Graphic][50] constant, which increases as the overdispersion parameter of measurement noise, *c**i*, decreases (seeΣ in Eq. (S.14)), regardless of the values of measured frequencies. Therefore, including the term ![Graphic][51] in Eq. (S.20) would introduce a bias that underestimates *c**i* when maximizing the likelihood. ###### S.2.2.2 Computing the posterior distributions using an MCMC We consider the flat prior distributions *p*prior(Θ) on Θ. In this case, the posterior distribution is proportional to the likelihood, ![Formula][52] where 𝒞 is a factor independent of Θ. To obtain the posterior distribution *p*(Θ| Data) ∝*e*−*E*(Θ), we perform an MCMC method, namely, simulate a random walk in the Θ space whose stationary distribution is given by *p*(Θ|Data)∝ *e*−*E*(Θ). Specifically, we implement the Metropolis algorithm: MCMC (Metropolis algorithm)![Figure17](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F17.medium.gif) [Figure17](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F17) For sufficiently large Monte Calro steps *L* (e,g., *L* ∼ 106), the sequence Θ, Θ1, …, Θ*L* approximates the posterior distribution *p*(Θ |Data). For the proposal distribution *p*proposal, we employ the followings: * For *N*eff,i, sample *h* ∼ 𝒩 (0, *ϵ**N*), where *ϵ**N* is a positive constant. If *N*eff,i + *h* ≥ 0, set ![Graphic][53]. Else set ![Graphic][54]. * For *c**i* ∈ Θ, sample *h* ∼ 𝒩 (0, *ϵ**c*), where *ϵ**c* is a positive constant. If *c**i* + *h* ≥ 1, set ![Graphic][55]. Else set ![Graphic][56]. * For the importation-rate matrix **A***ij*, randomly choose a pair (*i, j*) and sample *ϵ* ∈ [−**A***ii*, **A***ij*] from the uniform distribution. Set ![Graphic][57] and ![Graphic][58]. *Remark:* While one may use other forms of *p*proposal(Θ*′*|Θ), it must satisfy two critical conditions: 1. The proposed Θ*′* must be within the meaningful parameter region. For example, ![Graphic][59] must be positive, and ![Graphic][60] must lie on the simplex. 2. *p*proposal(Θ*′*| Θ) must be symmetric, i.e., *p*proposal(Θ*′*| Θ) = *p*proposal(Θ|Θ*′*). Otherwise, the Metropolis-Hastings algorithm should be used instead of the Metropolis algorithm. Note that in our proposal distribution for *N*e,*i* and *c**i*, any proposed value that exceeds the boundary of the parameter space is reflected along the boundary. This prescription ensures the symmetric condition near the boundary. For **A***ij*, our proposal distribution also satisfies the above two conditions (see ref. (55)). ##### S.2.3 Computing the maximum likelihood estimation through an EM algorithm We use an EM algorithm to search for the maximum likelihood estimate of the parameters Θ. Suppose that the model parameter is Θold = (**A, *N***e)old at a stage of the algorithm. By using Θold, for each lineage, we first compute the filtered trajectory ***µ****t* and the covariance ***V****t* using the forward algorithm. Then, we compute the smoothed trajectory ![Graphic][61] and the variance ![Graphic][62] by solving the backward equation: ![Formula][63] where ***P****t* = **A*V****t***A**⊤ + **Γ***t* and ***J****t* = ***V****t***A**⊤(***P****t*)−1. Solving the backward algorithm provides the following quantities, which will be required below1; ![Formula][64] In the EM algorithm, we need to minimize the expectation *Q* of the complete-data likelihood *p*(***z, x***, Θ) (Here, ***z*** and ***x*** collectively denote the frequencies ***z****ν* and ***x****ν* of all *N*lin lineages). It can be shown that the terms of *Q* that are dependent on **A***ij* and *N*e,*i* are given by (see ref. (54)) ![Formula][65] where (…)*ii* in the last line represents the *ii* component of the matrix inside the parentheses. The notation |*M*| denotes the determinant of a matrix *M*. The new matrix **A** is obtained by minimizing the quantities in the second line of Eq. (S.24). Since **A***ij* and ![Graphic][66] are decoupled if *i*≠ *i**′*, each row of **A**new can be determined separately. Specifically, the *i*-th row ![Graphic][67] is obtained by solving ![Formula][68] under the constraints, Σ*j* *ζ**j* = 1 and *ζ**j* *>* 0. The matrix ***g*** and the vector ***q****i* are given by ![Formula][69] which can be evaluated by using Eq. (S.23). We determine each row of **A**new by solving the constrained quadratic programming Eq. S.25 using the Python package CVXOPT. After determining **A**new, the new effective population size ![Graphic][70] is determined by differentiating Eq. (S.24) with respect to *N*e,*i*. The result is ![Formula][71] *Q* also has the terms that are dependent on the parameter describing the deviation from uniform sampling, *c**i*; ![Formula][72] The new value of the measurement noise overdispersion *c*new is determined by minimizing the above expression with respect to *c**i* (subjected to ≥ 1): ![Formula][73] In the EM algorithm, we initialized the parameters, Θ = Θ, and iteratively updated them to Θnew, by solving Eq. S.25 and evaluating Eqns. S.27 and S.29, until the likelihood stabilizes/converges. ###### Regularization To stabilize the inference for deme-resolved analysis (higher than at region level), we added a Ridge-like regularization term ![Graphic][74], to *Q*. Here, Λ is a regularization parameter. The value of Λ is determined via cross-validation by dividing the set of allele-frequency (or lineage-frequency) trajectories into training and test data of equal size. ###### Bootstrapping To evaluate the uncertainty in the MLE, we performed bootstrapping by randomly sampling lineages with replacement to generate multiple new sets, each containing the same number of lineages as the original dataset. We then applied the EM algorithm to each of these generated sets. ##### S.2.4 Computational tests * 3-deme system We simulated frequency time-series data using the 3 × 3 matrix shown in Fig. 2. The effective population size and sampling rate were set to values similar to those inferred for London and its neighboring regions during the Delta wave. The results for the simulated data, inferred using the HMM-MCMC, HMM-EM methods, and the least squares (LS) method (see Eq. 4), are compared in Figs. S7A-D. The LS method, which ignores fluctuations due to genetic drift and sampling noise, tends to overestimate the interaction strengths (Fig. S7C). Note that the results from the LS method are not displayed in Figs. S7C and D because the LS method cannot infer *N*e,*i* and *c**i*. * 50-deme system We simulated frequency time-series data using the 50 × 50 matrix shown in Fig. S8A, which was constructed by assuming five strongly interacting blocks, each consisting of 10 demes. The effective population size *N*e,*i* and *c**i* were set to *N*e,*i* = 1200 and *c**i* = 1.0, respectively. The sampling rate was assumed to be *S**i* = 500 for all demes. These values were chosen to mimic the actual situation during the Delta wave in England. From the simulated data, we inferred the parameters using the LS and HMM-EM methods. Fig. S8B compares the matrix elements of the true **A***ij* with those of the inferred **A***ij*. The LS method tends to overestimate interactions, especially for small couplings. This bias is further illustrated in Fig. S8C, which shows the histogram of the inferred matrix elements corresponding to the zero elements in the true matrix. The bias in the LS method can be intuitively understood by considering that in a large noise limit, the LS solution converges to a homogeneous matrix ![Graphic][75]. Thus, using the LS method, small couplings (roughly,![Graphic][76]) tend to be overestimated, while large couplings![Graphic][77] tend to be underestimated. ![Fig. S7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F18.medium.gif) [Fig. S7.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F18) Fig. S7. Supplementary figures to Fig. 2. The Wright-Fisher model on three demes is simulated, where the importation-rate matrix **A** and the effective population sizes are those shown in Fig. 2A. The measurement noise overdispersion parameter is set to *c**i* = 1.5. (**A**) The distribution of **A** inferred by the HMM-MCMC, HMM-EM, and LS methods, from 20 alleles (left) and 60 alleles (right). For the HMM-MCMC method, it represents the posterior distribution, while for the HMM-EM and LS methods, it represents the bootstrap distribution. (**B**) The error, ![Graphic][78], in the inference of interaction strengths for the HMM-MCMC, HMM-EM, and LS methods, where all pairs *i, j* are aggregated. (**C**) Boxplots showing the effective population sizes, inferred using the HMM-MCMC method from 60 alleles. The true values are indicated by thick horizontal lines. (**D**) Boxplots showing the measurement noise overdispersion, *c**i*, inferred using the HMM-MCMC method from 60 alleles. The true values (*c**i* = 1.5) are indicated by thick horizontal lines. ![Fig. S8.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F19.medium.gif) [Fig. S8.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F19) Fig. S8. (**A**) Heatmap showing the true matrix. All matrix elements within the five diagonal blocks (each consisting of 10 demes) are assumed to be nonzero, while the elements of the off-diagonal blocks are nonzero with probability of 0.1. The nonzero elements are randomly sampled from the exponential distribution with a mean of 0.02. **B**. Element-wise comparison between the true matrix and the inferred matrix obtained using the least squares and HMM-EM methods. (**C**) Histogram of the inferred matrix elements corresponding to zeros in the true matrix. #### S.3 Reproductive values in England In the main text, we have shown that the per-capita reproductive values *π**i**/I**i* vary across regions (Fig. 7C). Additionally, the unnormalized “class” reproductive values *π**i* are spatially heterogeneous, see Fig. S9. ##### S.3.1 Ranking regions according to ratios of bidirectional importation rates As discussed in the main text, reproductive values predict ratios of bidirectional importation rates when the principle of detailed balance, Eq. 2, holds. To verify this, we simultaneously reordered both rows and columns of the matrix **A***ij* to maximize the asymmetry, as measured by Σ*i > j***A***ij**/*(**A***ij* + **A***ji*). This approach indeed arranges regions roughly in line with the values of *π**i* for both the Alpha and Delta waves, see Fig. S10. In particular, the two regions, EE and SW, which have lower *π**i* values, are ordered last in the reordered matrices. ![Fig. S9.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F20.medium.gif) [Fig. S9.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F20) Fig. S9. The inferred class reproductive values for each region during the Alpha wave (epiweeks 57-68) and Delta wave (epiweeks 82-95) in England. ![Fig. S10.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F21.medium.gif) [Fig. S10.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F21) Fig. S10. Using the inferred importation-rate matrix **A** for the Alpha and Delta variants in England and the number of infected *NI**i*, as measured by the COVID-19 Infection Survey (37), the values of **A***ij* */*(**A***ij* + **A***ji*) are represented by heat maps. The order of regions is arranged to maximize the asymmetry, Σ*i>j***A***ij* */*(**A***ij* + **A***ji*), is maximized. ##### S.3.2 Testing the effect of the delay in sequence reporting One of the possible reasons for the observed heterogeneity in the reproductive value *π**i* is the difference in the population size. In fact, *π**i* correlates with the average number of infected individuals in each region during the wave; the Spearman correlation coefficients are *ρ* = 0.81 (p-value 0.015) for the Alpha wave and *ρ* = 0.88 (p-value 0.004) for the Delta wave, respectively. However, as presented in Fig. 7, the reproductive values are heterogeneous even after normalizing by the number of infected individuals. Another possible explanation could be that the inferred heterogeneity in *π**i* results from artifacts caused by variations in the timing of disease reporting. For instance, assume two regions *i* and *j* have perfectly symmetrical cross-importation rates. If region *i* reports cases substantially later than region *j*, its frequency trajectories will tend to follow region *j*, mimicking a causal influence of *j* on *i*. Consequently, the importation rate from *j* to *i*, **A***ij*, is likely to be overestimated, which would then expected to decrease *π**i*. A more quantitative argument based on perturbative analysis is provided in Sec. S.3.3. While no systematic reporting time heterogeneity has been documented to our knowledge, to test the impact of a hypothetical reporting delay, we artificially advanced the allele counts data from EE by *τ*shift(= 1, 2) weeks compared to the other regions, mimicking the scenario where the other regions report sequences later by *τ*shift weeks. We then re-inferred the importation-rate matrix (Fig. S11). As expected, the reproductive value of EE increases as *τ*shift is increased, highlighting the importance of accurate reporting dates for precise inference. ![Fig. S11.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F22.medium.gif) [Fig. S11.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F22) Fig. S11. (**A**) The reproductive values inferred for the data during the Delta wave in England (epiweek 82+*τ*shift to 93+*τ*shift for EE and epiweek 84 to 94 for the other regions), where sequence reporting dates from EE are artificially advanced by *τ*shift weeks. (**B**) The inferred per-capita reproductive values. ##### S.3.3 Perturbative analysis of the effect of delay in reporting Consider a population consisting of two groups, each containing ![Graphic][79] and ![Graphic][80] demes. Assume a consistent delay of one week in sequence reporting from the demes in the second group. Ignoring the genetic drift and measurement noise, the dynamics of the allele frequencies, ![Graphic][81] in the ![Graphic][82] demes and ![Graphic][83] in the ![Graphic][84] demes, are described by ![Formula][85] where **A***IJ* (*I, J* = 1, 2) represents the matrix of importation rates from demes in *J* to demes in *I*. The observed frequencies, ![Graphic][86] and ![Graphic][87], are given by ![Formula][88] The dynamics of the observed frequencies are written as ![Formula][89] where ![Graphic][90] and ![Graphic][91] are the block matrices of **A**−1, defined by ![Graphic][92]. We assume that the interactions are weak, i.e. **A** = 𝒪 (*ϵ*) (ϵ ≪ 1) for *i* ≠ *j*, which implies ***z***1(*t* − 1) = ***z***1(*t*) +𝒪 (*ϵ*) and ***z***2(*t*) = ***z***2(*t* − 1) +O(*ϵ*). Under this approximation, Eq. (S.32) can be written as ![Formula][93] Hence, under the delay from the second group, the apparent dynamics are still linear but described by the modified matrix **A**delay. Compared to the true matrix ![Graphic][94], the delay fictitiously modifies the upper-right block (representing the influence of the second group on the first group) by introducing the factor **A**22 and the lower-left block (representing the influence of the first group on the second group) by introducing the factor (**A**−1)11. To illustrate the delay effect, let us consider the simplest case where ![Graphic][95], where **A**11, **A**12, **A**21, **A**22 are not matrices but positive scalars. The constraints **A**11 + **A**12 = 1 and **A**21 + **A**22 = 1, along with the positivity condition, imply that the delay-induced factors satisfy **A**22 *<* 1 and (**A**−1)11 *>* 1. Therefore, when the second group’s report is delayed, the influence of the second group on the first group becomes underestimated, whereas the influence of the first group on the second group becomes overestimated. This effect is demonstrated for cases where ![Graphic][96] and ![Graphic][97] in Fig. S12. ![Fig. S12.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F23.medium.gif) [Fig. S12.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F23) Fig. S12. The Wright-Fisher simulation was performed with the 4*×* 4 importation-rate matrix shown in the leftmost heatmap. The effective population sizes and sampling rates were set to *N*e,*i* = 104 and *S**i* = 104, respectively, for all demes. We examined a scenario in which there was a 1-week delay in sequence counts from deme 2 and deme 3. The matrices inferred from the HMM method and the perturbative formula in Eq. S.33 are presented in the middle and rightmost heatmaps, respectively. Consistent with the formula, the influences from demes 2 and 3 (corresponding to the third and fourth columns) are underestimated due to the delay, while those from demes 1 and 2 (corresponding to the first and second columns) are overestimated. #### S.4 Spatial Resolution and Inference Reliability In the deme-resolved results presented in Fig. 3, we divided England into 50 demes. In this section, we present two arguments supporting the idea that increasing the spatial resolution much beyond 50 demes is not feasible. The first argument considers how **A***ij* should behave under changes in spatial resolution. Let *i* and *j* represent demes, and let *I* and *J* represent coarse-grained areas, referred to as regions. Suppose we have a deme-level matrix **A***ij*. If all demes within the same region interact strongly, the region-level interaction **A***IJ* can be approximated by: ![Formula][98] where *π**i* is the reproductive value(the normalized left eigenvector with eigenvalue 1 of the matrix **A***ij*). This formula can be understood from a time-backward perspective: assuming demes within the same region equilibrate instantaneously, a lineage in region *I* is distributed across demes *i* ∈ region *I* with probability ![Graphic][99]. The lineage in deme *i* ∈ region *I* then moves backward in time to deme *j* with probability **A***ij*. Summing over demes *i* ∈ region *I* and *j*∈ region *J* gives the probability that a lineage in region *I* moves backward in time to region *J*, which corresponds to **A***IJ* in Eq. (S.34). In Fig. S13A and B, we show the deme-resolved matrix with *N**D* = 50 for the Delta wave in England and its coarse-grained counterpart. As shown in Fig. S13C, the coarse-grained matrix closely matches the one directly inferred from the region-level allele frequency data. Fig. S13D shows the correlation between the two matrices as a function of the number of demes *N**D*. The decline in correlation above *N**D* ≈ 70 indicates a loss of inference reliability at higher spatial resolutions. ![Fig. S13.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F24.medium.gif) [Fig. S13.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F24) Fig. S13. (**A**) Heatmap showing the importation-rate matrix **A**, with *N**D* = 50, during the Delta wave in England.(**B**) The coarse-grained matrix obtained by applying Eq. Eq. (S.34) to the 50 *×* 50 matrix in Fig. **A**. (**C**) Element-wise comparison between the coarse-grained matrix and the matrix directly inferred from the region-level mutation data. The Pearson correlation coefficient is 0.89. (**D**) Plot showing the Pearson correlation between the 9 *×* 9 coarse-grained matrix, obtained from *N**D* *×N**D* deme-resolved matrix, and the 9 *×* 9 matrix directly inferred from the region-level mutation data, as a function of the number of demes *N**D*. For each spatial resolution *N**D*, the matrix elements between the two matrices are compared as in Fig. **C**. Another indirect argument comes from the relaxation times of the eigenmodes. Fig. S14A displays the inferred importationrate matrix for the Delta wave at the level of upper tier local authorities (UTLA), which is the finest spatial resolution available from our data. Fig. S14B shows the relaxation time, ln −|*λ**i*|, for each eigenmode (| *λ*1| ≤ |*λ*2 | ≤ …). While the relaxation time around *i* = 50 is approximately 2 weeks, it decreases to less than 1 week for *i >* 60. Given that the unit time of our analysis is 1 week, modes with relaxation times shorter than 1 week cannot be reliably inferred. This observation suggests that significantly increasing the spatial resolution beyond 50 demes is not practical. #### S.5 Epidemiological interpretation of A To aid in interpreting our importation-rate matrix **A**, we derive its expressions using specific epidemiological models. ![Fig. S14.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F25.medium.gif) [Fig. S14.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F25) Fig. S14. (**A**) The importation-rate matrix among 120 UTLAs in England, inferred for the Delta wave (Jun 20, 2021 - Sep 25, 2021) with the HMM-EM method (without Ridge regularization). The top 120 UTLAs with the most sequences are used in this inference. The 120 UTLAs are ordered by performing hierarchical clustering with the Jensen-Shannon distance between rows. (**B**) The relaxation time for each mode of the importation-rate matrix. The shaded region shows the 95% confidence interval, computed using the bootstrapping method. We begin by extending the multi-patch SIR model (11, 53, 56, 57) to include distinct multiple viral lineages. The extended model is represented by the following set of equations: ![Formula][100] ![Formula][101] ![Formula][102] Here, ![Graphic][103] denotes the number of individuals infected by lineage *µ* in deme *i, N**i* is the population size, *N**i* = *S**i*(*t*)+*I**i*(*t*)+*R**i*(*t*), and ![Graphic][104] is the transmission rate of lineage *µ* from an infected individual in deme *j* to susceptible individuals in deme *i. γ**µ* represents a lineage-specific recovery rate. In terms of the total number of infected individuals ![Graphic][105] in deme *i* and the frequency ![Graphic][106] of lineage *µ*, Eq. (S.36) is expressed as ![Formula][107] where ![Graphic][108] and ![Graphic][109] are lineage-averaged rates defined as ![Formula][110] ![Formula][111] In terms of the fractions ![Graphic][112] and ![Graphic][113], we have ![Formula][114] ##### S.5.1 Neutral case Suppose that all lineages have the same transmission rate and recovery rate, ![Graphic][115] and *γ**µ* = *γ*. Then, Eq. (S.40) reduces to ![Formula][116] Where ![Formula][117] If we assume that *s**i* ≈1, which is a reasonable approximation in realistic situations, and that the infected fraction is spatially homogeneous (*i**i* ≈ *i**j*), as was the case during the plateau period of the Delta wave in England (Fig. S26), then **A***ij* can be approximated as **A***ij* ≈ *β**ij*. By discretizing time using one week as the unit of time, ![Formula][118] where **A***ii* ≡ 1 − Σ*j i* **A***ij* (or equivalently, Σ*j* **A***ij* = 1). We note that the diagonal elements **A***ii* in the time-discrete dynamics are determined from the normalization, and the transmission rate within a deme, *β**ii*, does not enter the neutral dynamics. In ref. (53), *β**ij* is modeled in terms of human mobility as ![Formula][119] where *β* is the transmission rate when a susceptible individual has contact with an infected, *p**i* is the contact probability in deme *i*, and *f**i*→*j* is the mobility flux from *i* to *j*. The first term in the numerator corresponds to the case that a susceptible person in deme *i* visits deme *j* and gets infected by a residence of deme *j*, while the second term corresponds to the case that an infected person in deme *j* visits deme *i* and infects a resident of deme *i*. Note that, in ref. (53), *p**i* is assumed to be the same for all demes. Using Eq. (S.44), **A***ij* in Eq. (S.42) can be expressed in terms of the mobility flux as ![Formula][120] The expression in Eq. (S.45) implies: 1. The combination ![Graphic][121] is symmetric in *i* and *j*. 2. The asymmetry in the couplings between *i* and is given by ![Graphic][122], where the densities *s**i* and *i**i* are assumed to be approximately the same for all demes. This expression of the asymmetry means that a deme with a larger population size is expected to have a greater impact on the other. 3. **A***ij* in Eq. (S.45) satisfies the detailed balance condition. Specifically, the reproductive value(the left null vector of **A***ij*) is given by![Graphic][123]. ##### S.5.2 Non-neutral variant Consider the case of two non-neutral lineages, specifically the Alpha and Delta variants (*µ* = *α, δ*). We express ![Graphic][124] and *γ**µ* for these variants as follows: ![Graphic][125] and denote the Delta’s frequency as![Graphic][126]. Using these expressions, Eq. (S.40) for *µ* = *δ* becomes: ![Formula][127] where ![Graphic][128] denotes the importation-rate matrix for the Delta variant. When the Delta variant is rare, *f**i* ≪ 1, we can drop the terms quadratic in frequencies: ![Formula][129] where *σ**i* ≡ *s**i**ϵ**β**β**ii* + *ϵ**γ**γ* + *ϵ**β* Σ*i*≠*j* = **A***ij*. In the main text, the subscript of *σ**i* is dropped, assuming that *s**i**ϵ**β**β**ii* and Σ*j≠i* **A***ij*(= 1 − **A***ii*) are not significantly different across populations. ##### S.5.3 Other epidemiological scenarios While we verified the linear dynamics *f**i*(*t* + 1) = *j* **A***ij**f**j*(*t*) in Eq. S.43 assuming the standard transmission function *βSI/N*, it is worth noting that this result holds independently of specific transmission functions employed (see ref. (58) for a variety of transmission functions). For instance, if the transmission function takes the form ![Graphic][130] with some exponents *p, q >* 0, Eq. S.36 would be replaced by: ![Formula][131] Note that the transmission term is linear in![Graphic][132], which guarantees that the sum over lineages yields the total transmission rate. More generally, Eq. S.36 can be generalized to: ![Formula][133] where *F**ij* is any lineage-independent function that may depend on the total numbers of susceptible and infected individuals, as well as any other lineage-independent quantities such as population densities and geographic distances. It can then be shown that the neutral dynamics of a lineage frequency is given by Eq. S.41 with: ![Formula][134] Furthermore, the linear dynamics *f**i*(*t* + 1) = Σ*j* **A***ij**f**j*(*t*) can also be justified in another important class of epidemiological models, the SEIR model, which has been widely applied to SARS-CoV-2 (refs. (11, 59)). A multi-patch extension of the SEIR model is described by the following equations: ![Formula][135] ![Formula][136] ![Formula][137] ![Formula][138] Here, ![Graphic][139] represents the number of asymptomatic individuals in deme *i* who have been infected by lineage *µ. γ**E* is the rate at which an exposed individual becomes infectious (the inverse of the average latent time), and *γ**I* is the rate at which an infectious individual recovers. The total population size, *N**i* = *S**i* + *I**i* + *E**i* + *R**i*, is constant in each deme, where ![Graphic][140] and ![Graphic][141], respectively, represent the total infected and exposed populations in deme *i* across all lineages. We can show that the lineage frequencies, ![Graphic][142] and ![Graphic][143], among exposed and infected individuals, obey the following equations: ![Formula][144] ![Formula][145] with ![Formula][146] where ![Graphic][147] are the fractions of these epidemiological classes, and ![Graphic][148] is the matrix in Eq. S.42. The matrix ![Graphic][149] differs from ![Graphic][150] by the factor ![Graphic][151]. This factor is largely determined by the ratio ![Graphic][152] (see Eq. S.53) and takes similar values across demes. By subtracting Eqs. S.55 and S.56, we obtain ![Formula][153] where we dropped the off-diagonal couplings, which are usually smaller than the within-deme transmission, ![Graphic][154]. The coefficients ![Graphic][155] and ![Graphic][156] in Eq. S.58 can be roughly approximated as ![Graphic][157] and ![Graphic][158]. Evaluating these with realistic parameter values (for instance, *β**ii* = 0.4 day−1, *γ**E* = (3.0 day)−1, *γ**I* = (5.5 day)−1 (ref. (59)), we observe that the difference ![Graphic][159] relaxes to 0 within 1 week. Thus, we may practically identify ![Graphic][160] and ![Graphic][161]. Consequently, from Eq. S.56, with this identification, we find that in the SEIR model, the dynamics of the lineage frequency ![Graphic][162] is described by ![Formula][163] where, as argued above, ![Graphic][164]. #### S.6 The jump-size distribution calculated from the importation-rate matrix A In Fig. 3C, we computed the probability distribution of per-individual jump distances, assuming that a jump from deme *i* to deme *j* occurs with a probability proportional to ![Graphic][165]. The rationale for this population-size-dependent rescaling is that the impact of a source location *j* on a target location *i* will be greater if the source population size *N**j* is larger and the target population size *N**i* is smaller. To offset these effects, we rescaled **A***ij* by the factor ![Graphic][166], thereby obtaining the per-individual jump rate. We can also justify the rescaling factor based on the SIR model. As described in SI Sec. S.5, when deriving Eq. S.44, two possibilities for an infection event are considered: (i) a susceptible person in deme *i* visits deme *j* and gets infected by a resident of deme *j*, and (ii) an infected person in deme *j* visits deme *i* and infects a resident of deme *i*. Assuming that the second contribution is dominant, we would obtain ![Graphic][167] instead of Eq. S.45. Here, we assume *s**i* ≈ 1, *i**i* ≈ *i**j*, and *p**i* = *p*. Consequently, the per-individual jump rate from location *j* to *i*, ![Graphic][168], is proportional to ![Graphic][169], consistent with the above intuitive argument. #### S.7 Distance between demes used for the hierarchical clustering of A*ij* and multidimensional scaling analysis For the deme-resolved coupling matrices **A** shown in Figs. 3 and 6, we ordered demes according to how similar their rows are. To this end, we performed hierarchical clustering using Ward’s method with the following metric *d**ij* between populations; ![Formula][170] where *D**JS*(***p, q***) is the square root of the Jensen-Shannon divergence between two probability distributions ***p*** and ![Graphic][171] (resp. ![Graphic][172]) is the vector of the probability distribution obtained by removing the *i*-th and *j*-th elements from the *i*-th (resp. *j*-th) row of the importation-rate matrix. Explicitly, their *k*-th elements are given by ![Formula][173] In the backward-in-time interpretation, the metric *d**ij* compares the probability distribution of the spatial locations of a lineage starting at population *i* and the one starting at population *j*, conditional on that they go to populations outside of *i* and *j*. Figures S15A and B show the results of hierarchical clustering using Ward’s method, and the MDS analysis with the above metric for the matrix powers **A***n* (*n* = 1, 5, 10, 20), where **A** is the importation-rate matrix for the Delta wave in England (Fig. 3). Fig. S15C compares the Jensen-Shannon divergence to the physical distances between demes. We can see that the clustering of the 50 demes is relatively robust to the timescale *n* of interest. However, as *n* becomes large (compared to the timescale of relaxation, approximately *n* = 10 weeks), all rows of (**A***n*)*ij* approach the reproductive value***π*** (i.e., the left eigenvector of **A** corresponding to the eigenvalue 1), and the Jensen-Shannon divergence becomes less informative of physical distances. For example, for (**A**20)*ij*, demes in LDN, SE, and SW become highly clustered in the MDS plot (the bottom panel of Fig. S15B), indicating that the Jensen-Shannon divergence cannot effectively distinguish between physically close demes (the bottom panel of Fig. S15C). ![Fig. S15.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F26.medium.gif) [Fig. S15.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F26) Fig. S15. Clustering and distances based on the Jensen-Shannon divergence for the (*A**n*) for *n* = 1, 5, 10, 20, where **A** is the 50 *×* 50 matrix for the Delta wave (weeks 78-91) in England. (**A**) Heatmaps of **A***n*(*n* = 1, 5, 10, 20), where hierarchical clustering is performed using the Ward’s method with the Jensen-Shannon divergence between the rows of **A***n*. (**B**) Multidimensional scaling plots. (**C**) Comparison between the Jensen-Shannon divergence and physical distance, with each dot representing these two distances for a specific pair of demes *i* and *j*. #### S.8 Prediction of the spreading dynamics of the Delta variant shown in Fig. 8 *Non-neutral transmission dynamics:* We modeled the frequency dynamics of the Delta variant in England using the ordinary differential equation in the form of Eq. S.46: ![Formula][174] where **A***ij* represents the 50 × 50 importation-rate matrix presented in Fig. 3, and ![Graphic][175] and *ϵ**β* are fit parameters. *Determination of the parameter values:* To fix ![Graphic][176] and *ϵ*, we simulated the above ODE and identified the optimal parameter values that minimize the discrepancies between our model’s predictions and the observed frequencies in a logit scale: ![Formula][177] where ![Graphic][178] and ![Graphic][179] denote the simulated and observed frequencies, respectively, of the Delta variant. The data points ![Graphic][180] and (the number of sequences reported from location *i* in week *t*) *>* 10 are used for the fitting. We examined the discrepancy between theoretical predictions and observed data, Eq. S.63, across a range of parameter values (Fig. S16). The discrepancy has a weak dependence on *ϵ**β*, indicating that it cannot be reliably estimated solely from our fitting process. Therefore, we referred to the relative infectivity reported in ref. (35), which found that the Delta variant is 43–68% more transmissible than the Alpha variant, corresponding to *ϵ**β* values between 0.30 and 0.40 (see Eq. S.46 and the definition of *ϵ**β*). Based on these, we constrained our optimization search to this realistic range, resulting in the optimal values, ![Graphic][181]. These parameter values are used in the simulations presented in the main text. ![Fig. S16.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F27.medium.gif) [Fig. S16.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F27) Fig. S16. Heatmap showing the discrepancy between theoretical predictions and observed data, defined in Eq. S.63, as a function of ![Graphic][182]. The parameter values, indicated by the red cross, are used for the results presented in the main text. *Determination of the mid timepoint t*1*/*2: To compare the observed and simulated frequency trajectories, we calculated the mid timepoint *t*1*/*2 for each trajectory *f**i*(*t*) by fitting it to a logistic curve, ![Graphic][183], where *l* and *t*1*/*2 are fit parameters (see Fig. S17 for the actual trajectories and the fitting results). ![Fig. S17.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F28.medium.gif) [Fig. S17.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F28) Fig. S17. Procedure of determining *t*1*/*2 for the actual frequencies of the Delta variant in England. In each panel, the dashed lines represent the observed frequencies ![Graphic][184] in demes within a region, while the solid lines represent fitted logistic curves ![Graphic][185], where *k* and *t*1*/*2 are fit parameters. For each deme, the time window with more than 10 sequences is used for the fitting. The inflection points *t*1*/*2 of these fitted logistic curves are used in Fig. 8 of the main text. ![Fig. S18.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F29.medium.gif) [Fig. S18.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F29) Fig. S18. The posterior distributions of the measurement noise parameters *c**i* for the Alpha, Delta, and Omicron variants in England, which are inferred by applying the HMM-MCMC method to the region-level mutation data. ![Fig. S19.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F30.medium.gif) [Fig. S19.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F30) Fig. S19. Supplementary figure to Fig. 7C. The reproductive value in England is inferred at the regional level, with NE and YH being treated as a single region. For the Alpha wave, the time windows used for the four box plots are epiweek 54-67, 55-68, 56-69, and 57-70, respectively, from left to right. For the Delta wave, the time windows used for the six box plots are epiweek 78-91, 79-92, 80-93, 81-94, 82-95, and 83-96, respectively, from left to right. For each variant, we aggregated the results for all the time windows, and the aggregated results are shown in Fig. 7C. ![Fig. S20.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F31.medium.gif) [Fig. S20.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F31) Fig. S20. The heatmaps (**A-C**) display region-level inferred matrices for the Delta wave (epiweeks 78-91) in England, where YH and NE regions are combined, for different data types: (**A**) mutation data, incorporating both synonymous and nonsynonymous mutations; (**B**) synonymous mutation data; (**C**) lineage data. The mean values of parameters, computed using MCMC, are presented in these heatmaps. (**D-F**) provide element-wise comparisons of these matrices, with error bars indicating standard deviations. (**G**) Comparison of the reproductive values *π**i*. (**H**) Comparison of the eigenvalues *λ**i* of the importation-rate matrix. ![Fig. S21.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F32.medium.gif) [Fig. S21.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F32) Fig. S21. (**A**) The importation-rate matrix for the Delta wave in England, inferred from the full dataset. The matrix is inferred using the EM algorithm, and the error indicates the lower and upper quartiles obtained from the bootstrapping method. (**B**) The importation-rate matrix inferred from 10% of the data. (**C**) The Pearson correlation coefficient between the matrix elements of the full dataset and those of the downsampled matrix is computed for downsampling fractions of 5%, 10%, 20%, 40%, 60%, 80%, and 100% (i.e., the full dataset). The error bar indicates the standard deviation in the Pearson correlation coefficient obtained from the bootstrapping method. The matrix elements inferred from the downsampled data begin to deviate from those inferred from the full dataset when the fraction of downsampling is less than 10%. ![Fig. S22.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F33.medium.gif) [Fig. S22.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F33) Fig. S22. (**A**) Six examples of transmission-rate matrices inferred for the Delta wave in England, obtained through the bootstrapping method. For each matrix, the relaxation time computed from the matrix is written on the heatmap. (**B**) Each of the six matrices from Fig. **A** is binarized with a threshold of **A***ij* = 0.005. ![Fig. S23.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F34.medium.gif) [Fig. S23.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F34) Fig. S23. Heat map of the importation-rate matrix **A***ij* for 30 demes of the USA during the Delta wave (epiweek 82-96). The main infection pathways are illustrated in Fig. 6 of the main text. ![Fig. S24.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F35.medium.gif) [Fig. S24.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F35) Fig. S24. Comparison between the actual geography and the multidimensional scaling plot for the state-level importation-rate matrix of the Delta wave in the US. ![Fig. S25.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F36.medium.gif) [Fig. S25.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F36) Fig. S25. (**A**) The number of sequences per week for each variant in England. The numbers [*·*] shown above dates in the horizontal axis indicate epiweeks since Dec/29/2021. (**B**) The weekly number of sequences in each region of England for Alpha, Delta, and Omicron. The numbers [*·*] shown above dates in the horizontal axis indicate epiweeks since Dec/29/2021. ![Fig. S26.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F37.medium.gif) [Fig. S26.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F37) Fig. S26. The number of weekly infected individuals (estimated by surveillance testing (37)) divided by the population size across over time in the regions of England. #### S.9 Construction of demes ##### S.9.1 Construction of demes in England For the deme-resolved analysis of England, presented in Fig. 3, we constructed 50 demes from the upper tier local authorities (UTLAs). This involved clustering within each of the 9 regions of England in the following manner: 1. First, we select the 150 largest UTLAs based on the number *s**u* of metadata sequences reported from each UTLA *u*. 2. In each of the 9 regions, we create a connected graph using Delaunay Triangulation, using the geographic locations of these UTLAs. Edges that cross over the ocean are removed (Fig. S27A). To create 50 demes with similar sizes, we determine the number *n**i* of demes to be assigned to each of the 9 regions using a greedy algorithm: 1. As the initial state, we set *n**i* = 2 for all regions *i*. 2. Assuming that, at an intermediate step of the algorithm, the 9 regions respectively have *n**i* demes. We define the weight of each region as ![Graphic][186], where ![Graphic][187] is the total number of sequences in region *i*. We then increase *n**i* by 1 for the region with the smallest weight. 3. This process is repeated until the sum of *n**i* across all regions reaches the target number of demes, 50. The resulting allocation of demes was *n**i* = 3, 7, 5, 4, 5, 5, 8, 8, 5 for *i* = NE, NW, YH, EM, WM, EE, LDN, SE, SW, respectively. Finally, for each region *i*, we organize *n* demes (where the subscript *i* of *n**i* is omitted for notational simplicity) by grouping the UTLAs in the region into non-overlapping sets *U*1, …, *U**n*: 1. Suppose that region *i* includes *M* UTLAs in the graph, labeled by *u**α* (*α* = 1,…, *M*). By arranging the UTLAs, we may assume that *s*1 ≥ *s*2 ≥ … ≥ *s**M*, where *s**α* is the number of sequences reported from UTLA *u**α*. 2. We initialize the sets *U**k* with the first *n* UTLAs, namely, the largest *n* UTLAs, as *U**k* = {*u**k*} (*k* = 1,…, *n*). 3. For the remaining UTLAs *u**n*+1, …, *u**M*, we add each UTLA one by one to its nearest set on the graph constructed from the Delaunay Triangulation. Here, the distance between UTLA *u* and set *U**k* is defined as ![Graphic][188], where ![Graphic][189] is the graph distance; for example, the distance is 1 for adjacent UTLAs. 4. If the nearest set for UTLA *u* is not unique, we add it to the set with the fewest sequences, to mitigate the imbalance in deme sizes. The resulting demes are shown in Fig. S27B and summarized in Table 1. ![Fig. S27.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/12/05/2024.12.02.24318370/F38.medium.gif) [Fig. S27.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/F38) Fig. S27. (**A**) The nodes represent the top 150 UTLAs with the highest number of sequences in the metadata. Within each region, a graph is constructed using Delaunay triangulation. (**B**) The UTLAs in each region are grouped into *n**i* demes, using the information of the number of sequences. The 50 demes constructed in this manner are shown as connected components. View this table: [Table 1.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/T1) Table 1. List of 50 demes in England. ##### S.9.2 Construction of demes in the US In the US analysis, shown in Fig. 6, we constructed 30 demes using an approach similar to that used in the England analysis: Since the USA has no direct counterpart to the regions of England, we defined 8 regions in the USA for the purpose of the analysis: NE, MA, SA, SC, ENC, WNC, Mt, PAC, as colored in Fig. 6B. We then performed the same clustering as in the England analysis, making the following replacements: 150 UTLAs → 49 states (excluding Hawaii and Alaska) and 50 demes → 30 demes. In addition, instead of using Delaunay Triangulation, we constructed the graph of the US states based on whether they are geographically neighboring. The resulting demes are summarized in Table 2. View this table: [Table 2.](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/T2) Table 2. List of 30 demes in the US. #### S.10 Calendar Dates and Weeks Since December 29, 2019 View this table: [Table3](http://medrxiv.org/content/early/2024/12/05/2024.12.02.24318370/T3) ## ACKNOWLEDGMENTS We are grateful the Hallatschek lab for helpful discussions and feedback. We are grateful to Moritz Kraemer, and Richard Neher for helpful discussions and feedback. TO acknowledges support from JSPS KAKENHI (Grant Numbers JP22K03453 and JP22K06347) and the RIKEN iTHEMS Program. OH acknowledges support by a Humboldt Professorship of the Alexander von Humboldt Foundation. QY acknowledges support from the National Science Foundation Graduate Research Fellowship under Grant No. DGE 1106400. GI acknowledges support of a Humboldt Research Fellowship by the Humboldt Foundation. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231 using NERSC BER-ERCAP0019907. We gratefully acknowledge all data contributors, i.e., the Authors and their Originating laboratories responsible for obtaining the specimens, and their Submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based. The authors acknowledge use of data generated through the COVID-19 Genomics Programme funded by the UK Department of Health and Social Care ([https://www.gov.uk/government/organisations/department-of-health-and-social-care](https://www.gov.uk/government/organisations/department-of-health-and-social-care)). We thank the COG-UK consortium and all partners and contributors who are listed at [https://webarchive.nationalarchives.gov.uk/ukgwa/20230507113711/](https://webarchive.nationalarchives.gov.uk/ukgwa/20230507113711/)[https://www.cogconsortium.uk/about/about-us/about-us/](https://www.cogconsortium.uk/about/about-us/about-us/). ## Footnotes * 1 There is an erratum in the corresponding equation in ref. (54). * Received December 2, 2024. * Revision received December 2, 2024. * Accepted December 5, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. 1. Dirk Brockmann and Dirk Helbing. The hidden geometry of complex, network-driven contagion phenomena. science, 342(6164):1337–1342, 2013. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNDIvNjE2NC8xMzM3IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMTIvMDUvMjAyNC4xMi4wMi4yNDMxODM3MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 2. 2. Romualdo Pastor-Satorras, Claudio Castellano, Piet Van Mieghem, and Alessandro Vespignani. Epidemic processes in complex networks. Reviews of Modern Physics, 87(3):925, 2015. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1103/RevModPhys.87.925&link_type=DOI) 3. 3. Claus O Wilke and Carl T Bergstrom. Predicting an epidemic trajectory is difficult. Proceedings of the National Academy of Sciences, 117(46):28549–28551, 2020. [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTE3LzQ2LzI4NTQ5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMTIvMDUvMjAyNC4xMi4wMi4yNDMxODM3MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 4. 4. Johannes Bracher, Daniel Wolffram, Jannik Deuschel, Konstantin Görgen, Jakob L Ketterer, Alexander Ullrich, Sam Abbott, Maria V Barbarossa, Dimitris Bertsimas, Sangeeta Bhatia, et al. National and subnational short-term forecasting of covid-19 in germany and poland during early 2021. Communications Medicine, 2(1):1–17, 2022. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35603280&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 5. 5. Margaret C Steiner and John Novembre. Population genetic models for the spatial spread of adaptive variants: A review in light of sars-cov-2 evolution. PLoS Genetics, 18(9):e1010391, 2022. 6. 6. Samuel V Scarpino and Giovanni Petri. On the predictability of infectious disease outbreaks. Nature communications, 10(1):898, 2019. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30796206&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 7. 7. Serina Chang, Emma Pierson, Pang Wei Koh, Jaline Gerardin, Beth Redbird, David Grusky, and Jure Leskovec. Mobility network models of covid-19 explain inequities and inform reopening. Nature, 589(7840):82–87, 2021. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2923-3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33171481&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 8. 8. Zachary Susswein, Eugenio Valdano, Tobias Brett, Pejman Rohani, Vittoria Colizza, and Shweta Bansal. Ignoring spatial heterogeneity in drivers of sars-cov-2 transmission in the us will impede sustained elimination. medRxiv, 2021. 9. 9. Piero Birello, Michele Re Fiorentin, Boxuan Wang, Vittoria Colizza, and Eugenio Valdano. Estimates of the reproduction ratio from epidemic surveillance may be biased in spatially structured populations. Nature Physics, pages 1–7, 2024. 10. 10. Roy M Anderson and Robert M May. Infectious diseases of humans: dynamics and control. Oxford university press, 1992. 11. 11. Alun L Lloyd and Robert M May. Spatial heterogeneity in epidemic models. Journal of theoretical biology, 179(1):1–11, 1996. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1006/jtbi.1996.0042&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=8733427&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1996UF55200001&link_type=ISI) 12. 12. Matt J Keeling and Ken TD Eames. Networks and epidemic models. Journal of the royal society interface, 2(4):295–307, 2005. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16849187&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 13. 13. Frank Schlosser, Benjamin F Maier, Olivia Jack, David Hinrichs, Adrian Zachariae, and Dirk Brockmann. Covid-19 lockdown induces disease-mitigating structural changes in mobility networks. Proceedings of the National Academy of Sciences, 117(52):32883–32890, 2020. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTE3LzUyLzMyODgzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMTIvMDUvMjAyNC4xMi4wMi4yNDMxODM3MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 14. 14. Alberto Aleta, David Martin-Corral, Ana Pastore y Piontti, Marco Ajelli, Maria Litvinova, Matteo Chinazzi, Natalie E Dean, M Elizabeth Halloran, Ira M Longini Jr, Stefano Merler, et al. Modelling the impact of testing, contact tracing and household quarantine on second waves of covid-19. Nature Human Behaviour, 4(9):964–971, 2020. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32759985&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 15. 15. Emma B Hodcroft, Moira Zuber, Sarah Nadeau, Timothy G Vaughan, Katharine HD Crawford, Christian L Althaus, Martina L Reichmuth, John E Bowen, Alexandra C Walls, Davide Corti, et al. Spread of a sars-cov-2 variant through europe in the summer of 2020. Nature, 595(7869):707–712, 2021. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1101/2020.10.25.20219063&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34098568&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 16. 16. Joël Mossong, Niel Hens, Mark Jit, Philippe Beutels, Kari Auranen, Rafael Mikolajczyk, Marco Massari, Stefania Salmaso, Gianpaolo Scalia Tomba, Jacco Wallinga, et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS medicine, 5(3):e74, 2008. 17. 17. Kiesha Prem, Kevin van Zandvoort, Petra Klepac, Rosalind M Eggo, Nicholas G Davies, Centre for the Mathematical Modelling of Infectious Diseases COVID-19 Working Group, Alex R Cook, and Mark Jit. Projecting contact matrices in 177 geographical regions: an update and comparison with empirical data for the covid-19 era. PLoS computational biology, 17(7):e1009098, 2021. 18. 18. Stella Talic, Shivangi Shah, Holly Wild, Danijela Gasevic, Ashika Maharaj, Zanfina Ademi, Xue Li, Wei Xu, Ines Mesa-Eguiagaray, Jasmin Rostron, et al. Effectiveness of public health measures in reducing the incidence of covid-19, sars-cov-2 transmission, and covid-19 mortality: systematic review and meta-analysis. bmj, 375, 2021. 19. 19. Claire Johnston, Harriet Hughes, Sion Lingard, Stephen Hailey, and Brendan Healy. Immunity and infectivity in covid-19. bmj, 378, 2022. 20. 20. Nishant Kishore, Aimee R Taylor, Pierre E Jacob, Navin Vembar, Ted Cohen, Caroline O Buckee, and Nicolas A Menzies. Evaluating the reliability of mobility metrics from aggregated mobile phone data as proxies for sars-cov-2 transmission in the usa: a population-based study. The Lancet Digital Health, 4(1):e27–e36, 2022. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34740555&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 21. 21. Philippe Lemey, Andrew Rambaut, Alexei J Drummond, and Marc A Suchard. Bayesian phylogeography finds its roots. PLoS computational biology, 5(9):e1000520, 2009. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19779555&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 22. 22. Philippe Lemey, Andrew Rambaut, John J Welch, and Marc A Suchard. Phylogeography takes a relaxed random walk in continuous space and time. Molecular biology and evolution, 27(8):1877–1885, 2010. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/molbev/msq067&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20203288&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000280296700015&link_type=ISI) 23. 23. Colin A Russell, Terry C Jones, Ian G Barr, Nancy J Cox, Rebecca J Garten, Vicky Gregory, Ian D Gust, Alan W Hampson, Alan J Hay, Aeron C Hurt, et al. The global circulation of seasonal influenza a (h3n2) viruses. Science, 320(5874):340–346, 2008. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzMjAvNTg3NC8zNDAiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8xMi8wNS8yMDI0LjEyLjAyLjI0MzE4MzcwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 24. 24. Noémie Lefrancq, Valérie Bouchez, Nadia Fernandes, Alex-Mikael Barkoff, Thijs Bosch, Tine Dalby, Thomas Åkerlund, Jessica Darenberg, Katerina Fabianova, Didrik F Vestrheim, et al. Global spatial dynamics and vaccine-induced fitness changes of bordetella pertussis. Science Translational Medicine, 14(642):eabn3253, 2022. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1126/scitranslmed.abn3253&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35476597&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 25. 25. Elisa Chao, Connor Chato, Reid Vender, Abayomi S Olabode, Roux-Cil Ferreira, and Art FY Poon. Molecular source attribution. PLOS Computational Biology, 18(11):e1010649, 2022. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36395093&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 26. 26. Moritz UG Kraemer, Verity Hill, Christopher Ruis, Simon Dellicour, Sumali Bajaj, John T McCrone, Guy Baele, Kris V Parag, Anya Lindström Battle, Bernardo Gutierrez, et al. Spatiotemporal invasion dynamics of sars-cov-2 lineage b.1.1.7 emergence. Science, 373 (6557):889–895, 2021. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNzMvNjU1Ny84ODkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8xMi8wNS8yMDI0LjEyLjAyLjI0MzE4MzcwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 27. 27. John T McCrone, Verity Hill, Sumali Bajaj, Rosario Evans Pena, Ben C Lambert, Rhys Inward, Samir Bhatt, Erik Volz, Christopher Ruis, Simon Dellicour, et al. Context-specific emergence and growth of the sars-cov-2 delta variant. Nature, 610(7930):154–160, 2022. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35952712&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 28. 28.The COVID-19 Genomics UK (COG-UK) consortium. An integrated national scale SARS-CoV-2 genomic surveillance network. The Lancet Microbe, 1(3):e99, 2020. doi:10.1016/S2666-5247(20)30054-9. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S2666-5247(20)30054-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32835336&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 29. 29. Yuelong Shu and John McCauley. Gisaid: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance, 22(13):30494, 2017. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28382917&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 30. 30. Rudolf E. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1):35–45, 1960. doi: 10.1115/1.3662552. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1115/1.3662552&link_type=DOI) 31. 31. QinQin Yu, Joao A Ascensao, Takashi Okada, COVID-19 Genomics UK (COG-UK) Consortium, Olivia Boyd, Erik Volz, and Oskar Hallatschek. Lineage frequency time series reveal elevated levels of genetic drift in sars-cov-2 transmission in england. Plos Pathogens, 20 (4):e1012090, 2024. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=38620033&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 32. 32. Sewall Wright. Isolation by distance. Genetics, 28(2):114, 1943. [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo4OiJnZW5ldGljcyI7czo1OiJyZXNpZCI7czo4OiIyOC8yLzExNCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzEyLzA1LzIwMjQuMTIuMDIuMjQzMTgzNzAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 33. 33. Pierre Barrat-Charlaix, John Huddleston, Trevor Bedford, and Richard A Neher. Limited predictability of amino acid substitutions in seasonal influenza viruses. Molecular Biology and Evolution, 38(7):2767–2777, 2021. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33749787&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 34. 34. William S Hart, Sam Abbott, Akira Endo, Joel Hellewell, Elizabeth Miller, Nick Andrews, Philip K Maini, Sebastian Funk, and Robin N Thompson. Inference of the sars-cov-2 generation time using uk household data. ELife, 11:e70767, 2022. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35138250&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 35. 35. William S Hart, Elizabeth Miller, Nick J Andrews, Pauline Waight, Philip K Maini, Sebastian Funk, and Robin N Thompson. Generation time of the alpha and delta sars-cov-2 variants: an epidemiological analysis. The Lancet Infectious Diseases, 22(5):603–610, 2022. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35176230&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 36. 36. Al Mead. Review of the development of multidimensional scaling methods. Journal of the Royal Statistical Society: Series D (The Statistician), 41(1):27–39, 1992. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2348634&link_type=DOI) 37. 37.UK Office for National Statistics. Coronavirus (COVID-19) Infection Survey: England. [https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/coronaviruscovid19infectionsurveydata](https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/coronaviruscovid19infectionsurveydata). Accessed: 2021-12-10. 38. 38. Seema S Lakdawala and Vineet D Menachery. Catch me if you can: superspreading of COVID-19. Trends in Microbiology, 29(10):919–929, 2021. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34059436&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 39. 39. Benjamin M Althouse, Edward A Wenger, Joel C Miller, Samuel V Scarpino, Antoine Allard, Laurent Hébert-Dufresne, and Hao Hu. Superspreading events in the transmission dynamics of SARS-CoV-2: Opportunities for interventions and control. PLoS Biology, 18 (11):e3000897, 2020. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pbio.3000897&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33180773&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 40. 40. Ashish Goyal, Daniel B Reeves, E Fabian Cardozo-Ojeda, Joshua T Schiffer, and Bryan T Mayer. Viral load and contact heterogeneity predict SARS-CoV-2 transmission and superspreading events. eLife, 10:e63537, 2021. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33620317&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 41. 41. Erik Volz, Swapnil Mishra, Meera Chand, Jeffrey C Barrett, Robert Johnson, Lily Geidelberg, Wes R Hinsley, Daniel J Laydon, Gavin Dabrera, Áine O’Toole, et al. Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England. Nature, 593(7858):266–269, 2021. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-021-03470-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33767447&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 42. 42. Manon Ragonnet-Cronin, Olivia Boyd, Lily Geidelberg, David Jorgensen, Fabricia F Nascimento, Igor Siveroni, Robert A Johnson, Marc Baguelin, Zulma M Cucunubá, Elita Jauneikaite, et al. Genetic evidence for the association between COVID-19 epidemic severity and timing of non-pharmaceutical interventions. Nature Communications, 12(1):1–7, 2021. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-020-20314-w&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33397941&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 43. 43. Louis Du Plessis, John T McCrone, Alexander E Zarebski, Verity Hill, Christopher Ruis, Bernardo Gutierrez, Jayna Raghwani, Jordan Ashworth, Rachel Colquhoun, Thomas R Connor, et al. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science, 371(6530):708–712, 2021. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNzEvNjUzMC83MDgiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8xMi8wNS8yMDI0LjEyLjAyLjI0MzE4MzcwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 44. 44.Safegraph social distancing metrics. [https://docs.safegraph.com/docs/social-distancing-metrics](https://docs.safegraph.com/docs/social-distancing-metrics). 2021. 45. 45. Nicholas H Barton and Alison M Etheridge. The relation between reproductive value and genetic contribution. Genetics, 188(4):953–973, 2011. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiZ2VuZXRpY3MiO3M6NToicmVzaWQiO3M6OToiMTg4LzQvOTUzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMTIvMDUvMjAyNC4xMi4wMi4yNDMxODM3MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 46. 46. António MM Rodrigues and Andy Gardner. Reproductive value and the evolution of altruism. Trends in ecology & evolution, 37(4):346–358, 2022. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34949484&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 47. 47. Joao A Ascensao, Kelly M Wetmore, Benjamin H Good, Adam P Arkin, and Oskar Hallatschek. Quantifying the local adaptive landscape of a nascent bacterial community. Nature Communications, 14(1):248, 2023. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36646697&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 48. 48. Pavel Sagulenko, Vadim Puller, and Richard A Neher. Treetime: Maximum-likelihood phylodynamic analysis. Virus evolution, 4(1):vex042, 2018. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ve/vex042&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29340210&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 49. 49.UK Health Security Agency. Sars-cov-2 variants of concern and variants under investigation in england: Technical briefing 26, 2021. 50. 50. Mohammad Alkhatib, Romina Salpini, Luca Carioti, Francesca Alessandra Ambrosio, Stefano D’Anna, Leonardo Duca, Giosuè Costa Maria Concetta Bellocchi, Lorenzo Piermatteo, Anna Artese, et al. Update on sars-cov-2 omicron variant of concern and its peculiar mutational profile. Microbiology Spectrum, 10(2):e02732–21, 2022. 51. 51. Áine O’Toole, Emily Scher, Anthony Underwood, Ben Jackson, Verity Hill, John T McCrone, Rachel Colquhoun, Chris Ruis, Khalil Abu-Dahab, Ben Taylor, et al. Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evolution, 7(2): veab064, 2021. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ve/veab064&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34527285&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 52. 52. Andrew Rambaut, Edward C Holmes, Áine O’Toole, Verity Hill, John T McCrone, Christopher Ruis, Louis du Plessis, and Oliver G Pybus. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature microbiology, 5(11):1403– 1407, 2020. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41564-020-0770-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32669681&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 53. 53. Guillaume Le Treut, Greg Huber, Mason Kamb, Kyle Kawagoe, Aaron McGeever, Jonathan Miller, Reuven Pnini, Boris Veytsman, and David Yllanes. A high-resolution flux-matrix model describes the spread of diseases in a spatial network and the effect of mitigation strategies. Scientific Reports, 12(1):15946, 2022. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36153391&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 54. 54. Christopher M Bishop. Pattern recognition and machine learning, volume 4. Springer, 2006. 55. 55.Frank Noé. Probability distributions of molecular observables computed from markov models. The Journal of chemical physics, 128(24), 2008. 56. 56. Herbert W Hethcote. An immunization model for a heterogeneous population. Theoretical population biology, 14(3):338–349, 1978. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0040-5809(78)90011-4&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=751264&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1978GL07800002&link_type=ISI) 57. 57. WM Post, DL DeAngelis, and CC Travis. Endemic disease in environments with spatially heterogeneous host populations. Mathematical Biosciences, 63(2):289–302, 1983. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0025-5564(82)90044-X&link_type=DOI) 58. 58. Hamish McCallum, Nigel Barlow, and Jim Hone. How should pathogen transmission be modelled? Trends in ecology & evolution, 16(6):295–300, 2001. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0169-5347(01)02144-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11369107&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) 59. 59. Danielle Miller, Michael A Martin, Noam Harel, Omer Tirosh, Talia Kustin, Moran Meir, Nadav Sorek, Shiraz Gefen-Halevi, Sharon Amit, Olesya Vorontsov, et al. Full genome viral sequences inform patterns of SARS-CoV-2 spread into and within Israel. Nature Communications, 11(1):1–10, 2020. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-019-13993-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31911652&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F12%2F05%2F2024.12.02.24318370.atom) [1]: /embed/graphic-2.gif [2]: /embed/inline-graphic-1.gif [3]: /embed/inline-graphic-2.gif [4]: /embed/graphic-8.gif [5]: /embed/graphic-9.gif [6]: /embed/inline-graphic-3.gif [7]: /embed/graphic-11.gif [8]: /embed/graphic-12.gif [9]: F8/embed/inline-graphic-4.gif [10]: F8/embed/inline-graphic-5.gif [11]: F8/embed/inline-graphic-6.gif [12]: F8/embed/inline-graphic-7.gif [13]: /embed/graphic-14.gif [14]: /embed/inline-graphic-8.gif [15]: /embed/graphic-15.gif [16]: /embed/graphic-16.gif [17]: F9/embed/inline-graphic-9.gif [18]: F9/embed/inline-graphic-10.gif [19]: F9/embed/inline-graphic-11.gif [20]: /embed/graphic-19.gif [21]: F12/embed/inline-graphic-12.gif [22]: F12/embed/inline-graphic-13.gif [23]: F12/embed/inline-graphic-14.gif [24]: F12/embed/inline-graphic-15.gif [25]: F12/embed/inline-graphic-16.gif [26]: F12/embed/inline-graphic-17.gif [27]: /embed/graphic-23.gif [28]: /embed/inline-graphic-18.gif [29]: /embed/graphic-25.gif [30]: /embed/graphic-26.gif [31]: /embed/inline-graphic-19.gif [32]: /embed/graphic-27.gif [33]: /embed/inline-graphic-20.gif [34]: /embed/graphic-28.gif [35]: /embed/inline-graphic-21.gif [36]: /embed/graphic-29.gif [37]: /embed/graphic-30.gif [38]: /embed/graphic-31.gif [39]: /embed/graphic-32.gif [40]: /embed/graphic-34.gif [41]: /embed/graphic-35.gif [42]: /embed/graphic-36.gif [43]: /embed/graphic-37.gif [44]: /embed/inline-graphic-22.gif [45]: /embed/graphic-38.gif [46]: /embed/graphic-39.gif [47]: /embed/inline-graphic-23.gif [48]: /embed/graphic-41.gif [49]: /embed/inline-graphic-24.gif [50]: /embed/inline-graphic-25.gif [51]: /embed/inline-graphic-26.gif [52]: /embed/graphic-42.gif [53]: /embed/inline-graphic-27.gif [54]: /embed/inline-graphic-28.gif [55]: /embed/inline-graphic-29.gif [56]: /embed/inline-graphic-30.gif [57]: /embed/inline-graphic-31.gif [58]: /embed/inline-graphic-32.gif [59]: /embed/inline-graphic-33.gif [60]: /embed/inline-graphic-34.gif [61]: /embed/inline-graphic-35.gif [62]: /embed/inline-graphic-36.gif [63]: /embed/graphic-44.gif [64]: /embed/graphic-45.gif [65]: /embed/graphic-46.gif [66]: /embed/inline-graphic-37.gif [67]: /embed/inline-graphic-38.gif [68]: /embed/graphic-47.gif [69]: /embed/graphic-48.gif [70]: /embed/inline-graphic-39.gif [71]: /embed/graphic-49.gif [72]: /embed/graphic-50.gif [73]: /embed/graphic-51.gif [74]: /embed/inline-graphic-40.gif [75]: /embed/inline-graphic-41.gif [76]: /embed/inline-graphic-42.gif [77]: /embed/inline-graphic-43.gif [78]: F18/embed/inline-graphic-44.gif [79]: /embed/inline-graphic-45.gif [80]: /embed/inline-graphic-46.gif [81]: /embed/inline-graphic-47.gif [82]: /embed/inline-graphic-48.gif [83]: /embed/inline-graphic-49.gif [84]: /embed/inline-graphic-50.gif [85]: /embed/graphic-57.gif [86]: /embed/inline-graphic-51.gif [87]: /embed/inline-graphic-52.gif [88]: /embed/graphic-58.gif [89]: /embed/graphic-59.gif [90]: /embed/inline-graphic-53.gif [91]: /embed/inline-graphic-54.gif [92]: /embed/inline-graphic-55.gif [93]: /embed/graphic-60.gif [94]: /embed/inline-graphic-56.gif [95]: /embed/inline-graphic-57.gif [96]: /embed/inline-graphic-58.gif [97]: /embed/inline-graphic-59.gif [98]: /embed/graphic-62.gif [99]: /embed/inline-graphic-60.gif [100]: /embed/graphic-65.gif [101]: /embed/graphic-66.gif [102]: /embed/graphic-67.gif [103]: /embed/inline-graphic-61.gif [104]: /embed/inline-graphic-62.gif [105]: /embed/inline-graphic-63.gif [106]: /embed/inline-graphic-64.gif [107]: /embed/graphic-68.gif [108]: /embed/inline-graphic-65.gif [109]: /embed/inline-graphic-66.gif [110]: /embed/graphic-69.gif [111]: /embed/graphic-70.gif [112]: /embed/inline-graphic-67.gif [113]: /embed/inline-graphic-68.gif [114]: /embed/graphic-71.gif [115]: /embed/inline-graphic-69.gif [116]: /embed/graphic-72.gif [117]: /embed/graphic-73.gif [118]: /embed/graphic-74.gif [119]: /embed/graphic-75.gif [120]: /embed/graphic-76.gif [121]: /embed/inline-graphic-70.gif [122]: /embed/inline-graphic-71.gif [123]: /embed/inline-graphic-72.gif [124]: /embed/inline-graphic-73.gif [125]: /embed/inline-graphic-74.gif [126]: /embed/inline-graphic-75.gif [127]: /embed/graphic-77.gif [128]: /embed/inline-graphic-76.gif [129]: /embed/graphic-78.gif [130]: /embed/inline-graphic-77.gif [131]: /embed/graphic-79.gif [132]: /embed/inline-graphic-78.gif [133]: /embed/graphic-80.gif [134]: /embed/graphic-81.gif [135]: /embed/graphic-82.gif [136]: /embed/graphic-83.gif [137]: /embed/graphic-84.gif [138]: /embed/graphic-85.gif [139]: /embed/inline-graphic-79.gif [140]: /embed/inline-graphic-80.gif [141]: /embed/inline-graphic-81.gif [142]: /embed/inline-graphic-82.gif [143]: /embed/inline-graphic-83.gif [144]: /embed/graphic-86.gif [145]: /embed/graphic-87.gif [146]: /embed/graphic-88.gif [147]: /embed/inline-graphic-84.gif [148]: /embed/inline-graphic-85.gif [149]: /embed/inline-graphic-86.gif [150]: /embed/inline-graphic-87.gif [151]: /embed/inline-graphic-88.gif [152]: /embed/inline-graphic-89.gif [153]: /embed/graphic-89.gif [154]: /embed/inline-graphic-90.gif [155]: /embed/inline-graphic-91.gif [156]: /embed/inline-graphic-92.gif [157]: /embed/inline-graphic-93.gif [158]: /embed/inline-graphic-94.gif [159]: /embed/inline-graphic-95.gif [160]: /embed/inline-graphic-96.gif [161]: /embed/inline-graphic-97.gif [162]: /embed/inline-graphic-98.gif [163]: /embed/graphic-90.gif [164]: /embed/inline-graphic-99.gif [165]: /embed/inline-graphic-100.gif [166]: /embed/inline-graphic-101.gif [167]: /embed/inline-graphic-102.gif [168]: /embed/inline-graphic-103.gif [169]: /embed/inline-graphic-104.gif [170]: /embed/graphic-91.gif [171]: /embed/inline-graphic-105.gif [172]: /embed/inline-graphic-106.gif [173]: /embed/graphic-92.gif [174]: /embed/graphic-94.gif [175]: /embed/inline-graphic-107.gif [176]: /embed/inline-graphic-108.gif [177]: /embed/graphic-95.gif [178]: /embed/inline-graphic-109.gif [179]: /embed/inline-graphic-110.gif [180]: /embed/inline-graphic-111.gif [181]: /embed/inline-graphic-112.gif [182]: F27/embed/inline-graphic-113.gif [183]: /embed/inline-graphic-114.gif [184]: F28/embed/inline-graphic-115.gif [185]: F28/embed/inline-graphic-116.gif [186]: /embed/inline-graphic-117.gif [187]: /embed/inline-graphic-118.gif [188]: /embed/inline-graphic-119.gif [189]: /embed/inline-graphic-120.gif