Abstract
The COVID-19 pandemic has underscored the critical need for accurate epidemic forecasting to predict pathogen spread and evolution, anticipate healthcare challenges, and evaluate intervention strategies. The reliability of these forecasts hinges on detailed knowledge of disease transmission across different population segments, which may be inferred from within-community transmission rates via proxy data, such as contact surveys and mobility data. However, these approaches are indirect, making it difficult to accurately estimate rare transmissions between socially or geographically distant communities. We show that the steep ramp up of genome sequencing surveillance during the pandemic can be leveraged to directly identify transmission patterns between communities. Specifically, our approach uses a hidden Markov model to infer the fraction of infections a community imports from other communities based on how rapidly the allele frequencies in the focal community converge to those in the donor communities. Applying this method to SARS-CoV-2 sequencing data from England and the U.S., we uncover networks of inter-community disease transmission that, while broadly reflecting geographical relationships, also expose epidemiologically significant long-range interactions. We provide evidence that transmission between regions can substantially change between waves of variants of concern, both in magnitude and direction, and analyze how the inferred plasticity and heterogeneity in inter-community transmission impact evolutionary forecasts. Overall, our study high-lights population genomic time series data as a crucial record of epidemiological interactions, which can be deciphered using tree-free inference methods.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Humboldt Professorship of the Alexander von Humboldt Foundation (to OH). JSPS KAKENHI (Grant Numbers JP22K03453 and JP22K06347) and the RIKEN iTHEMS Program (to TO). QY acknowledges support from the National Science Foundation Graduate Research Fellowship under Grant No. DGE 1106400. GI acknowledges support of a Humboldt Research Fellowship by the Humboldt Foundation. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE- AC02-05CH11231 using NERSC BER-ERCAP0019907.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study used (or will use) ONLY openly available human data that are accessibly through the GISAID platform and the COG-UK consortium.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced are available online at https://github.com/Hallatscheklab/NetworkInfer