Abstract
Genomic surveillance data are used to detect communicable disease clusters, typically by applying rule-based signaling criteria, which can be arbitrary. We applied the prospective tree-temporal scan statistic (TreeScan) to genomic data with a hierarchical nomenclature to search for recent case increases at any granularity, from large phylogenetic branches to small groups of indistinguishable isolates. Using COVID-19 and salmonellosis cases diagnosed among New York City (NYC) residents and reported to the NYC Health Department, we conducted weekly analyses to detect emerging SARS-CoV-2 variants based on Pango lineages and clusters of Salmonella isolates based on allele codes. The SARS-CoV-2 Omicron subvariant EG.5.1 first signaled as locally emerging on June 22, 2023, seven weeks before the World Health Organization designated it as a variant of interest. During one year of salmonellosis analyses, TreeScan detected fifteen credible clusters worth investigating for common exposures and two data quality issues for correction. A challenge was maintaining timely and specific lineage assignments, and a limitation was that genetic distances between tree nodes were not considered. By automatically sifting through genomic data and generating ranked shortlists of nodes with statistically unusual recent case increases, TreeScan assisted in detecting emerging communicable disease clusters and in prioritizing them for investigation.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by the U.S. Centers for Disease Control and Prevention (NU90TP922035-05, NU50CK000517-01-09, NU50CK000517-05-00).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Institutional Review Board of the NYC Health Department gave ethical approval for this work. The IRB determined this activity meets the definition of public health surveillance as set forth under 45 CFR 46.102(l)(2).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Data, software, and code are available as follows at the links below: (1) SARS-CoV-2 variant data for New York City residents. (2) Allele codes for Salmonella isolates (available to CDC partners via SEDRIC). (3) SAS code for generating TreeScan input files. (4) TreeScan software for free download. (5) TreeScan source code.
https://github.com/nychealth/coronavirus-data/tree/master/variants
https://www.cdc.gov/foodborne-outbreaks/php/foodsafety/tools/
https://github.com/CityOfNewYork/communicable-disease-surveillance-nycdohmh