Abstract
New York City’s Health Department developed a SARS-CoV-2 percent test positivity cluster detection system using census tract resolution and the SaTScan prospective space-time scan statistic. One cluster led to identifying a gathering with inadequate social distancing where viral transmission likely occurred, and another cluster prompted targeted community testing and outreach.
Spatiotemporal analysis of high resolution COVID-19 data can support local health officials to monitor disease spread and target interventions (1,2). Publicly available data have been used to detect COVID-19 space-time clusters at county and daily resolution across the US (3,4) and purely spatial clusters at ZIP code resolution in New York City (NYC) (5).
For routine public health surveillance, the NYC Department of Health and Mental Hygiene (DOHMH) uses the case-only space-time permutation scan statistic (6) in SaTScan1 to detect new outbreaks of reportable diseases (7) (e.g., Legionnaires’ disease (8) and salmonellosis (9)). Given wide variability in SARS-CoV-2 testing across space and time, case-only analyses could be poorly suited for COVID-19 monitoring, as true differences in disease rates would be indistinguishable from changes in testing rates across space and time. Moreover, we sought to detect newly emerging or re-emerging hotspots during an ongoing epidemic, which is more challenging than detecting a newly emerging outbreak in the context of minimal or stable disease incidence.
A new approach was needed to detect areas where COVID-19 diagnoses were increasing or not decreasing as quickly relative to other parts of the city. We launched the system on June 11, 2020 to detect community-based clusters of increased SARS-CoV-2 test positivity in near-real time at census tract resolution in NYC, accounting for testing variability.
The Study
Clinical and commercial laboratories are required to report all results (including positive, negative, and indeterminate results) for SARS-CoV-2 PCR tests for New York State residents to the New York State Electronic Clinical Laboratory Reporting System (ECLRS) (10). For NYC residents, ECLRS transmits reports to DOHMH. Laboratory reports include specimen collection date and patient demographics, including residential address. Patient symptoms and illness onset date, if any, are obtained through patient interviews, although not all patients are interviewed.
To detect emerging clusters, the space-time scan statistic uses a cylinder where the circular base covers a geographical area and the height corresponds to time (11). This cylinder is moved, or “scanned,” over both space and time to cover different areas and time periods. At each position, the number of cases inside the cylinder is compared with the expected count under the null hypothesis of no clusters using a likelihood function, and the position with the maximum likelihood is the primary candidate for a cluster. The statistical significance of this cluster is then evaluated, adjusting for the multiple testing inherent in the many cylinder positions evaluated.
To quickly detect emerging hotspots, prospective analyses are conducted daily (12). To adjust for the multiple testing stemming from daily analyses, recurrence intervals are used instead of p-values (13). A recurrence interval of D days means that under the null hypothesis, if we conduct the analysis repeatedly over D days, then the expected number of clusters of the same or larger magnitude is one.
The space-time scan statistic can be utilized with different probability models. We used the Poisson model (11), where the number of cases is distributed according to the Poisson probability model, with an expected count proportional to the number of persons tested. Analyses were adjusted non-parametrically for purely geographical variations that were consistent over time, as the goal was to detect newly emerging hotspots. Fitting a log-linear function, we also adjusted for citywide temporal trends in percent positivity, as the goal was to detect local hotspots rather than general citywide trends. To prioritize quickly emerging clusters to identify epidemiologic linkages, we used a short maximum temporal window of 7 days. As we wished to also detect sustained clusters to inform place-based resource allocation, we additionally ran analyses using a maximum temporal window of 21 days.
We developed SAS code (SAS Institute, Inc., Cary, NC, USA) that generated input and parameter files (Table 1, Technical Appendix Table 1), invoked SaTScan in batch mode, read analysis results back into SAS for further processing, and output files to secured folders, including a patient linelist, visualizations, and investigator notification email. Similar SAS code referencing markedly different input parameters is freely available.2 Parameters were adjusted during the study period to improve signal prioritization, including increasing the minimum number of cases in clusters from 2 to 5 cases.
Given changing cluster definitions and investigation protocols during the study period, instead of summarizing all clusters detected, we present three illustrative examples (Table 2). On June 22, in the context of waning case counts citywide, the system detected a not statistically significant cluster of 6 patients residing in a 0.6-kilometer radius, all with specimens collected on June 17 (Figure 1A). DOHMH staff interviewed patients for common exposures, such as attending the same event or visiting the same location. On June 23, a DOHMH surveillance investigator (D.B.) determined that two patients in the cluster had attended the same gathering, where recommended social distancing practices had not been observed. In response, DOHMH launched an effort to limit further transmission, including testing, contact tracing, community engagement, and health education emphasizing the importance of isolation and quarantine. Investigation of patients in the cluster with the highest recurrence interval (323 days) during the study period did not reveal any epidemiologic linkages. Detection of a sustained cluster (lasting >1 week) with high percent positivity (8.9%) (Figure 1B) supported the selection of one ZIP code for targeted testing and outreach, as part of NYC’s hyper-local plan to prevent COVID-19 transmission (14).
Conclusions
Automated spatiotemporal cluster detection analyses detected emerging, highly focused areas to target COVID-19 containment efforts in NYC. For jurisdictions where case investigation capacity is limited, interviewing patients in clusters, even if not statistically significant, can help identify common exposures for interrupting further transmission. Regardless of whether epidemiologic linkages are identified — or whether overall transmission is increasing, decreasing, or steady — clusters can be used to prioritize resources and focus interventions such as promoting increased testing, public messaging, and community engagement activities.
Our system is subject to several limitations. First, analyses were based on specimen collection date, but given delays in testing availability and care seeking, these dates did not necessarily represent recent infections. Timeliness was further limited by delays from specimen collection to laboratory testing and reporting. Clusters dominated by asymptomatic patients or patients with illness onset >14 days prior to diagnosis may not require intervention, as a positive PCR result indicates the presence of viral RNA but not necessarily viable virus (15). Second, geocoding is required for precision, and of unique NYC residents with a specimen collected during June–July 2020 for a PCR test for SARS-CoV-2 RNA, approximately 3% had a non-geocodable residential address and were excluded from analyses. Finally, automation coding was complex (Technical Appendix). Planned SaTScan software enhancements that will facilitate wider adoption by other health departments include: adding a software interface for prospective surveillance, enabling temporal and spatial adjustments for the Bernoulli probability model, and enabling the log-linear temporal trend adjustment with automatically calculated trend at a sub-annual scale.
Our COVID-19 early detection system has highlighted areas in NYC warranting a rapid response. Such local targeted, place-based approaches are necessary to minimize further transmission and to better protect people at high risk for severe illness, including older adults and people with underlying health conditions.
Data Availability
Data at high spatiotemporal resolution are not publicly available in accordance with patient confidentiality and privacy laws. Publicly available data are linked below.
First author biographical sketch
Dr. Greene is the director of the Data Analysis Unit at the Bureau of Communicable Disease of the New York City Department of Health and Mental Hygiene, Long Island City, New York. Her research interests include infectious disease epidemiology and applied surveillance methods for outbreak detection.
Acknowledgments
We thank all staff members of the DOHMH Incident Command System Surveillance and Epidemiology Section for processing, cleaning, and managing input data; for conducting patient interviews and cluster investigations; and for logistical support. We also thank the NYC Test and Trace Corps for their assistance in managing the cases and contacts included in and identified by cluster investigations.S.K.G. and E.R.P were supported by the Public Health Emergency Preparedness Cooperative Agreement (grant NU90TP922035-01), funded by the Centers for Disease Control and Prevention. This article’s contents are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention or the Department of Health and Human Services.
Technical Appendix
Geocoding
Patient addresses were geocoded daily using version 20A of the NYC Department of City Planning’s Geosupport geocoding software, implemented in R through C++ using the Rcpp package.3 Addresses that failed to geocode were then cleaned using a string searching algorithm performed against the Department of City Planning’s Street Name Dictionary and Property Address Directory. Addresses that failed to geocode after cleaning were then verified using the IBM Infosphere USPS service.
Study period and time precision
SaTScan v.9.6 can estimate a temporal trend (see below), but only at an annual time scale, as this feature was originally developed to accommodate long-term secular trends across multiple years, as for cancer incidence. As a workaround to accommodate a rapidly changing trend, as for SARS-CoV-2 test positivity, reassign one day as if it were one year in the SaTScan case and population input files and conduct analyses at annual resolution. For example, for a 21-day study period ending June 19, 2020, reassign May 30, 2020 as the year “2000” and June 19, 2020 as the year “2020,” and indicate a time precision and a time aggregation of “year,” (i.e., PrecisionCaseTimes=1 and TimeAggregationUnits=1 in the SaTScan parameter file). The minimum and maximum temporal cluster sizes would be input as years instead of days. Similarly, with input data expressed in years, nonparametric adjustment for space by day-of-week interaction was not possible. Note that in calculating the recurrence interval, SaTScan assumes that analyses are repeated on a regular basis with a periodicity equal to the specified time interval length. Because these daily analyses are specified at annual time intervals, interpret recurrence intervals in days, not years; e.g., a recurrence interval of 1.0 years in this context should be re-interpreted as 1.0 days, i.e., consistent with chance alone.
Temporal trend adjustment
As a workaround for a bug in SaTScan v.9.6 in calculating a temporal trend adjustment in the prospective setting, first use the case and population files to run a retrospective purely temporal Poisson analysis, with the temporal adjustment “Log linear with automatically calculated trend” (TimeTrendAdjustmentType=3 in the SaTScan parameter file). Read in this automatically calculated temporal trend from the SaTScan text output. Retain the magnitude of trend (“X”) and sign of X determined by “increase” or “decrease.” Example SaTScan text output excerpt:
The time trend is the same for retrospective and prospective analyses. Then, run the prospective spatio-temporal Poisson analysis, inserting the calculated time trend in the parameter file as user-specified (TimeTrendAdjustmentType=2, TimeTrendPercentage=-6.42984 in the SaTScan parameter file). Example user interface screenshot:
If unable to enter a negative value for the temporal trend adjustment in the user interface, then edit the “TimeTrendPercentage” value in the SaTScan parameter file in a text editor (e.g., Notepad), then reopen in SaTScan.
Footnotes
(1) Corrected interpretation of statistical significance of reported clusters. As a result of the workaround described in the Technical Appendix to accommodate a rapidly changing trend, in version 1, we inadvertently misinterpreted a cluster with a recurrence interval of 1.0 days (not statistically significant) as having a recurrence interval of 1.0 years (statistically significant); (2) Updated text to further demonstrate application of results to support applied public health practice.
↵1 Kulldorff M, Information Management Services, Inc. SaTScan v9.6: software for the spatial and space-time scan statistics (www.satscan.org). 2018.
↵2 https://github.com/CityOfNewYork/communicable-disease-surveillance-nycdohmh
↵3 Eddelbuettel D, Francois R. Rcpp: Seamless R and C++ integration. Journal of Statistical Software. 2011;40(8):1–18.