Abstract
Background Rapid spread of SARS-CoV-2 in Wuhan prompted heightened surveillance in Shenzhen and elsewhere in China. The resulting data provide a rare opportunity to measure key metrics of disease course, transmission, and the impact of control.
Methods The Shenzhen CDC identified 391 SARS-CoV-2 cases from January 14 to February 12, 2020 and 1286 close contacts. We compare cases identified through symptomatic surveillance and contact tracing, and estimate the time from symptom onset to confirmation, isolation, and hospitalization. We estimate metrics of disease transmission and analyze factors influencing transmission risk.
Findings Cases were older than the general population (mean age 45) and balanced between males (187) and females (204). Ninety-one percent had mild or moderate clinical severity at initial assessment. Three have died, 225 have recovered (median time to recovery is 21 days). Cases were isolated on average 4.6 days after developing symptoms; contact tracing reduced this by 1.9 days. Household contacts and those travelling with a case where at higher risk of infection (ORs 6 and 7) than other close contacts. The household secondary attack rate was 15%, and children were as likely to be infected as adults. The observed reproductive number was 0.4, with a mean serial interval of 6.3 days.
Interpretation Our data on cases as well as their infected and uninfected close contacts provide key insights into SARS-CoV-2 epidemiology. This work shows that heightened surveillance and isolation, particularly contact tracing, reduces the time cases are infectious in the community, thereby reducing R. Its overall impact, however, is uncertain and highly dependent on the number of asymptomatic cases. We further show that children are at similar risk of infection as the general population, though less likely to have severe symptoms; hence should be considered in analyses of transmission and control.
Introduction
Since emerging in Wuhan, China in December of 20191, the epidemic of the novel coronavirus SARS-CoV-2 has progressed rapidly. The disease caused by this virus, dubbed COVID-19 (coronavirus disease 2019) by the World Health Organization (WHO) is characterized by fever, cough, fatigue, shortness of breath, pneumonia, and other respiratory tract symptoms 2–4, and in many cases progresses to death. As of February 24, 2020, there have been 79,331 confirmed cases and 2,618 deaths reported worldwide5. The vast majority of these remain confined to Hubei province, but there has been significant spread elsewhere in China and the world. A rapid and robust response by the global scientific community has described many key aspects of SARS-CoV-2 1,2,6–8 transmission and natural history, but key questions remain.
If well tracked, early introductions of an emerging pathogen provide a unique opportunity to characterize its transmission, natural history, and the effectiveness of screening. The careful monitoring of cases and low probability of infection from the general community enables inferences, critical to modeling the course of the outbreak, that are difficult to make during a widely disseminated epidemic. In particular, we are able to make assumptions about when and where cases were likely infected that are impossible when the pathogen is widespread. Furthermore, during these early phases, uninfected and asymptomatic contacts are often closely tracked, providing critical information on transmission and natural history. Combined, this data on early introductions can be used to give insights into disease natural history 9, transmission characteristics 10, and the unseen burden of infection 11.
Here, we use data collected by the Shenzhen Center for Disease Control and Prevention (Shenzhen CDC) on 391 cases of COVID-19 and 1,286 of their close contacts to characterize key aspects of its epidemiology outside of Hubei province. We characterize differences in demographics and severity between cases identified through symptom-based surveillance and the monitoring of close case contacts, and estimate the time to key events, such as confirmation, isolation, and recovery. Using data from contact tracing, we characterize SARS-CoV-2 transmission by estimating key values, such as the household secondary attack rate, serial interval and observed reproductive number.
Methods
Case Identification
On January 8, 2020, Shenzhen CDC identified the first case of pneumonia with unknown cause and began monitoring travelers from Hubei province for symptoms of COVID-19. Over the next two weeks this surveillance program expanded to include travelers from Hubei regardless of symptoms, patients at local hospitals, and individuals detected by fever screening in neighborhoods and at local clinics. Suspected cases and close contacts were tested for SARS-CoV-2 by PCR of nasal swabs at 28 qualified local hospitals, 10 district level CDCs, and 2 third party testing organizations, with final confirmation performed at the Guangdong CDC or Shenzhen CDC (Text S1). Close contacts were defined as those who lived in the same apartment, shared a meal, traveled, or socially interacted with an index case during the period starting two days before symptom onset. Casual contacts (e.g., other clinic patients) and some close contacts (e.g., nurses) who wore a mask during exposure were not included in this group.
Symptomatic cases were isolated and treated at designated hospitals regardless of test results. Asymptomatic positives were isolated at centralized facilities. Close contacts and travelers from Hubei testing negative were isolated at home or a central facility, and monitored for 14 days. PCR testing was required for all close contacts at the beginning of isolation, and release was conditional on a negative PCR result. Basic demographics, signs and symptoms, clinical severity, and exposure history were recorded for all confirmed cases.
Here, we analyze confirmed cases identified by the Shenzhen CDC between Jan 14, 2020 and Feb 12, 2020, and the close contacts of cases confirmed before February 9th.
Epidemiologic and Clinical Characteristics of Cases
We define symptom-based surveillance to include symptomatic screening at airport and train stations, community fever monitoring, home observation of recent travellers to Hubei, and testing of hospital patients. Contact-based surveillance is the identification of cases through monitoring and testing of close contacts of confirmed cases. By protocol, those in the contact-based group were tested for SARS-CoV-2 infection regardless of symptoms, while those in the other categories were tested only if they showed signs or symptoms of disease.
At first clinical assessment, data were recorded on 21 signs and symptoms (see supplement), and disease severity was assessed. Cases with fever, respiratory symptoms, and radiographic evidence of pneumonia were classified as having moderate symptoms. Cases were classified as having severe symptoms if they had any of: breathing rate ≥30/min; oxygen saturation level ≤93% at rest; oxygen concentration level PaO2/FiO2 ≤ 300mmHg (1mmHg=0.133kPa); lung infiltrates >50% within 24-48 hours; respiratory failure requiring mechanical ventilation; septic shock; or multiple organ dysfunction/failure. All other symptomatic cases were classified as mild.
Relationships between demographics, mode of detection and symptom severity were assessed and characterized using χ2-tests, and logistic regression.
Timing of Key Events
Distributions were fit to the timing of key events in each confirmed case’s course of infection and treatment. The time from infection to symptom onset (incubation periods) were assumed to be log-normally distributed and estimated as previously described 12–14. Cases who recently travelled to Hubei were assumed to have been exposed while there. Cases without a recent travel history but with exposure to a confirmed case, were assumed to be exposed from the time of earliest to latest possible contact with that case. Only cases for whom we could bound the earliest and latest period of exposure and had a date of symptom onset were included in the analysis.
Time between symptom onset and recovery was estimated using parametric survival methods. Patients who had not recovered were considered to be censored on February 22, 2020 or at the time of death. All other delay distributions were estimated by directly fitting parametric distributions to time between symptom onset or arrival in Shenzhen, and confirmation, isolation or hospitalization. Confidence intervals were calculated using bootstrapping or standard parametric estimators 15.
Transmission Characteristics
Transmission was characterized by examining the relationship between confirmed cases and their infected and uninfected close contacts. The household secondary attack rate (SAR) was calculated as the percentage of household contacts (those sharing a room, apartment or other sleeping arrangement) who were later confirmed to be infected with SARS-CoV-2. The distribution of serial intervals (the time between symptom onset in infector and infectee) was calculated by fitting parametric distributions to the time of symptom onset in clear infector/infectee pairs. The mean observed reproductive number, R, and distribution of personal reproductive numbers (i.e., the number of secondary infections caused by each case) were calculated from the number of secondary infections observed among close contacts of each index case, with ambiguities resolved through multiple imputation. The relative odds of transmission among contacts of various types were estimated using conditional logistic regression and random effects models, to account for differing numbers of possible infectors in each risk group. Confidence intervals were estimated using bootstrapping or standard parametric approaches.
Results
Epidemiologic and Clinical Characteristics
Between Jan 14, 2020 and Feb 12, 2020 the Shenzhen CDC confirmed 391 cases of SARS-CoV-2 infection (Table 1). Of 379 with a known mode of detection, 77% were detected through symptom-based surveillance. Overall, there were approximately equal numbers of male and female cases (187 vs. 204), and 79% were adults between the ages of 30 and 69. At the time of first clinical assessment, most cases were mild (26%) or moderate (65%); and only 35 (9%) were severe. Eighty-four percent of cases had fever at the time of initial assessment, while 6% had no signs or symptoms. As of February 22, 2020, final clinical outcomes were known for 228 of the 391 cases in our data; with three having died and 225 recovered.
Cases detected through symptom-based surveillance were more often male (55% vs 28%) and between the ages of 20-69 (91% vs 75%) than those detected through contact-based surveillance (Tables 1 and 2). At the time of first clinical assessment, 29% of the contact-based surveillance group did not have fever, and 20% had no symptoms. In contrast, 88% of the symptom-based surveillance group had fever, and only 8 reported no symptoms.
In multiple logistic regression, severe symptoms were associated with being male (OR 2.5, 95% CI 1.1,6.1). There was a general trend of increasing probability of severe symptoms with age, though only 60-69 year olds showed a significant difference from the reference category (OR 3.4 versus 50-59 year olds; 95% 1.4,9.5) and being male (OR=2.5, 95%CI 1.1, 6.1) (Table 2).
Timing of Key Events
Based on 183 cases with a well-defined period of exposure and symptom onset (Figure S1), we estimate the median incubation period for COVID-19 to be 4.8 days (95% CI 4.2,5.4) [Figure 2, Table S2], and that 95% of those who develop symptoms will do so within 14.0 days (95% CI 12.2,15.9) of infection.
Based on 228 cases with known outcomes, we estimate that the median time to recovery is 22 days (95% CI 21,24) in 50-59 year olds, and is estimated to be significantly shorter in younger adults (e.g., 19 days in 20-29 year olds), and longer in older groups (e.g., 23 days in those aged 60-69) despite not being statistically significant (Table S5, Figure S3). In multiple regression models including sex, age, baseline severity and method of detection, in addition to age, baseline severity was associated with time to recovery. Compared to those with mild symptoms, those with severe symptoms were associated with a 41% (95% CI, 24%,60%) increase in time to recovery. Thus far, only three have died. These occurred 35-44 days from symptom onset and 27-33 days from confirmation.
Cases detected through symptom-based surveillance were confirmed on average 5.5 days (95% CI 5.0, 5.9) after symptom onset (Figure 3, Table S2); compared to 3.2 days (95% CI 2.6,3.7) in those detected by contact-based surveillance. 17 cases were isolated before developing symptoms. Among those isolated after, the symptom-based surveillance group were, on average, isolated 4.6 days (95% CI 4.1,5.0) after symptom onset, versus 2.7 days (95% CI 2.1,3.3) in the contact-based surveillance group. Hence, contact-based surveillance was associated with a 2.3 days (95% CI 1.5,3.0) decrease in time to confirmation and a 1.9 days (95% CI 1.1,2.7) decrease in time to isolation. Timings between symptom onset and hospitalization were similar to isolation results (Figure 3, Table S2).
Sixty-four percent (191/298) of travelers developed symptoms after arriving in Shenzhen, with a mean time from arrival to symptom onset of 4.9 days (95% CI 4.2, 5.5) (Table S2). Those developing symptoms prior to arrival or on the day of arrival were confirmed as cases on average 4.5 days (95% CI 3.8,5.1) after arrival, and isolated on average 3.1 days (95% CI, 2.5,3.7) after.
Transmission Characteristics
Overall, 1286 close contacts were identified for index cases testing positive for SARS-CoV-2 between January 14 and February 9, 2020, with 83% (244/292) of cases having at least one close contact. Ninety-five percent of close contacts were followed 12 days or longer. Ninety-eight tested PCR positive for SARS-CoV-2 infection, and one had presumptive infection.
Excluding those with a missing test result, we found that the secondary attack rate was 14.9% (95% CI 12.1,18.2) among household contacts and 9.6% (95%CI 7.9,11.8) overall (these drop to 11.2% and 6.6% if those with missing results are considered to be negative). In multivariable analysis of contact types household contact (OR 6.3, 95% CI 1.5, 26.3) and travelling together (OR 7.1, 95% CI 1.4, 34.9) were significantly associated with infection. Reporting contact occurred “often” was also associated with increased risk of infection (OR 8.8 versus moderate frequency contacts, 95% CI 2.6,30.1).
Attack rates were similar across infectee age categories (Table 3), though there is some indication of elevated attack rates in older age groups (Figure 1). Notably, the rate of infection in children under 10 (7.4%) was similar to the population average (7.9%). There was no significant association between probability of infection and age of the index case. Surprisingly, in univariate analysis a longer time in the community prior to isolation was associated with a reduced risk of causing infections. However, this association was no longer significant after adjusting for contact frequency and type.
Based on 48 pairs of cases with a clear infector-infectee relationship and time of symptom onset, we estimate that the serial interval is gamma distributed with mean 6.3 days (95% CI 5.2,7.6) and a standard deviation of 4.2 days (95% CI, 3.1,5.3) (Figure 2B, Table S2). Hence, 95% of cases are expected to develop symptoms within 14.3 (95% CI, 11.1,17.6) days of their infector. It should be noted this estimate includes the effect of isolation on truncating the serial interval. Stratified results show that if the infector was isolated less than 3 days after infection the average serial interval was 3.6 days, increasing to 8.1 days if the infector was isolated on the third day after symptom onset or later (Table S4).
The mean number of secondary cases caused by each index case (i.e., the observed reproductive number, R), was 0.4 (95% CI 0.3,0.5). The distribution of personal reproductive numbers was highly overdispersed, with 80% of infections being caused by 8.9% (95% CI 3.5,10.8) of cases (negative binomial dispersion parameter 0.58, 95% CI 0.35, 1.18).
Potential Impact of Surveillance and Isolation on Transmission
To calculate the potential impact of surveillance and isolation on transmission, we considered a range of possible infectious periods where infectiousness varied over time. We define the mean infectious day (i.e., the average number of days after symptom onset an infector is expected to infect a secondary case) as the weighted mean of the infectious period, where each day is weighted by relative infectiousness. We consider periods where the mean infectious day is less than 15 days after symptom onset (roughly the period of SARS and early SARS-CoV-2 reports 16,17), and assume that R=2.6 and that isolation effectively ends the infectious period. Under these assumptions we find if the mean infectious day is greater than 5 days, then it may be possible to bring R below one in those detected by symptom-based surveillance; and the same can be accomplished by contact-based surveillance if the mean infectious day is greater than 3 days. For the impact of passive surveillance alone to achieve our observed R of 0.4, we project the mean infectious day must be at least 5.5 days (and likely more) after symptom onset.
Even if transmission is completely eliminated in the group captured by surveillance (e.g., if we could get perfect surveillance on the day of symptom onset), assuming R=2.6, the cases captured by surveillance must, if unisolated, be expected to cause 61% of onward transmission to achieve local elimination by surveillance and isolation alone (see Text S2).
Discussion
This analysis of early SARS-CoV-2 cases and their close contacts in Shenzhen China, provides insights into the natural history, transmission and control of this disease. The values estimated provide the evidentiary foundation for predicting the impact of this virus, evaluating control measures, and guiding the global response. Analysis of how cases are detected, and use of data on individuals exposed but not infected, allow us to show that infection rates in young children are no lower than the population average (even if rates of clinical disease are). We are able to directly estimate critical transmission parameters, and show that, at least among observed contacts, transmission rates are low. Estimates of the distribution of time between symptom onset and case isolation by surveillance type reveal that heightened surveillance combined with case isolation could plausibly account for these low rates of transmission. These results paint a positive picture of the impact of heightened surveillance and isolation in Shenzhen. However, uncertainty in the number of asymptomatic cases missed by surveillance and their ability to transmit must temper any hopes of stopping the COVID-19 epidemic by this means.
This work further supports the picture of COVID-19 as a disease with a fairly short incubation period (mean 4-6 days) but a long clinical course 2,7,19, with patients taking many weeks to die or recover. It should be noted, however, that we estimate a higher proportion of cases taking 14 days or more to develop symptoms (9%) than previous studies 6,7.
Focusing on cases detected through contact-based surveillance adds nuance to previous characterizations of COVID-19. Since PCR testing of contacts is near universal, we can assume these cases are more reflective of the average SARS-CoV-2 infection than cases detected through symptomatic surveillance. In the contact-based surveillance group, any tendency for cases to be male or older (beyond the underlying population distribution, see Table S3) disappears. Further, in this group, 20% were asymptomatic at the time of first clinical assessment and nearly 30% did not have fever. This is consistent with a reasonably high rate of asymptomatic carriage, but less than suggested by some modeling studies18, though PCR has imperfect sensitivity20.
In Shenzhen, SARS-CoV-2 transmission is most likely between very close contacts, such as individuals sharing a household. However, even in this group less than 1 in 6 contacts were infected; and, overall, we observed far less than one (0.4) onward transmission per primary case. As noted above, low transmission levels may in part be due to the impact of isolation and surveillance; but it is equally likely unobserved transmission is playing some rule. We also estimate reasonably high rates of overdispersion in the number of cases each individual causes, leaving open the possibility that large COVID-19 clusters occur even if surveillance and isolation are forcing R below one; events that could potentially overwhelm the surveillance system.
This work has numerous limitations. As in any active outbreak response, the data were collected by multiple teams under protocols that, by necessity, changed as the situation developed. Hence, there may be noise and inconsistency in definitions. Of note, the definition of a confirmed case changed to require symptoms near the end of our analysis period (Feb. 7); but sensitivity analyses show that truncating the data at this point does not qualitatively impact results. It is, likewise, impossible to identify every potential contact an individual has, so contact tracing focuses on those close contacts most likely to be infected; hence our observed R is assuredly less than the true R in the population. Asymptomatic travellers will be missed by symptom-based surveillance; and, even if tested, some asymptomatic contacts may be missed due to the imperfect sensitivity of the PCR test 20.
As SARS-CoV-19 continues to spread, it is important that we continue to expand our knowledge about its transmission and natural history. Data from the early phase of local outbreaks, when detailed contact tracing is possible and sources of infection can still be reliably inferred, are particularly powerful for estimating critical values. This is especially true when information on uninfected contacts and mode of detection are used, as we have done here. The resulting estimates provide critical inputs for interpreting surveillance data, evaluating interventions, and setting public health policy.
Data Availability
Linelist data contains PHI and cannot be made available. The authors are working to create deidentified summary data sets that will be made available when completed.
FUNDING SOURCE
TM, CY, TZ, BS, YS and JZ were funded by the Emergency Response Program of Harbin Institute of Technology (HITERP010) and Emergency Response Program of Peng Cheng Laboratory (PCLERP001). JL, ST and QB were funded by a grant from the US Centers for Disease Control and Prevention (NU2GGH002000).
AUTHOR CONTRIBUTION
YW, SM, XZ, ZZ, XL, WG, LW, CC, XT, XW, YW, and SH collected the data JL, QB, SAT, and CY performed statistical analyses, and drafted the manuscript and figures TZ, BS, YS, JZ, TM, and CY cleaned the data QB, MT, JL, and TF conceived the study and supervised the collection of data
CONFLICT OF INTEREST STATEMENT
The authors report no conflict of interest.
SUPPLEMENTARY MATERIAL
Text S1: Data Extraction and Confirmation Details
By categorizing COVID-19 as a notifiable disease Class B, Chinese Law on the Prevention and Treatment of Infectious Diseases required all cases to be immediately reported to China’s Infectious Disease Information System. Each case was recorded into the system by local epidemiologists and public health professionals who did the field investigation and collected possible exposure related information. All data on COVID-19 case reported in Shenzhen were extracted from the Infectious Disease Information System by the end of February 12, 2020. Then personal information including demographics, symptoms, clinical outcome and severity and so on, were stripped to construct an anonymous dataset. All cases were included without sampling and no eligibility criteria were needed.
All laboratory confirmation of SARS-CoV-2 were done by Guangdong Center for Disease Prevention and Control (CDC) before Jan 30, 2020, and then only need to be done by Shenzhen CDC, when it obtained the qualification of laboratory-confirmation of 2019-nCoV from the authority. The RT-PCR assay was conducted in the BSL-2 laboratory of Shenzhen CDC, using the protocol established by the World Health Organization and China CDC.
Text S2: Supplemental Calculation
Let R0 be the basic reproductive number, ρ be the percent of transmission due to cases potentially reachable by the surveillance system, and γ be the relative effective infectious period of those captured by surveillance. Then:
When R is below one, sustained outbreaks are impossible. Hence, for a known R0 and γ such that γ R0 < 1, we can calculate the proportion of transmission that must be from people who can be captured by surveillance as:
Assuming an R0 of 2.6 and that surveillance reduces R0 by a factor of 0.18, we find control is possible if 75% of people can be captured by surveillance.
ACKNOWLEDGEMENT
We thank Andrew Azman, Derek Cummings, Steve Lauer, Jacco Wallinga, and Michael Mina for advice and input on the manuscript and analyses. We thank all patients, close contacts, and their families involved in the study; as well as the front line medical staff and public health workers who collected this critical data.