ABSTRACT
Vigorous non-pharmaceutical interventions have largely suppressed the COVID-19 outbreak in Wuhan, China. We developed a susceptible-exposed-infectious-recovered model to study the transmission dynamics and evaluate the impact of interventions using 32,583 laboratory-confirmed cases from December 8, 2019 till March 8, 2020, accounting for time-varying ascertainment rates, transmission rates, and population movements. The effective reproductive number R0 dropped from 3.89 (95% credible interval: 3.79-4.00) before intervention to 0.14 (0.11-0.28) after full-scale multi-8 pronged interventions. By projection, the interventions reduced the total infections in Wuhan by 96.5% till March 8. Furthermore, we estimated that 79% (lower bound: 60%) of the total infections were unascertained, potentially including asymptomatic and mild-symptomatic cases. The probability of resurgence was 0.22 and 0.10 based on models with 79% and 60% infections unascertained, respectively, assuming interventions were lifted after a 14-day period of no new ascertained infections. These results provide important implications for continuing surveillance and interventions to eventually contain the outbreak.
The novel coronavirus disease (COVID-19) caused by SARS-CoV-2 was detected in Wuhan, China, in December 2019.1 Many early cases were connected to the Huanan Seafood Market, which was shut down on January 1, 2020 to stop potential zoonotic infection. Nevertheless, the high population density of Wuhan together with the increased social activities before the Chinese New Year catalyzed the outbreak in January, 2020. The massive human movement during the holiday travel season Chunyun, which started on January 10, further expedited spreading of the outbreak. Shortly after the confirmation of human-to-human transmission, the Chinese authorities implemented the unprecedented city lockdown of Wuhan on January 23 to contain the geographic spread, followed by a series of non-pharmaceutical interventions to reduce virus transmission, including suspension of all intra- and inter-city transportation, cancelation of public events, social distancing, and home quarantine of mild-symptomatic patients. From February 2, strict stay-at-home policy for all residents and centralized quarantine of all patients and close contacts were implemented to stop family and community transmission. Furthermore, a city-wide door-to-door universal symptom survey was carried out during February 17-19 by designated community workers to identify previously undetected symptomatic cases. Details of the interventions were described in Pan et al.2 These drastic interventions, together with the improved medical resources and healthcare manpower from all over the country, have effectively bent the epidemic curve and reduced the attack rate in Wuhan, shedding light on the global efforts to control the COVID-19 outbreak.2
Several recent studies have reported a nonnegligible proportion of asymptomatic cases with transmissibility,3–9 which were not considered by previous epidemiological studies.2,10 Furthermore, the number of officially reported cases was much smaller than those estimated by earlier modeling-based studies using international cases exported from Wuhan prior to the city lockdown,11–13 implying a substantial number of unascertained cases. Using reported confirmed cases from 375 cities in China, a recent modeling study concluded that substantial unascertained cases had facilitated the rapid spreading of COVID-19.14 In addition, accounting for the unascertained cases has refined the estimation of case fatality risk of COVID-19, leading to a better understanding of the clinical severity of the disease.15 Modeling both ascertained and unascertained cases can facilitate interpretation of transmission dynamics and epidemic trajectories.
Based on comprehensive epidemiological data from Wuhan,2 we developed an extended susceptible-exposed-infectious-recovered (SEIR) model to delineate the full spectrum of COVID-19 outbreak in the epicenter, accounting for the unascertained cases, population movement, and different intervention strengths across time periods (Fig. 1). Specifically, we compartmentalized the population into S susceptible, E latent, I ascertained infectious, A unascertained infectious, H hospitalized, and R removed individuals. Compared with the classic SEIR model, we explicitly modeled population movement11 and introduced two additional compartments A and H to model that only ascertained cases would seek for medical care and thus be quarantined by hospitalization. We chose to model from January 1, 2020 to avoid complications from potential zoonotic infection from the Huanan Seafood Market. We divided the outbreak into five time periods based on key events and interventions: January 1 to 9 (before Chunyun), January 10 to 22 (Chunyun), January 23 to February 1 (city lockdown), February 2 to 16 (centralized quarantine), and February 17 to March 8 (community screening). We assumed a constant population size of 10 million with equal numbers of daily inbound and outbound travelers (500,000 before Chunyun, 800,000 during Chunyun, and 0 after city lockdown).11 Furthermore, we assumed transmission rate and ascertainment rate did not change in the first two periods, because few interventions were implemented before January 23, while they were allowed to vary in later periods to reflect different intervention strengths. We used Markov Chain Monte Carlo (MCMC) to estimate these parameters by assuming the daily incidence following a Poisson distribution, while the other parameters were set based on previous epidemiological investigations16,17 or from our data (Methods). We assumed the same transmissibility in unascertained and ascertained cases, because unascertained cases were likely asymptomatic or mild-symptomatic, who were reported to have similar viral load in upper respiratory specimens as those from symptomatic patients.18,19
Intuitively, due to the force of infection contributed by the unascertained cases, in the absence of interventions, the epidemic curve would grow faster than the expected curve under a model with no unascertained cases. Thus, growth rate of the epidemic curve provides information to estimate the ascertainment rate. We first simulated epidemic curves with two periods to test the performance of our method (Methods). As shown in Figs. S1-S2, our method could accurately estimate the ascertainment and transmission rates when the model was correctly specified and was relatively robust to misspecification of parameters and initial states. It is worth noting that while estimation of transmission rate was a bit sensitive to misspecification of the length of latent or infectious period, estimation of the effective reproductive number R0 appeared to be more robust. For example, misspecification of the infectious period by 25% lower would lead to ~20% overestimation in the transmission rate but only ~10% 16 underestimation in R0, both for the first period (Fig. S2).
We then applied our model to fit the daily incidences in Wuhan from January 1 to February 29, assuming the ascertainment rate at the early outbreak was about 0.5, and used the fitted model to predict the trend from March 1 to 8. As shown in Fig. 2A, our model fit the observed data well, except for the outlier on February 1, which might be due to approximate-date records of many patients admitted to the mobile cabin hospitals set up after February 1. The slight overprediction for the last three days (March 6 to 8) was likely due to the delay in laboratory confirmation. The transmission rate decreased from 1.77 (95% credible intervals [CrI]: 1.73-1.81) in the first two periods to 0.57 (0.55-26 0.59), 0.21 (0.20-0.22), and 0.08 (0.06-0.10) in the later three periods, respectively (Table S3), which could be translated into R0 of 3.93 (3.83-4.04), 3.89 (3.79-4.00), 1.23 (1.18-1.28), 0.44 (0.42-0.46) and 0.14 (0.11-0.18) for the five periods, respectively (Fig. 2B, Table S4). We estimated the number of total infections, which included both ascertained and unascertained cases, till March 8 to be 163,328 (136,308-193,140) if the trend of the fourth period was assumed (Fig. 2C), or 514,457 (411,833-631,018) if the trend of the third period was assumed (Fig. 2D), or 4,514,872 (4,434,347-4,572,588) if the trend of the second period was assumed (Fig. 2E), in comparison to the estimated total infections of 157,894 (131,673-186,720) by fitting data from all five periods (Fig. 2A). These numbers were translated to 3.3%, 69.3%, and 96.5% reduction of infections due to the interventions in different periods.
Strikingly, we estimated low ascertainment rates across periods, which were 0.18 (0.16-0.20) for the first two periods, and 0.22 (0.19-0.25), 0.17 (0.15-0.19), and 0.29 (0.25-11 0.35) for the other three periods, respectively (Table S3). Even with the universal community symptom screening implemented on February 17 to 19, the ascertainment rate was only increased to 0.29. Based on the fitted model using data from January 1 to February 29, we projected the cumulative number of ascertained cases to be 32,588 (28,920-36,473) by March 8, close to the actual reported number of 32,583. This was equivalent to an overall ascertainment rate of 0.21 (0.18-0.24) given the estimated total infections of 157,894 (131,673-186,720). The model also projected that the number of daily active infections in Wuhan, including both ascertained and unascertained, peaked at 14,393 (11,735-17,284) on February 3 and dropped afterwards to 161 (109-222) on March 8 (Fig. 2F). If the trend remained unchanged, the number of ascertained infections would first become zero on March 24 (95% CrI: March 17 to March 30), while the clearance of all infections would occur on April 18 (April 6 to May 8), 2020 23 (Table S5).
The large fraction of unascertained cases has important implications for continuing surveillance and interventions.20 Based on stochastic simulations, we estimated the probability of resurgence after lifting all controls, assuming the transmission rate, ascertainment rate, and daily population movement were resumed to values of the first period (Methods). Because of the latent period and unascertained cases, the source of infection would not be completely cleared shortly after the first day of zero ascertained cases. We found that if control measures were lifted 14 days after the first day of zero ascertained cases, despite sparse new cases might be ascertained during the observation period, there would still be as high as 0.90 probability of resurgence, and the surge is predicted to occur on day 27 (95% CrI: 17-44) after lifting controls (Fig. 3). If we imposed a more stringent criterion of lifting controls after observing no ascertained cases in a consecutive period of 14 days, the probability of resurgence would drop to 0.22, with possible resurgence delayed to day 33 (95% CrI: 22-54) after lifting controls (Fig. 3). These results highlighted the risk of ignoring unascertained cases in switching intervention strategies, despite using an over-simplified model without considering other factors such as imported cases, changes in temperature and humidity, and a stepwise lifting strategy that is currently adopted by Wuhan and other cities in China.
We performed a series of sensitivity analyses to test the robustness of our results by considering the outlier data point on February 1, varying lengths of latent and infectious periods, lower transmissibility for unascertained cases, and different initial states (Tables S3-S5, Figs. S4-S11). Our major findings of remarkable decrease in R0 after interventions and the existence of substantial unascertained cases was robust in all sensitivity analyses. Consistent with simulation results, the estimated ascertainment rates increased with decreasing initial numbers of latent and unascertained cases (Figs. S9-S11). If we assumed an extreme scenario with no unascertained cases in the early outbreak (model S8; Fig. S11), the estimated ascertainment rate would be 0.40 (0.35-0.45) overall and 0.55 (0.47-0.64) for the last period, which would represent an upper bound of the ascertainment rate. In this model, because of the higher ascertainment rate compared to the main analysis, we estimated a lower probability of resurgence of 0.10 when lifting controls after 14 days of no ascertained cases, and a longer time to resurgence, occurring on day 37 (95% CrI: 24-58) after lifting controls (Fig. 3). We also tested a simplified model assuming complete ascertainment anytime, but this simplified model performed significantly worse than the full model, especially in fitting the rapid growth before interventions as expected (Fig. S12).
To evaluate our model assumptions, we compared with a published modeling study, which was based on independent data of early cases exported from Wuhan to other countries.11 Wu et al.11 estimated that 75,815 (37,304-130,330) individuals had been infected in Greater Wuhan as of January 25, 2020, while we estimated a total number of 40,883 (34,468-47,799) infections by the same day. The discrepancy was mainly due to different assumptions of population size, which was 19 million for the Greater Wuhan Area including surrounding cities by Wu et al.11 versus 10 million for the Wuhan city in our analysis. After accounting for the difference in population size used, the estimates of infection prevalence were indeed highly consistent (0.41% in Wu et al. versus 0.37% in our analysis). The remarkable consistency supported the validity of key assumptions in our main analysis, including the initial states. For example, if we assumed no unascertained cases initially (model S8; Fig. S11), while the model still fit the epidemic curve well, the estimated total infections would be 21,393 (17,383-25,911) by January 25. This was equivalent to an infection prevalence of 0.21% (0.17-0.26%), much lower than the estimate based on Wu et al.11
Our finding of a large fraction of unascertained cases, despite the strong surveillance in Wuhan, indicated the existence of many asymptomatic or mild-symptomatic but infectious cases, highlighting a key challenge to the COVID-19 epidemic control.21 There is accumulating evidence on the existence of many asymptomatic cases. For example, asymptomatic cases were estimated to account for 18% of the infections onboard the Diamond Princess Cruise ship7 and 31% of the infected Japanese evacuated from Wuhan.8 In addition, it was recently reported that 29 of the 33 (88%) infected pregnant women were asymptomatic by universal screening of 210 women admitted for delivery between March 22 and April 4 in New York City.9 Several reports also highlighted the difficulty in detecting COVID-19 cases: about two thirds of the cases exported from mainland China remained undetected worldwide,22 and the detection capacity varied from 11% in low surveillance countries to 40% in high surveillance countries.23,24 By modeling the epidemics in other cities, it was also estimated that the ascertainment rate of infected individuals was about 24.4% in China (excluding Hubei province)13 and 14% in Wuhan prior to travel ban.14 Consistent with these studies, our extensive analyses of the most comprehensive epidemic data from Wuhan also indicated an ascertainment rate between 13% and 40% (Table S3). The large fraction of unascertained cases would lead to about 25 days delay between the first occurrence of no ascertained cases and the clearance of all infections (Table S5), imposing a high risk of resurgence after lifting controls (Fig. 3). Therefore, understanding the proportion of unascertained cases and the asymptomatic transmissibility will be critical for prioritization of the surveillance and control measures.20,25 Currently, Wuhan is implementing a strategy to normalize and restore societal activities gradually while maintaining strong disease surveillance. The experience and outcome of Wuhan will be valuable to other countries who will eventually face the same issue.
We noted that our R0 estimate of 3.93 (3.83-4.04) before interventions was at the higher end of the range of R0 (1.40-6.49 with a median of 2.79) reported by previous studies using different data sources, time periods, and statistical methods.13,14,16,26–28 Several plausible reasons might explain the discrepancy, including potential impacts of unascertained cases, more complete case records in our analysis, and different starting date of the model. If we assumed no unascertained cases (Fig. S12), the estimated R0 would be 2.89 (2.88-2.90) without interventions, aligned well with the early epidemiological investigations.16,26 We further considered a model starting from the first COVID-19 case reported in Wuhan (Fig. S13), from which we estimated an R0 of 3.58 (3.49-3.69) before January 23, 2020, similar to the value of 3.15 reported by a recent study.27 Nevertheless, this reproductive number was still much higher than the earlier estimates and those for SARS and MERS,29,30 featuring another challenge to control the spread of COVID-19.
Taken together, our modeling study delineated the full-spectrum dynamics of the COVID-19 outbreak in Wuhan, and highlighted two key features of the outbreak: a high proportion of asymptomatic or mild-symptomatic cases, and high transmissibility. These two features synergistically propel the global pandemic of COVID-19, imposing grand challenges to control the outbreak. Nevertheless, lessons from Wuhan have demonstrated that vigorous and multifaceted containment efforts can considerably control the size of the outbreak, as evidenced by the remarkable decrease of R0 from 3.89 to 0.14 and an estimated 96.5% reduction of infections till March 8. These are important information for other countries combatting the outbreak.
Some limitations of our study should be noted. First, we need field investigations and serologic studies to confirm our estimate of the ascertainment rate, and the generalizability to other places is unknown. This may depend on the detection capacity in different locations.24 Second, due to the delay in laboratory tests, we might have missed some cases and therefore underestimated the ascertainment rate, especially for the last period. Third, our model assumed a homogeneous population without considering heterogeneity by sex, age, geographic regions and socioeconomic status. Finally, we could not evaluate the impact of individual interventions by the epidemic curve from a single city, because many interventions were applied simultaneously. Future work by modeling transmission between different groups and joint analysis with data from other cities will lead to deeper insights on the effectiveness of different control strategies.27,31
Data Availability
Data will be released once granted by the authorities.
Codes availability
R codes are available at http://chaolongwang.github.io/codes_covid19.zip.
Author contributions
TW, XL and CW designed the study. XH, SC, XL and CW developed statistical methods. XH, SC, and DW performed data analysis. CW wrote the first draft of the manuscript. All authors reviewed and edited the manuscript.
Competing interests
The authors declare no competing interests.
Methods
Data of COVID-19 cases in Wuhan
We used the same dataset as detailed in Pan et al.2 Briefly, we extracted information of COVID-19 cases from December 8, 2019 till March 8, 2020 from the municipal Notifiable Disease Report System on March 9, 2020. We collected information of date of illness onset (the self-reported date of symptoms such as fever, cough, or other respiratory symptoms) and date of confirmed diagnosis (the laboratory confirmation date of SARS-CoV-2 in the bio-samples). For the consistency of case definition throughout the periods, we only included 32,583 laboratory-confirmed cases who were tested positive for SARS-8 CoV-2 by the real-time reverse-transcription-polymerase-chain-reaction (RT-PCR) assay or high-throughput sequencing of nasal and pharyngeal swab specimens.
The extended SEIR model
We extended the classic susceptible-exposed-infectious-recovered (SEIR) model to account for population movement, unascertained cases, and quarantine by hospitalization (Fig. 1). We chose to analyze from January 1, 2020, when the Huanan Seafood market was closed, and thus did not model the zoonotic force of infection. We assumed a constant population size N = 10,000,000 with equal daily inbound and outbound travelers n, where n = 500,000 for January 1–9, 800,000 for January 10–22 due to Chunyun, and 0 after city quarantine from January 23.11 We divided the population into S susceptible, E latent, I ascertained infectious, A unascertained infectious, H hospitalized, and R removed individuals. Dynamics of these six compartments across time t were described by the following set of ordinary differential equations: where b was the transmission rate, defined as the number of individuals that an ascertained case can infect per day; α was the ratio of the transmission rate of unascertained over ascertained cases; r was ascertainment rate; De and Di were the latent and infectious periods; Dq was the duration from illness onset to hospitalization; and Dh was the hospitalization period. The effective reproductive number R0 could be computed as where and represented the rate of ascertained cases to recovery and hospitalization, respectively. Thus, was the expected infectious period of ascertained cases after taking hospitalization into account.
Initial states and parameter settings
Initial states of the model and parameter settings of the main analysis were summarized in Tables S1-S2. 11 We set α = 1, assuming same transmissibility between unascertained and ascertained cases.18,19 We set the latent period De = 5.2 days and the infectious period Di = 2.3 days,16 assuming the latent period equal to the incubation period and the infectious period equal to the serial interval minus the incubation period.11 We set a long hospitalization period of Dh = 30 days. The duration from onset to hospitalization were estimated to be Dq = 10.5, 7.5, 5, 3, and 1 days as half of the median difference between the onset and confirmed dates for each period, respectively. Our Dq = 10.5 days for the first period of January 1-9 matched well to the reported mean Dq of 9.1 days for 189 cases with onset during January 1-11.16
We assumed the ascertainment rate was 1/2 at the early outbreak, such that the number of initial unascertained cases A(0) was the same as the number of initial ascertained cases I(0), and the initial number of latent cases as twice of those ascertained cases with onset from January 1 to 5, 2020.
Estimation of parameters in the SEIR model
Considering the time-varying strength of control measures, we assumed b = b12 and r = r12 for the first two periods, b = b3 and r = r3 for period 3, b = b4 and r = r4 for period 4, and b = b5 and r = r5 for period 5. To estimate unknown parameters b12, b3, b4, b5 r12, r3, r4, and r5, we assumed the number of ascertained cases with illness onset on day d, denoted as xd, followed a Poisson distribution with rate , where Ed-1 was the expected number of latent individuals on day (d-1). We fit the observed data from January 1 to February 29 (d = 1,2,…,D, and D = 60) and used the fitted model to predict the trend from March 1 to 8. Thus, the likelihood function was
We estimated b12, b3, b4, b5 r12, r3, r4, and r5 by Markov Chain Monte Carlo (MCMC) with Metropolis-Hastings algorithm and non-informative flat priors. We set a burn-in period of 200,000 iterations and continued to run 1,000,000 iterations with a sampling step size of 100 iterations. We repeated MCMC with three different sets of initial values and assessed the convergence by the trace plot and the multivariate Gelman-Rubin diagnostic (Fig. S3).32 Estimates of parameters were presented as 8 posterior means and 95% credible intervals (CrIs) from 10,000 MCMC samples. All the analyses were performed in R (version 3.6.2) and the Gelman-Rubin diagnostic was calculated using the gelman.diag function in the R package coda.
Stochastic simulations
We used stochastic simulations to obtain 95% CrI of fitted/predicted epidemic curve. Given a set of parameter values from MCMC, we performed the following multinomial random sampling: where pS→E = b(It–1 + αAt-1)N−1, , , , , , and pout = n(N – It–1 – Ht–1)−1. The SEIR model described by Eqs. 1–6 is equivalent to the following stochastic dynamics:
We repeated the stochastic simulations for all 10,000 sets of parameter values sampled by MCMC to construct the 95% CrI of the epidemic curve by the 2.5 and 97.5 percentiles at each time point.
Prediction of epidemic ending date and the risk of resurgence
Using the stochastic simulations described above, we predicted the first day of no new ascertained cases and the date of clearance of all active infections in Wuhan, assuming continuation of the same control measures as the last period (i.e., same parameter values).
We also evaluated the risk of outbreak resurgence after lifting control measures. We considered lifting all controls (1) at t days after the first day of zero ascertained cases, or (2) after a consecutive period of t days with no ascertained infectious cases. After lifting controls, we set the transmission rate b, ascertainment rate r, and population movement n to be the same as the first period, and continued the stochastic simulation to stationary state. Time to resurgence was defined as the number of days from lifting controls to when the number of active infectious cases reached 100. We performed 10,000 simulations with 10,000 sets of parameter values sampled from MCMC (as described above). We calculated the probability of resurgence as the proportion of simulations in which a resurgence occurred, as well as the time to resurgence conditional on the occurrence of resurgence.
Simulation study for method validation
To validate the performance of our method in estimating parameters b and r, we performed two-period stochastic simulations (Eqs. 9–14) with transmission rate b = 1.6, ascertainment rate r = 0.2, daily population movement n = 500,000, and duration from illness onset to hospitalization Dq = 10 for the first period, and b = 0.5, r = 0.2, n = 0, and Dq = 1 for the second period. Lengths of both periods were set to 15 days, while the initial states and other parameters were set as the those in our main analysis (Table S1). We repeated stochastic simulations 25 times to generate 25 datasets. For each dataset, we applied our MCMC method to estimate b1, b2, r1 and r2, while setting all other parameters and initial values the same as the true values. We also tested the robustness to misspecification of the latent period De, the infectious period Di, the initial number of latent cases E(0), and the initial number of unascertained cases A(0). In each test, we changed the specified value of a parameter (or initial state) to be 25% lower or higher than its true value, while keeping all other parameters and initial states unchanged. We did not test the robustness to misspecification of other parameters or initial states because they either had little impacts (e.g., hospitalization period Dh) or were known with little uncertainty in the real data setting.
For the simulated datasets, we ran the MCMC method with 100,000 burn-in iterations and sampled parameter values from additional 100,000 iterations with a step size of 100 iterations. We took the mean across 1,000 MCMC samples as the final estimates and displayed results for 25 repeated simulations using boxplots.
Sensitivity analyses for the real data
We designed nine sensitivity analyses to test the robustness of our real data results. For each of the sensitivity analyses, we fixed parameters and initial states to be the same as the main analysis except for those mentioned below.
(S1) Adjust the reported incidences from January 29 to February 1 to their average. We suspected the spike of incidences on February 1 might be caused by approximate-date records among some patients admitted to the centralized quarantine after February 2. The actual illness onset dates for these patients were likely to be between January 29 and February 1.
(S2) Decrease the latent period De to be 4.0 days,17 and accordingly adjust the initial number of latent cases E(0) to be 276 as twice the number of reported incidences from January 1 to 4.
(S3) Increase the latent period De to 6.4 days33, and accordingly adjust the initial number of latent cases 20 E(0) to be 450 as twice the number of reported incidences from January 1 to 6.
(S4) Double the infectious period Di to 4.6 days.
(S5) Assume the unascertained cases have half transmissibility of the ascertained cases (α = 0.5).
(S6) Assume the early ascertainment rate is 1/3 by setting A(0) = 176 and E(0) = 555.
(S7) Assume the early ascertainment rate is 2/3 by setting A(0) = 44 and E(0) = 278.
(S8) Assume the early ascertainment rate is 1 by setting A(0) = 0 and E(0) = 185.
(S9) Assume complete ascertainment across time by fixing r12 = r3 = r4 = r5=1. We tested if the full model was significantly better than this simplified model using likelihood ratio test.
Supplementary Data
Acknowledgements
We thank all staff at the national, provincial and municipal Center for Disease Control and Prevention for providing the data. This study was partly supported by the Fundamental Research Funds for the Central Universities (2019kfyXMBZ015), the 111 Project (X.H., S.C., D.W., C.W., T.W.). X.L. is supported by Harvard University.
Footnotes
↵# Co-first authors.
Disclaimer: An early version of this manuscript was part of a preprint entitled “Evolving epidemiology and impact of non-pharmaceutical interventions on the outbreak of coronavirus disease 2019 in Wuhan, China”, which was posted on medRxiv on March 6, 2020. (https://www.medrxiv.org/content/10.1101/2020.03.03.20030593v1)