Estimation of the Time-Varying Reproduction Number of 2019-nCoV Outbreak in China ================================================================================= * Chong You * Yuhao Deng * Wenjie Hu * Jiarui Sun * Qiushi Lin * Feng Zhou * Cheng Heng Pang * Yuan Zhang * Zhengchao Chen * Xiao-Hua Zhou ## Abstract **Background** The 2019-nCoV outbreak in Wuhan, China has attracted world-wide attention. As of February 5, 2020, a total of 24433 cases of novel coronavirus-infected pneumonia associated with 2019-nCov were confirmed by the National Health Commission of China. **Methods** Three approaches, namely Poisson likelihood-based method (ML), exponential growth rate-based method (EGR) and stochastic Susceptible-Infected-Removed dynamic model-based method (SIR), were implemented to estimate the basic and controlled reproduction numbers. **Results** A total of 71 chains of transmission together with dates of symptoms onset and 67 dates of infections were identified among 5405 confirmed cases outside Hubei as reported by February 2, 2020. Based on this information, we find the serial interval having an average of 4.41 days with a standard deviation of 3.17 days and the infectious period having an average of 10.91 days with a standard deviation of 3.95 days. Although the estimated controlled reproduction numbers *R**c* produced by all three methods in all different regions are significantly smaller compared with the basic reproduction numbers *R*, they are still greater than one. **Conclusions** Although the controlled reproduction number is declining, it is still larger than one. Additional efforts are needed to further reduce the *R**c* to below one in order to end the current epidemic. ## 1. Introduction On December 29, 2019, Wuhan, the capital city of Hubei Province in Central China, has reported four cases of pneumonia with unknown etiology. Since then, the outbreak has rapidly worsened over a short span of time and has received considerable global attention. On January 7, 2020, the pathogen of the current outbreak was identified as a novel coronavirus (2019-nCoV), and its gene sequence was quickly submitted to WHO.1,2 On January 30, WHO announced the listing of this novel coronavirus-infected pneumonia (NCP) as a “public health emergency of international concern”. As of February 5, 2020, the National Health Commission (NHC) of China had confirmed a total of 24433 cases of novel coronavirus-infected pneumonia (NCP) associated with the 2019-nCoV in Mainland China, including 493 fatalities and 968 recoveries. Since January 19, 2020, strict containment measures, including travel restrictions, contact tracing, entry or exit screening, non-hospital isolation, quarantine, awareness campaigns and others, have been implemented by the Wuhan Government and quickly adopted by other cities within China with the aim to minimize virus transmission via human-to-human contact. Similar measures were previously employed in China in 2009 to tackle the outbreak of H1N1 including mandatory quarantine of anyone who has had close contact with confirmed patients. This article investigates the change in the basic reproduction number *R* and controlled reproduction number *R**c* since the outbreak of 2019-nCoV. We have found that the estimated controlled reproduction numbers *R**c* in all different regions are significantly smaller compared with the basic reproduction numbers *R*, but still greater than one. ## 2. Data Data were collected from provincial/municipal health commissions in China as well as through ministry of health in other countries and regions with details of each confirmed case including case ID, region, age, gender, date of symptom onset, date of diagnosis, history of traveling to or residing in Wuhan, and, if any, related remarks such as contact identification, cases and case-related information. In addition, the collected dataset also contains date of infection and chains of transmission of infection which can be inferred from travel or residency history in Wuhan and other relevant information, if available, as follows: 1. If the individual has not been to Hubei Province recently, but were exposed within a four-day period (i.e., the individual has had contact with a confirmed case of NCP on a certain day), then the corresponding date of infection is inferred as the middle of the exposure period; 2. If the individual has travelled to Hubei Province but has returned within four days, then the date of infection is inferred as the middle of the travelling period; 3. If the individual has not been to Hubei Province recently, but in close contact with an imported case from Hubei, then the individual is identified to be infected by this imported case; 4. If the individual has not been to Hubei Province recently, but in close contact with a local case who was clearly infected before that individual, then this individual is identified to be infected by the corresponding local case. Note: if the individual has been to Hubei Province, the transmission history would not be recorded despite the existence of contact tracing information. ## 3. Inference about the Serial Interval and Infectious Period In this study, serial interval is defined as the time difference between dates of infection of successive cases in a chain of transmission (different textbooks may have different definitions). Infectious period is the duration of which an infected individual can transmit pathogens to a susceptible host. In this study, infectious period is defined as the time difference between date of infection and date of diagnosis as there is strong evidence showing that a diseased individual remains contagious even during the incubation period, and would be immediately isolated upon positive diagnosis hence losing the transmissibility. Both are key quantities that depict an epidemic and are essential to estimate the basic/controlled reproductive number, *R* / *R**c*. Among 139 chains of transmission identified from 5405 confirmed cases outside Hubei as recorded by February 2, 2020, none of them have their dates of infection acquired, but 71 of them have their dates of symptoms onset available. Hence, the corresponding serial intervals were approximated by the differences in dates of symptom onset rather than the actual dates of infection, see Figure 1.  [Figure 1:](http://medrxiv.org/content/early/2020/02/11/2020.02.08.20021253/F1) Figure 1: Histogram of serial interval with the average of 4.27 days before correction. We can see that some serial intervals are negative which is certainly impossible by definition. However, noting that the serial intervals were approximated from the dates of symptoms onset, this suggests that the negative values could be caused by different lengths of incubation period between individuals. Here a simple correction is implemented by resetting the negative values to zeros. This shifts the average of the serial intervals to 4.41 days and the standard deviation to 3.17 days after corrections, see table 1. Note that the serial interval of SARS-nCoV in Hongkong was 8.4 days on average.3 In addition, a total of 67 cases in the collected data were able to identify the corresponding dates of infection. Figure 2 plots the histogram of infectious period while Table 2 shows the numerical summary. View this table: [Table 1:](http://medrxiv.org/content/early/2020/02/11/2020.02.08.20021253/T1) Table 1: numerical summary of serial intervals. View this table: [Table 2:](http://medrxiv.org/content/early/2020/02/11/2020.02.08.20021253/T2) Table 2: numerical summary of infectious period.  [Figure 2:](http://medrxiv.org/content/early/2020/02/11/2020.02.08.20021253/F2) Figure 2: Histogram of infectious period with the average of 10.91 days. We found that there were no significant demographical differences between the subset of cases used to estimate serial interval and infectious period and the cases in the full dataset. Therefore, the inference made on serial interval and infectious period based on the corresponding subsets should be able to represent the full dataset. ## 4. Estimation of Basic/Controlled Reproduction Number ### Definition The reproduction number *R*is defined as the (average) number of new infections generated by one infected individual during the entire infectious period in a fully susceptible population.4 It can be also understood as the average number of infections caused by a typical individual during the early stage of an outbreak when nearly all individuals in the population are susceptible. The basic reproduction number reflects the ability of an infection spreading under no control. When the size of susceptible population is limited, the quantity, effective reproduction number *R**e*, is used instead of *R*. Similarly, the quantity, controlled reproduction number *R**c*, should be used to describe the ability of disease spreading when interventions (such as quarantine, isolation, or traffic control) are taking place. Hence a good measure of any intervention is to reduce *R**c*. Note that the disease will decline and eventually die out if *R**c* ≤ 1. ### Methods The basic reproduction number can be estimated through a variety of models.5 In this section, we have compared three most popular estimates of *R*/*R**c* as shown below. #### (1) Poisson Likelihood-based (ML) method Let *N**t* be the number of reported new confirmed cases on day *t*. Suppose that the serial interval has a maximum of *k* days and the number of new cases generated by an infected individual is assumed to follow a Poisson distribution with parameter *R*.6 The probability that the serial interval of an individual in *j* days is *w**j*, which can be estimated from the empirical distribution of serial interval or by setting up a discretized Gamma prior on it. Thus, the likelihood function can be reduced into a thinned Poisson ![Formula][1]</img> where ![Formula][2]</img> The reproduction number *R* can be estimated by maximizing the likelihood function. Note that if the empirical distribution of serial interval is used or *w**j**s* are given, then ![Formula][3]</img> #### (2) Exponential growth rate-based (EGR) method At the early period of an epidemic, the number of infected cases rises exponentially. Suppose the exponential epidemic growth rate (Malthusian coefficient) is *r*, which can be estimated by fitting a least square line to the daily number of reported new confirmed cases in a log-scale, namely, *log* (*N**t*). Let *f**G*(*t*) denote the probability density function of serial interval. Hence the reproduction number can be calculated according to the Euler-Lotka equation in a moment generating form7 ![Formula][4]</img> #### 3. Stochastic dynamic model-based method Here we consider a stochastic Susceptible-Infected-Removed (SIR) model rather than a standard deterministic one. The major advantage of using a stochastic dynamic model is that it affords improved accounting for real variabilities and increases opportunity for quantifying uncertainties.8 Let *S*(*t*), *I*(*t*) and *R*(*t*) denote the number of susceptible, infectious and recovered population at time *t* respectively, and note that *N= s*(*t*) *+ I*(*t*) *+ R*(*t*). Suppose that the infectious period of an individual is a random variable *T ∼ Exp*(*γ*), then the reproduction number *R=βE*(*T*) *=β/ γ*, where *γ* and *β* are the recovery rate and transmission rate respectively in the system of ordinary differential equation (ODE) below, ![Formula][5]</img> The maximum likelihood method is used to estimate model parameters where the likelihood is obtained by sequential Monte Carlo method, and parameters are estimated using the Iterated Filtering algorithm (IF2)9 implemented as mif in the R package pomp10 with *S*(**) equals the population of the region, *R*(0) *= 0, I*(0) is 10 times the number of confirmed cases on Day 0 and *γ* = 10.91 obtained from the collected data described ahead. ### Results In this section we have estimated the basic reproduction number *R* and the controlled reproduction number *R**c*. Since January 19, 2020, various containment measures have been strictly implemented, especially after the State Council agreed to include NCP into the Management of the Infectious Diseases Law and the Health and Quarantine Law on January 20. Based on an average10.91-day infectious period estimate from our collected data, we expect a flatter rate of increment starting on January 29. Figure 3 plots the number of daily new cases in a log-scale against date, and, as anticipated, the trend supports our guess. Therefore, the quantities *R* and *R**c* are estimated based on collected data in two separate periods, i.e., from January 21 (the starting date of daily updates of confirmed cases nationwide) to January 28, and from January 29 to February 5 (the end date of this study) respectively.  [Figure 3:](http://medrxiv.org/content/early/2020/02/11/2020.02.08.20021253/F3) Figure 3: Visualization of daily numbers of new confirmed case along with date, as obvious change of rate occurred on January 29 as expected. The estimates of *R* and *R**c* by Poisson likelihood (ML) and exponential growth rate (EGR) in selected regions of China are listed in Table 3 and Table 4. Despite the disagreement between different estimation methods, all three methods indicate notable reductions from *R* to *R**c* which suggests an improvement in the current situation. This is possibly due to the effective interventions and prompt actions by the local and central governments to minimize further spreading. We also notice that EGR yields smaller estimates of *R**c* compared to other methods. This might be because the number of infected patients does not grow exponentially after such strict containment measures. View this table: [Table 3:](http://medrxiv.org/content/early/2020/02/11/2020.02.08.20021253/T3) Table 3: Estimates and 95% confidence intervals of basic reproduction number in some selected provinces (or cities) of China, from Jan 21 to Jan 28, 2020. View this table: [Table 4:](http://medrxiv.org/content/early/2020/02/11/2020.02.08.20021253/T4) Table 4: Estimates and 95% confidence intervals of controlled reproduction number in some selected provinces (or cities) of China, from Jan 29 to Feb 5, 2020. Furthermore, the time-varying controlled reproduction number *R**c*(*t*) can be estimated through the Poisson likelihood (ML) method where *t* is from Jan 29 to Feb 5, 2020. For each Day *t*, the number of daily reported new cases from Jan 20 to Day *t* is used to estimate ![Graphic][6]</img> (more historical data are used compared to that in Table 4). Figure 4 plots the estimated controlled reproduction number ![Graphic][7]</img> along with its 95% CI for selected regions of China. Note that the estimated ![Graphic][8]</img> reflects the average spreading ability of the epidemic in a short period prior to Day *t*. As a result, the real-time *R**c* (*t*) might be overestimated if the general trend of *R**c* (*t*) is declining.  [Figure 4:](http://medrxiv.org/content/early/2020/02/11/2020.02.08.20021253/F4) Figure 4: The estimated controlled reproduction number in (a) China, (b) Hubei, (c) Other provinces except Hubei, (d) Beijing, (e) Shanghai, (f) Guangdong, (g) Zhejiang, (h) Hunan, and (i) Henan. The dashed line is the 95% confidence interval. The confidence interval gets narrower since more historical data are used to fit the ML model. ## 5. Conclusion Despite the continuous increase in new confirmed cases on a daily basis, the estimated controlled reproduction numbers *R**c* produced by all three methods in all different regions are significantly smaller compared with the basic reproduction numbers *R*. As discussed in Section 4, the real-time controlled reproduction number may be even lower than the estimated values in Figure 3. Nonetheless, additional effort is needed to further reduce *R**c* below one. ## 6. Discussion The dataset used in this study is based on the confirmed cases reported by NHC China. However, during the period of data collection, the official guidelines for diagnosis and treatment were updated four times. The criteria of confirmation have evolved from the original “whole genome sequencing of the respiratory excretion” to “positive viral nucleic acid results by the RT-PCR of the respiratory excretion or viral gene sequence”, and, most currently, the inclusion of positive nucleic acid results of the blood sample. In the meanwhile, the confirmation process is simplified by removing the accreditation process by the national expert committee for confirmed cases. The fourth edition granted the accrediting authority to the municipalities.11 In addition, the medical resources in Hubei Province especially in Wuhan have been enhanced remarkably. All of these changes might result in a temporary surge of confirmed cases and lead to an overestimation of *R**c*, especially in Hubei Province. Furthermore, the current containment measures mainly aim to cut the transmission from human to human via droplets of respiratory. However, other transmission pathways, including fecal-oral transmission and aerosol transmission, could not yet be excluded based on current evidence. If other transmission mechanisms do exist, the *R**c* values would remain high in the future unless further measures would intersect these transmission pathways. ## Data Availability All data are collected from the website of China CDC. [http://2019ncov.chinacdc.cn/2019-nCoV/](http://2019ncov.chinacdc.cn/2019-nCoV/) ## Acknowledgments We thank Taoyun Hu, Xueqin Liu and Yuyin Li from School of Public Health, Peking university for assistance of data collection. ## Footnotes * * Joint first authors * Received February 8, 2020. * Revision received February 8, 2020. * Accepted February 11, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., … & Cheng, Z. (2020). Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet. 2. 2.Wang, C., Horby, P. W., Hayden, F. G., & Gao, G. F. (2020). A novel coronavirus outbreak of global health concern. The Lancet. 3. 3.Lipsitch M, Cohen T, Cooper B, et al. Transmission dynamics and control of severe acute respiratory syndrome. Science 2003; 300: 1966–70. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzMDAvNTYyNy8xOTY2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDIvMTEvMjAyMC4wMi4wOC4yMDAyMTI1My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 4. 4.Anderson, R. M., Anderson, B., & May, R. M. (1992). Infectious diseases of humans: dynamics and control. Oxford university press. 5. 5.Nikbakht, R., Baneshi, M. R., Bahrampour, A., & Hosseinnataj, A. (2019). Comparison of methods to Estimate Basic Reproduction Number (R0) of influenza, Using Canada 2009 and 2017-18 A (H1N1) Data. Journal of research in medical sciences: the official journal of Isfahan University of Medical Sciences, 24. 6. 6.Forsberg White, L., & Pagano, M. (2008). A likelihood□based method for real□time estimation of the serial interval and reproductive number of an epidemic. Statistics in medicine, 27(16), 2999–3016. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.3136&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18058829&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F11%2F2020.02.08.20021253.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000257567900002&link_type=ISI) 7. 7.Wallinga, J., & Lipsitch, M. (2007). How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences, 274(1609), 599–604. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1098/rspb.2006.3754&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17476782&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F11%2F2020.02.08.20021253.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000243354200019&link_type=ISI) 8. 8.King Aaron A., Domenech de Cellès Matthieu, Magpantay Felicia M. G. and Rohani Pejman. (2015). Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola. Proc. R. Soc. B. 282 9. 9.Ionides, E. L., Nguyen, D., Atchadé, Y., Stoev, S. & King, A. A. (2015). Inference for dynamic and latent variable models via iterated, perturbed Bayes maps, Proceedings of the National Academy of Sciences of the U.S.A. 112, 719–724. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czo5OiIxMTIvMy83MTkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wMi8xMS8yMDIwLjAyLjA4LjIwMDIxMjUzLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 10. 10.King, A. A., Ionides, E. L., Bretó, C. M., Ellner, S., Kendall, B., Wearing, H., Ferrari, M. J., Lavine, M. & Reuman, D. C. (2010). pomp: Statistical inference for partially observed Markov processes (R package). 11. 11.National Health Commission of the People’s Republic of China, [http://www.nhc.gov.cn/xcs/zhengcwj/list\_gzbd.shtml](http://www.nhc.gov.cn/xcs/zhengcwj/list_gzbd.shtml) [1]: /embed/graphic-5.gif [2]: /embed/graphic-6.gif [3]: /embed/graphic-7.gif [4]: /embed/graphic-8.gif [5]: /embed/graphic-9.gif [6]: /embed/inline-graphic-1.gif [7]: /embed/inline-graphic-2.gif [8]: /embed/inline-graphic-3.gif