Abstract
Background How could we anticipate the progression of the ongoing epidemic of the coronavirus disease 2019 (COVID-19) in China? As a measure of transmissibility, we aimed to estimate concurrently the time-varying reproduction number over time during the COVID-19 epidemic in China.
Methods We extracted the epidemic data from the “Tracking the Epidemic” website of the Chinese Center for Disease Control and Prevention for the duration of January 19, 2020 and March 14, 2020. Then, we specified two plausible distributions of serial interval to apply the novel estimation method implemented in the incidence and EpiEstim packages to the data of daily new confirmed cases for robustly estimating the time-varying reproduction number in the R software.
Results The epidemic curve of daily new confirmed cases in China peaked around February 4–6, 2020, and then declined gradually, except the very high peak on February 12, 2020 owing to the added clinically diagnosed cases of the Hubei Province. Under two specified plausible scenarios for the distribution of serial interval, both curves of the estimated time-varying reproduction numbers fell below 1.0 around February 17–18, 2020. Finally, the COVID-19 epidemic in China abated around March 7–8, 2020, indicating that the prompt and aggressive control measures of China were effective.
Conclusion Seeing the estimated time-varying reproduction number going downhill speedily was more informative than looking for the drops in the daily number of new confirmed cases during an ongoing epidemic of infectious disease. We urged public health authorities and scientists to estimate time-varying reproduction numbers routinely during an epidemic of infectious diseases and to report them daily to the public until the end of the epidemic.
Introduction
The coronavirus disease 2019 (COVID-19) is now known to be caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The COVID-19 was first reported to the World Health Organization (WHO) from Wuhan City, Hubei Province, China on December 31, 2019. China locked down Wuhan on January 23, 2020. Then, the WHO officially declared the COVID-19 epidemic as a Public Health Emergency of International Concern on January 30, 2020. After drastic control measures had been implemented with a high cost to combat the COVID-19 epidemic, President Xi Jinping of China made his first visit to Wuhan on March 10, 2020 to show that China turned the corner on coronavirus. China announced no new locally transmitted cases on March 19, 2020, and then ended its lockdown of Wuhan on April 8, 2020. Nevertheless, the WHO described coronavirus as a pandemic on March 11, 2020 due to its rapid and wide spread over the world outside China. Traveling oversea has dramatically shortened the social distances between countries worldwide, and thus the coronavirus spreads out more quickly and widely than ever before. At the same time, public panic spreads out even more drastically through social and news media. The epidemic of such a novel infectious disease has caused big damages to human health, medical facilities, economy, and society in several countries. Once the epidemic is likely to occur, appropriate control measures according to the source-transmission-host theory, including the nonpharmaceutical interventions (NPI) and pharmaceutical interventions (PI), should be implemented timely to lower such damages.1 In China, prompt and aggressive actions had been taken since the very early phase of the epidemic. In the other counties, areas, or territories such as Singapore, Thailand, Philippines, Taiwan, Japan, South Korea, Iran, Italy, Spain, France, Germany, United Kingdom, United States of America, Australia, and Egypt, the sooner the proper actions were taken against the spread of the coronavirus, the smaller the size of the COVID-19 epidemic and its damage would likely be. Since the coronavirus is highly transmittable by droplets, more stringent control measures such as fast and mass testing, rigorous contact tracing, large-scale isolations, mandatory quarantines, travel restrictions or bans, border closing, social distancing, school closings, stay-at-home orders, and long-term lockdowns may be needed to contain the epidemic ultimately. Almost the whole world is now in the battle against the coronavirus as the global numbers of confirmed COVID-19 cases and deaths continue to rise speedily.
In such a disaster, various sectors of society should collaborate immediately to tackle the problem under the guidance of the government. Scientists and technology experts may try all the available tools to help the government fight the epidemic. The basic reproduction number, R0, for an infectious disease is the expected number of new cases infected directly from the index case in a susceptible population. As a measure of transmissibility, it tells us how quickly an infectious disease spreads out in various stages of the epidemic.1 Most importantly, R0 can be used to assess the effectiveness of the implemented control measures and to explore the possibility of pathogen mutations in an epidemic. When R0 < 1.0 persistently, the epidemic would be damped down soon.1 In essence, R0 is the driving force behind the epidemic curve. Its value usually varies over time during an epidemic. Hence, as biostatisticians, we started this investigation on January 27, 2020 with the aim to estimate concurrently the time-varying reproduction number, R0(t), over time during the ongoing epidemic of COVID-19 in China. We also initiated a similar research project for estimating concurrently the time-varying reproduction number, R0(t), over time during the ongoing epidemic of COVID-19 in 12 heavily-attacked countries outside China, which will be reported later.
Methods
Epidemic data
To collect all individual patient data, including personal contact history, during an epidemic is a tough and crucial task. As revealed in the report of the Novel Coronavirus Pneumonia Emergency Response Epidemiology Team, great efforts have previously been made in China to build up China’s Infectious Disease Information System, which is very helpful in the management of nationwide patient data during this epidemic of a large size.2 Such highly confidential and miscellaneous data are not made available for unauthorized researchers during the COVID-19 epidemic. Nevertheless, since January 19, 2020, the numbers of new confirmed cases, total confirmed cases, new suspected cases, total suspected cases, new deaths, total deaths, new recoveries, and total recoveries have been announced daily by the Chinese Center for Disease Control and Prevention (China CDC) on its website of “Tracking the Epidemic” (http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm). Thus, as listed in Supplementary Appendix 1, we extracted such publicly available data of this ongoing epidemic from the above website for the duration of January 19, 2020 and March 14, 2020 in this study. Yet, we had to make some minor corrections, marked with red asterisks, to reconcile the daily new counts with the daily total counts numerically.
Epidemic analysis
Statistical analysis was performed using the R 3.6.3 software (R Foundation for Statistical Computing, Vienna, Austria). The distributional properties of counts were presented by the mean, standard deviation (SD), minimum, the first quartile (Q1), median, the third quartile (Q3), and maximum. Instead of developing any advanced methods specific to this epidemic, we tried to find an available easy-to-use tool to monitor the progress of the ongoing COVID-19 epidemic as soon as possible. As listed on the Comprehensive R Archive Network (CRAN) (https://cran.r-project.org/), several R packages might be used to compute basic reproduction numbers of an epidemic in R, including argo, epibasix, EpiCurve, EpiEstim, EpiILM, EpilLMCT, epimdr,3 epinet, epiR, EpiReport, epitools, epitrix, incidence, mem, memapp, R0, and surveillance. We chose the incidence (version 1.7.0) and EpiEstim (version 2.2-1) packages to estimate R0(t) in R during the ongoing COVID-19 epidemic in China due to their methodological soundness and computational simplicity for rapid analysis.4,5 Our R code was listed in Supplementary Appendix 2 for check and re-uses.
First, we laid out the conceptual framework below. In an epidemic of infectious disease, any susceptible subject who becomes a patient usually goes through the following three stages: infection, development of symptoms, and diagnosis of the disease. Theoretically, to estimate R0 or R0(t), we need the information about the distribution of generation time (GT), which is the time interval between the infection of the index case and the infection of the next case infected directly from the index case.5 Yet, the time of infection is most likely unavailable or inaccurate, and thus investigators collect the data about the distribution of serial interval (SI) instead, which is the time interval between the symptom onset of the index case and the symptom onset of the next case infected directly from the index case.5 Nevertheless, the data of symptom onset are not publically available and almost always have the problem of delayed reporting in any ongoing epidemic of infectious disease because they are usually recorded at diagnosis.2 Hence, we took a common approach in statistics to tackle this problem by specifying the best plausible distributions of SI according to the results obtained from previous studies of similar epidemics, and then applied the novel estimation method implemented in the EpiEstim package to the data of daily new confirmed cases in practice.4,5
Next, we considered two plausible scenarios for studying the ongoing COVID-19 epidemic in China. The estimate_R function of the EpiEstim package assumes a Gamma distribution for SI by default to approximate the infectivity profile.4 Technically, the transmission of an infectious disease is modeled with a Poisson process.5 When we choose a Gamma prior distribution for SI, the Bayesian statistical inference leads to a simple analytical expression for the Gamma posterior distribution of R0(t).5 In the first scenario, we specified the mean (SD) of the Gamma distribution for SI to be 8.4 (3.8) days to mimic the 2003 epidemic of the severe acute respiratory syndrome (SARS) in Hong Kong.5 Then, in the second scenario, we specified the mean (SD) of the Gamma distribution for SI to be 2.6 (1.5) days to mimic the 1918 pandemic of influenza in Baltimore, Maryland.5 Given an observed series of daily new confirmed cases, the shorter SI, the smaller R0(t). According to the current understanding, the transmissibility of COVID-19 was higher than SARS but lower than influenza.1 Hence, although we did not know the true distribution(s) of SI for the ongoing epidemic of COVID-19 in China,6 these two scenarios helped us catch the behavior pattern of this epidemic along with the time evolution.
Results
Epidemic data
We subtracted one day from the reporting dates in the COVID-19 epidemic data of China CDC to obtain the “dates” on which the daily new confirmed cases actually occurred. In Table 1, we listed the sample statistics of the daily new confirmed cases, total confirmed cases, new suspected cases, total suspected cases, new deaths, total deaths, new recoveries, and total recoveries during January 19-March 14 from the epidemic data of COVID-19 in China. Those statistics revealed the magnitude and impact of the ongoing epidemic over those 56 days. Yet, the clinically diagnosed cases (Hubei Province only) were added to the daily counts of new confirmed cases during February 12–16, 2020.2 Thus, the maximum value of new confirmed cases went up dramatically to 15,151 on February 12, 2020. All the counts of total confirmed cases, total deaths, and total recoveries were monotonically increasing, except the count of total suspected cases.
Epidemic analysis
In Figure 1, the epidemic curve of daily new confirmed cases peaked around February 4–5, 2020. Then, it declined gradually, except for the very high peak on February 12, 2020 (in the dark red color) due to the added clinically diagnosed cases of the Hubei Province.
Next, under the above specified two scenarios, we plotted the daily estimates of the time-varying reproduction numbers, R0(t), over sliding weekly windows,4,5 from January 19 to March 14, 2020 in the middle panels of Figures 2A and 2B for the ongoing COVID-19 epidemic in China. Namely, at any given day of the epidemic curve, R0(t) was estimated for the weekly window ending on that day.4,5 The estimated R0(t) was not shown from the very beginning of the epidemic because precise estimation was not possible in that period.5 Specifically, the blue lines showed the posterior means of R0(t), the grey zones represented the 95% credible intervals (CrI), and the black horizontal dashed lines indicated the threshold value of R0 = 1.0.4,5 In addition, the first and third panels of Figures 2A and 2B were the miniaturized epidemic curve of daily new confirmed cases (see Figure 1) and the distributions of SI used for the estimation of R0(t).4,5 Intriguingly, both curves of the estimated R0(t) went down to be less than 1.0 around February 17–18, 2020.
Finally, in Tables 2A and 2B, we displayed the estimated parameters of the Gamma posterior distributions of R0(t), over sliding weekly windows, from January 19 to March 14, 2020 under those two specified scenarios for scrutinization.4,5
Discussion
Almost everyone was susceptible to the novel COVID-19 and this was one of the reasons why the COVID-19 epidemic occurred in many places and caused public panics in China and worldwide. In terms of population size, the depletion due to death or recovery might be negligible in China. And, there were no imported cases of COVID-19 in China. These features made the task of modeling this epidemic in China relatively easier. However, since the COVID-19 was new to human society, its diagnostic criteria, control measures, and medical cares were inevitably changing during the epidemic as the knowledge and experience about it were accumulated continuously.2 We decided not to exclude the added clinically diagnosed cases of the Hubei Province during February 12-16, 2020 for the robustness of our epidemic analysis. Besides, many recoveries were probably not susceptible any more in the late phase of the epidemic. Thus, the result obtained in this study was merely a rough estimate of R0(t). Nevertheless, our findings (esp., Figure 2B) were consistent with the earlier estimates of R0 for the COVID-19 epidemic in China such as 2.24 (95% confidence interval [CI]: 1.96–2.55)–3.58 (95% CI: 2.89–4.39),6 2.2 (95% CI: 1.4–3.9),7 2.6 (uncertainty range: 1.5–3.5),8 2.68 (95% CrI: 2.472.86),9 1.4–2.5 (WHO),10 2.0–3.3,10 2.2 (90% CI: 1.4–3.8),10 6.47 (95% CI: 5.71–7.23),10 3.11 (90% CI: 2.39–4.13),10 and 2.0.10
This study had several limitations because we relied on some assumptions to make a rapid analysis of this ongoing epidemic feasible. First, we assumed that all new cases of COVID-19 in China were detected and put into the counts of daily new confirmed cases correctly. However, asymptomatic or mild cases of COVID-19 were likely undetected, and thus under-reported, especially in the early phase of this epidemic.6,11 In some countries, a lack of diagnostic test kits for the SARS-CoV-2 and a shortage of qualified manpower for fast testing could also cause under-reporting or delay in reporting. Second, we admitted the time delay in our estimates of R0(t) for the COVID-19 epidemic in China due to the following two time lags: (1) the duration between the time of infection and the time of symptom onset (i.e., the incubation period of infection) if infectiousness began around the time of symptom onset5 and (2) the duration between the time of symptom onset and the time of diagnosis.2 Nevertheless, if asymptomatic carriers could transmit the COVID-19,12 the first time lag would be shorter and it became the duration between the time of infection and the time of becoming infectious (i.e., the latent period of infection). Moreover, the time interval from symptom onset to diagnosis was shorter and shorter due to the full alert of society and the faster diagnostic tests.2 Third, we assumed that the distribution of SI did not change considerably over time as the epidemic progressed. However, as for the SARS epidemic in Singapore, the SI tended to be shorter after control measures were implemented.13 The SI also depended on the amount of infecting dose, the level of host immunity, and the frequency of person-to-person contacts. To summarize, most of these limitations led our estimation of R0(t) into a more conservative context.
Looking back the epidemic curve in Figure 1, we could see that the lockdown of Wuhan on January 23, 2020 was a very smart, brave, and quick move even though the numbers of new confirmed cases and total confirmed cases were only 131 and 571 on January 22, 2020 in the very early phase of the COVID-19 epidemic in China. Then, adding clinically diagnosed cases of the Hubei Province to the daily counts of new confirmed cases during February 12–16, 2020 indicated that the capacity of medical care was strong enough at that time to help the suspected cases obtain proper medical care sooner by reducing the waiting time. As a result, the new confirmed cases had dropped sharply after February 18, 2020.
Next, as shown in Figures 2A and 2B, the estimated R0(t) began to drop just 2–4 days after the lockdown of Wuhan on January 23, 2020. Yet, the number of new confirmed cases was still climbing up day by day until it reached the peak around February 4–5, 2020. During this very painful time period, everyone was keen to know: When would the epidemic curve reach the hilltop?14 Although it was difficult to estimate the exact date when it would happen, the peak of 3,886 new confirmed cases on February 4, 2020 appeared right after the trend of the computed R0(t) was declining monotonically for about 10 days, and then the daily number of new confirmed cases began dropping, indicating that the COVID-19 epidemic had abated. Then, two weeks later, the estimated R0(t) surprisingly reduced to the level below 1.0 around February 17–18, 2020 in both scenarios. Our first epidemic analysis was done on February 24, 2020, which had led us to believe that the COVID-19 epidemic would end soon if the effective control measures were maintained and nothing else happened incidentally such as heavy case imports, super-spreaders, and virulent virus mutations. Finally, we observed that the COVID-19 epidemic in China closed up around March 7–8, 2020, indicating that the prompt and aggressive control measures of China15-18 were effective. In the end, China would win the battle against the coronavirus as long as its resurgence, if any, is well managed. In our opinion, seeing the estimated R0(t) going downhill is more informative than looking for the drops in the daily number of new confirmed cases during an ongoing epidemic of infectious disease. And, the consistent decline of the estimated R0(t) in trend over time is more important than the actual values of the estimated R0(t) themselves. The steeper the slope, the sooner the epidemic ends.
Although this study could not provide the most accurate results in a rigorous way, it sufficed for the pragmatic purpose from the public health viewpoint. We believed that it was an approximate answer to the right question. The estimate_R function of the EpiEstim package provided the option for including the data of imported cases in the estimation of R0(t).4 Refinements in the estimation of R0(t) can be made with the individual patient data, including personal contact history, whenever they are available for analysis.
Finally, we urged public health authorities and scientists worldwide to estimate time-varying reproduction numbers routinely during epidemics of infectious diseases and to report them daily on their websites such as the “Tracking the Epidemic” of China CDC for guiding the control strategies and reducing the unnecessary panic of the public until the end of the epidemic.18 Fitting complex transmission models of infectious disease dynamics to epidemic data with limited information about required parameters is a challenge.3 The results of such analyses may be difficult to generalize due to the context-specific assumptions made and it can be too slow to meet a pressing need during an epidemic.5,19-22 Thus, an easy-to-use tool for monitoring the COVID-19 epidemic is so important in practice.
Since the coronavirus has spread out globally, we should take the fresh lessons from China15-22, South Korea23, Italy24, and the United States of America25,26 and learn the experiences from the previous epidemics1 to mitigate its harm as much as possible. We may also use China as an example to anticipate the potential progression of the COVID-19 epidemic in a particular country. Control tactics and measures should be applied in line with local circumstances, but the same easy-to-use monitoring tool, R0(t), could be applied to many places. Let’s help each other to combat the COVID-19 pandemic together. After all, we are all in the same shaking boat now.
Data Availability
The epidemic data were listed in Supplementary Appendix 1.
Funding Source
The authors did not receive any funding for this study. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
Declaration of Interests
We declared no conflicts of interest in this study.