Abstract
Background How can we anticipate the progression of the ongoing pandemic of the coronavirus disease 2019 (COVID-19)? As a measure of transmissibility, we aimed to estimate concurrently the time-varying reproduction number, R0(t), over time during the COVID-19 pandemic for each of the following 12 heavily-attacked countries: Singapore, South Korea, Japan, Iran, Italy, Spain, Germany, France, Belgium, United Kingdom, the United States of America, and South Africa.
Methods We downloaded the publicly available COVID-19 pandemic data from the WHO COVID-19 Dashboard website (https://covid19.who.int/) for the duration of January 11, 2020 and May 1, 2020. Then, we specified two plausible distributions of serial interval to apply the novel estimation method implemented in the incidence and EpiEstim packages to the data of daily new confirmed cases for robustly estimating R0(t) in the R software.
Results We plotted the epidemic curves of daily new confirmed cases for the 12 selected countries. A clear peak of the epidemic curve appeared in 10 of the 12 selected countries at various time points, and then the epidemic curve declined gradually. However, the United States of America and South Africa happened to have two or more peaks and their epidemic curves either reached a plateau or still climbed up. Almost all curves of the estimated R0(t) monotonically went down to be less than or close to 1.0 up to April 30, 2020 except Singapore, South Korea, Japan, Iran, and South Africa, of which the curves surprisingly went up and down at various time periods during the COVID-19 pandemic. Finally, the United States of America and South Africa were the two countries with the approximate R0(t) ≥ 1.0 at the end of April, and thus they were now facing the harshest battles against the coronavirus among the 12 selected countries. By contrast, Spain, Germany, and France with smaller values of the estimated R0(t) were relatively better than the other 9 countries.
Conclusion Seeing the estimated R0(t) going downhill speedily is more informative than looking for the drops in the daily number of new confirmed cases during an ongoing epidemic of infectious disease. We urge public health authorities and scientists to estimate R0(t) routinely during an epidemic of infectious disease and to report R0(t) daily to the public until the end of the epidemic.
Introduction
The coronavirus disease 2019 (COVID-19) is now known to be caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The COVID-19 was first reported to the World Health Organization (WHO) from Wuhan City, Hubei Province, China on December 31, 2019. China locked down Wuhan on January 23, 2020 to prevent the coronavirus from spreading out to the other provinces of China and the other countries outside China. The WHO officially declared the COVID-19 epidemic as a Public Health Emergency of International Concern on January 30, 2020. After drastic control measures had been implemented with a high cost to combat the COVID-19 epidemic, President Xi Jinping of China made his first visit to Wuhan on March 10, 2020 to show that China turned the corner on coronavirus. China announced no new locally transmitted cases on March 19, 2020, and then ended its lockdown of Wuhan on April 8, 2020. Nevertheless, the WHO described coronavirus as a pandemic on March 11, 2020 due to its rapid and wide spread over the world outside China. Traveling oversea has dramatically shortened the social distances between countries worldwide, and thus the coronavirus spreads out more quickly and widely than ever before. At the same time, public panic spreads out even more drastically through social and news media. The epidemic of such a novel infectious disease has caused big damages to human health, medical facilities, economy, and society in many countries. Once the epidemic is likely to occur, appropriate control tactics and measures according to the source-transmission-host theory, including the nonpharmaceutical interventions (NPI) and pharmaceutical interventions (PI), should be implemented timely to lower such damages (see Appendix).1 In China, prompt and aggressive actions had been taken since the very early phase of the epidemic.2 In the other counties, areas, or territories such as Taiwan, Singapore, Thailand, Philippines, Japan, South Korea, Iran, Italy, Spain, France, Germany, United Kingdom, the United States of America, Australia, and South Africa, the sooner the proper actions were taken against the local spread of the coronavirus, the smaller the size of the COVID-19 epidemic and its damage would likely be. Since the coronavirus is highly transmittable by droplets, more stringent control measures such as fast and mass testing, rigorous contact tracing, large-scale isolations, mandatory quarantines, traveling restrictions or bans, border closing, social distancing, school closings, stay-at-home orders, curfews, and long-term lockdowns may be needed to contain the epidemic ultimately. Almost the whole world is now in the battle against the coronavirus as the global numbers of confirmed COVID-19 cases and deaths are soaring speedily.3,4
In such a disaster, various sectors of society should collaborate immediately to tackle the problem under the guidance of the government. Scientists and technology experts may try all the available tools to help the government fight the epidemic. Yet, how can we anticipate the progression of the ongoing COVID-19 pandemic in a city, province, or country? The basic reproduction number, R0, for an infectious disease is the expected number of new cases infected directly from the index case in a susceptible population. As a measure of transmissibility, it can tell us how quickly an infectious disease spreads out in various stages of the epidemic.1 Most importantly, R0 can be used to assess the effectiveness of the implemented control measures and to explore the possibility of pathogen mutations in an epidemic. When R0 < 1.0 persistently, the epidemic would be damped down soon.1 In essence, R0 is the driving force behind the epidemic curve. Its value usually varies over time during an epidemic. Hence, as biostatisticians, we had analyzed the epidemic data of China to estimate the time-varying reproduction number, R0(t), and reported the analysis result.2 Then, we started this investigation on April 4, 2020 with the aim to estimate concurrently the time-varying reproduction number, R0(t), over time during the ongoing COVID-19 pandemic for each of the following 12 heavily-attacked countries in different continents: (1) Singapore, (2) Republic of Korea (i.e., South Korea), (3) Japan, (4) Islamic Republic of Iran (i.e., Iran), (5) Italy, (6) Spain, (7) Germany, (8) France, (9) Belgium, (10) United Kingdom, (11) the United States of America, and (12) South Africa.
Methods
Epidemic data
To collect all individual patient data, including personal contact history, during an epidemic is a tough and crucial task. As revealed in the report of the Novel Coronavirus Pneumonia Emergency Response Epidemiology Team, great efforts have previously been made in China to build up China’s Infectious Disease Information System, which is very helpful in the management of nationwide patient data during this epidemic of a large size.5 Such highly confidential and miscellaneous data are not made available in any country to unauthorized researchers during the COVID-19 pandemic. Nevertheless, since January 11, 2020, the numbers of new confirmed cases, cumulative confirmed cases, new deaths, and cumulative deaths of the counties, areas, and territories in the world have been announced daily by the WHO on its official websites.3,4 Besides, the newly created and daily updated COVID-19 pandemic data R package, data2019nCoV, collects the COVID-19 pandemic data by transcribing or compiling the available data from official sources, including the WHO Situation Reports,4 on its website (https://github.com/eebrown/data2019nCoV). Thus, we downloaded the publicly available COVID-19 pandemic data (“WHO-COVID-19-global-data.csv”) from the WHO COVID-19 Dashboard website (https://covid19.who.int/),3 for the duration of January 11, 2020 and May 1, 2020 (Last Updated: 2020/5/1, 3:00 PM CEST) in this study.
Epidemic analysis
Statistical analysis was performed using the R 3.6.3 software (R Foundation for Statistical Computing, Vienna, Austria). The distributional properties of daily counts of new confirmed cases were presented by the number of days (n), total number, mean, standard deviation (SD), minimum, the first quartile (Q1), median, the third quartile (Q3), and maximum for each of the 12 selected countries outside China. Instead of developing any advanced methods specific for this pandemic, we tried to find an available easy-to-use tool to monitor the progress of the ongoing COVID-19 pandemic as soon as possible. As listed on the Comprehensive R Archive Network (CRAN) (https://cran.r-project.org/), several R packages might be used to compute basic reproduction numbers of an epidemic in R, including argo, epibasix, EpiCurve, EpiEstim, EpiILM, EpiILMCT, epimdr,6 epinet, epiR, EpiReport, epitools, epitrix, incidence, mem, memapp, R0, and surveillance. We chose the incidence (version 1.7.0) and EpiEstim (version 2.2-1) packages to estimate R0(t) in R during the ongoing COVID-19 pandemic for the 12 selected countries outside China due to their methodological soundness and computational simplicity for rapid analysis.7,8 The R code was listed in the Supplementary Appendix of our previous study of China2 for check and re-uses.
First, we laid out the conceptual framework below. In an epidemic of infectious disease, any susceptible subject who becomes a patient usually goes through the following three dynamic stages: infection, development of symptoms, and diagnosis of the disease. Theoretically, to estimate R0 or R0(t), we need the information about the distribution of generation time (GT), which is the time interval between the infection of the index case and the infection of the next case infected directly from the index case.8 Yet, the time of infection is most likely unavailable or inaccurate, and thus investigators collect the data about the distribution of serial interval (SI) instead, which is the time interval between the symptom onset of the index case and the symptom onset of the next case infected directly from the index case.8 Nevertheless, the data of symptom onset are not publically available and almost always have the problem of delayed reporting in any ongoing epidemic of infectious disease because they are usually recorded at diagnosis.5 Hence, we took a common approach in statistics to tackle this problem by specifying the best plausible distributions of SI according to the results obtained from previous studies of similar epidemics, and then applied the novel estimation method implemented in the EpiEstim package to the data of daily new confirmed cases in practice.7,8
Next, we considered two plausible scenarios for studying the ongoing COVID-19 pandemic in the 12 selected countries outside China. The estimate_R function of the EpiEstim package assumes a Gamma distribution for SI by default to approximate the infectivity profile.7 Technically, the transmission of an infectious disease is modeled with a Poisson process in the EpiEstim package.7,8 When we choose a Gamma prior distribution for SI, the Bayesian statistical inference leads to a simple analytical expression for the Gamma posterior distribution of R0(t).8 In the first scenario, we specified the mean (SD) of the Gamma distribution for SI to be 8.4 (3.8) days to mimic the 2003 epidemic of the severe acute respiratory syndrome (SARS) in Hong Kong.8 Then, in the second scenario, we specified the mean (SD) of the Gamma distribution for SI to be 2.6 (1.5) days to mimic the 1918 pandemic of influenza in Baltimore, Maryland.8 According to the current understanding, the transmissibility of COVID-19 was higher than SARS but lower than influenza.1 Hence, even though we did not know the true distribution(s) of SI for the ongoing pandemic of COVID-19 in the 12 selected countries outside China, these two plausible scenarios helped us catch the behavior pattern of this pandemic along with the time evolution.
Results
Epidemic data
We subtracted one day from the reporting dates in the COVID-19 pandemic data of WHO to obtain the “dates” on which the daily new confirmed cases actually occurred. Then, we computed the sample statistics of the daily new confirmed cases for the 12 selected countries over the time period from January 10, 2020 to April 30, 2020 in Table 1. The starting date of the COVID-19 epidemic differed among the 12 countries so that the sample size (i.e., number of days) varied from 58 (South Africa) to 109 (Japan). All the daily new confirmed cases in the 12 selected countries were laboratory confirmed, but the number of daily new confirmed cases included both domestic and repatriated cases according to the description of WHO.4
As expected, the larger the mean value of daily new confirmed cases, the bigger the size of the corresponding SD. Spain, France, Belgium, and the United States of America had zero-valued Q1’s, indicating that they had zero daily new confirmed case in the early phase of the pandemic for more than 25 days but the number of daily new confirmed cases were soaring quickly after then. These sample statistics revealed the magnitudes of the ongoing COVID-19 pandemic in each of the 12 selected countries over the study period. Surprisingly, the United States of America had a total of 1,035,353 new confirmed cases accumulated over 103 days up to April 30, 2020 and it became the severest epidemic area in the world.
Epidemic analysis
First, we plotted the epidemic curves of daily new confirmed cases for each of the 12 selected countries in Figures 1-1, 2-1, …, 12-1 respectively. A clear peak of the epidemic curve appeared in 10 of the 12 selected countries at various time points, and then the epidemic curve declined gradually. However, the United States of America and South Africa happened to have two or more peaks and their epidemic curves either reached a plateau or still climbed up.
Next, under the specified two plausible scenarios (A and B), we plotted the daily estimates of the time-varying reproduction numbers, R0(t), over sliding weekly windows,7,8 for each of the 12 selected countries in Figures 1-2A and 1-2B, 2-2A and 2-2B,…, and 12-2A and 12-2B respectively. On any given day of the epidemic curve, R0(t) was estimated for the weekly window ending on that day.7,8 The estimated R0(t) was not shown from the very beginning of the epidemic curve because precise estimation was not possible in that period.8 Specifically, the blue lines showed the posterior means of R0(t), the grey zones represented the 95% credible intervals (CrI), and the horizontal black dashed lines indicated the threshold value of R0 = 1.0.7,8 Since the repatriated cases were not separated from the domestic cases in the COVID pandemic data of WHO,3,4 they caused the problem of over-estimating R0(t) especially in the early phase of the COVID-19 pandemic, but they would have less and less effect on the estimation of R0(t) as time elapsed with traveling restrictions and bans worldwide. Almost all curves of the estimated R0(t) monotonically went down to be less than or close to 1.0 up to April 30, 2020, except Singapore, South Korea, Japan, Iran, and South Africa, of which the curves surprisingly went up and down at various time periods during the COVID-19 pandemic. Nevertheless, Singapore took stringent control measures lately, including a strengthened restriction of the population mobility, to contain the second and third waves of the COVID-19 epidemic. In particular, the effectiveness of these stringent control measures had led to a sharp drop in the estimated R0(t) to bring it down to below 1.0 in the last two weeks of April. This was a rarely seen victory in the places outside China, and thus it would be very encouraging to the countries with gently sloped curves for the last several weeks before April 30, 2020 such as South Korea, Japan, Iran, Italy, Spain, Germany, France, Belgium, United Kingdom, and the United States of America.
Finally, we listed each country’s estimated distributional parameters of R0(t) for the weekly window ending on April 30, 2020 under the two plausible scenarios (A and B) in Table 2 as a numerical complement to Figures 1-2A and 1-2B, 2-2A and 2-2B, …, and 12-2A and 12-2B. The values of the posterior mean and median were close to each other during the study period. The bigger the size of the SD, the more the number of daily new confirmed cases in the last week of April. Intriguingly, after comparing the posterior mean values between the two plausible scenarios (A and B), we found that (1) A > B: Singapore, the United States of America, and South Africa, and (2) B > A: all the other 9 countries. It seemed reasonable to take an average of these two as an approximate R0(t) for comparison. By doing so, the United States of America (A: 1.0197, B: 0.9974) and South Africa (A: 1.4655, B: 1.1080) were the two countries with the approximate R0(t) ≥ 1.0 at the end of April, and thus they were now facing the harshest battles against the coronavirus among the 12 selected countries. By contrast, Spain (A: 0.4574, B: 0.6892), Germany (A: 0.5055, B: 0.7023), and France (A: 0.5092, B: 0.7340) with smaller values of the estimated R0(t) were relatively better than the other 9 countries.
Discussion
Almost everyone is susceptible to the novel COVID-19 and this is one of the reasons why the COVID-19 epidemic occurs in many places and has caused public panics worldwide. In terms of population size, the depletion due to death or recovery might be negligible in big countries such as China, but not in small ones such as Singapore. Many recoveries are probably not susceptible any more in the late phase of the epidemic. Although there was officially no imported case of COVID-19 in China at the beginning of the COVID-19 epidemic, the imported cases of COVID-19 are the major sources of initial infection in many other countries starting from February of 2020. These features made the task of modeling this epidemic in any country outside China more difficult. Moreover, since the COVID-19 is new to human society, its diagnostic criteria, control measures, and medical care are inevitably changing during the pandemic as the knowledge and experience about it are accumulated continuously.4,5 Thus, the result obtained in this study is merely a rough estimate of R0(t) for each of the 12 selected countries. Nevertheless, our findings for the 12 selected countries passing the early phase of the pandemic are consistent with the early estimates of R0 for the COVID-19 epidemic such as 2.24 (95% confidence interval [CI]: 1.96–2.55)–3.58 (95% CI: 2.89–4.39),9 2.2 (95% CI: 1.4–3.9),10 2.6 (uncertainty range: 1.5–3.5),11 2.68 (95% CrI: 2.47–2.86),12 1.4–2.5 (WHO),13 2.0–3.3,13 2.2 (90% CI: 1.4–3.8),13 6.47 (95% CI: 5.71–7.23),13 3.11 (90% CI: 2.39–4.13),13 and 2.0.13
This study had several limitations because we relied on some assumptions to make a rapid analysis of this ongoing epidemic feasible. First, we assumed that all new cases of COVID-19 in each of the 12 selected countries are detected and reported to the WHO correctly. However, asymptomatic or mild cases of COVID-19 are likely undetected, and thus under-reported, especially in the early phase of the epidemic.5,14 In some countries, a lack of diagnostic test kits for the SARS-CoV-2 and a shortage of qualified manpower for fast testing can also cause under-reporting or delay in reporting. Nevertheless, we only analyze the official source of epidemic data for examining the trend of R0(t) during an epidemic because it is feasible, transparent, and reproducible. Unintentional under-reporting is inevitable in any country during an ongoing epidemic due to various reasons. Second, since the reported number of daily new confirmed cases includes both domestic and repatriated cases according to the description of WHO,4 the imported cases lead to an over-estimation of R0(t) in our epidemic analysis. Yet, they most likely appear in the early phase of the COVID-19 pandemic, and then have less effect on the estimated R0(t) in the later phase of the pandemic. The estimate_R function of the EpiEstim package provides the option for including the separated data of imported cases in the estimation of R0(t).7 Thus, our epidemic analysis can be easily refined once the domestic and repatriated cases are reported separately in the future. Third, we admitted the time delay in our estimates of R0(t) for the COVID-19 pandemic in the 12 selected countries due to the following two time lags: (1) the duration between the time of infection and the time of symptom onset (i.e., the incubation period of infection) if infectiousness began around the time of symptom onset8 and (2) the duration between the time of symptom onset and the time of diagnosis.5 Nevertheless, since asymptomatic carriers can transmit the COVID-19,15-17 the first time lag is actually shorter than expected and it becomes the duration between the time of infection and the time of becoming infectious (i.e., the latent period of infection). Moreover, the time interval from symptom onset to diagnosis is shorter and shorter due to the full alert of society and the faster diagnostic tests.5 Fourth, we assumed that the distribution of SI does not change considerably over time as the pandemic progresses. However, as for the SARS epidemic in Singapore, the SI tended to be shorter after control measures were implemented.18 The SI also depended on the amount of infecting dose, the level of host immunity, and the intensity of person-to-person contacts.16 To summarize, most of these limitations led our estimation of R0(t) into a more conservative context.
Looking at the epidemic curves of the 12 selected countries in Figures 1-1, 2-1,…, 12-1 respectively, we could see that the size of the epidemic varies across these 12 countries. It primarily depends on how early and how fast the effective control measures are taken to stop the spread of the coronavirus from the very beginning of the COVID-19 pandemic. Any mistakes in this matter would have a dramatic impact on the sequel. The commonly used lag-1 relative change (%) in the number of daily new confirmed cases is not useful because its denominator varies and the time lag is not long enough to reflect what is really going on behind those numbers.
Next, as shown in Figures 1-2A and 1-2B, 2-2A and 2-2B,…, and 12-2A and 12-2B respectively, the estimated R0(t) over sliding weekly windows for each of the 12 selected countries begins to drop gradually after a certain time period. Yet, when the number of new confirmed cases is climbing up day by day before it reaches the peak(s), everyone is keen to know: When will the epidemic curve reach the hilltop?19 Although it is difficult to estimate the exact date when it will happen, the peak can only appear after the trend of the computed R0(t) is declining sharply and monotonically for some days (e.g., about 10 days in China), and then the daily number of new confirmed cases will begin dropping, indicating that the COVID-19 epidemic has abated. However, it is very frustrating to see the curve of the estimated R0(t) staggering slowly down to the threshold line of R0(t) = 1.0 in most of the 12 selected countries. Notice that the same value of R0(t) may have quite different magnitudes of impacts on the number of daily new confirmed cases, depending on how many infectious cases in the community at the time. Even though R0(t) = 1.0, the number of daily new confirmed cases can still go up soon if more and more infectious subjects are accumulated in society. Thus, a steeply sloped line of the estimated R0(t) is crucial — that is, how early and how fast does it go downhill? Finally, the COVID-19 epidemic in China ended around March 7-8, 2020, indicating that the prompt and aggressive control measures of China were very effective,20-23 but when can it happen in the 12 selected countries? In particular, the resurgence has occurred in Singapore, South Korea, Japan, Iran, and South Africa respectively. Thus, when the COVID-19 pandemic will end in any of the 12 selected countries and the whole world worries us. Given a symmetric one-peak epidemic curve, the total time and size of the epidemic is just the double of the time and size from the beginning of the epidemic to its peak. As compared to China’s experience of success, each of the 12 selected countries should quickly learn from the fresh lessons of the past two to three months, and then find much more effective ways to fighting the COVID-19 pandemic for winning the battle against the coronavirus (see Appendix). In our opinion, seeing the estimated R0(t) going downhill speedily is more informative than looking for the drops in the daily number of new confirmed cases during an ongoing epidemic of infectious disease. And, the consistent decline of the estimated R0(t) in trend over time is more important than the actual values of the estimated R0(t) themselves. The steeper the slope, the sooner the epidemic ends.
Finally, we urged public health authorities and scientists worldwide to estimate time-varying reproduction numbers routinely during epidemics of infectious diseases and to report them daily on their websites such as the “Tracking the Epidemic” of China CDC for guiding the control strategies and reducing the unnecessary panic of the public until the end of the epidemic.23 Fitting complex transmission models of infectious disease dynamics to epidemic data with limited information about required parameters is a challenge.6,8,24-26 The results of such analyses may be difficult to generalize due to the context-specific assumptions made and it can be too slow to meet a pressing need during an epidemic.8,18,27-34 Thus, an easy-to-use pragmatic tool such as the estimated R0(t) for monitoring the COVID-19 pandemic is so important in practice. Although this study could not provide the most accurate results rigorously, it sufficed for the pragmatic purpose from the public health viewpoint. We believed that it was an approximate answer to the right question timely. Refinements in the estimation of R0(t) can be made with the individual patient data, including personal contact history, whenever they are available for analysis.
Since the coronavirus has spread out globally, we should take the important lessons from China,2,20-23,31-34 South Korea,35 Italy,36 the United States of America,37,38 and Singapore39 and learn the experiences from the previous epidemics1,25 to mitigate its harm as much as possible. We may also use China as an example to anticipate the potential progression of the COVID-19 pandemic in a particular country.2 As listed in Appendix, control tactics and measures should be applied in line with local circumstances,25 but the same easy-to-use monitoring tool, R0(t), can be applied to many places. Such timely available information for monitoring the ongoing VOVID-19 pandemic deserves societal attention, especially when the government is going to make a move in the harsh battle against the coronavirus. As the coronavirus outbreak continues to spread, let’s help each other to combat the COVID-19 pandemic together. After all, we are all in the same shaking boat now.
Data Availability
The publicly available COVID-19 pandemic data can be downloaded from the "WHO COVID-19 Dashboard" website (https://covid19.who.int/).
Funding Source
The authors did not receive any funding for this study. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
Declaration of Interests
We declared no conflicts of interest in this study.
Appendix: The source-transmission-host approach to controlling the COVID-19 pandemic
1. Goal: The “Separation” Process
As depicted in the following diagram, what we should do to control the COVID-19 epidemic within a black rectangle (e.g., a city, a state, or a country) is to separate the red balls (i.e., patients) from the grey circles (i.e., medical personnel) and the green balls (healthy subjects). When the number of new cases is soaring in Stage 2, the more aggressive separation process is needed by reducing population mobility, implementing mass testings, and conducting community surveillance. We ask or force the red balls to enter the big grey circle(s) as in Stage 3.