Abstract
Transmission of infectious diseases is characterized by the basic reproduction number R0, a metric used to assess the threat posed by an outbreak and inform proportionate preventive decision-making. Based on individual case reports from the initial stage of the coronavirus disease 2019 epidemic, R0 is often estimated to range between 2 and 4. In this report, we show that a SEIR model that properly accounts for the distribution of the incubation period suggests that R0 lie in the range 4.4–11.7. This estimate is based on the doubling time observed in the near-exponential phases of the epidemic spread in China, United States, and six European countries. To support our empirical estimation, we analyze stochastic trajectories of the SEIR model showing that in the presence of super-spreaders the calculations based on individual cases reported during the initial phase of the outbreak systematically overestimate the doubling time and thus underestimate the actual value of R0.
Introduction
The basic reproduction number R0 is a critical parameter characterizing the dynamics of the outbreak of an infectious disease. R0 quantifies the expected number of secondary cases generated by an infectious individual in an entirely susceptible population. Although it is pertinent to the situation before imposition of protective measures, R0 is influenced by natural conditions (such as seasonality) as well as socioeconomic factors (such as population density or ingrained societal norms and practices) (Delamater et al., 2019). Importantly, R0 suggests the extent of control measures that have to be implemented to stop the spread of the epidemic.
A preliminary estimate published by the World Health Organization (WHO) suggested that R0 of coronavirus disease 2019 (COVID-19) lies in between 1.4 and 2.5 (WHO, 2020a). Later this estimate has been revised to 2–2.5 (WHO, 2020b), which is in agreement with numerous other studies that, based on official data from China, implied the range of 2–3 [see, e.g., Liu et al. (2020) or Boldog et al. (2020) for a summary]. This range suggests an outbreak of a contagious disease that should be containable by imposition of relatively mild restrictions on social interactions, as initially implemented in European countries such as Italy or Spain. Of note, there were early reports of R0 higher than 4 by Cao et al. (2020) and Shen et al. (2020), in which, however, relatively long infectious periods were assumed.
Here we estimated R0 of COVID-19 based on the doubling time observed in the (nearly) exponential phase of the increase of confirmed cases in China, United States, and six European countries. By considering the early phase in which the number of registered cases begins to exceed 100, we aimed at accounting for super-spreading events that are implausible when the number of infected individuals is very low. To provide additional motivation for this approach, we show that in the presence of super-spreaders the median trajectory starting from a single infection grows much slower than the average trajectory, which very likely leads to overestimation of the doubling time. We used a susceptible–exposed–infected–removed (SEIR) model with six subsequent exposed states to reproduce the shape of the reported distribution of the incubation period. This allowed us to determine a plausible range of R0 at 4.4–11.7, considerably higher than most initial estimates.
Results
The SEIR model
We used a SEIR model with six exposed states, which is equivalent to the assumption that the incubation period is Erlang-distributed with the shape parameter 6. We assumed that the average incubation period 1/σ ranges between 5.28 days [mean value determined by Lauer et al. (2020), used by us as a lower bound] and 6.47 days [mean value determined by Backer et al. (2020), used by us as an upper bound]. The infectious period is either exponentially or Erlang-distributed with the shape parameter 2 [as in (Kucharski et al., 2020)], resulting in model variants with, respectively, a single or two subsequent infectious states. The average infectious period 1/γ is assumed to be 3 days [as in (Liu et al., 2020) and (Kucharski et al., 2020)]. See Methods for more details on the model.
Estimation of the basic reproduction number in the exponential growth phase
We estimated Td within two- and three-week periods starting on the day in which the number of confirmed (in the SEIR model naming convention, removed) cases exceeded 100 in populations of China, United States, and six European countries (Fig. 1A). In nearly all countries (except the United States), values of Td estimated based on three-week periods are higher than based on the two-week periods, which is likely a consequence of protective measures, such as social distancing, and isolation of positively tested individuals. Three-week Td values range from 2.34 (United States) to 3.25 (China), while two-week Td values lie in between 2.16 (Spain) and 2.88 (United Kingdom).
We estimated the range of R0 as a function of the doubling time Td using a formula provided in Methods. The lower bound has been obtained using the model variant with 1/σ = 5.28 days and two infectious states, whereas the upper bound results from the model with 1/σ = 6.47 days and one infectious state, Fig. 1B. After plugging in the overall longest Td = 3.25 and the shortest Td = 2.16 into, respectively, a variant of our model with the lowest R0(Td) curve (parameters: m = 6, n = 2, 1/σ = 5.28) and a variant with the highest R0(Td) curve (parameters: m = 6, n = 1, 1/σ = 6.47), we arrived at the estimated R0 range of 4.4–11.7. The two-week doubling time for China, 2.39, is remarkably consistent with the value of 2.4 reported by Sanche et al. (2020), who estimated that R0 for China lies in the range 4.7 to 6.6, which is contained in our estimated range for China: 4.4–9.6. The models having one or two exposed states, often used to estimate the value of R0, substantially underestimated R0, whereas a SIR model with a fixed time delay equal to 1/σ overestimated R0, cf. Fig. 1B and the articles by Wearing et al. (2005), Wallinga & Lipsitch (2007), and Kochańczyk et al. (2020, Fig. 4).
Our calculations suggest the range of R0 that is higher than initially estimated (Boldog et al., 2020; Liu et al., 2020). This is because (1) in contrast to most model-based calculations, we appropriately accounted for the distribution of the incubation period and (2) we estimated Td based on the (nearly) exponential growth phases of the outbreak, which gives Td shorter than estimates based on initial individual propagation events. The discrepancy in Td estimation may be potentially attributed to the fact that not all removed individuals are registered. In the case when the ratio of registered to removed individuals is increasing over time, the true increase of the removed cases may be overestimated. We do not rule out this possibility, although we consider it implausible as the expansion of testing capacity in considered countries has been slower than the progression of the outbreak. We rather attribute the discrepancy to the fact that in the very early phase, in which the doubling time (growth rate) is estimated based on individual case reports, the effect of super-spreading events (such as football matches, carnival fests, demonstrations, masses, or hospital-acquired infections) is negligible due to a low probability of such events when the number of infected individuals is low. Sanche et al. (2020) inferred that the pre-exponential period in Wuhan has been dominated by simple transmission chains. In turn, super-spreading events were very likely the main drivers of the epidemic spread in, e.g., Italy and Germany, where, in the early exponential phase, spatial heterogeneity of registered cases has been evident (Cereda et al., 2020; Mercker et al., 2020).
Stochastic simulations of the SEIR model dynamics with super-spreading
We analyzed the impact of super-spreading on estimation of Td based on stochastic simulations of SEIR model dynamics. Simulations were performed in the perfectly mixed regime according to the Gillespie algorithm. We assumed that a predefined fixed proportion of individuals has higher infectiousness and as such is responsible for on average either half of infections (‘super-spreaders’) or two-third of infections (‘hyper-spreaders’). To reproduce these fractions in systems with different assigned proportions of super- or hyper-spreaders, their infectiousness is assumed to be inversely proportional to their ratio in the simulated population. We estimated Td in two ways: based on one month of growth of the number of new cases since the first registered case (‘30 days since the 1st case’) and based on growth of new cases in the two-week period after the number of registered cases exceeds 100 (‘14 days since 100 cases’). As we are interested in the initial phase characterized by exponential growth, we assumed that the susceptible population remains constant.
In Fig. 2 we show histograms of Td calculated using either the ‘14 days since 100 cases’ method or the ‘30 days since the 1st case’ method. One may observe that the histograms calculated using the ‘30 days since the 1st case’ method are broader than those calculated using the ‘14 days since 100 cases’ method, and the width of all histograms increases with increasing infectiousness (which is set inversely proportional to ρ). When Td is calculated using the ‘14 days since 100 cases’ method, its median value is slightly larger than Td in the deterministic model (equal 2 days); however, when Td is calculated using the ‘30 days since the 1st case’ method, then for high infectiousness of super- and hyper-spreaders (correspondingly, for low ρ) its median value becomes much larger than the deterministic Td. In the case of the lowest considered ρ = 1%, when super-spreaders (hyper-spreaders) have their infectiousness about 100 times (200 times) higher than the infectiousness of normal individuals, one obtains median Td that is overestimated by 28% (67%). Both methods statistically overestimate Td but when using the ‘30 days since the 1st case’ method, overestimation is very significant. The effect is caused by low probability of appearance of super- or ultra-spreaders in the first weeks of the outbreak. We notice that Td estimation for a given country based on available data is equivalent to the analysis of a single stochastic trajectory.
Conclusions
Based on epidemic data from China, United States, and six European countries, we have estimated that the basic reproduction number R0 lies in the range 4.6–11.6 (4.4–9.6 for China), which is higher than most previous estimates (Boldog et al., 2020; Liu et al., 2020). There are two sources of the discrepancy in R0 estimation. First, in agreement with data on the incubation period distribution, we used a model with six exposed states, which substantially increases R0(Td) with respect to the models with one or two exposed states. Second, we estimated Td based on the two- and three-week period exponential growth phase beginning on the day in which the number of registered cases exceeds 100. This approach, in contrast to estimation of R0 based on individual case reports, allows to implicitly take into account super-spreading events that substantially shorten Td. Spatial heterogeneity of the epidemic spread observed in many European countries, including Italy, Spain, and Germany, can be associated with larger or smaller super-spreading events that initiated outbreaks is particular regions of these countries.
Our estimates are consistent with current epidemic data in Italy, Spain, and France. As of April 25, 2020, these countries managed to terminate the exponential growth phase by means of country-wide quarantine. Current COVID-19 Community Mobility Reports (Google, 2020) show about 80% reduction of mobility in retail and recreation, transit stations, and workplaces in these countries. Together with increased social distancing, this reduction possibly lowered the infection rate β at least fivefold; additionally, massive testing reduced the infectious period, 1/γ. Consequently, we expect that the reproduction number R = β/γ was reduced more than fivefold, which brought it to the values somewhat smaller than 1. This suggests that R0 in these countries could have been larger than 5.
Methods
The SEIR model and its parametrization
The dynamics of our SEIR model is governed by the following system of ordinary differential equations: where N = S(t) + E1(t) + … + Em(t) + I1(t) + … + In(t) + R(t) is the constant population size, and I(t) = I1(t) + … + In(t) is the size of infectious subpopulation. As m is the number of exposed states and n is the number of infectious states, there are m + n + 2 equations in the system.
To obtain R0(Td) for Dirac δ-distributed latency (incubation) period, we use a SIR model described by the following system of delay-differential equations corresponding to the case of m ➔ ∞ and n = 1: with M = S(t) + I(t) + R(t) and 1/σ being the delay equal to the mean latency period.
We assume that the latency period is the same as the incubation period, estimates of which are available in the literature. In model variants with shorter incubation period, we assume that it is Erlang-distributed with the shape parameter 6 and the scale parameter equal 0.88, giving 1/σ = 5.28, following exactly the estimates made by Lauer et al. (2020). In model variants with longer incubation period, we replaced the Γ distribution of Backer et al. (2020), having the shape parameter 6.1 and the scale parameter 1.06, resulting in 1/σ = 6.47, with the Erlang distribution with the shape parameter 6 and the scale parameter 1.078, having nearly identical shape and the same 1/σ of 6.47.
Dependence of the basic reproduction number on the doubling time, R0(Td)
We determined R0(Td) as: in accordance with Wallinga & Lipsitch (2007) and Wearing et al. (2005).
Data Availability
All data are provided in manuscript and references therein