Abstract
We present evidence for existence of a universal lower bound for the initial growth rate of the epidemic curve of the SARS-CoV-2 coronavirus. This can be used to infer that, on average, an asymptomatic infected individual is infectious during 5.6 ± 0.3 days. We further present evidence of an average time scale of 12 days for halving the number of new cases, or new deaths, during the extinction period of the first phase of the epidemic.
1 Introduction
The development of strategies to fight future phases of the Covid-19 pandemic, or similar future epidemics, requires a good understanding of the global features of the evolution of the epidemic. In particular it is important to obtain quantitative information about the epidemic which is independent of the local conditions such as social habits of the local population or the details of the confinement measures. In this work we show existence of a universal lower bound on the rate of the initial growth of the epidemic, with a characteristic time scale of about 6 days. We argue that this minimal rate of growth can be explained by an infectiousness time of infected individuals of 6 days. Furthermore, we show that the epidemiological data exhibit an average extinction rate with a characteristic time scale of 12 days. The information should be useful for monitoring the development of the current epidemic for countries which are still in the first phase thereof, as well as that of an upcoming second wave, or future coronavirus epidemics.
Our results are based on fitting to the epidemiological time-series the curves introduced in [1] (see also Appendix C), which describe surprisingly well the global features of the confirmed-cases time-series and death time-series for the first phase of the Covid-19 epidemic, whenever a fit to the data is available. Here one should keep in mind that in many countries the epidemic stabilised to a new rampant phase, or entered a new growing phase, in which case the fits of the function I to the data are becoming worse with time. This allows us to determine, for each time-series, the end of the first phase of the epidemic defined as the day for which the fit of the curve is optimal.
A shortcoming of the curve (1.1) is that I can only provide a good fit for data sets for which the extinction rate is smaller than half of the initial growing rate; this is explained in Appendix A. While this is a fundamental bias of our analysis, it turns out that many Covid-19 time-series have this property.1
A key feature of the function I is that if both e(α−β)τ and eβσ are sufficiently large,2then near t =0 the function I is well approximated by where c = ke−(α−β)τ−βσ. This shows that the parameter α carries information about the initial rate of growth of the epidemic.
Next, for large positive t the function I behaves as when ever C = eβσ or t (or both) are sufficiently large. This shows that β describes the extinction rate of the epidemic.
Fitting the curve to the time series of the epidemic for a given city, county, province, state, or country, provides quantitative information about the dynamics of the epidemic.
We have attempted to determine the parameters (α, β, k, τ, σ) for all time-series of the epidemic available at the John Hopkins University (JHU) server on July 1, 2020. As already pointed-out in [1], optically satisfactory fits to the time series of the curve (1.1) can be found for essentially all time-series that we have examined; several examples can be found below, in[1], and in the supplementary materials to this paper. However, it appears that satisfactory fits, in the sense of small residuals and decent standard deviations for the fit parameters, can typically only be found after the (first) peak of the number of new cases of epidemic. Moreover, wobbly time-series, or time series with jumps, often do not lead to fits which are satisfactory in this sense either. While our observations below are based on the time series with satisfactory fits, we checked that they are consistent with all fits when the large uncertainties of the parameters of the remaining fits are taken into account.
The values of all fitting parameters depend upon the country considered for many obvious reasons, such as awareness of the epidemic, quarantine measures taken, social habits of the local population, etc. The main observation of this work is the existence of absolute lower bounds for the constants α and β, independently of local factors. As reported in more detail in Section 2, we find the following lower bounds for all time-series examined: with the minimum for α attained at the time series of the county Cumberland, New Jersey in the US, and the minimum for β attained for the State of Victoria in Australia. This is illustrated in Figure 1_1, with the fits for Cumberland and Victoria shown in Figures 1.2 and 1.3. (Unless explicitly indicated otherwise, error indicators are standard deviations.)
For the death time-series the lower bounds on the corresponding parameters are rather similar: attained on the time series of DuPage, Illinois for α and Snohomish, Washington for β. This is illustrated in Figure 1_4, with the fits for DuPage and Snohomish in Figure 1_5.
The parameter α is directly related to the initial doubling time of the epidemic, which will be denoted by T2. For data sets for which the approximation (1.2) holds we obtain a common bound on the doubling time:
Now, for some of our fits the approximation (1.2) breaks down, but the value of the doubling time T2 can be found numerically using the explicit form of I when the fitting parameters are used. This shifts the threshold (1.6) to the bound attained for the time series of Cumberland, New Jersey, already seen in Figure 1_2.
Next, for all fits (to be found in supplementary materials) we find with the lower bound attained on Mason, Washington, and the upper bound on Ireland (both seen in Figure 1_6). Note that the upper bound stays well clear of the 0.5 limit imposed by our fitting function.
The average value of β for all satisfactory fits is
The corresponding average value for the death time-series is the same within standard deviation:
This translates to an average time scale of twelve days for the halving of the number of new cases, or deaths, during the extinction period of the first phase of the epidemic.
The question arises about the origin of the lower bound on α. Assuming that the number of confirmed cases during the first week or so of the epidemic is proportional to the number of infectious individuals and that the approximation (1.2) holds, we show in Appendix A that a precise lower bound α ≥ αmin determines a lower bound on the number of days, say k0, during which an infected individual is infectious.
Supposing we had access to all possible curves for the epidemic, the threshold number αmin would then be determined exactly from these curves. Now, we only have a set of 469 curves. Given that this is quite a large number, it appears reasonable to assume that the minimal value of α which we see in our fits is approximately equal to the threshold value αmin. This, together with (1.4), leads to consistently with data obtained by completely different methods in [3, 4].
It should be pointed out that (1.11) applies to asymptomatic infected individuals, since the number of these is much larger than the number of symptomatic ones.
As further explained in the Appendix, for time series which are well approximated at the beginning by an exponential, the initial propagation number is determined by the equation
(see Equation (A.3)). This, together with (1.11), allows us to determine R0 for each time-series with a satisfactory fit.
In what follows we only consider the 145 fits for which the values of R0 have a standard deviation less than 10%. The histogram of the resulting distribution of R0 can be found in Figure 1_7. We find with an average
The weighted average is calculated by weighing with the standard deviation of R0 of the individual time-series, while the direct average is the obvious one.
The minimum value R0 =1 is attained for the county Cumberland in New Jersey, and the maximum value R0 = 13.5 ± 1.2 in (1.13) is attained for the county of Stearns in Minnesota, with fits shown in Figure 1_8 The values of R0 so determined are provided in Table 1.1 for a sample of countries. A table presenting the values of R0 for all satisfactory fits with relative error of R0 less than 10% can be found in Appendix B.
2 The analysis
We analysed the time series for confirmed cases, confirmed US cases, deaths and US deaths as available on the John Hopkins University server on July 1, 2020. After ignoring all time-series which have less than two cases on June 30, as well as the US time-series which appear as “Unassigned” and “Out of …”, we were left with 3149 confirmed-cases time-series and 1647 deaths time-series.
These time series include whole countries, but also time series for e.g. French Domaines d’Outre Mer-Territoires d’Outre Mer, or for each Australian State, or for each Chinese Province, or for US counties.
For each such time-series we attempted to determine the day, say Tf, and the parameters (α, β, k, τ, σ) so that the curve (1.1) fits optimally the data up to day Tf, using the NonlinearModelFit procedure in MATHEMATICA. Finding a fit sometimes requires indicating parameters which are used as starting values for the fitting procedure, with fits depending upon the starting parameters. To take this into account we generated, for each time-series, a hundred random values of starting parameters, and from the collection of resulting fits we chose the “best fit”. This “best fit” was determined by adding the values of the largest relative residual (defined as the ratio of the largest residual to the number of cases at the end of the time series) to the relative standard deviations of all five parameters of the fit, and choosing the fit for which the resulting number was smallest. We carried-out this analysis for time series truncated to April 16, then to April 17, etc., until June 30, and the day of the end of the first phase, say Tf, was determined by choosing the time series with the best fit on day Tf. All the satisfactory fits found had a date Tf later than April 27 (with April 28 attained by the time series for Sichuan, China, seen in Figure 2_1.)
While the above results in optically satisfactory fits in most cases, some of the resulting fits have parameters with large standard deviations. To eliminate those we only kept the fits where the absolute value of the largest relative residual divided by the last value of the time series was smaller than 10%, and where the relative standard deviations of each parameter were smaller than10%. All this led to 469 satisfactory fits for the confirmed-cases time-series, and 140 satisfactory fits for the deaths time-series.
In Figure1.1 we show the resulting values of the parameters α and β together with their standard errors, ordered by increasing length of confidence interval, for the confirmed-cases time-series. The figure shows clearly the lower bound of α of (1.4). Figure 2_2 shows the histograms of the values of α and β.
It follows from (1.2) that the larger the value of α, the faster the initial growth of the epidemic. So the two time-series with smallest value of α, namely Cumberland, New Jersey (see Figure 1_8) and San Juan, New Mexico (Figure 2_3) are the ones with the slowest initial growth of the epidemic. Likewise the two time-series with largest values of α, namely South Korea and Stearns, Minnesota, are the ones with fastest initial growth of the epidemic. The fits for these time series are shown in Figures 1.8 and 2.4.
The last data point on all plots is June 30, 2020.
We note that we only show in this paper the fits for the outliers of our analysis, which are not necessarily very good fits, and we hope that the reader will not draw the conclusion that no good fits can be obtained to the curve (1.1). In the supplementary material the reader will find the complete set of our fits, which hopefully should dispel such an impression. It should also be clear, by inspection of the fits presented here or in the supplementary materials, that the fits can be fine-tuned for individual time-series by removing outliers and other data manipulation, but we have not attempted to do this.
We have repeated the above analysis for the deaths time-series from the JHU server. Figure 1_4 illustrates the lower bound α ≥ 0.125 for these time series.
The relevant histograms can be found in Figure 2_6. The lower bound for α is attained on the time series of DuPage, Illinois (α =0.125 ±0.006) with the next smallest value for Riverside, California (α =0.135 ± 0.006). The two deaths time-series recording the fastest initial growth rate are Westchester, New York (α =0.57±0.03), and Nassau, New York (α =0.72±0.05). The four associated fits can be seen in Figures 1.5, 2.7 and 2.8.
In Figure 2.5 we show the values of α and β for all fits for which the parameter errors are less than 100%; the values of the parameters for such fits, when not in our previous list filtered to 10%, should be interpreted with care. The plot shows that our lower bounds (1.4) are satisfied by an overwhelming majority of fits, including the poor ones.
3 Conclusions
We used the double-logistic curve of [1] to obtain quantitative information on the dynamics of the Covid-19 epidemics up to June 30, 2020. This curve approximates surprisingly well most of the time series until a date which we interpret as the end of the first phase of the epidemic, even when the fits are unsatisfactory from a mathematical point of view. In many cases the fact that the errors of the fitting parameters are large can be explained by a short time-series, or changes in the reporting procedures during the epidemic, or appearances of new clusters related to changes in confinement measures.
We have shown existence of a lower rate of the initial growth of the epidemics. We suggest that this lower bound can be used to determine the time during which an infected asymptomatic individual is infectious. This then allows one to determine the initial propagation number R0 for time series with satisfactory fits to our curve. We have observed that the average extinction rate of the epidemics is about 12 days.
It should be admitted that our analysis gives unambiguous results only for these time series where a mathematically satisfactory fit to the time series has been found, and might be biased by the requirement that the extinction rate is smaller than half of the initial growth rate. It is therefore justified to question the universal validity of our findings, even more so because of the ad-hoc character of the curve (1.1).
A The growth rate
Consider an interval of data describing the number of infectious individuals in k +1 consecutive days. (We expect that the numbers xi are proportional to the number of confirmed cases at the beginning of the epidemic. This is relevant for our interpretation in the main body of this paper, but it is irrelevant for the considerations in this section where xi denotes, by definition, the number of infectious individuals.) Suppose that the data are well approximated by the exponential curve
Let k0 denote the number of days during which an individual is infectious. We suppose that the probability of infecting someone is constant during this period. At day i1 + k with k ≤ k0, thus after k days, each individual which is infections at day i1 will lead to eαk infected individuals, hence will have infected individuals.
After k0 days the individual is removed from the set of infectious individuals, hence the remaining number of infectious individuals equals
It follows that the law (A.1) persists when is much larger than one. On the other hand, (A.1) will break down when is of order one.
Within this model, an epidemic dies out if the number of infectious individuals after k0 days is smaller than 1:
Thus, an epidemic time-series for which never follows the exponential law (A.1) for more than k0 days.
Equivalently, no epidemic with the exponential law (A.1) will be observed with a parameter α such that
We conclude that the time k0 during which an individual is infectious can be determined by measuring the threshold number αmin.
Note that the equality case in (A.4) corresponds to the doubling time of the epidemic. Hence, in this simple model, the time during which an individual is infectious is smaller than or equal the smallest observed doubling time of the epidemic.
B R0 for selected time-series
C Remarks on the “double-logistic” function
Consider the function with
Before trying to fit I to some time series, one needs to resolve the degeneracy related to the symmetry (b1,c1) ↔ (b2,c2), This can be removed by requiring that
The case b1 = b2 would have required special attention, but in all the fits considered this case has never occurred, so we will from now on assume
One can write c1 as and c2 as , which brings (C.1) to the form
The initial growth rate of I is α:= b1 + b2, while the final extinction rate is
Renaming thus b1 to β we can write which leads to the following form of (C.4):
The first inequality in (C.6) shows that α>β: equivalently, a necessary condition for I to provide a good description of the epidemic the initial growth rate must be larger than the final extinction rate. We will see shortly that in fact we must have the inequality (C.8).
We emphasise that the above renaming of the exponents b1 and b2 requires that the exponent which has been renamed to β should be smaller than the exponent that has been renamed to α − β. Hence, if a fit of the data to the function (C.7) has been made, and if the parameters so determined are such that the parameter which was thought to be α − β is smaller than the parameter which was thought to be β, then the parameter which has been renamed to α − β should actually be β, and the parameter that has been renamed to β should be α − β.
This somewhat confusing discussion leads to a more stringent restriction for I to provide a good fit to the data: Since β ≤ α − β we must have
This shows that no good fit to the data can be found if (C.8) fails.
The above is closely related to the question of uniqueness of the parameters in (C.7). Here the question is, for what sets of parameters (α, β, σ, τ) and (α′,β′,σ′,τ′) it holds for all t
Equivalently, which is the same as
By comparing the fastest growing terms on both sides for t → −∞ one finds
From the subleading exponential terms one obtains
The first case leads to the obvious solution (α, β, σ, τ)=(α′,β′,σ′,τ′), while the second leads to
We have thus shown that the sets of parameters give the same function I, and that this is the only nontrivial possibility
In order to remove this degeneracy, within our fitting procedure we enforce the inequality
If a fit is found saturating the equality we repeat the procedure with the reverse inequality in place.
Data Availability
All data available on the John Hopkins University server, https://coronavirus.jhu.edu
Footnotes
↵† Email piotr.chrusciel{at}univie.ac.at, URL homepage.univie.ac.at/piotr.chrusciel
Formatting of the supplementary material corrected
↵1 This should be compared with the logistic function, where both the initial growth rate and the extinction rate are the same; the Gauss-error function, where both the growth and extinction rates vanish; and the Richards curve [2], where the initial and extinction rates are arbitrary but the growth amplitude is correlated with the extinction amplitude.
↵2 For example, if both e(α−β)τ ≥ 9 and eβσ ≥ 9, then the relative error in (1.2) less than 10%: 1/x − 1/(1 + x)=1/(x(1 + x)) ≤ 0.1 × 1/x if x ≥ 9. The accuracy of the approximation grows with growing x.