Abstract
We demonstrate that universal scaling behavior is observed in the current coronavirus (COVID-19) spread in various countries. We analyze the numbers of infected people in selected eleven countries (Japan, USA, Russia, Brazil, China, Italy, Indonesia, Spain,South Korea, UK, and Sweden). By using the double exponential function called the Gompertz function, fG(x) = exp(−e−x), the number of infected people is well described as N(t) = N0fG(γ(t − t0)), where N0, γ and t0 are the final total number of infected people, the damping rate of the infection probability and the peak time of dN(t)/dt, respectively. The scaled data of infected people in most of the analyzed countries are found to collapse onto a common scaling function fG(x) with x = γ(t − t0) in the range of fG(x) ± 0.05. The recently proposed indicator so-called the K value, the increasing rate of infected people in one week, is also found to show universal behavior. The mechanism for the Gompertz function to appear is discussed from the time dependence of the produced pion numbers in nucleus-nucleus collisions, which is also found to be described by the Gompertz function.
1. Introduction
The COVID-19 pandemic is the worst disease spread in this century. As of May 20, 2020, over 4 million people have tested positive in the world, and the number of infected people N(t) at the time t is still increasing rapidly. In order to control the spread of infection, it is desired to understand the diffusion mechanism of COVID-19.
Recently, a double exponential function called the Gompertz function is found to catch the features of N(t) [1–20]. The Gompertz function appears when the infection probability per infected people exponentially decreases as a function of time. With the Gompertz function, the daily increase of infected people (≃ dN(t)/dt) shows asymmetric time-profile rather than the symmetric one found in the prediction of the Susceptible-Infected (SI) model [21], one of the standard models of the spread of infection. The Gompertz function was proposed by B. Gompertz in 1825 to discuss the life contingencies [22]. It is interesting to find that the Gompertz function also appears as the number of tumors [23] and the number of detected bugs in a software [24], as well as particle multiplicities at high energies [25].
The exponential decrease of the infection probability is also important to deduce when the restrictions can be relaxed. For example, Nakano and Ikeda [26] found that the total number of infected people of COVID-19 is well characterized by the newly proposed indicator K, which represents the increasing rate of infected people in one week. The indicator K takes a value between zero and unity, is not affected by the weekly schedule of the test, and is found to decrease almost linearly as a function of time in the region 0.25 < K < 0.9 provided that there is only a single outbreak affecting the infection. In order to understand the linearly decreasing behavior of K, Nakano and Ikeda assumed that the infection probability decreases exponentially as a function of time. Since the indicator K is expected to be useful to predict the date when the restrictions can be relaxed as K(t) ≃ 0.05, it would be valuable to analyze its solution, the Gompertz function, in more detail.
In this article, we analyze the number of infected people by using the Gompertz function. We examine that N(t) in one outbreak is well described by the Gompertz function in several countries. Then with the time shift and the scale transformation of time and N(t), the data are found to show universal behavior; they are on one-curve described by the basic Gompertz function, exp(−e−x). The newly proposed indicator K [26] also shows universality. We further discuss that the number of produced pions in nucleus-nucleus collisions is described by the Gompertz function. This similarity may be helpful to understand the mechanism of COVID-19 spread.
This article is organized as follows. In Sec. 2, we give a brief review of the Gompertz function and its relevance to the disease spread. In Sec. 3, we show the comparison of the number of infected people and the Gompertz function fitting results. We demonstrate that the numbers of infected people in many countries show universal scaling behavior. In Sec. 4, we show that the number of pions in nuclear collisions is well described by the Gompertz function, and we deduce the mechanism to produce the time-dependence described by the Gompertz function. In Sec. 5, we summarize our work.
2. Gompertz function, indicator K and scaling variables
When the infection probability k(t) is given as a function of time, the evolution equation for the number of infected people N(t) and its solution are given as
Adopting an exponentially decreasing function as k(t), the solution is found to be
By choosing the reference time t0 so that k(t0) = γ, the solution is given as where N0 = N(t0) exp(k(t0)/γ) = eN(t0) is the asymptotic value of N(t), N0 = limt→∞ N(t). Here, N0 can be interpreted as “terminal velocity”, since the evolution equation (1) has an analogy to the equation of motion of particle feeling a viscous resistance, where its coefficient is not constant but exponentially decreasing.
The double exponential function appearing in Eq. (3) is called the Gompertz function [22],
By using the Gompertz function, N(t) is given as where x is the scaling variable for N(t).
One of the characteristic features of the Gompertz function is the asymmetry of its derivative in the early (x < 0) and late (x > 0) stages. In Fig. 1, we show the Gompertz function fG(x) and its derivative,
as functions of x. The derivative takes the maximum at x = 0, and the asymmetry in the negative and positive x region is clearly seen.
This asymmetry should be compared with the solution of the SI model [21], in which the susceptible people (S(t)) are infected by the infectious people (I(t)) at the rate proportional to the product S(t)I(t), where k is a constant. The solution is found to be where N0 = S + I is a constant, γ = N0k, and t0 is the time when half of the people are infected, I(t0) = N0/2. The characteristic function fS(x) is called the sigmoid function, which is also referred to as the logistic function,
Then the daily number of infected people is a symmetric function of t − t0,
Where . In Fig. 1, we also show fS(x) and dfS(x)/dx by blue curves. As already noticed [1–20] and is discussed later, the asymmetry is clearly found in dN(t)/dt of COVID-19, so N(t) is better understood by the Gompertz function than by the sigmoid function.
Once the solution is given, one can obtain the K value [26], increasing rate of the infected people in a week, as where xK is the scaling variable for K(t), with w = 7 days.
3. Comparison with data
3.1. Adopted dataset
Let us now examine the universal behavior given by the Gompertz function in real data. We use the data given in Ref. [27] as of May 21, which contain the N(t) data from Dec. 31, 2019 (t = 0) till May 20 (t = 141). Throughout this article, we measure the time t in the unit of day. In order to avoid the discontinuity coming from the definition change, we have removed the spikes in the daily increase (dN(t)/dt) data in Japan (April 12, t = 103) and China (February 13, t = 44), and instead the daily numbers in previous days are increased by multiplying a common factor, which is determined to keep the total number of infected people in the days after the spike. The number of infected people (N(t)) is obtained as the integral of thus-smoothen dN(t)/dt. In addition, seven-day averages (±3 days) are considered in the analysis of dN(t)/dt in order to remove the fluctuations in a week. Thus-smoothen dN(t)/dt data are available in 3 ≤ t ≤ 138.
We have chosen the countries with a large number of infected people (the USA, Russia, Brazil, and the UK as of May 20, 2020), the first three Asian countries where the COVID-19 spread explosively (China, South Korea, and Japan), the first two countries of spread in Europe (Italy and Spain), a country with a unique policy (Sweden), and a country with somewhat different dN(t)/dt profile (Indonesia).
3.2. Number of infected people and its daily increase
In the left panel of Fig. 2, we show the daily increase of the infected people dN(t)/dt given in Ref. [27] with the smoothing mentioned above. The legends stand for the abbreviation of the country name (internet country domain code, see Table 1). The dN(t)/dt data show there is one big peak in each country, and the shape of the peak is asymmetric; fast rise and slow decay. In many of the countries, there are several other peaks, which are smaller than the dominant one but visible at least in the log-scale plot. The fitting results using the Gompertz function are shown by dotted lines, and are found to explain the dominant peak region of data well.
The fitting to dN(t)/dt data is carried out by using the derivative of the Gompertz function,
In order to concentrate on the dominant peak, we first limit the time region of the fit to tmin ≤ t ≤ tmax, which covers it. Next, the fitting time region is extended to the whole range, 0 ≤ t ≤ 140. In the fitting procedure, we have assumed a Poisson distribution for the daily number of newly infected people, then the uncertainty in dN(t)/dt is assumed to be , where ε = 0.1 is introduced to avoid zero uncertainty in the case of zero daily number. We summarize the obtained parameters (N0, γ, t0) in Table 1, and parameters in the line with (*) are adopted to draw the figures. When the χ2 value is smaller in the whole range analysis and the obtained parameters in the two cases are similar, the single outbreak assumption is supported and we show only the results in the whole range analysis. In other cases, we in principle adopt the results giving the smaller reduced χ2. The exception is the USA, where the reduced χ2 is larger but we adopt the whole range analysis results. This is closely related to the multiple outbreaks, and will be discussed in Appendix A.
In the right panel of Fig. 2, we show the normalized daily numbers, (dN(t)/dt)/(N0γ), as functions of the scaling variable, x = γ(t − t0). Most of the data points are around the derivative of the Gompertz function and inside the gray band, which shows the region with 5% uncertainty in γ and 20% uncertainty in N0.
We also show the sigmoid function by the orange curves in the right panel of Fig. 2. It is clear that a single sigmoid function cannot describe the behavior of the dN(t)/dt data. If we try to fit dN(t)/dt data by the sigmoid function, we need to adopt larger γ in the negative x region as shown by the orange dashed curve in Fig. 2, , while approximately agrees with in shape in the positive x region.
The left panel of Fig. 3 shows the number of the infected people N(t), obtained as the integral of dN(t)/dt data after removing the spike. We also show the Gompertz function results with the parameters (N0, γ, t0) determined from the dN(t)/dt data, and an integration constant ΔN,
where ΔN is obtained by fitting to the N(t) data. In most of the countries, the Gompertz function with ΔN explains the data in the large N(t) region and deviations are found only in the region with small N(t). In Japan, the fitted time range is limited to be t ≥ 85 and earlier time data are not fitted to. Thus deviations at t < 80 are visible in the log scale, while the value is less than 10% of the total number of infected people at t = 140.
We show the normalized N(t) as functions of the scaling variables in the right panel of Fig. 3. We subtract ΔN from N(t). It is interesting to find that most of the world data are on the Gompertz function fG(x). In Japan, China and South Korea, the fitted time range is limited and deviation from the Gompertz function results are found in the earlier times (Japan) and in later times (China and South Korea). It should be noted that the agreement at x < 0 owes largely to the large denominator compared with the number of infected people in the early stage. Compared with the exponentially grown number of infected people, the number of infected people in the early stage is much smaller and the ratio is seen to be very small. Nevertheless, it is impressive to find the agreement of the observed number of infected people after scaling and the Gompertz function.
3.3. K value
We now proceed to discuss the K value. We choose the offset day when explosive spread started, as shown by open squares in Fig. 3. The offset day and offset number of infected people, (toffset, Noffset), are summarized in Table 2. With these offset parameters, the K value is obtained as
Since Noffset is generally much smaller than the number of infected people after explosive spread, the K value is not sensitive to the choice of the offset parameters as long as we discuss the long-time behavior.
In the left panel of Fig. 4, we show the K factor as a function of time. Data of K(t) are explained by the prediction from the scaling function, 1 − fG(xK), while the fluctuations around the predictions are large compared with the number of infected people, N(t). In the right panel of Fig. 4, we show K values as functions of the scaling variable xK = γ(t − t0 − Δt). Except for several countries, the scaling behavior in K is observed.
In Ref. [26], K(t) is found to show the linear dependence on time in the large K(t) region, 0.25 < K(t) < 0.9. Actually, the Gompertz function fG(x) shows the linear dependence on time in the small |xK| region. When the scaling variable is small, the first-order Taylor expansion would work,
The precision of this first order approximation is around 1% (10%) for |xK| ≤ 0.5 (|xK| ≤ 1), where 1 − fG(xK) amounts to be 0.45 − 0.81 (0.28 − 0.93). When K is smaller (K < 0.25), should be also small and the Taylor expansion with respect to may work. Then K damps exponentially,
With the sigmoid function, while scaling behavior is observed in N(t) and dN(t)/dt, K does not scale as a function of a single scaling variable. In Fig. 5, we compare the functions in the K value derived from the Gompertz and sigmoid functions for N(t) as functions of x = γ(t − t0). With the Gompertz function, the shift of the scaling variable is enough for the K value to be described by the basic Gompertz function. In contrast, the K value from the sigmoid function reads
In addition to the shift in the scaling variable, the amplitude also depends on γw = 7γ.
3.4. Days of expected relaxing COVID-19 restrictions from the K value
When the K(t) value goes down to be around 0.05 [26], it would be possible to relax COVID-19 restrictions such as lifting the lockdown of the city or relaxing the state of emergency. Let us call those days as trelax. The days of relaxing the restrictions in several countries roughly correspond to the time at K(t) = 0.05. With this condition, the number of infected people in the current outbreak will increase by around 5% and the infection in the current outbreak will converge. In terms of the scaling variable, this corresponds to the solution of 1 − fG(xK) = 0.05 and is found to be xK = xR ≃ 2.97. Thus the expected day of relaxing the restrictions can be evaluated to be around, provided that there will be no further outbreaks. In Table 3, we summarize the expected day of relaxing COVID-19 restrictions trelax from the Gompertz function analyses in comparison with the relaxed day trelaxed in several countries.
In Fig. 6, we show the expected days of relaxing the restrictions in comparison with some of the relaxed days. The lockdown in the Wuhan city was lifted on May 10, 2020 (t = 131) in China, the lockdown was relaxed on May 2, 2020 (t = 123) in Spain, May 4, 2020 (t = 125) in Italy, and May 11, 2020 (t = 132) in the UK. Restrictions were relaxed in part on May 6, 2020 (t = 127) in South Korea, May 12, 2020 (t = 133) in Russia, and May 20, 2020 (t = 141) in the USA. In Japan, the state of emergency was declared on April 7, 2020 (t = 98) and canceled on May 25 (t = 146). The relaxed days in Italy, Spain, Japan, the UK, and the USA are close to those expected from the Nakano-Ikeda model analyses. In China and Korea, the relaxed days were significantly later than the expectations from the model. One can guess that the governments tried to be on the safe side in these first two countries of COVID-19 spread.
The expected day of relaxing COVID-19 restrictions strongly depends on the value of the damping rate of the infection probability, γ, which people and governments should try to enhance. In South Korea and China, the damping rate is larger than 0.1. Then the infection probability decreases by a factor of 1/e within 10 days. In these countries, the test-containment processes have been performed strongly. In many of the countries under consideration (Japan, USA, UK, Italy, and Spain), the damping rates take the value between 4 − 10%/day. In these countries, many of the restaurants and shops are closed and people are requested to stay home for one month or more. Sweden may be an interesting example. The Swedish government does not require restaurants and shops to be closed and does not ask people to stay home. The government asks people to be responsible for their behavior and social distancing is encouraged. The damping rate in Sweden, γ = 3.7%/day, may be regarded as a value representing the intrinsic nature of COVID-19.
It would be valuable to comment on the use of the effective reproduction number, Re, which is defined as the ratio of the newly infected people in one week to that in the previous week and has the merit that it is not necessary to define the offset. For a constant infection probability per infected people, k(t) = k0 = const. in Eq. (1), Re is found to be Re = exp(k0w) and we can guess k0 from Re. In contrast, when the number of infected people is given by the Gompertz function, N(t) = N0fG(x), the effective reproduction number Re(t) is a function of two variables, xK and γw, while K(t) is a function of a single scaling variable xK. Thus the universality observed in K(t) is lost in Re(t), and it would be less easy to give a prediction of trelax from Re(t) than from K(t). In Fig. 7, we show Re(t) from the Gompertz function at γ = 4, 8 and 16%/day. The value of Re(t) at xK = xR depends on γ, the value of xK at Re(t) = 1 is different from xR, xK = 0.49 (1.42) for γ = 16% (4%)/day, and apparent large values of Re(t) appear in the early stage, xK < 0 or t < t0 + Δt(γ).
4. Mechanism of appearance of the Gompertz function
Fast and slow rises in the early and late stages are found in many physical processes such as the particle production in nuclear collisions. In Fig. 8, we show the number of Δ particles (Δ++, Δ+, Δ0 and Δ−) and π particles (π+, π0 and π−) produced in central Au+Au collisions at the incident energy of 1 GeV/nucleon [28]. Histograms show the calculated results by using the hadronic transport model JAM [29]. The main production mechanism of π particles at this incident energy is the Δ production and its decay. In the early stage (t < 15 fm/c), nucleons are excited to resonances such as the Δ particles in the nucleon-nucleon collisions, NN → N Δ. Produced Δs collide with other nucleons and Δs, some of them produce additional Δ particles, ΔN → ΔΔ, and some of them are deexcited to nucleons in the Δ absorption processes such as ΔN → NN. The Δ particles decay and produce π particles, Δ → Nπ. Produced π particles may collide with other nucleons, Δs and πs, and occasionally produce additional Δ, πN → πΔ. In the later stage, the system expands, particle density decreases, and interaction rate goes down. When the density becomes low enough, all Δ particles decay to Nπ with the lifetime τΔ = ħ/ΓΔ ≃ 2 fm/c and π particles go out from the reaction region and are detected.
The number of π and Δ particles in the above nucleus-nucleus collision is found to be well fitted by the Gompertz function fG(x) and its derivative . Curves in Fig. 8 show the results of fit by the Gompertz function and its derivative, where the parameters are obtained as (Nπ, γπ, tπ) = (62.0, 0.113 (fm/c)−1, 14.6 fm/c) and (NΔ, γΔ, tΔ) = (760, 0.160 (fm/c)−1, 14.3 fm/c). Compared with the lifetime of Δ, the number of Δ during nucleus-nucleus collisions has a longer tail. This may be because of the relativistic effects, resonance mass dependence of the width, and sequential decay and production of Δ such as Δ → πN followed by πN → Δ.
It should be noted that one Δ particle mostly decays into Nπ and additionally produced number of π is less than unity in the present pion production in nucleus-nucleus collisions. A rough estimate of the upper bound of the basic reproduction number R0 from Δ to π may be obtained as follows. The lower bound of the number of produced Δ particles is the peak number of Δ, which is NΔ = 45.3 at t = 14 fm/c in the calculated data and is expected to be NΔ(tΔ) = NΔγΔ/e ≃ 44.7 at t = tΔ from the Gompertz function. Then the upper bound of the additionally produced π number is given as Nπ(t = ∞) − NΔ(tΔ) ≃ 17. Consequently, the upper bound of R0 is given as R0 = [Nπ(t = ∞) − NΔ(tΔ)]/NΔ(tΔ) ≃ 0.38, which is less than unity. As a result, the number of pions in the final state is already determined in the early stage. The green dashed curve in Fig. 8 shows the “π-like” particle number, NπΔ = Nπ + NΔ, which shows a peak at t ≃ 16 fm/c, gradually decreases by the Δ absorption processes, and converges to Nπ(t = ∞).
Based on the success in describing Nπ(t) by using the Gompertz function, we may make a conjecture of the correspondence of the COVID-19 spread and the π production in nuclear collisions. Let us assume that π particles correspond to the infected people who tested positive and Δ particles correspond to the exposed (infected) people who have not tested positive yet. The people are exposed in the initial dense stage (Δ production), occasionally infect other susceptible people (additional Δ production) or are recovered (Δ absorption), develop symptoms and test positive (Δ → Nπ). Number of infected people including those have not tested positive, Nπ + NΔ, grows rapidly in the early dense stage but does not change much in the later stage. Hence, except for the early dense stage, the basic reproduction number would be less than unity.
5. Summary
We have analyzed the number of infected people of COVID-19, one of the coronaviruses, as a function of time, N(t), by using the double exponential function referred to as the Gompertz function, fG(x) = exp(−e−x). The Gompertz function appears when the infection probability is an exponentially decreasing function in time. One of the characteristic features of the Gompertz function is the asymmetry of its derivative, , fast rise and slow decay.
This feature is found in the daily cases of infected people, dN(t)/dt. We have assumed that the number of infected people from one outbreak is given as , where N0, γ and t0 are the total number of infected people, damping rate of the infection probability, and the time where N(t) becomes N0/e. These parameters are obtained by the χ2 fitting to the dN(t)/dt data. Then we have found that N(t) and dN(t)/dt show universal scaling, N(t)/N0 = fG(x) and , where x = γ(t − t0) is the scaling variable. The K value, the increasing rate of infected people in one week, is also found to show the scaling behavior, K(t) = 1 − fG(xK), where xK = γ(t − t0 − Δt(γ)) is the scaling variable for K(t) with Δt(γ) being a given function of γ.
We have also found that the time dependence of the produced pion number in nucleus-nucleus collisions is described by the Gompertz function. Since both of the COVID-19 spread and the pion production are transport phenomena, the mechanism of the former may be similar to the latter. If this is the case, there is a possibility that the basic reproduction number is high only in the initial stage of the outbreak.
Throughout this article, we have imposed the single-outbreak assumption. Since this assumption may be too restrictive, we show the results of multiple-outbreak analyses in Appendix A. The multiple-outbreak analyses also show that the COVID-19 spread in one outbreak is well described by the Gompertz function.
Data Availability
We use data opened to public and available on the web. https://ourworldindata.org/coronavirus-source-data https://covidtracking.com/
A. Multiple-outbreak model analysis
In the analyses in the main text, we have assumed that there is only one dominant outbreak. Let us consider here the multiple-outbreak cases, where we assume the number of infected people is described by the sum of several Gompertz functions,
As in the single-outbreak model discussed in the main text, we analyze the daily number of newly infected people, dN(t)/dt,
Where .
In Fig. A1, we show dN/dt in Eq. (A2) in comparison with the daily number of newly infected people. In the multiple-outbreak analysis, we use data in the whole range, 0 ≤ t ≤ 140. Two or three outbreaks are considered, and we try to describe the region with large dN/dt by adding outbreaks. Obtained parameters are summarized in Table A1.
In many countries under consideration, dN/dt is decreasing on May 21, 2020, and the parameters are well determined. Then we adopt the three-outbreak model (n = 3). In Japan, China, and South Korea, multiple-outbreak structure of dN/dt is clearly seen in the logarithmic plot and can be fitted by using Eq. (A2). In Russia, Italy, Spain, the UK and Sweden, additional outbreaks improve the reduced χ2 by filling the peaks which are not covered by the single outbreak. In Brazil and Indonesia, where the numbers of infected people are still rapidly increasing, we need at least one outbreak term with ti > tnow. In those cases, parameters generally have large uncertainties, so the third outbreak, if included, has extremely large uncertainties larger than 100 %. Thus we use the two-outbreak model (n = 2) in these countries.
In the USA, there are many centers of outbreaks. In Fig. A2, we show the dN(t)/dt data in New York, Massachusetts and California states [30]. It is possible to fit the data in each state by using the Gompertz function, but the results show significantly different values in (γ, t0). This would be the reason of the slow decrease of dN(t)/dt in the USA. As a result, a single-outbreak treatment is not appropriate. By comparison, the dN(t)/dt data are reasonably explained by two outbreaks (n = 2). We have used the data in Ref. [27] updated on June 7, 2020. The daily number of newly infected people does not decrease and it seems that it takes more time for the settle down.
After obtaining (Ni, γi, ti) by fitting dN/dt data, the constant part (ΔN) is obtained by fitting N(t). Thus obtained multiple-outbreak functions in Eq. (A1) are compared with the data in Fig. A3. In the region with N(t) > 100, the multiple-outbreak functions are found to explain the data well. This supports the idea that the number of infected people in one outbreak would be described by the Gompertz function.
Acknowledgments
The authors thank T. Kunihiro and T. T. Takahashi for constructive comments and suggestions. They also thank N. Ikeno, A. Ono and Y. Nara for useful discussions. This work is supported in part by the Grants-in-Aid for Scientific Research from JSPS (Nos. 19H01898 and 19H05151).
Footnotes
↵* E-mail: ohnishi{at}yukawa.kyoto-u.ac.jp
† Report number: YITP-20-82