Abstract
In this paper, we analyze the real-time infection data of COVID-19 epidemic for 21 nations up to May 18, 2020. For most of these nations, the total number of infected individuals exhibits a succession of exponential growth and power-law growth before the flattening of the curve. In particular, we find a universal growth before they reach saturation. India, Singapore, and Sri Lanka have reached up to linear growth (I(t) ~ t), and they are yet to flatten their curves. Russia and Brazil are still in the power-law (t2) growth regime. Thus, the polynomials of the I(t) curves provide valuable information on the stage of the epidemic evolution. Besides these detailed analyses, we compare the predictions of an extended SEIR model and a delay differential equation-based model with the reported infection data and observed good agreement among them, including the behaviour.
1 Introduction
As of May 23, 2020, COVID-19 pandemic has infected more than 5.3 million of the human population and caused 0.34 million deaths. The world economy is in tatters. Therefore, understanding the progression of the pandemic is extremely crucial. In the present paper, we analyze the publicly-available national COVID-19 infection data [37] up to May 18, 2020. We observe that the COVID-19 infection curves for many nations exhibit power-law growth after exponential growth. We compare the reported data with model predictions and observe a good agreement among them.
To understand and forecast epidemics, epidemiologists have made many models [15,4,10]. One of the first models is called the SIR model, where the variables S and I describe the numbers of susceptible and infected individuals, respectively. The third variable R represents the removed individuals who have either recovered or died. An advanced model, called SEIR model, includes exposed individuals, E, who are infected but not yet infectious [4,10].
SARS-CoV-2 is one of the seven human coronaviruses which have been identified so far. It is the most dangerous among all of these because of its highly infectious nature and its lethality. Asymptomatic carriers, individuals who do not exhibit any symptoms, have carried the virus to far off places where it has spread rapidly [19]. Even symptomatic patients manifest symptoms two to three days after turning transmissible. To stop the spread of the deadly virus, various nations have employed lockdowns, mandatory social distancing, quarantines for the affected, etc.
Despite the difficulties stated above, many models are able to describe the COVID-19 pandemic data quite well. Peng et al. [27] constructed a generalized SEIR model with seven-variables (including quarantined and death variables) for the epidemic spread in China. Their predictions are in good agreement with the present data. Lopez and Rodo [20] formulated an extended version of this model to analyse the spread of the pandemic in Spain and Italy. Earlier, Cheynet [7,8] had developed a code to simulate this model. Hellewell et al. [13] studied the effects of isolation on controlling the COVID-19 epidemic. Chinazzi et al. [9] analyzed the effects of travel restrictions on the spread of COVID-19 in China and in the world using the global metapopulation disease transmission model. Mandal et al. [23] constructed model for devising intervention strategies in India. Shayak et al. [34] have constructed delay differential equation (DDE) model for the spread of COVID-19; this model takes into account the pre-symptomatic period and predicts a path to the end of the epidemic.
Due to the above complex issues in the epidemic models of COVID-19, many researchers have consciously focussed on the data and attempted to extract useful information from them. It has been observed that the analysis of the pandemic provides important clues that may be useful for its forecast. In particular, Ziff and Ziff [39], Komarova and Wodarz [16], Manchein et al. [22], Blasius [5], Marsland and Mehta [25], Li et al. [18], Singer [35], Beare and Toda [2], and Cherednik and Hill [6] analyzed the reported count of total infections (I(t)) in various nations and observed power-law growth after the exponential regime. Verma et al. [36] analyzed the data of 9 nations up to April 7, 2020 and showed that the I(t) goes through power laws, t3, t2, t and , in temporal sequence before flattening out. By April 7, 2020, China and South Korea had flattened the I(t) curves, but other nations were either in the exponential regime or in the power-law regime. However, by May 18, many nations are close to the flattening of their epidemic curves. Besides the above results, Prakash et al. [31] reported a linear growth of I(t) after early exponential growth.
Schüttler et al. [33] analyzed the daily death counts for various nations and observed that their probability distributions appear to follow a Gaussian profile. Marsland and Mehta [25] observed that the error function provides best fit to the total count, I(t); this observation follows from Schüttler et al. [33]’s analysis.
There are epidemic growth models based on population growth [10,38]. COVID-19 spread via asymptomatic carriers leads to a network formation. Hence, network-based epidemic growth models may be useful for modelling COVID-19 pandemic. Marathe and Vullikanti [24] review computational epidemiology with a focus on epidemic spread over a network.
In this paper, we analyzed the COVID-19 infection data up to May 18, 2020 for 21 nations and observed that all the nations are following transition from exponential to power-law growth in infection counts. Many of the 21 nations are close to flattening their curves with several exceptions (for example, Russia). We also showed that three epidemics—Ebola, COVID-19, MERS—have similar evolution: exponential growth, power-law growth, and then flattening of the curve. In addition, we compared the predictions of an extended SEIR model [20] and a delay-differential equation model [34] with the real-time data and observed good agreement among them.
The structure of the paper is as follows: in Sec. 2 we analyse the COVID-19 data for 21 leading nations and observe power law growth for them after the exponential growth. The evolutions of Ebola, MERS, and COVID-19 are compared in Sec. 3. The predictions of two models of COVID-19 pandemic are compared with the observed data in Sections 4 and 5. We conclude in Sec. 6.
2 Data analysis of COVID-19 epidemic
In this section, we present our results based on a comprehensive data analysis of COVID-19 cases for 21 countries (see Table 1) up to May 18, 2020. The majority of the countries in our analysis include those with a large number of COVID-19 cases, including USA, Italy, Germany, China, and India. For a complete study, we also include countries with a relatively smaller number of cases such as Sri Lanka and Hong Kong. We used the real-time data available at worldOmeter [37] and chose the starting date (see Table 1) as the one from which the number of infected cases increased rapidly. Corona Resource Center [14] too is an important repository for COVID-19 data.
We analyze the evolution of cumulative number of infected cases, which is denoted by I(t), with time in days, denoted by t. For all I(t) curves, we compute the derivatives using Python’s gradient function. These derivatives indicate the daily count of the infected cases. Note that exhibit lower fluctuations than the measured daily counts due to smoothing. In Fig. 1, we exhibit the plots of I(t) (red curves) and (blue curves) in semi-logy format for all the 21 countries.
We find that a single function does not describe the I(t) curves; hence, we compute best-fit curves for different parts of I(t) by employing exponential and polynomial functions. We used Python’s polyfit function to compute the best-fit curves. These curves are listed in Table 1 along with the relative errors between the original data and the fitted data. However, we exhibit only the leading power laws of the polynomials in the plots of the figures.
Initially, all the countries exhibit exponential growth (I(t) = A exp(βt)), which is expected. It is worth mentioning that the I(t) plots for USA, UK, France, Spain, Germany, Russia, Belgium, and Brazil have two exponential functions for the fit. For example, the I(t) curve of UK is described by two exponential functions, ~ exp(0.26t) and ~ exp(0.23t). The quantity β is proportional to the growth rate. The value of β varies for different countries as it depends on factors such as population density, immunity level of the population, climate, local policy decisions (social distancing, lock-downs, testing capacity), etc.
In the exponential regime, the daily infection count is directly proportional to the cumulative count of cases, that is, . The cumulative case count doubles in time T = (log 2)/β in this regime, For Italy, β = 0.33, resulting in T ≈ 2 days, which means Italy’s I(t) doubled every two days in the early phase (February 22 to March 01).
Next, the curves transition to the regimes that are best described by polynomials and can be approximated as power laws. In Fig. 1, we report the leading terms of the best-fit polynomials as power laws (also see Table 1). The I(t) curves for South Korea, China, Spain, Germany, Israel, Netherlands, and Switzerland exhibit three power law regimes—t2, t, and before flattening. Similarly, I(t) for Australia, Belgium and Hong Kong saturate after t and regimes. As predicted in our earlier work [36], countries such as USA, France, Italy, Spain, and Germany transition to a linear and then to regime after going through regimes of t4, or t3, or t2. These nations are close to flattening their I(t) curves. UK, Turkey, Israel and Netherlands exhibit similar transitions. However, I(t) for countries like Hong Kong, Sri Lanka, Australia, and Belgium directly transitioned to the linear regime from an exponential phase. We make a cautionary remark that the coefficients of the polynomials depend quite critically on the choice of endpoints of the fit. Our observations of power-law growth are consistent with earlier results [39, 16, 22, 5, 25,18,35,2,36].
In Fig. 1, in the exponential regime, the curves (daily counts) are nearly parallel to the I(t) curves. It means that increases exponentially in the beginning, similar to I(t). Subsequently, the curves transition to power-law regimes. As discussed by Verma et al. [36], the power law can be approximated as I(t) ~ Atn and , which is slower than for the exponential regime. We also remark that for large n, , similar to exponential function.
The linear growth regime has an interesting property. In this regime, , that is, constant daily infection count. The daily infection count starts to decrease after the linear regime; hence linear regime is the transition point.
We also analyze the data of cumulative infected individuals in the entire world. In Fig. 2, we plot I(t) and versus time in semi-logy format. Note that the initial epicenter of the COVID-19 outbreak was in China, and then it shifted to Europe and then to USA. Therefore, we divide the plot in two parts. In the first part [Fig. 2(a)], we illustrate cases that belong mostly to China. After approximately thirty days of outbreak (around February 20), I(t) for China starts to saturate. In Fig. 2(b), we exhibit the curve after t = 41 (March 02) when China had achieved flattening of the curve. In Fig. 2(b), and due to the coordinate shifts. Both the plots exhibit exponential and power law regimes, but is yet to flatten (see Table 2). We hope that there is no third part to this curve, which is possible if the unaffected countries remain so.
The transition from the exponential to power-law behaviour is expected from the nature of the I(t) curve. The I(t) curve is convex during the exponential growth phase, that is, its center of curvature is upward. However, the curve must turn concave for it to flatten. This transformation occurs via a sequence of growth phases: power-law, linear, square-root, and then flat. The curve transitions from convex to concave in the linear regime for which the radius of curvature is infinite. In Sec. 6 we argue that the power-law behaviour is possibly due to lockdown and social distancing.
We remark that the death count due to COVID-19 also exhibits similar behaviour as the infection count I(t). It is expected because a fraction of infected individuals, unfortunately, die. However, we expect a small time delay between the death time series and the infection time series. Some researchers have attempted to fit the I(t) and death counts with error functions [25,33].
The above analysis shows that we can track the development of the epidemic locally in time. The best-fit curves in particular segments provide the status of the epidemic. For example, if we have reached the linear regime, then we are not far from flattening the curve. Similarly, a regime indicates that the flattening of the curve has begun. Thus, simple data analytics described above has significant predictive power.
In the next section we compare the functional behaviour of COVID-19’s I(t) curve with those of other major epidemics.
3 Comparisonof COVID-19 with other epidemics
A natural question is whether the epidemic evolution of COVID-19 differs from the spread of Ebola and MERS (Middle Eastern Respiratory Syndrome). In this section we perform a comparative study of Ebola, MERS, and Covid-19 epidemics. We digitized data for these epidemics for their respective time periods: MERS [29] from May 01, 2013 to April 30, 2015; Ebola [21] from May 01, 2014 to April 30, 2015; and Covid-19 [37] from January 22, 2020 to May 18, 2020. In Fig. 3, we plot I(t) vs. normalized time, t/tmax, in a semi-logy format for all three epidemics. Here, tmax (see Table 3) is the time span of the epidemic, except for COVID-19 for which tmax is taken up to May 18.
In Fig. 3 we present the best-fit curves as dotted lines. Clearly, the three curves look similar, with regimes exhibiting exponential, power-law, and linear growth before flattening. A major difference is that COVID-19 has two subparts, which is essentially due to the spread of COVID-19 by asymptomatic carriers. As shown in Table 2 and Fig. 2, the epidemic first spread in China and then in rest of the world. In contrast, the other two epidemics, Ebola and MERS, were somewhat confined.
In the next section, we present a model for COVID-19 whose predictions match several countries.
4 SEIR model for COVID-19 epidemic
In Sec. 2, we analyzed the COVID-19 infection data and observed a power-law growth (followed by a linear regime near saturation) after an exponential growth. In this section, we attempt to get some insights about this transition using SEIR model [4,10,20,17,28]. Note the other important epidemic models are regression models [30,12], ARIMA forcasting model [1,3,11], SIR model [15,32], etc. All these models have been frequently and successfuly used to analyse the transmission dynamics of COVID-19. For example, Labadin and Hong [17] used this model to predict the second confirmed case in Malaysia.
Recently, Peng et al. [27] constructed a generalised SEIR model for the spread of SARS-Cov-2 virus in China. López and Rodo [20] modified Peng et al. [27]’s model to analyze the data of Spain and Italy up to the end of March. In this section, we will discuss a simplified version of López and Rodo [20]’s SEIR model and fit it with the real-time data of USA, Italy, Spain and Japan till 18th May 2020.
In the model, we assume the disease transmission to take place only among humans. Further, the natural birth and death rates are assumed to be negligible. We divide the total population (N) at a certain place at time t into seven categories: Susceptible (S(t)), Exposed (E(t)), Infected (I(t)), Recovered (R(t)), Insusceptible (P(t)), Quarantined (Q(t)) and Dead (D(t)). Here, Q(t) is the number of confirmed infected cases at time t. The evolution equations of the seven categories are: where β, α, δ, λ(t), and κ(t) are the infection, protection, average quarantine, recovery and mortality rates respectively; and γ−1 is the average latency period for COVID-19. The protection rate α is governed by the intensity of contact tracing, lockdown policies, and improvement of health facilities. The time-dependent parameters λ(t) and κ(t) are modeled as follows [20]: where λ0, κ0, λ1, and κ1 are constants. The functional forms in Eqs. (8-9) are chosen in such a way that the recovery rate saturates and the death rate vanishes with time. Note that the cumulative number of reported infected cases (denoted by I(t) in Sec. 2) is the sum of Q(t), R(t), and D(t). For further details, refer to Peng et al. [27], and Lopez and Rodo [20].
We compare the model predictions [Eqs. (1-9)] with the available data [37] for USA, Italy, Spain and Japan. For Spain and Japan, t = 0 is taken to be the starting date shown in Table 1. For USA and Italy, t = 0 corresponds to 29th and 24th February respectively. The end date for all the four countries is 18th May. The initial values of Q, R, and D are taken to be the total active cases, recovered cases and deaths respectively at t = 0 for each country. The number of initial insusceptible cases (P(t = 0)) is assumed to be zero. We adjust the parameters {α, β, γ, δ, {λ0, λ1}, {κ0, κ1}}, E(t = 0) and I(t = 0) such that the relative error between the model and actual data is minimized. Note that the initial condition satisfies the relation where N is the total population of the country.
In Fig. 4, we present the best-fit curves from the SEIR model along with the actual real-time data. This model fits well with the data for Italy and Spain. In Table 4 we list the numerical values of the best-fit parameters and the relative errors between the predictions and data. Note that for Spain and Italy, López and Rodo [20] considered natural birth and death rates in their model and obtained fits for Q, R and D seperately. In contrast, we stick to the fundamental assumption regarding natural birth and death rates of the basic SEIR model [17] and obtain the fits for Q(t)+ R(t) + D(t) till May 18.
Our best-fit values of parameters for Spain and Italy are nearly consistent with those of López and Rodo [20]. The model shows that high infection rates (β) and small average latency periods γ−1 try to push the cumulative number of infected cases (reported) to a large saturation value via an exponential growth. On the other hand, high protection and quarantine rates, α and δ, slow down the growth and minimize the saturation level of the cumulative infected (reported) cases. Thus, the values of the control parameter set {β, γ, α, δ} in Table 4 determine the nature of the power-law after the exponential growth. On the other hand, the linear regime (for Italy, Spain) near the saturation is determined well by the removal rate set {λ0, λ1, κ0, κ1}. Thus, the present model is consistent with the results presented in Section. 2.
In the next section, we will present another model which is based on delayed differential equations.
5 Model based on delayed-differential equations
In this section we consider a class of models based on delay differential equations (DDE), which are different from SEIR model. Here, the equations often look simpler than their SEIR counterparts since delay can be used to account for multiple features without increasing the number of variables. The flip side, however, is that delays can be analytically intractable.
In this section, we focus on one particular delayed model [34], which uses delays to account for the pre-symptomatic period and the infection period. This model has been used to track the evolution of the epidemic, especially in the post-linear regime. It describes a potential new route to the end of the pandemic through a combination of social distancing, sanitization, contact tracing and preventive testing. In the controlled endgame phase of the epidemic, which we call self-burnout, we have a slightly different equation. In this phase, there is extensive enforcement of separation minima (a term we prefer to social distancing as it does not carry connotations of emotional isolation) so the rate of new cases does not depend on the number of healthy and susceptible people at large (i.e. not in quarantine). Rather, we assume that each sick person spreads the disease at a constant rate m0. Under these conditions, the dynamic model for the spread of cases (y, which is same as I(t)) is where μ1, μ2, μ3, τ1, τ2 are parameter. In our model, the contact tracing manages to capture a fraction 1 − μ3 of all the sick patients and places them into quarantine.
A solution to the above equation is y = const. It has been shown in [34] that this solution is stable if and only if
This identifies a maximum value of m0 for which the solution is stable, i.e., the epidemic gets over in time. For the plausible parameter values τ1 = 7, τ2 = 3, μ1 = 1/5 and μ3 = 1/2, the critical value of m0 turns out to be 20/53. Here we assume that the test results are instantaneous due to high testing capacity present in the region.
We perform simulation runs of Eq. (11) with the above parameter values and m0 having the values 70, 80 and 90 percent of the critical limit. We seed the equation with the linear function y = 1000t for the first ten days. We find a considerable region thereafter where the case histories show a profile before saturating. The errors between the observed data and the best-fit curves are less than 1 percent in each case. This explains why the countries which are achieving saturation are showing a pronounced phase after the linear phase.
To further bolster the validity of our model, we consider two countries (South Korea and Austria) which have shown a very good linear region followed by saturation. South Korea showed linear regime from March 28 to April 04 (with 9478 and 10156 cases respectively), after which it enters the burnout phase. Using this as the seeding data and taking the parameter values mentioned above, we find the best fit for the next 20 days for m0 equal to 77 percent of the critical. The error between the best-curve and the data is 0.34 percent. Note that Shayak and Rand [34] has found an m0 of 75 and not 77 percent of critical, because the fit was upto a smaller duration.
Austria showed linear regime from March 28 to April 01 (with 6250 and 10711 cases respectively) before entering self-burnout phase. We find the best fit for m0 to be 79 percent of the critical. The error is 2.6 percent. However, the actual data for the 8th to the 15th day appears to be too low—the curve has a convex profile which is probably unrealistic. If we consider the error from the 16th to the 30th day then we find a value of 1.1 percent only. We present these best-fit curves in Fig. 6. Other regions which are in the self-burnout phase are Vietnam, Australia, New Zealand, and Goa, Kerala and Odisha in India. We have chosen South Korea and Austria since their data shows the smoothest profile on account of high testing capacity.
Both SEIR and the DDE models describe the evolution of COVID-19 epidemic quite well for many countries. A detailed comparison between the two models will be performed in future. Also, we plan to employ the two models to understand the epidemic evolution for many nations.
6 Discussions and Conclusions
COVID-19 pandemic involves many factors, for example, asymptomatic carriers, lockdown, social distancing, quarantine, etc. Considering these complex issues, we focus on data analysis. In particular, we analyze the real-time infection data of COVID-19 epidemic for 21 nations up to May 18, 2020. Our analysis shows that many nations are close to flattening the epidemic curve.
A key feature of our analysis is the emergence of power-law behavior after an exponential growth, which has also been observed by other researchers [39,16,22,5, 25,18,35,2]. The exponential growth is easily explained using relation, which arises due to the spread by contact. For power-law growth, I(t) ~ tn, the above relation is modified to . The suppression of I−1/n in could be attributed to lockdowns and social distancing etc. A careful analysis of the epidemic models should yield this feature. Interestingly, Ebola and MERS also exhibit similar behavior. This generic feature is very useful for the forecast of the epidemic evolution.
Note that the I(t) curve needs to turn from convex (during the exponential growth) to concave for flattening. Hence, a transition from exponential growth to a power-law growth is expected. The lockdowns and social distancing are likely to make the transition earlier, thus suppressing the exponential growth to some degree. Earlier, Verma et al. [36] had conjectured that the power-law growth might occur due to asymptomatic carriers and/or community spread. This conjecture needs a closer examination.
In this paper, we only studied the infection counts. However, it is evident that during the growth phase, the active cases and death counts, would follow similar pattern as I(t). The total death count too flattens along with the infection count, but the active cases decreases with time during the saturation.
Prakash et al. [31] studied the phase space portraits, that is, vs. I plots. They observed the phase-space curves to be linear. This is natural for the exponential growth , as well as for the power-law growth with large exponent n because . In another interesting analysis of COVID-19 epidemic, Schiittler et al. [33] and Marsland and Mehta [25] argued that I(t) or total death count could be modelled using error function. Using this result, we may be able to predict the asymptotic behavior of I(t) that may yield valuable clues regarding the extent and duration of the epidemic.
Epidemic spread has similarities with rumor spread and the growth of a network [10,26]. A comparison of the power-law growth in these systems will yield fruitful results for the epidemic forecast.
In summary, COVID-19 epidemic data reveal interesting properties that can be used for its forecast. The dependence of total infections after the linear regime is a striking feature that might have connections with other aspects of physics and mathematics. We leave these considerations for future studies.
Data Availability
We used the publicly available data from the website worldOmeter
7 Conflict of interest
The authors declare that they have no conflict of interest.
Acknowledgements
The authors thank Santosh Ansumali for useful discussions. We also thank the worldOmeter for the data that made this work possible. Ali Asad is supported by Indo-French (CEFIPRA) project 6104-1, and Soumyadeep Chatterjee is supported by INSPIRE fellowship (IF180094) of Department of Science & Technology, India.