Abstract
Prediction on the peak time of COVID-19 virus spread is crucial to decision making on lockdown or closure of cities and states. In this paper we design a recursive bifurcation model for analyzing COVID-19 virus spread in different countries. The bifurcation facilitates a recursive processing of infected population through linear least-squares fitting. In addition, a nonlinear least-squares fitting is utilized to predict the future values of infected populations. Numerical results on the data from three countries (South Korea, United States and Germany) indicate the effectiveness of our approach.
1. Introduction
Coronavirus disease (COVID-19) is a novel respiratory illness that originated in 2019 and can spread from person to person, as defined by CDC [1]. The first incidence of such disease was publicly reported as an outbreak in Wuhan, China. So far, the original source of this disease has not been clearly identified and the disease is continuously spread in over 70 countries.
Lockdown of towns, cities, states, and countries imposes severe damage to the well-being and economic growth of society. The unknown nature about the peak of virus spread makes the decision of lockdown or closure a difficult task to plan in advance. This calls for an accurate model to predict the peak time of ongoing spread of COVID-19 virus.
2. Literature Review
Many studies have been carried out on the epidemic investigation of COVID-19 spread. The first category of studies is pure statistical analysis. Important epidemic parameters were estimated [2, 3], including basic reproduction number [4], doubling time [5] and serial interval [6]. In addition, some advanced models were developed in handling untraced contacts [7], undetected international cases [8], and actual infected cases [9]. Statistical reasoning [10, 11] and stochastic simulation [12, 13] were also explored by a few researchers.
The second group of investigations was based on dynamic modelling. Susceptible exposed infectious recovered model (SEIR) was used in assessing various measures in the COVID-19 outbreak [14-17]. Furthermore, it was utilized in investigating the effect of lockdown [18], transmission process [19], transmission risk [14], and the effect of quarantine [14]. The SEIR model with time delays was also developed for studying the period of incubation and recovery [20, 21].
Although there have been many recent studies with respect to the COVID-19 virus spread, an accurate model to pinpoint the peak time of the virus spread is still elusive. Such a model is crucial to a decision-making process for strategic plans to achieve a balance between reduction in life loss and avoidance of economic crisis due to lockdown.
The rest of this paper is organized as follows. In Section 3, a recurve bifurcation model is introduced to model the COVID-19 spread. A bifurcation analysis is given in Section 4 on infected data from South Korea. Section 5 describes the prediction of COVID-19 virus based on our model, followed by some concluding remarks in Section 6.
3. Recursive Bifurcation Model
In this paper, we focus on the number of infected population, which is an important metric to measure the extent of the COVID-19 spread in different countries. Although the infected population in most countries follows a pattern of an exponential or sigmoid function, the logarithm of the infected population may provide more information, as shown in Fig. 1(b).
The countries that exhibit a bifurcation pattern include South Korea, United States, France, Canada, Germany, Australia, Malaysia, and Ecuador. By utilizing the bifurcation, we can find out the intrinsic parameters in cycle 1 and apply those parameters as a set of starting values in the prediction for cycle 2 or beyond.
Following the above idea, we introduce a recursive Tanh function to describe the number of infected population within each cycle of an entire virus spread process: where i refers to the i-th cycle, P is the number of infected population in the i-th cycle, D represents the number of days since the initiation of virus spread, Pi stands for the number of infected population at the end of the i-th cycle, ri is the spread rate in the i-th cycle, and Di refers to the number of days at the end of the i-th cycle. The purpose of adding 1 in the logarithm calculation is to avoid an infinity caused by the case where P = 0.
Note that Equation (1) is not strictly a recursive formula in a conventional sense. The reason for us to call it as a recursive one is that Equation (1) should be recursively solved starting from cycle 1 toward cycle n, if n is the last cycle for the virus spread. When n=1, this equation is degenerated to a regular Tanh function.
4. Bifurcation Analysis of COVID-19 Virus Spread
In order to validate Equation (1) for the analysis of COVID-19 virus spread, we have to select a complete virus spread process. Among all the countries, South Korea seems to be the best choice for this validation because the country provides reasonably reliable data and the virus spread in that country has been stabilized.
ri in Equation (1) represents an intrinsic attribute of the virus spread rate. It can be estimated by a linear least-squares fitting of the following linear equation in a parameter space: where X = D-Di and .
Figure 2(a) shows the result of determining the virus spread rate, r1. By using this r value, we predict the infected population, yp, which is very close to the true data, y, as shown in Figure 2(b).
Furthermore, by using r1 in cycle 2 of Korea data, we also achieve an accurate prediction of infected population and validate α to be close to unity (Figure 3). Here, α is a fictious variable that should be of a value of unity:
where The bifurcation in Figure 1(a) is easy to identify visually. An automatic algorithm can be created on the basis of discontinuity of tangential direction when traversing the curve. Since it is not the main focus of this paper, we do not explore it any further in this aspect.
5. Prediction of Peak Time of COVID-19 Virus Spread
Based on the model in Section 4, we design an algorithm to predict the incoming peak time of COVID-19 virus in United States and Germany, as given in Table 1. Since the infected population has not been stabilized in these two countries, it is important to estimate the ultimate infected population at the end of the last cycle, n.
We first use the following formula to estimate through a linear least-squares fitting: where y=log(P+1)-log(Pn-1+1) and .
Then, Equation (1) is utilized to estimate for cycle n through a linear least-squares fitting. With and being available as a pair of starting values, a nonlinear Levenberg-Marquart least-squares fitting [22] is computed to determine two unknown parameters (βn and rn) simultaneously in the following equation: Once βn and rn are determined, Equation (5) can be used to predict the future values of infected population. To define the peak time of virus spread, a termination condition is proposed as follows: where j refers to j-th day in cycle n. Equation (6) means that the virus spread approaches its stability when the difference in the logarithm of infected population between two consecutive days is less than 0.01.
Figure 4 shows the prediction result of infected population in United States. The bifurcation pattern of infected population is given in Figure 4(a) and the determination of virus spread rate is presented in Figure 4(b). The virus spread rate in United States (r1 = 0.072) is smaller than that in South Korea (r1 = 0.106) because the population density in South Korea is much higher. This may also mean that the peak time of virus spread will be longer than that in South Korea. Figures 4(c) and 4(d) are the predicted data for cycles 1 and 2, respectively. According to Figure 4(d), the COVID virus spread in United States will roughly peak on April 26, 2020.
COVID-19 data in Germany can be analyzed in a similar way. Figure 5(b) indicates that the virus spread will approximately peak on May 1, 2020. The virus spread rate, r1, in Germany is 0.108, which is close to that in South Korea. These two countries have a higher virus spread rate than United State because of the higher population density in Germany and South Korea.
6. Conclusions
In this paper, we propose a recursive bifurcation approach to estimate the peak time of COVID-19 virus spread. The infected population data in South Korea is analyzed as an example of stabilized virus spread. An algorithm is developed to predict the future infected population based on ongoing existing data as of April 6, 2020. Our model predicts that the COVID-19 virus spread will approximately peak on April 26 and May 1, 2020, respectively for United States and Germany in terms of infected population.
Data Availability
All the true data of infected populations was obtained from the Coronavirus Resource Center of Johns Hopkins University.
Conflicts of Interest
The authors declare no conflict of interests.
Acknowledgments
All the true data of infected populations is obtained from the Coronavirus Resource Center of Johns Hopkins University.