Abstract and Findings
Confirmed infection cases in mainland China were analyzed using the data up to January 28, 2020 (first 13 days of reliable confirmed cases). For the first period the cumulative number of cases followed an exponential function. However, from January 28, we discerned a downward deviation from the exponential growth. This slower-than-exponential growth was also confirmed by a steady decline of the effective reproduction number. A backtrend analysis suggested the original basic reproduction number R0 to be about 2.4 to 2.5. As data become available, we subsequently analyzed them during three consecutive periods obtaining a sequence of model predictions. All available data up were processed the same way. We used a simple logistic growth model that fitted very well with all data. Using this model and the three sets of data, we estimated maximum cases as about 21,000, 28,000 and 35,000 cases refining these predictions in near-real time. With slightly different approach (linearization in time) the estimate of maximum cases was even higher (about 65,000). Although the estimates of maximum cases increase as more data were reported all models show reaching a peak in mid-February in contrast to the unconfined exponential growth. These predictions do not account for any possible other secondary sources of infection.
Introduction
Recent outbreak of a novel coronavirus (designated 2019-nCoV) originated in Wuhan, China raised serious public health concerns and many human tragedies. To manage the epidemics resulting from virus spreading across China and other countries forecasting the occurrence of future cases is extremely important. Such forecasting is very complicated and uncertain since many factors are poorly understood or estimated with a large possible error. Two major factors include spatial movement of virus carriers (i.e., infected individuals) and the basic reproduction number R0 (average number of new infections originated from an infected individual).
As a result of great public attention to the virus spreading several approaches were recently reported attempting to model the epidemics and infection dynamics (Chen et al., 2020; Hui et al., 2020; Imai, Cori, et al., 2020; Imai, Dorigatti, Cori, Donnelly, et al., 2020; Li et al., 2020; Linton et al., 2020; Liu et al., 2020; Shen, Peng, Xiao, & Zhang, 2020; Wu, Leung, & Leung, 2020; Zhang & Wang, 2020b; Zhao et al., 2020a). These models are certainly useful to understand the implications of various quarantine procedures, public health actions, and possible virus modifications. However, the reported models are, by its nature, very complicated with numerous assumptions and requiring many parameters values of which are not known with good accuracy. As a result, some predictions (Nishiura et al., 2020) were more educated guesses although eventually they were confirmed qualitatively.
In this work, we present the results of fitting a very simple (perhaps even simplistic) model to the available data and a forecast of new infections.
Logistic Model and its Application
The logistic model has been used in population dynamics and specifically in epidemics for a long time (Bailey, 1950; Bangert, Molyneux, Lindsay, Fitzpatrick, & Engels, 2017; Cockburn, 1960; Jowett, Browning, & Haning, 1974; Koopman, 2004; Mansfield & Hensley, 1960; Waggoner & Aylor, 2000). Mathematically, the model describes dynamic evolution of a population P (in our case the number of infected individuals) being controlled by the growth rate r and population capacity K due to limiting resources. In continuous time t the change of P is Initially, the growth of P is close to exponential since the term (1 − P/K) is almost one. When P becomes larger (commensurate with K) the growth rate slows down with becoming an effective instantaneous growth rate.
In discrete time, more appropriate to daily reported infection cases, the logistic model becomes where P(t) and P(t+1) are populations on consecutive days, R 0* is the growth rate (basic reproduction number in epidemiology) at the beginning of the logistic growth, and K is the limiting population. When plotted as a function of time t, both Eqs. (1) and (3) result in a classic sigmoidal curve with being the effective reproduction rate at time t.
The logistic model may be adequate for the analysis of mainland China since the country can be treated as a unit where a vast majority of cases occurred without any significant “import” or “export” of cases.
Data Analysis
We analyzed infection cases in mainland China as reported by the National Health Commission of the People’s Republic of China (www.nhc.gov.cn/xcs/xxgzbd/gzbd_index.shtml). As of the time of writing the number of cases is shown in Table 1. The data are also plotted in Figure 1. The first column in Table 1 represents days from the beginning of the outbreak. There is a considerable controversy as to the exact date of the outbreak with most reports pointing to mid-December (Li et al., 2020; Wang, Horby, Hayden, & Gao, 2020) while one analysis suggest multiple sources of original infection (Nishiura et al., 2020). Initially, the outbreak was not recognized and number of confirmed cases is not fully known (P. Wu et al., 2020). Initially, (Figure 1), the number of cases increased exponentially. This feature was also clearly recognized in a previous report (Zhao et al., 2020b).
However, starting from January 28 (day 42) there was a marked deviation from the exponential curve, in line with logistic growth. This decline of the growth rate is clearly demonstrated by a steady decline of the effective reproduction number Re (fourth column in Table 1) calculated simply as a ratio of cases on consecutive days Eq. (5) also shows a linearization of the discrete logistic model (Eq. (3)) that was used to estimate the Re (t=30) = R0* and capacity K. An example of such linearization for Period 1 is shown in Figure 2 with the initial 16 points (blue points and line in Figure 2). The same figure also shows linearization using for Period 2 (orange points and line in Figure 2). For all periods the effective reproduction numbers decrease in time.
As new data were reported daily, we followed with subsequent analysis in near-real time with three periods as shown in Table 2
In contrast, a simple linearization of Re in time (Figure 3) back-estimated the original basic reproduction number R0 at about 2.4 to 2.5, agreeing well with other values recently reported (Imai, Dorigatti, Cori, Riley, & Ferguson, 2020; Liu et al., 2020; Majumder & Mandl, 2020; Zhang & Wang, 2020a, 2020b; Zhao et al., 2020a). The discrepancy between these two estimates can be attributed a potential loss of virulence of the virus but most likely due to extreme measures to contain virus spread in China.
This linearization provides yet another method for the use of the logistic model. We use linear regression as shown in Figure 3 to calculate the effective reproduction numbers for a series of days Re* (t) with the * superscript denoting that the values are calculated from the regression. Using xsEqs. (3) and (4) the predicted values of P(t) can be calculated regressively as
Forecast and Discussion
Based on the linearization of Eq. (5), we obtained the parameters of the logistic model for three periods as shown in Table 2. The resulting prediction of future infection cases are shown in Figure 4
The model predicted further increases of the infected cases but slower than the initial exponential growth. As expected, new data resulted in further model refinement primarily in the values of K and the resulting maximum cases.
Following a parallel approach and using Eq. (6) we obtained yet another set of predictions shown in together with those from Figure 4. This approach results in a dramatically higher predictions of maximum case (at about 65,000) but remarkably shows a very close time for the peak (Figure 5).
Despite differences among different estimate, the critical feature of the model predictions is the stabilization of the total number of cases in the next several days (by mid-February) and not a dramatic exponential growth. This prediction is made purely based on the described analysis and may or may not happen in the future. One significant factor that could invalidate the predictions is a possibility of secondary or parallel outbreaks perhaps with different etiology as previously suggested for the original Wuhan outbreak (P. Wu et al., 2020)
Data Availability
not applicable - all data in public domain
Declarations
Ethics approval and consent to participate
The ethical approval or individual consent was not applicable.
Availability of data and materials
All data and materials used in this work were publicly available.
Consent for publication
Not applicable.
Funding
This work was not funded.
Disclaimer
The funding agencies had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
Conflict of Interests
The author declared no competing interests.
Footnotes
e-mail: hermanowicz{at}ce.berkeley.edu