Estimation of the final size of the COVID-19 epidemic ===================================================== * Milan Batista ## Abstract In this short paper, the logistic growth model and classic susceptible-infected-recovered dynamic model are used to estimate the final size of the coronavirus epidemic. ## 1 Introduction One of the common questions regarding an epidemic is its final size. To answer this question various models are used: analytical (Danby 1985, Brauer 2019a, b, Murray 2002), stochastic (Miller 2012), and phenomenological (Fisman D 2014, Pell et al. 2018). In this note, we attempt to estimate the final epidemic size using the phenomenological logistic growth model (Pell et al. 2018, Chowell G 2014) and the classic susceptible-infected-recovered (SIR) model (Hethcote 2000). With both the models, we obtain a series of daily predictions. The final sizes are then predicted using iterated Shanks transformation (Shanks 1955, Bender and Orszag 1999). The data used for the calculations are taken from *worldmeters* 1. Before proceeding, we note that the final size of the epidemic in its early stage was discussed by Wu et al. (Wu, Leung, and Leung 2020) using the susceptible-exposed-infected-resistant model, by Xiong and Yan (Xiong and Yan 2020) using the exposed-infected-resistant model, by Nesteruk (Nesteruk 2020) using the SIR model, and by Anastassopoulou et al. (Anastassopoulou et al. 2020) using the SIR/death model. These early predictions range from 65000 to a million cases. Roosa et al recently gave short-term forecasts of the epidemic (Roosa et al. 2020). ## 2 Logistic growth model The logistic growth model originates from population dynamics (Haberman 1998). The underlying assumption of the model is that the rate of change in the number of new cases per capita linearly decreases with the number of cases. Hence, if *C* is the number of cases, and *t* is the time, then the model is expressed as ![Formula][1] where *r* is infection rate, and *K is* the final epidemic size. If *C* (0) = *C* is the initial number of cases, then the solution of (1) is ![Formula][2] where ![Graphic][3]. The growth rate, ![Graphic][4], reaches its maximum when ![Graphic][5]. From this condition, we obtain that the growth rate peaks at time tp. ![Formula][6] At this time, the number of cases and growth rate are ![Formula][7] Now, if *C*1, *C*2, …,*C*n are the number of cases at times *t*1, *t*2, …,*t*n, then the final size predictions of the epidemic based on these data are *K*1, *K*2, …, *K*n. By using Shanks transformation, the predicted final epidemic size is ![Formula][8] For the practical calculation of the parameters *K* and *r*, we use the MATLAB functions *lsqcurvefit* and *fitnlm*. ## 3 SIR model The model equations are ![Formula][9] ![Formula][10] ![Formula][11] where *t* is time, *S* (*t*) is the number of susceptible persons at time *t, I* = *I* (*t*) is the number of infected persons at time *t, R* (*t*) is the number of recovered persons in time *t, β* is the contact rate, and 1 *ϒ* is the average infectious period. From (1), (2), and (3) we obtain the total population size, *N*. ![Formula][12] The initial conditions are *S* (0) = *S*, *I* (0) = *I* , and *R* (0) = *R*. Eliminating *I* from (1) and (3) yields ![Formula][13] In the limit *t* → ∞, the number of susceptible people left, *S*∞, is ![Formula][14] where *R*∞is the final number of recovered persons. As the final number of infected people is zero, we have, using (4), ![Formula][15] From this and (6), the equation for *R*∞is ![Formula][16] To use the model, we must estimate the model parameters *β, ϒ*, and the initial values *S* and *I* from the available data (we set *R* = 0 and *I* *= C*). Now the available data is a time series of the total number of cases *C*, i.e., ![Formula][17] We can estimate the parameters and initial values by minimizing the difference between the actual and predicted number of cases, i.e., by minimizing ![Formula][18] Where *C**t* = (*C*1,*C*2,…,*C**n*) are the number of cases at times *t*1, *t*2, …,*t*n and ![Graphic][19] are the corresponding estimates calculated by the model. For practical calculation, we use the MATLAB function *fminsearch*. For the integration of the model equation, we use the MATLAB function *ode45*. With a series of predicted final number of recovered persons, *R*∞,1, *R*∞,2,…, *R*∞,*n*, we can estimate the series limit by Shanks transformation. ![Formula][20] ## 4 Results The results of logistic regression and the SIR model simulation are given in Tables 1 and 2, respectively. The comparison of the predicted final sizes is shown in the graph in Figure 1. We see that both methods converge and with more data, the discrepancy between the predicted values becomes less than 5%. From Table 1, we see that the peak of the epidemic was probably on 9 Feb, 2020. View this table: [Table 1.](http://medrxiv.org/content/early/2020/02/28/2020.02.16.20023606/T1) Table 1. Data and results of logistic regression (see Eqs. (2), (3), (4)) View this table: [Table 2.](http://medrxiv.org/content/early/2020/02/28/2020.02.16.20023606/T2) Table 2. Results of SIR simulations. After day 28, the method of data collection changes. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/02/28/2020.02.16.20023606/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/02/28/2020.02.16.20023606/F1) Figure 1. Evaluation of estimated final size of coronavirus epidemic (data until 20 Feb, 2020) In Figure 2, the time evaluation of the cases is shown, where we can see a good agreement between the models and the actual data. From Table 3, we see that the logistic regression model has a high coefficient of determination of 0.996, while the p-value (< 0.000) indicates that all the regression parameters are statistically significant. View this table: [Table 3.](http://medrxiv.org/content/early/2020/02/28/2020.02.16.20023606/T3) Table 3. Estimated logistic model parameters for data until 25 Feb, 2020 View this table: [Table 3.](http://medrxiv.org/content/early/2020/02/28/2020.02.16.20023606/T4) Table 3. Iterated Shanks transformation for logistic model ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/02/28/2020.02.16.20023606/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2020/02/28/2020.02.16.20023606/F2) Figure 2. Predicted evaluation of coronavirus epidemic (data until 20 Feb, 2020) In Tables 3 and 4, the iterated Shanks transformations for the predicted series of the final epidemic size are given. It appears that the predictions of the logistic model tend to the final size of 83231 cases, while the SIR model predictions converge to 83640 cases. Thus, the discrepancy is less than 0.5%. View this table: [Table 4.](http://medrxiv.org/content/early/2020/02/28/2020.02.16.20023606/T5) Table 4. Iterated Shanks transformation for SIR model ## 5 Short term forecasting The models used are data-driven, so they are as reliable as data are. Namely, as can be seen from the graph in Figure 2 at the beginning, we have exponential growth. Then until 11 Feb, one can predict the final epidemic size of about 55000 cases. However, the collection of data changes and we have a jump of about 15000 new cases on 12. Feb. On 20 Feb we have another change in trend; the data begin to shows almost linear trend (See Fig 3). While the above models show that the epidemic is slow down, the linear trend predicts about 873 new cases per day (see Table 5). View this table: [Table 5.](http://medrxiv.org/content/early/2020/02/28/2020.02.16.20023606/T6) Table 5. Short term forecasting with the logistic and linear model. The linear model predicts 873 new cases per day. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/02/28/2020.02.16.20023606/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2020/02/28/2020.02.16.20023606/F3) Figure 3. Short-term forecasting from 20 Feb 2020 ## 6 Conclusion On the basis of the available data, we can now predict that the final size of the coronavirus epidemic using the logistic model will be approximately 83700 (±1300)?cases and that the peak of the epidemic was on 9 Feb 2020. A more optimistic final size of 83300 cases is obtained using the Shanks transformation. Similar figures are obtained using the SIR model, where the predicted size of the epidemic is approximately 84500, and the Shanks transformation lowers this number to about 83700 cases. Naturally, the degree of accuracy of these estimates remains to be seen. In conclusion, qualitatively, both models show that the epidemic is moderating, but recent data show a linear upward trend. The next few days will, therefore, indicate in which direction the epidemic is heading. PS. Today it is more or less clear that the predictions of the article apply only to China. By February 20, 99% of the case was from China. The linear trend in data from Feb 20 onward meant a decreasing number of infected in China and increasing infected elsewhere in the world. In other words, in China, the epidemic is slowing down, however, it is now developing elsewhere in the world. We note that the forecasting methods used in this article are inapplicable in the early stages of an epidemic. ## Data Availability The data used are from https://www.worldometers.info/coronavirus/ ## Footnotes * 1 [https://www.worldometers.info/coronavirus/](https://www.worldometers.info/coronavirus/) * Received February 16, 2020. * Revision received February 28, 2020. * Accepted February 28, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. Anastassopoulou, Cleo, Lucia Russo, Athanasios Tsakris, and Constantinos Siettos. 2020. “Data-Based Analysis, Modelling and Forecasting of the novel Coronavirus (2019-nCoV) outbreak.” medRxiv:2020.02.11.20022186. doi: 10.1101/2020.02.11.20022186. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wMi4xMS4yMDAyMjE4NnY1IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDIvMjgvMjAyMC4wMi4xNi4yMDAyMzYwNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 2. Bender, Carl M., and Steven A. Orszag. 1999. Advanced mathematical methods for scientists and engineers I asymptotic methods and perturbation theory. New York: Springer. 3. Brauer, Fred. 2019a. “Early estimates of epidemic final sizes.” Journal of Biological Dynamics 13 (up1):23–30. doi: 10.1080/17513758.2018.1469792. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/17513758.2018.1469792&link_type=DOI) 4. Brauer, Fred. 2019b. “The Final Size of a Serious Epidemic.” Bulletin of mathematical biology 81 (3):869–877. doi: 10.1007/s11538-018-00549-x. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s11538-018-00549-x&link_type=DOI) 5. Chowell G, Simonsen L, Viboud C, Kuang Y. 2014. “West Africa Approaching a Catastrophic Phase or is the 2014 Ebola Epidemic Slowing Down? Different Models Yield Different Answers for Liberia.” PLOS Currents Outbreaks. doi: 10.1371/currents.outbreaks.b4690859d91684da963dc40e00f3da81. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/currents.outbreaks.b4690859d91684da963dc40e00f3da81&link_type=DOI) 6. Danby, J. M. A. 1985. Computing applications to differential equations modelling in the physical and social sciences. Reston, Va.: Reston Publishing Company. 7. Fisman D, Khoo E, Tuite A.. 2014. “Early Epidemic Dynamics of the West African 2014 Ebola Outbreak: Estimates Derived with a Simple Two-Parameter Model.” PLOS Currents Outbreaks.. doi: 10.1371/currents.outbreaks.89c0d3783f36958d96ebbae97348d571. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/currents.outbreaks.89c0d3783f36958d96ebbae97348d571&link_type=DOI) 8. Haberman, Richard. 1998. Mathematical models mechanical vibrations, population dynamics, and traffic flow an introduction to applied mathematics. Unabridged republication ed, Classics in applied mathematics. Philadelphia: SIAM. 9. Hethcote, Herbert W. 2000. “The Mathematics of Infectious Diseases.” SIAM Review 42 (4):599–653. doi: 10.1137/S0036144500371907. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1137/S0036144500371907&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000165673600003&link_type=ISI) 10. Miller, Joel C. 2012. “A note on the derivation of epidemic final sizes.” Bulletin of mathematical biology 74 (9):2125–2141. doi: 10.1007/s11538-012-9749-6. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s11538-012-9749-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22829179&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F02%2F28%2F2020.02.16.20023606.atom) 11. Murray, James Dickson. 2002. Mathematical biology. 3rd ed, Interdisciplinary applied mathematics. New York: Springer. 12. Nesteruk, Igor. 2020. “Statistics based predictions of coronavirus 2019-nCoV spreading in mainland China.” medRxiv:2020.02.12.20021931. doi: 10.1101/2020.02.12.20021931. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wMi4xMi4yMDAyMTkzMXYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDIvMjgvMjAyMC4wMi4xNi4yMDAyMzYwNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 13. Pell, Bruce, Yang Kuang, Cecile Viboud, and Gerardo Chowell. 2018. “Using phenomenological models for forecasting the 2015 Ebola challenge.” Epidemics 22:62–70. doi: [https://doi.org/10.1016/j.epidem.2016.11.002](https://doi.org/10.1016/j.epidem.2016.11.002). 14. Roosa, K. Y.; Lee, R.; Luo, A.; Kirpich, R.; Rothenberg, J.M.; Hyman P.; Yan, and G. Chowell. 2020. “Short-term Forecasts of the COVID-19 Epidemic in Guangdong and Zhejiang, China.” Journal of Clinical Medicine 9 (2):596. 15. Shanks, Daniel. 1955. “Non-linear Transformations of Divergent and Slowly Convergent Sequences.” Journal of Mathematics and Physics 34 (1-4):1–42. doi: 10.1002/sapm19553411. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sapm19553411&link_type=DOI) 16. Wu, Joseph T., Kathy Leung, and Gabriel M. Leung. 2020. “Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study.” The Lancet. doi: [https://doi.org/10.1016/S0140-6736(20)30260-9](https://doi.org/10.1016/S0140-6736(20)30260-9). 17. Xiong, Hao, and Huili Yan. 2020. “Simulating the infected population and spread trend of 2019-nCov under different policy by EIR model.” medRxiv:2020.02.10.20021519. doi: 10.1101/2020.02.10.20021519. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wMi4xMC4yMDAyMTUxOXYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDIvMjgvMjAyMC4wMi4xNi4yMDAyMzYwNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) [1]: /embed/graphic-1.gif [2]: /embed/graphic-2.gif [3]: /embed/inline-graphic-1.gif [4]: /embed/inline-graphic-2.gif [5]: /embed/inline-graphic-3.gif [6]: /embed/graphic-3.gif [7]: /embed/graphic-4.gif [8]: /embed/graphic-5.gif [9]: /embed/graphic-6.gif [10]: /embed/graphic-7.gif [11]: /embed/graphic-8.gif [12]: /embed/graphic-9.gif [13]: /embed/graphic-10.gif [14]: /embed/graphic-11.gif [15]: /embed/graphic-12.gif [16]: /embed/graphic-13.gif [17]: /embed/graphic-14.gif [18]: /embed/graphic-15.gif [19]: /embed/inline-graphic-4.gif [20]: /embed/graphic-16.gif