Fine-tuned Forecasting Techniques for COVID-19 Prediction in India ================================================================== * Abhinav Gola * Ravi Kumar Arya * Animesh * Ravi Dugh * Zuber Khan ## Abstract Estimation of statistical quantities plays a cardinal role in handling of convoluted situations such as COVID-19 pandemic and forecasting the number of affected people and fatalities is a major component for such estimations. Past researches have shown that simplistic numerical models fare much better than the complex stochastic and regression-based models when predicting for countries such as India, United States and Brazil where there is no indication of a peak anytime soon. In this research work, we present two models which give most accurate results when compared with other forecasting techniques. We performed both short-term and long-term forecasting based on these models and present the results for two discrete durations. Keywords * COVID-19 * Numerical Analysis * Exponential Curve Fitting * Regression * Forecasting * India ## 1. Introduction In December 2019, some people in Wuhan, China were infected by the novel coronavirus, named 2019-nCoV and since then, this outbreak has spread to more than 200 countries all over the world. This has led the World Health Organization (WHO) to declare it as international public health emergency. Governments of the nations affected by this pandemic are running around to formulate provisions and provide resources to handle this epidemic. Forecasting the infection rate for a nation can act as a huge asset in planning and formulation of policies for such nations. While no model can accurately forecast the rates of infection and mortality, attempts have been made to consider and analyse the strengths and shortcomings of many studies and models presented regarding the coronavirus. Whereas the forecast models used by the health department or Government of India were not disclosed, we can definitely continue with existing models in separate publications. Each of these models took different approaches and techniques to predict the future rates. There has been a profusion of available mathematical techniques to predict the infection rate for the currently ongoing Covid-19 crisis. In past research [18], researchers evaluated the performance of majority of these techniques and concluded with two models which can be used for further purposes of estimating the number of cases affected by the coronavirus as these models gave the best predictions. These two models, exponential curve fitting and least square fitted model, can be used for short-term and long-term forecasting respectively. In this study, we implement these two techniques on an updated dataset taken from the official website of Ministry of Health and Family Welfare, Government of India [17]. We estimate the number of affected, death and recovered cases for 2 different durations - one from August 5 to September 3 i.e. for 4 weeks, and the other from August 5 to September 23 i.e. for 7 weeks. We believe this forecast would assist the government and certain other official authorities in preparing and organizing necessary resources to deal with this pandemic. This study is organized into five main sections. The paper starts with the general information about history and information of the disease. Section 2 provides the survey of the previously employed forecasting models to predict the confirmed cases in Indian context. We present our methodology in section 3 and discuss our findings and results in section 4. We conclude this research work in section 5 alongside providing scope for future improvements. ## 2. Related Work Research on estimation of infection rate of Covid-19 has been quite prolific. Majority of these revolve around traditional machine learning methods and neural network-based models. R. Sujath et. al [11] and Ajit Kumar Pasayat et. al [16] used linear regression models while Gaurav Pandey et, al [12] employed polynomial regression technique to predict the Coronavirus cases in its early months. R. Sujath et. al [11] also used multi-layer perceptron models alongside their stochastic vector autoregression (VAR) time-series model. Another case of using complex learning models is Anuradha Tomar et. al [14] applying a LSTM model to forecast the number of cases. In case of small epidemics, Meyers [1] studied the forecasted spread using a model of the Susceptible-Infected-Recuperated (SIR). In the simulation COVID-19 diffusion experiments, Wu et. al [2] applied the Susceptible-Exposed-Infectious-Recovered (SEIR) Model. Anastassopoulou Al. [3] performed a simulation study of situation COVID-19 at the very initial stage of pandemics, a model of susceptible-infectious-recovered-dead (SIRD) was used. Ghosh et al [4] used a pandemic model of Susceptible Infectious Susceptible (SIS) to forecast spread of the COVID-19 in India. Kumar et. al [5], in order to analyse the Indian scenario, has used the ARIMA time series analysis technique. Their predictions were very similar to the later reported actual values. Basu [8] has been researching time-based viral spread in India on his own basis. According to his predictions, in early June, total number of cases in India was estimated to cross 200,000 and that prediction was quite accurate. Sudip Ghosh et. al [20] used linear square fitted modelling while Hemanta Kumar Baruah et. al [19] fitted an exponential curve for their predictions. ## 3. Methodology ### 3.1 Short-term forecasting [Exponential Curve fitting] Short-term forecasting can be done based on elementary analytical approaches instead of diving into complex architectures like disease modelling or neural networks. Previous research [18] has shown that for shorter durations, simplistic curve fitting models achieve better accuracy than regression and pandemic models. Observing the patterns of number cases in countries such as China, Spain and Italy we can infer that the natural infection rate curve will follow a non-linear path initially till it hits its peak and begins to subside. Nations such as India, United States of America and Brazil are still in the nonlinear portion of the plot and due to the uncertainty of their peak point, forecasting for such countries can only be done for short durations. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/13/2020.08.10.20167247/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2020/08/13/2020.08.10.20167247/F1) Figure 1. Plots of cumulative cases in India Observing the pattern for India, we can discern a definite exponential trajectory, which can be exploited by studying the time series data in an inverted fashion and then instituting a numerical model established on the latter part of the data. Let *P(t)* be the total number of affected cases. *Q(t)* be the total number of death cases, and *R(t)* be the total number of recovered cases at a given time t. To verify out assumption of the curve being exponential, we took the natural logarithm of *P(t), Q(t)* and *R(t)*. The resulting plots shown in Fig. 2 are linear for each curve, thus establishing our assumption as legitimate. ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/13/2020.08.10.20167247/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2020/08/13/2020.08.10.20167247/F2) Figure 2. Plots of natural logarithm of cumulative cases in India Fitting against exponential functions is exceedingly fragile because tiny variations in the exponent can result in large differences in the result. Optimising is done across many orders of magnitude, and errors near the origin are not equally weighted compared to errors higher up the curve. The simplest way to handle this is to convert our exponential data to a linear form using a natural logarithm transformation: Considering the equations of curve to be: ![Formula][1] where a and b are constants. Taking natural logarithm of both sides: ![Formula][2] This allows us to use the linear curve fitting method instead of the slower polynomial fitting method which when employed on large values is prone to result in overflow errors. We would later transform the data back into linear space for analysis. We used the *polyfit()* function of the *numpy* module placed in *Python* and got the coefficients’ values as: ![Formula][3] rendering our equations to be: ![Formula][4] ![Formula][5] ![Formula][6] The covariance matrices obtained for each case are shown in Fig. 3. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/13/2020.08.10.20167247/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2020/08/13/2020.08.10.20167247/F3) Figure 3. Covariance matrices after curve fitting ### 3.2 Least square fitting method A methodology widely used to perform regression analysis is the least square regression method. This is a statistical technique to determine the best line of fit between an independent and a dependent variable. The ‘least-square method’ combines measurements in order to extract the parameter estimates that define the curve that best matches the results. Using the least square rule, given the set of N (noisy) measurements fi, i∈1, N, which are to be applied to the curve f(a), where ‘a’ is the vector of the parameter values, we seek to minimize the square of the difference between the measurements and the values of the curve to provide an approximation of the parameters ‘a∧’ according to ***(7)*** ![Formula][7] When we fit our data to the polynomial function graph, the polynomial curve fit is. The same technique of smallest squares is used to identify a certain degree polynomial which has a minimum overall error: ![Formula][8] where M is the order of the polynomial We obtain a fit by minimizing an error function – sum of squares of the errors between the ![Formula][9] predictions y(xn,w) for each data point xn and target value tn. Here, polynomial of degree 6 is used for fitting the dataset. ## 4. Results Observing the non-linear pattern in India’s COVID-19 infection rate, we employed curve fitting techniques to predict the number of confirmed, death and recovered cases for both short-term and long-term durations. Due to the unpredictable nature of the exponential graph, small modifications in input can lead to abrupt changes in our output. Thus, we used exponential curve fitting for short-term forecasting for a duration of 4 weeks starting from August 8, 2020 to September 4, 2020. Polynomial regression modelling is used for long-term forecasting for a duration of 7 weeks starting from August 8, 2020 to September 24, 2020. View this table: [Table 1.](http://medrxiv.org/content/early/2020/08/13/2020.08.10.20167247/T1) Table 1. Forecasted cases from August 5 to August 17 [Exponential Curve Fitting] Results for each case are presented in Tables 1 and 2 while their respective plots are demonstrated in Figs. 5 and 6. As per our forecasts, the total number of cases in India would cross 30,00,000 by August 15, 2020. By August 25, 2020 it would cross 40,00,000, and around September 1, it should exceed the 50,00,000 value. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/13/2020.08.10.20167247/F4.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2020/08/13/2020.08.10.20167247/F4) Figure 4. Forecasting plots for exponential curve fitting View this table: [Table 2.](http://medrxiv.org/content/early/2020/08/13/2020.08.10.20167247/T2) Table 2. Forecasted cases from August 17 to September 3 [Exponential Curve Fitting] ![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/08/13/2020.08.10.20167247/F5.medium.gif) [Figure 5.](http://medrxiv.org/content/early/2020/08/13/2020.08.10.20167247/F5) Figure 5. Forecasting plots for least squared error fitting View this table: [Table 3.](http://medrxiv.org/content/early/2020/08/13/2020.08.10.20167247/T3) Table 3. Forecasting from August 8, 2020 to August 17, 2020 [Least Squared Error Fitting] View this table: [Table 4.](http://medrxiv.org/content/early/2020/08/13/2020.08.10.20167247/T4) Table 4. Forecasting from August 18, 2020 to August 31, 2020 [Least Squared Error Fitting] View this table: [Table 5.](http://medrxiv.org/content/early/2020/08/13/2020.08.10.20167247/T5) Table 5. Forecasting from 1 September 2020 to 24 September 2020 [Least Squared Error Fitting] ## 5. Conclusion Building upon the previous research [18], current study implemented two numerical models to forecast the number of cases related to COVID-19 in India, namely – exponential curve fitting and least square fitted model. Both of the models forecasted an upward of 30 lakhs cases and 40,000 deaths for the upcoming months. Unless there is a sudden peak in the graph and it begins to subside, we are going to face an enormous challenge to handle this pandemic. To prevent a dearth of required resources, government and official organisations should plan factoring in the forecasted cases. This study can be expanded to establish other mathematical and regression techniques for the forecasting of the COVID-19 cases in future. This would be essential in having a diverse assortment of prediction techniques to consider while developing new policies. ## Data Availability N/A * Received August 10, 2020. * Revision received August 10, 2020. * Accepted August 13, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/) ## 6. References 1. 1. L. A. Meyers, Contact network epidemiology bond percolation applied to infectious disease prediction and control, Bulletin (New Series) of the American Math Soc, 44(1) (2007) 63–86. 2. 2. J. T. Wu, K. Leung, G. M. Leung, Nowcasting and forecasting the potential domestic and international spread of the 2019 nCoV outbreak originating in Wuhan, China: A modelling study, The Lancet, 395(10225)(2020)689–97. 3. 3. C. Anastassopoulou, L. Russo, A. Tsakris, Data-based analysis, modelling and forecasting of the COVID-19 outbreak, PLoS ONE, 15(3) (2020) 0230405. 4. 4. P. Ghosh, R. Ghosh, B. Chakraborty, COVID-19 in India: State-wise analysis and prediction, medRxiv preprint, doi: [https://doi.org/10.1101/2020.04.24.20077792](https://doi.org/10.1101/2020.04.24.20077792). 5. 5. P. Kumar, R. K. Singh, C. Nanda. et al. Forecasting COVID-19 impact in India using pandemic waves nonlinear growth models, medRxiv preprint doi: [https://doi.org/10.1101/2020.03.30.2004703](https://doi.org/10.1101/2020.03.30.2004703). 6. 6. N. Poonia, S. Azad, Short term forecasts of COVID-19 spread across Indian States until May 1, 2020, arXiv: 2004.13538v2[q.bio.PE]. 7. 7. S. Azad, N. Poonia, Short term forecasts of COVID-19 spread across Indian States until 29 May, 2020, under the worst-case scenario, Preprints 202000491. [https://doi.org/10.20944/preprints202004.0491.v1](https://doi.org/10.20944/preprints202004.0491.v1) 8. 8. S. Basu, Model based case studies in the UK, the USA and India, medRxiv preprint doi: [https://doi.org/10.1101/2020.05.31.20118760](https://doi.org/10.1101/2020.05.31.20118760). Posted on June 3, 2020. 9. 9.Worldometers.info. Total coronavirus cases in India, Publishing Date: June 10, 2020. Place of Publication: Dover, Delaware, U. S. A. 10. 10. H. K. Baruah, The current COVID-19 spread pattern in India, medRxiv preprint doi: [https://doi.org/10.1101/2020.06.03.20121210](https://doi.org/10.1101/2020.06.03.20121210). Posted on June 8, 2020. 11. 11. R. Sujath, Jyotir Moy Chatterjee & Aboul Ella Hassanien, “A machine learning forecasting model for COVID-19 pandemic in India” Stochastic Environmental Research and Risk Assessment volume 34, pages 959–972(2020), doi: 10.1007/s00477-020-01827-8 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00477-020-01827-8&link_type=DOI) 12. 12. Gaurav Pandey, Poonam Chaudhary, Rajan Gupta, Saibal Pal, “SEIR and Regression Model based COVID-19 outbreak predictions in India”, doi: 10.1101/2020.04.01.20049825 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNC4wMS4yMDA0OTgyNXYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDgvMTMvMjAyMC4wOC4xMC4yMDE2NzI0Ny5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 13. 13. Sunita Tiwari, Sushil Kumar, Kalpna Guleria, “Outbreak trends of CoronaVirus (COVID-19) in India: A Prediction” Disaster Med Public Health Prep. 2020 Apr 22: 1–6., doi: 10.1017/dmp.2020.115 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/dmp.2020.115&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32317044&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F13%2F2020.08.10.20167247.atom) 14. 14. Anuradha Tomar, Neeraj Gupta. “Prediction for the spread of COVID-19 in India and effectiveness of preventive measures” Science of The Total Environment Volume 728, 1 August 2020, 138762, doi: 10.1016/j.scitotenv.2020.138762 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.scitotenv.2020.138762&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32334157&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F08%2F13%2F2020.08.10.20167247.atom) 15. 15. Rohit Salgotra, Mostafa Gandomi, Amir H Gandomi, “Time Series Analysis and Forecast of the COVID-19 Pandemic in India using Genetic Programming” Chaos, Solitons & Fractals Volume 138, September 2020, 109945, doi: 10.1016/j.chaos.2020.109945 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.chaos.2020.109945&link_type=DOI) 16. 16. Ajit Kumar Pasayat, Satya Narayan Pati, Aashirbad Maharana, “Predicting the COVID-19 positive cases in India with concern to Lockdown by using Mathematical and Machine Learning based Models” doi: 10.1101/2020.05.16.20104133 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNS4xNi4yMDEwNDEzM3YxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDgvMTMvMjAyMC4wOC4xMC4yMDE2NzI0Ny5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 17. 17.[https://www.mohfw.gov.in/](https://www.mohfw.gov.in/) 18. 18. Abhinav Gola, Ravi Kumar Arya, Animesh, Ravi Dugh, “Review of Forecasting Models for Coronavirus (COVID-19) Pandemic in India during Country-wise Lockdown,” medRxiv preprint doi: [https://doi.org/10.1101/2020.08.03.20167254](https://doi.org/10.1101/2020.08.03.20167254) 19. 19. Hemanta Kumar Baruah, “Nearly Perfect Forecasting of the Total COVID-19 Cases in India: A Numerical Approach”, doi: 10.1101/2020.06.13.20130096 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNi4xMy4yMDEzMDA5NnYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDgvMTMvMjAyMC4wOC4xMC4yMDE2NzI0Ny5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 20. 20. Mr. Sudip Ghosh, “An Overview: Situation Assessment and Prediction of Corona Virus in India” Mukt Shabd Journal Volume IX Issue V, MAY/2020 Issn No: 2347–3150 [1]: /embed/graphic-3.gif [2]: /embed/graphic-4.gif [3]: /embed/graphic-5.gif [4]: /embed/graphic-6.gif [5]: /embed/graphic-7.gif [6]: /embed/graphic-8.gif [7]: /embed/graphic-10.gif [8]: /embed/graphic-11.gif [9]: /embed/graphic-12.gif