Adaptive short term COVID-19 prediction for India

Shuvrangshu Jana; Debasish Ghose

doi:10.1101/2020.07.18.20156745

Abstract

In this paper, a data-driven adaptive model for infection of COVID-19 is formulated to predict the confirmed total cases and active cases of an area over 4 weeks. The parameter of the model is always updated based on daily observations. It is found that the short term prediction of up to 3-4 weeks can be possible with good accuracy. Detailed analysis of predicted value and the actual value of confirmed total cases and active cases for India from 1^st June to 3^rd July is provided. Prediction over 7, 14, 21, 28 days has the accuracy about 0.73% ± 1.97%, 1.92% ± 2.95%, 4.34% ± 3.91%, 6.40% ± 9.26% of the actual value of confirmed total cases. Similarly, the 7, 14, 21, 28 days prediction has the accuracy about 1.24% ± 6.57%, 3.04% ± 10.00%, 6.33% ± 16.12%, 10.20% ± 24.14% of the actual value of confirmed active cases.

1. Introduction

The accurate predictions of the spread of the novel coronavirus (COVID-19) are essential for planning and management of medical resources as well as lockdown strategy. A good prediction can avoid the shortage of critical medical resources, economical loss associated with unnecessary lockdowns. Mathematical modelling of infectious disease is generally performed by classifying the population into different compartments such as Susceptible, Infectious, Recovered, Deceased. Various classical compartmental model reported in literature such as SIR, SIS, SIRD, MSIR, SEIR, SEIS, MSEIR, MSEIRS. In order to model the spreading of corona virus, various modification of the SIR model is proposed in literature to include travel information [1], behavioural changes [2], various intervention measures such as quarantine, strict social distancing, tracing and isolation [3],[4], [5]. The model parameters are generally obtained for a specific area as there can be huge variation in these parameters based on the government policies, economical factors, adoption of social distancing measures. Prediction for the Spread of COVID-19 for a specific country is developed such for India [6, 7], China [8, 9, 10], South Korea-Italy [1], Iran [11],Korea [2], UK [12].

The estimation of spread of COVID-19 is made based on the assumption on rate of growth of total cumulative cases following a certain pattern such as gaussian [13], Verhulst equation [14], Lotka-Volterra dynamic model[15], Gompertz equation [16]. Estimation for Greece, Netherlands, Germany, Italy, Spain, France, United Kingdom and United States is developed using Gaussian fitting hypothesis based on the assumptions that evolutions of infected cases are of Gaussian in nature [13]. In [17], estimations for COVID-19 is performed from the fatality data for India under the assumption that the growth of infected cases is exponential. In [15], Lotka-Volterra dynamic model is used to represent the growth rate and the model coefficients are derived using the Extended Kalman filter techniques. In [18], Kalman filter-based techniques are used for short term prediction model.

Time-invariant SIR model is not effective to predict the spread of corona virus[19], considering the fact the large variation in the transmission dynamics. Also, it is difficult to fit a single model in different areas. So, Prediction models are developed based on the time series analysis of the reported data, where the model parameters c are estimated based on the previous observations of the confirmed cases. Data-driven estimation methods using long short-term memory (LSTM) and curve fitting is used for prediction of COVID-19 cases for India in 30 days ahead [20]. Deceased cases of COVID-19 prediction for one month is developed using a hybrid model using discrete wavelet decomposition and autoregressive integrated moving average (ARIMA) models [21]. In [22], Autoregressive Integrated moving average model is used to predict the daily number of COVID-9 cases up to 4 weeks. A mathematical model for prediction is developed using power series polynomials, where the coefficients of the polynomials are obtained using the least square approximations.

Generally, it is observed that it is difficult to predict in the long term due to the various uncertainties involved in the government decision-making process, social distancing measured followed by the people, availability of medical resources. There is a time lag between the time of infection and the time of reporting of infection. So in short term prediction has less uncertainty compared to the long term. A good general prediction model should use information about the local state as less as possible.

In this paper, a short term prediction model is proposed based on the concept of SIR model, where the parameters are updated continuously based on the observations using weighted least square techniques. It is observed that the spread of COVID-19, recovery and deceased rate can be approximated using time-varying polynomials. The proposed model is validated by comparing the true and actual value of June for India over the past predictions on time different time windows such as 7 days, 14 days, 21 days, and 28 days. In case of 2 days time window, It is found that error in prediction is within 6.40 % ± 9.26 % and 10.22 % ± 24.14 % for the prediction of total cases and active cases.

The rest of the paper is described as follows. The basic model is described in Section 2. The model parameters are estimated in Section 3. The proposed model is validated with COVID-19 statistics of India in Section 5. Expected total and active cases for a month are predicted in Section 7. The prediction results are discussed and concluded in Discussions and Conclusions Sections respectively.

2. Model formulation

The total number of people who have been infected with the virus can be classified as active, recovered or deceased cases. Few cases will be unreported due to non-development of symptoms in the infected person. In Fig. 1, different categories of different cases are shown. We will consider the population as susceptible (S), Infectious (I), and Removed (as shown in Fig. 2). Infectious cases are divided into two categories, active cases(A) and active unreported cases A_ur. Similarly, recovered and deceased cases are also classified as those who are reported and unreported cases. So, if N is the total population,then,

Figure 1:

Figure 2:

Different compartment

where, R, R_ur are the reported recovered and unreported recovered cases respectively. Similarly, D, D_ur are the reported and unreported deceased cases respectively. Also, As per standard SIR model, If β is the infection rate, then, Also, we can write, where γ(t) and µ(t) are the rate of recovered and death among the reported cases; and γ_ur and µ_ur are the rate of recovered and death cases among unreported cases.

3. Parameter estimation

The different parameter of the above model is estimated from the observed data and dynamically adjusted using the daily observations. The active cases, recovered and deceased from the reported cases are readily available; whereas, the similar statistics for the unreported cases need to be estimated from random antibody tests or serological tests. The confirmed active cases at each day are obtained by the following equation where, T is the total number of reported infected person. Then we have, From the previous reported daily statistics of daily new infected cases, recovered cases, and deceased cases; the rate of growth of the total number of cases, recovered cases and deceased cases are obtained. The rate of growth of total cases are obtained by taking derivative of the variable T. T can be represented by different functions. In the case of India, it is observed that T can be approximated using a fourth-order polynomial. Let consider T be, The coefficients (a_T, b_T, c_T, d_T, e_T) of the function T is obtained using the weighted least squares method by minimizing the error between the predicted value and the observed daily values. The weighted least squares method is used to provide more weightage on the recent values which in turn reflects the status of lockdown, social distancing measured followed by the citizens. The following cost function is used. where y_t is the observed cumulative total cases, T is expressed as T = Xζ, and ζ are the coefficients of the polynomial. The weights (w_t are selected from the previous observations as 1(last 7 days), 0.9 (last 7-14 days), 0.8 (last 14-21 days), 0.5 (last 21-28 days) and 0.2 for further previous days.

The weights need to be selected properly for an area considering the lockdown/unlock period. The optimum values of ζ are calculated from the equation 14 as follows. where W is the diagonal matrix consist of weights (w_t). It is observed that reported recovered and deceased cases can be approximated using the fourth-order and second-order polynomial respectively.

4. Prediction

Prediction values are updated based on the current observations. The data of total cases recovered cases, and deceased cases from 14^th March to 3^rd July for India is considered in the analysis. The data is collected from the “https://api.covid19india.org/” website, where the daily COVID-19 updates of various states and central government’s of India are recorded. The variation of total cases, cumulative recovered cases, cumulative deceased cases, and corresponding active cases on daily basis are plotted in Figure 3a, Figure 3b, Figure 3c, Figure 3d respectively. Based on the observations till 3^rd July, the prediction for the next 14 days is shown in Figure 4a and Figure 4b. The variation of coefficients of the prediction curve of the total cases is tabulated in Table 1.

View this table:

Table 1:

Total cases prediction curve coefficients

Figure 3:

Total case, Recovered case, Deceased case, Active Case

Figure 4:

Prediction of total cases and active cases

5. Model Validation

5.1. Total case prediction validation

The difference between the predicted value and the actual value is compared to check the efficiency of the prediction algorithm. The predicted value on the 7^th, 14^th, 21^th and 28^th days back is considered for comparison. The predictions from June 1^st to July 3^rd are considered for detailed analysis. The total case prediction over a 7 days time window and the corresponding actual value is shown in Table 2. In 26^th June, the total cases prediction for 3^rd July was 61130 and the actual value reported is 627065. So, the error in prediction is 2.54 % of the actual value of 627065. From Table 2, over the complete duration of June 1^st to July 3^rd, the error between the actual and predicted value is 0.73 % ± 1.97 % of the actual value.

View this table:

Table 2:

Validation of prediction for 7 days window

The predicted and actual values over a 14 days prediction window from June 1^st is tabulated in Table 3. As per Table 3 In case of prediction on 14 days duration, the error between the actual and the predicted value is found to be 1.92 % ± 2.95 % of the actual value. Similary, prediction over 21 days and 28 days time window are tabulated in 4 and 5. The error between the predicted value and the actual value over 21 days and 28 days is 4.34 % ± 3.91 % and 6.40 % ± 9.26 % of the actual value respectively.

View this table:

Table 3:

Validation of prediction for 14 days window

View this table:

Table 4:

Validation of prediction for 21 days window

View this table:

Table 5:

Validation of prediction for 28 days window

The error in prediction from mid-April to July 3^rd is plotted in Figure 5a. In Figure5a, magenta points show the predicted value on 7 days back and the black points are the predicted value on that day. At each day, the difference between the black and magenta point is the difference in predicted and the actual value. Similar plots for the prediction window of 14 days, 21 days, and 28 days are plotted in the Figure 5b, Figure 5c, and Figure 5d respectively.

Figure 5:

Validation of prediction of total cases

5.2. Active case prediction validation

The prediction results in case of active cases are compared with the actual value of the active cases. The prediction values and the error in prediction of active cases over 7 days, 14 days, 21 days, 28 days time duration are tabulated in Table 6, Table 7, Table 8, and Table 9 respectively. The difference between the predicted active case and the actual active case is found to be 1.24 % ± 6.57 %, 3.04 % ± 10.00 %, 6.33 % ± 16.12 %, 10.20 % ± 24.14 % for the prediction window of 7 days, 14 days, 21 days, 28 days respectively. The error in active case prediction from mid April to July 3^rd over different time window is plotted in Figure 6a, Figure 6b, Figure 6c, and Figure 6d.

View this table:

Table 6:

Validation of active case prediction for 7 days window

View this table:

Table 7:

Validation of active case prediction for 14 days window

View this table:

Table 8:

Validation of active case prediction for 21 days window

View this table:

Table 9:

Validation of active case prediction for 28 days window

Figure 6:

Validation of prediction of active cases

6. Discussions

The prediction is good in the smaller time window and error between the actual and predicted case grows with larger time window prediction. The prediction error from June 1^st to 3^rd July is tabulated in Table 10. The prediction is better in case of total cases compared to active cases. The reason behind the higher value of error bound could be the frequent change in discharge policy by the various state governments to adjust the growth of active patients in hospitals, which in turn affected the recovery rates. Also sometimes the deceased cases are adjusted in a single day after detailed accounting which caused a high jump in the deceased curve. Discharge policy by the governments and reporting of deceased cases have caused the large variation in the actual value of active cases over daily-basis. However, the reporting of the total case is smooth (Figure 3a); therefore, the bound of prediction error is also less.

View this table:

Table 10:

Summary of prediction error

7. Future prediction

Future predictions of total cases of India from the proposed algorithm over different time windows are provided in Table 11, Table 12, Table 13, and Table 14. We will compare this table in future to further validate our algorithm. Similar predictions for the active cases are tabulated in Table 15, Table 16, Table 17, and Table 18. The prediction for next 28 days based on the upto date observations are also included in Table 19 for reference.

View this table:

Table 11:

Future Prediction of total cases for 7 days window

View this table:

Table 12:

Future prediction of total cases for 14 days window

View this table:

Table 13:

Future prediction of total cases for 21 days window

View this table:

Table 14:

Future prediction of total cases for 28 days window

View this table:

Table 15:

Future prediction of active cases for 7 days window

View this table:

Table 16:

Future prediction of active cases for 14 days window

View this table:

Table 17:

Future prediction of active cases for 21 days window

View this table:

Table 18:

Future prediction of active cases for 28 days window

View this table:

Table 19:

Future prediction of active and total cases on 13.07.2020

8. Conclusions

In this paper, a data-driven prediction model is proposed to predict the total cases and active cases of an area. The model can apply to an area. It is observed that the error between the actual value and predicted value is 4.34 % ± 3.91 % and 6.33 % ± 16.12 % for total cases and active cases respectively over the prediction window of 21 days. We have also provided the prediction for the next 28 days from today for further validation of the proposed algorithm. The short term prediction can be used in the allocation of scar medical resources among different units, optimal lockdown planning.

Data Availability

All data is available in open source.

Funding

Work is not funded by any funding agency.

Conflicts of interest

Authors have no conflict of interest.

ACKNOWLEDGEMENTS

The authors would like to thank Mayur Shewale and GCDSL lab members for their suggestions in data processing and analysis of results.

References

[1].↵
C. Zhan, C. K. Tse, Z. Lai, T. Hao, J. Su, Prediction of covid-19 spreading profiles in south korea, italy and iran by data-driven coding, medRxiv.
[2].↵
S. Kim, Y. B. Seo, E. Jung, Prediction of covid-19 transmission dynam- ics using a mathematical model considering behavior changes in korea., Epidemiology and Health 42 (2020) 1–6.
OpenUrl
[3].↵
M. Mandal, S. Jana, S. K. Nandi, A. Khatua, S. Adak, T. Kar, A model based study on the dynamics of covid-19: Prediction and control., Chaos Solitons and Fractals 136 (2020) 109889–109889.
OpenUrl
[4].↵
W. C. Roda, M. B. Varughese, D. Han, M. Y. Li, Why is it difficult to ac- curately predict the covid-19 epidemic?, Infectious Disease Modelling 5 (5) (2020) 271–281.
OpenUrl
[5].↵
A. Rajesh, H. Pai, V. Roy, S. Samanta, S. Ghosh, Covid-19 prediction for india from the existing data and sir(d) model study, medRxiv.
[6].↵
S. Mondal, S. Ghosh, Possibilities of exponential or sigmoid growth of covid19 data in different states of india, medRxiv.
[7].↵
J. Singh, P. K. Ahluwalia, A. Kumar, Mathematical model based covid-19 prediction in india and its different states, medRxiv.
[8].↵
Y. Zhang, C. You, Z. Cai, J. Sun, W. Hu, X.-H. Zhou, Prediction of the covid-19 outbreak based on a realistic stochastic model, medRxiv.
[9].↵
A. J. Kucharski, T. W. Russell, C. Diamond, Y. Liu, J. Edmunds, S. Funk, R. M. Eggo, F. Sun, M. Jit, J. D. Munday, N. Davies, A. Gimma, K. van Zandvoort, H. Gibbs, J. Hellewell, C. I. Jarvis, S. Clifford, B. J. Quilty, N. I. Bosse, S. Abbott, P. Klepac, S. Flasche, Early dynamics of transmission and control of covid-19: a mathematical modelling study., Lancet Infectious Diseases 20 (5) (2020) 553–558.
OpenUrl CrossRef PubMed
[10].↵
L. Peng, W. Yang, D. Zhang, C. Zhuge, L. Hong, Epidemic analysis of covid-19 in china by dynamical modeling., arXiv preprint arxiv:2002.06563.
[11].↵
B. Zareie, A. Roshani, M. A. Mansournia, M. A. Rasouli, G. Moradi, A model for covid-19 prediction in iran based on china parameters., Archives of Iranian Medicine 23 (4) (2020) 244–248.
OpenUrl
[12].↵
J. D. Annan, J. C. Hargreaves, Model calibration, nowcasting, and opera- tional prediction of the covid-19 pandemic, medRxiv.
[13].↵
G. Barmparis, G. Tsironis, Estimating the infection horizon of covid-19 in eight countries with a data-driven approach, Chaos, Solitons and Fractals (2020) 109842.
[14].↵
S. Sanchez-Caballero, M. A. Selles, M. A. Peydro, E. Perez-Bernabeu, An efficient covid-19 prediction model validated with the cases of china, italy and spain: Total or partial lockdowns?, Journal of Clinical Medicine 9 (5) (2020) 1547–1547.
OpenUrl
[15].↵
A. B. Younes, Z. Hasan, Covid-19: Modeling, prediction, and control, Ap- plied Sciences 10 (11) (2020) 3666.
OpenUrl
[16].↵
M. C. Sabat, S. A. Muoz, E. lvarez Lacalle, D. L. Codina, P. J. C. Iglesias, C. P. Soler, Empiric model for short-time prediction of covid-19 spreading, PLOS Computational Biology (2020) 1–14.
[17].↵
S. Gupta, R. Shankar, Estimating the number of covid-19 infections in indian hot-spots using fatality data, arXiv preprint arxiv:2004.04025.
[18].↵
D. K. K. Singh, S. Kumar, P. Dixit, D. M. K. Bajpai, Kalman filter based short term prediction model for covid-19 spread, medRxiv.
[19].↵
J.-P. Quadrat, 1-c nonlinear covid-19 epidemic model and application to the epidemic prediction in france, medRxiv.
[20].↵
A. Tomar, N. Gupta, Prediction for the spread of covid-19 in india and effectiveness of preventive measures, Science of The Total Environment (2020) 138762.
[21].↵
S. Singh, K. S. Parmar, J. Kumar, S. J. S. Makkhan, Development of new hybrid model of discrete wavelet decomposition and autoregressive integrated moving average (arima) models in application to one month forecast the casualties cases of covid-19., Chaos Solitons and Fractals 135 (2020) 109866.
OpenUrl CrossRef
[22].↵
S. I. Alzahrani, I. A. Aljamaan, E. A. Al-Fakih, Forecasting the spread of the covid-19 pandemic in saudi arabia using arima prediction model under current public health interventions., Journal of Infection and Public Health.