COVID-19 Trend and Forecast in India: A Joinpoint Regression Analysis ===================================================================== * Aalok Ranjan Chaurasia ## Abstract This paper analyses the trend in daily reported confirmed cases of COVID-19 in India using joinpoint regression analysis. The analysis reveals that there has been little impact of the nation-wide lockdown and subsequent extension on the progress of the COVID-19 epidemic in the country and there is no empirical evidence to suggest that relaxations under the third and the fourth phase of the lockdown has resulted in spiking the reported confirmed cases of COVID-19. The analysis also suggests that if the current trend continues, in the immediate future, then the daily reported confirmed cases of COVID-19 in the country is likely to increase to 21 thousand by 15 June 2020 whereas the total number of confirmed cases of COVID-19 will increase to around 422 thousand. Key Words * COVID-19 * India * Trend * Forecast * Joinpoint Regression ## Background Total number of reported confirmed cases of COVID-19 in India crossed 100 thousand mark on 19 May 2010 according to the database maintained by the World Health Organization. The first confirmed COVID-19 case in India was reported on 30 January 2020 but no confirmed COVID-19 case was reported during 4 February 2020 through 1 March 2020. By 15 March 2020, more than 100 confirmed cases of COVID-19 were reported which increased to 500 by 24 March 2020 when the nation-wide lockdown was announced in the country. Since then, the number of daily reported confirmed cases crossed the 10000 mark by 14 April 2020 and the 50000 mark by 7 May 2020. An analysis of the trend in the daily reported confirmed cases of COVID-19 may provide an idea about how the COVID-19 epidemic has progressed in the country. The trend analysis also permit forecasting the likely trend in the reported confirmed cases of COVID-19 in the immediate future. A trend analysis of daily reported confirmed cases of COVID-19 is also needed as it is widely claimed that the imposition of the nation-wide lockdown on 25 March 2020 has significantly decelerated the progress of COVID-19 epidemic in the country. It has also been claimed that loosening the restrictions under the nation-wide lockdown during its third and the fourth phase has primarily been responsible for the recently witnessed spike in the number of daily reported confirmed cases of COVID-19 in the country. It has even been argued that reimposing the harsh restrictions as part of the nation-wide lockdown is the only way of stopping or decelerating the progress of COVID-19 epidemic despite the fact that the social and economic cost of nation-wide lockdown has been found to be quite complex and exorbitant. It has repeatedly been stressed that because of serious social and economic implications of the nation-wide lockdown, it cannot be prolonged. One way of empirically examining these and many other claims regarding the progress of the epidemic is whether the trend in the daily reported confirmed cases of COVID-19 has changed after the imposition of the nation-wide lockdown or after loosening the restrictions under the nation-wide lockdown. If the trend in the daily reported confirmed cases of COVID-19 has changed after the imposition of the nation-wide lockdown, then it can be concluded that the nation-wide lockdown indeed has an impact on the progress of the epidemic. Similarly, if it is found that the trend in the daily reported confirmed cases of COVID-19 has changed after loosening the restrictions under the nation-wide lockdown, then it can be concluded that loosening of the restrictions has been responsible for the spike in reported confirmed COVID-19 cases in the country. However, if there is no change in the trend, then there is little empirical evidence to suggest that either the nation-wide lockdown or loosening of restrictions under the nation-wide lockdown has any telling impact on the progress of the COVID-19 epidemic. A review of the daily reporting of the confirmed COVID-19 cases in India reveals that during the 28 days from 4 February 2020 through 1 March 2020, no confirmed case of COVID-19 case was reported in the country (Table 1). Moreover, during the period 2 March 2020 through 31 March 2020, daily reporting of confirmed COVID-19 cases has been highly inconsistent. For example, no confirmed case of COVID-19 was reported on 3 March, 20 March and 28 March 2020 whereas on 29 March 2020 alone, 255 confirmed cases of COVID-19 were reported. These inconsistencies in the reporting of daily confirmed cases of COVID-19 may bias any analysis of the trend in the daily reported confirmed cases of COVID-19. It is therefore necessary that these irregular fluctuations in the daily reporting of confirmed cases of COVID-19 are first ironed out before any analysis of the trend in the reported confirmed cases of COVID-19 is carried out. View this table: [Table 1](http://medrxiv.org/content/early/2020/06/02/2020.05.26.20113399/T1) Table 1 Reported confirmed cases of COVID-19 in India, 1 March 2020 - 23 May 2020. One approach to minimise the impact of reporting inconsistencies in the analysis of the trend in daily reporting of confirmed COVID-19 cases is to use moving average instead of actual daily reported confirmed cases of COVID- 19. The same approach has been followed in the present analysis. To minimise irregular fluctuations in the reporting of COVID-19 cases, five-day moving average has been used for the trend analysis. In other words, the reported confirmed cased of COVID-19 in a day used in the present analysis are actually the average of the reported confirmed cases of COVID-19 two days prior to the day in question; two days after the day in question and the reported confirmed cases of COVID-19 on the day itself. For example, the reported confirmed cases of COVID-19 on 3 March 2020 used in the present analysis are actually the simple average of reported confirmed cases of COVID-19 on 1 March through 5 March 2020, etc. This paper analyses the trend in daily reported confirmed cases of COVID-19 in the country using joinpoint regression (Kim et al, 2000). Joinpoint regression is used to study the trend that varies over time. It first identifies the time point(s) at which the trend in the reported confirmed cases of COVID-19 has changed or the joinpoint(s). Once the joinpoint(s) are identified, then the average per cent change between two joinpoints is calculated to reflect how the trend in the reported confirmed cases of COVID-19 has varied over time. The goal of the joinpoint regression analysis is not to provide the statistical model that best fits the time series data. Rather, the purpose of the joinpoint regression analysis is to provide the model that best summarises the trend in the data (Marrot, 2010). The underlying assumption of joinpoint regression is that trend in the data is not the same throughout the period under reference. ## Joinpoint Regression Model The joinpoint regression model is essentially different from the conventional piecewise or segmented regression model in the sense that the identification of joinpoint(s) and their location(s) is estimated within the model and are not set arbitrarily as is the case with the piecewise or segmented regression analysis. The minimum and the maximum number of joinpoint(s) are, however, set in advance but the final number of joinpoint(s) or the time point(s) when the trend changes is determined statistically. The model first identifies the time point(s) when there is a change in the trend and calculates the average percentage change (APC) which reflects the rate of change between two joinpoint(s). When the number of joinpoint(s) is zero, the model reduces to simple linear regression model. Let *yi* denotes the infant mortality rate for the year *ti* such that *t*1 < *t*2 <… <*tn*. Then the joinpoint regression model is defined as ![Formula][1] where ![Formula][2] and *k*1*0) in advance. In the present analysis, the minimum number of joinpoints is specified as 0 while the maximum number of joinpoints have been specified as 5. The programme starts with the minimum number of joinpoints (0, which is actually a straight line and the model is simple linear regression model) and tests whether more joinpoints are statistically significant and must be added to the model (up to the pre-specified maximum number of joinpoints). The tests of significance is based on a Monte Carlo Permutation method (Kim et al, 2000). The Bayesian Information Criterion (BIC) was used to identify the number of joinpoints in the model. There are other methods also for the purpose. These include the permutation test and the data driven BIC methods. Relative merits and demerits of different methods are discusses elsewhere (NIC, 2013). The permutation method is regarded as the best method but it is very highly computationally intensive. The BIC method, on the other hand, is less computationally intensive. This method selects that model for which the object function, which is either the sum of the model fit error or the penalty term is minimised. ## Trend in Daily Reported Confirmed Cases of COVID-19 Results of the joinpoint regression analysis of the five-days moving average of the daily reported confirmed cases of COVID-19 in India for the period 1 March 2020 through 23 May 2020 are summarised in table 2 and figure 1. The five-days moving average is centred at the mid-point of the five-day interval. For example, the five-day moving average of the period 1 March through 5 March 2020 is centred on 3 March 2020. In other words the joinpoint regression analysis is carried out for the period 3 March 2020 through 21 May 2020, although, it covers the data on daily reported confirmed cases of COVID-19 1 March 2020 through 23 May 2020. The period prior to 1 March 2020 has not been included in the analysis as the daily reported confirmed cases of COVID-19 during the period 30 January 2020 through 1 March 2020 have mostly been found to be zero. View this table: [Table 2](http://medrxiv.org/content/early/2020/06/02/2020.05.26.20113399/T2) Table 2 Results of the joinpoint regression analysis The application of the joinpoint regression analysis divides the duration 1 March 2020 through 23 May 2020 or a period of 84 days into five time segments and the trend in the daily reported confirmed cases of COVID-19 cases is found to be different in different time segments. During the first five days of the period under reference - 3 March 2020 (day 1) through 7 March 2020 (day 5), the trend in the daily reported confirmed cases of COVID-19 in the country has been found to be negative which means that daily reported confirmed cases of COVID-19 in the country actually decreased, instead increased during this period, on average, at a rate of around 8 per cent per day. This decrease in the reported confirmed cases of COVID-19 may be attributed to reporting inconsistencies. The daily per cent change during this period has, however, been found to be statistically significant. On the other hand during the next 10 days - from 7 March 2020 (day 5) through 16 March 2020 (day 14) - the daily reported confirmed cases of COVID-19 increased, on average, at a rate of almost 15 per cent per day. The increase in the daily reported confirmed cases of COVID-19 accelerated further during the next eight days - from 16 March 2020 (day 14) through 23 March 2020 (day 21) - when the daily reported confirmed cases of COVID-19 in the country increased at a rate of more than 28 per cent per day. However, the increase in the daily reported confirmed cases of COVID-19 decelerated during the period 23 March 2020 (day 21) through 26 March 2020 (day 24) at a rate of almost 9 per cent per day, although the daily per cent change was not found to be statistically significant. The daily reported confirmed cases of COVID-19 increased again at a rate of 27 per cent per day, on average, during the next 10 days - from 26 March 2020 (day 24) through 4 April 2020 (day 33). After 4 April 2020, however, there has been no change in the trend in the daily reported number of confirmed cases of COVID-19 till 21 May 2020. During 4 April 2020 through 21 May 2020, the daily reported confirmed cases of COVID-19 increased, on average, at a rate of almost 5.2 per cent per day. ![Figure 1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/02/2020.05.26.20113399/F1.medium.gif) [Figure 1](http://medrxiv.org/content/early/2020/06/02/2020.05.26.20113399/F1) Figure 1 Joinpoint regression of daily reported confirmed cases of COVID-19 in India 1 March through 23 May 2020 The joinpoint regression analysis suggests that the trend in the daily reported confirmed cases of COVID-19 changed statistically significantly at the 5th day (7 March 2020); 14th day (16 March 2020); 24th day (26 March 2020); and 33rd day (4 May 2020) of the 80 days period beginning 3 March 2020. The nation-wide lockdown in the country was imposed on 25 March 2020 initially for a period of 21 days which was then extended to 3 May 2020 on 15 April 2020. On 4 May 2020, the lockdown was again extended up to 17 May 2020 but with a relaxed set of restrictions which, on 18 May 2020, was further extended to 31 May 2020 with even more relaxed set of restrictions. The analysis suggests that the day of the fifth change in the trend in daily reported confirmed cases of COVID-19 only matched with the date of third extension of the nation-wide lockdown. The day of the change in the trend in daily reported confirmed cases of COVID-19 has not been found to be linked with the imposition of the nation-wide lockdown on 25 March 2020 as well as its extension on 15 April 2020. There has also been no change in the trend in the daily reported confirmed cases of COVID-19 after the fourth extension of the nation-wide lockdown on 18 May 2020 when restrictions under the nation-wide lockdown were significantly loosened. After 4 April 2020, the daily reported confirmed cases of COVID-19 in the country have been found to have increased, on average, at almost 5.2 per cent per day. In other words, the joinpoint regression analysis of the trend in daily reported confirmed cases of COVID19 does not support the claim that the imposition of the nation-wide lockdown on 25 March 2020 had resulted in a deceleration in the increase in daily reported confirmed cases of COVID-19. At the same time, the trend analysis also does not support the claim that the relaxations in the restrictions under the nation-wide lockdown has resulted in the spiking of daily reported confirmed cases of COVID-19. ## Forecasting Number of COVID-19 Cases The average daily percent change in the reported confirmed cases of COVID-19 during the period 4 April 2020 through 21 May 2020 permit forecasting the daily reported confirmed cases of COVID-19 under the assumption that there is no change in the trend. This exercise suggests that by 15 June 2020, the daily reported confirmed cases of COVID-19 will increase to 21243 with a 95 per cent confidence interval of 18246 - 24725 (Table 3 and Figure 2). This increase in the reported confirmed cases of COVID-19 may change only when there is a significant change in the trend in the reported confirmed cases of COVID-19. A significant change in the trend is possible only when a new set of interventions are introduced to combat COVID-19 epidemic in the country. It is already being emphasised that the nation-wide lockdown imposed on 25 March 2020 is now getting increasingly irrelevant in checking the progress of the epidemic because of a host of factors, the most important of which is that the national-wide lockdown could not prevent large scale movement, especially of migrant workers from urban areas to rural hinterland. It is, therefore, being stressed that population-wide testing for COVID-19 followed by active contact tracing and isolation of the positive cases and their contacts is necessary to stop the increase and even decrease the reported confirmed cases of COVID-19 in the country. The need for such a strategy stems from the fact that almost 40 per cent of the individuals tested positive for COVID-19 are found to be asymptomatic. Chaurasia (2020a) has suggested a cluster-based approach of population-wide testing for COVID-19 which significantly reduces the number of tests to be done. ![Figure 2](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/02/2020.05.26.20113399/F2.medium.gif) [Figure 2](http://medrxiv.org/content/early/2020/06/02/2020.05.26.20113399/F2) Figure 2 Forecast of COVID-19 cases in India up to 15 June 2020 View this table: [Table 3](http://medrxiv.org/content/early/2020/06/02/2020.05.26.20113399/T3) Table 3 Forecast of daily reported confirmed cases of COVID-19 till 15 June 2020. The forecast of the daily reported confirmed cases of COVID-19, on the basis of the joinpoint regression analysis present here also suggests that the total number of confirmed COVID-19 cases in the country are likely to increase to almost 422 thousand by 15 June 2020 with a 95 per cent confidence interval of around 376 thousand to around 473 thousand. According to the latest information available from the database maintained by the World Health Organization, the total number of confirmed COVID-19 cases in the country has crossed the 138 thousand mark by 25 May 2020. This implies that during the next 20 days, there will be most probably around 284 thousand additional COVID-19 cases in the country. This scenario can be change through introducing appropriate interventions to halt or to even reverse the progress of the epidemic. The good sign, however, is that recovery rate of the disease in the country is quite encouraging while the risk of death from the disease is quite low by international standards. ## Conclusions The present analysis, based on the daily reported confirmed cases of COVID-19, suggests that there has virtually been little impact of the nation-wide lockdown and subsequent extension and relaxation in restrictions on the progress of the COVID-19 epidemic in India. There has also been little empirical evidence to suggest that relaxation in the restrictions under the third and the fourth phase of the nation-wide lockdown has resulted in spiking the reported confirmed cases of COVID-19 in the country. The analysis also suggests that if the trend in the reported confirmed cases of COVID-19 during the period 4 April through 21 May 2020 continues in the immediate future, then the daily reported confirmed cases of COVID-19 is likely to increase to around 21 thousand by 15 June 2020 whereas the total number of confirmed cases of COVID-19 will increase to around 422 thousand. This trend can be changed or reverted by introducing appropriate interventions that may help in containing the spread of the disease. In this context, population-wide testing for COVID- 19 along with isolation of positive cases and contacts to the positive cases appears to be the need of the time. ## Data Availability Data table is included in the paper itself. * Received May 26, 2020. * Revision received May 26, 2020. * Accepted June 2, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory The copyright holder has placed this preprint in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, remix, or adapt this material for any purpose without crediting the original authors. ## References 1. Akinyede O, Soyemi K (2016) Joinpoint regression analysis of pertussis crude incidence rates, Illinois, 1990-2014. American Journal of Infection Control 44(12):1732–3. 2. Chatenoud L, Garavello W, Pagan E, Bertuccio P, Gallus S, La Vecchia C, Negri E, Bosetti C (2015) Laryngeal cancer mortality trends in European countries. International Journal of Cancer 842: 833–42. 3. Chaurasia AR (2020) Long-term trend in infant mortality in India: a joinpoint regression analysis for 1981–2018. Bhopal, MLC Foundation. 4. Chaurasia AR (2020a) Cluster approach to population-wide testing for COVID- 19. Bhopal, MLC Foundation. 5. Doucet M, Rochette, Hamel D (2016) Prevalence and mortality trends in Chronic Obstructive Pulmonary Disease over 2001 to 2011: a public health point of view of the burden. Canadian Respiratory Journal 2016: 1–10. 6. John U, Hanke M (2015) Liver cirrhosis mortality, alcohol consumption and tobacco consumption over a 62 year period in a high alcohol consumption country: a trend analysis. BMC Research Notes 8(1):822. 7. Kim HJ, Fay MP, Feuer EJ, Midthune DN (2000) Permutation tests for joinpoint regression with applications to cancer rates. Statistics in Medicine 19: 335–351. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/(SICI)1097-0258(20000215)19:3<335::AID-SIM336>3.0.CO;2-Z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10649300&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F02%2F2020.05.26.20113399.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000084997000005&link_type=ISI) 8. Kim HJ, Fay MP, Yu B, Barrett MJ, Feuer EJ (2004) Comparability of segmented line regression models. Biometrics 60(4): 1005–1014. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.0006-341X.2004.00256.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15606421&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F02%2F2020.05.26.20113399.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000225939300018&link_type=ISI) 9. Marrot LD (2010) Colorectal cancer network (CRCNet) user documentation for surveillance analytic software: Joinpoint. Cancer Care Ontario: 1–28. 10. Missikpode C, Peek-Asa C, Young T, Swanton A, Leinenkugel K, Torner J (2015) Trends in non-fatal agricultural injuries requiring trauma care. Injury Epidemiology 2(1): 30. 11. Mogos MF, Salemi JL, Spooner KK, McFarlin BL, Salihu HM (2016) Differences in mortality between pregnant and nonpregnant women after cardiopulmonary resuscitation. Obstetrics and Gynecology 128(4): 880–8. 12. National Cancer Institute (2013) Joinpoint Regression Program. Bethesda, MD: National Institutes of Health, United States Department of Health and Human Services. 13. Rea F, Pagan E, Compagnoni MM, Cantarutti A, Pigni P, Bagnardi V, Cprrap G (2017) Joinpoint regression analysis with time-on-study as time-scale. Application to three Italian population-based cohort studies. Epidemiology, Biostatistics and Public Health 14(3): e12616. 14. Tyczynski JE, Berkel HJ (2005) Mortality from lung cancer and tobacco smoking in Ohio (U.S.): will increasing smoking prevalence reverse current decreases in mortality? Cancer Epidemiol Biomarkers Preview. United States 14(5):1182–7. [1]: /embed/graphic-5.gif [2]: /embed/graphic-6.gif