Abstract
COVID-19 has been declared as a global pandemic by the World Health Organization (WHO) on March 11, 2020. In this paper, we investigate various aspects of the clinical recovery of the first 1000 COVID-19 patients in Singapore, spanning from January 23 to April 01, 2020. This data consists of 245 clinically recovered patients. The first part of the paper studies the descriptive statistics and the influence of demographic parameters, namely age and gender, in the clinical recovery-period of COVID-19 patients. The second part of the paper is on identifying the distribution of the length of the recovery-period for the patients. We identify a piecewise analysis of three different periods, identified based on trends of both positive confirmation and clinical recovery of COVID-19. As expected, the overall recovery rate has reduced drastically during the exponential increase of incidences. However, our in-depth analysis shows that there is a shift in the age-group of incidences to the younger population, and the recovery-period of the younger population is considerably lower. Here, we have estimated the recovery rate to be 0.125. Overall, the prognosis of COVID-19 indicates an improvement in recovery rate owing to the government-mandated practices of restricted mobility of the older population and aggressive contact tracing.
1 Introduction
The viral contagion, named COVID-19, has been declared as a global pandemic by the World Health Organization on March 11, 20202. The pandemic, characterized by atypical pneumonia, is caused by a virus from the coronavirus family, namely SARS-CoV2 (Severe Acute Respiratory Syndrome Coronavirus-2), which is a positive-sense single-stranded RNA virus. As of April 2, 2020, there are 827,419 positively confirmed cases, and 40,777 deaths, spread across 206 countries 3. The total number of recovered patients in an unofficial count is 193,989 out of 935,197 positively confirmed patients, which implies that the ratio of the recovered to the infected patients, rri is ∼0.2, as of April 1, 2020 4.
In this paper, we analyze the statistics of hospital recovery of patients tested positive for COVID-19 infection in Singapore [1]. The case study of Singapore has been carefully chosen owing to the reliable, accessible, and available data from the official press releases of the Ministry of Health (MoH), Government of Singapore. The healthcare system of Singapore has been unique in its handling of the widespread contagion in terms of imposing strict lockdown, quarantine, and isolation, and aggressive, large-scale contact tracing and testing. 1000 individuals have been confirmed positive for COVID-19 during January 23-April 01, 2020, of which 245 have been discharged after clinical recovery. This gives an overall rri higher than that of the world, as rri=0.245. The positive confirmation has been made using the real-time reverse transcription polymerase chain reaction (RT-PCR) tests on respiratory samples (sputum or nasal/throat/nasopharyngeal swabs), based on experiential learning from the outbreak of SARS in 2003 5. Similarly, the protocol for clinical recovery or hospital discharge has been based on the results on the RT-PCR tests of two consecutive samples being negative over two days [2].
Since the SARS outbreak in 2003, Singapore has systematically strengthened the system of managing the spread of infectious diseases [2]. The measures include opening dedicated facilities (the National Center for Infectious Diseases (NCID), National Public Health Laboratory, and more biosafety level-3 laboratories), increasing capacity in the public healthcare system (e.g., negative pressure isolation beds, personal protective equipment, trained health professionals), and deploying formal (digital) platforms for inter-governmental agency cooperation. For containing the spread of contagions, systems have been in place for upscaled, quick-responsive, and aggressive contact tracing at entry points of the country (airports) and through local healthcare providers. There has been a holistic improvement, supported by increased economic investment, in building expertise in infectious disease management. This organized system has thus facilitated a controlled management of the pandemic COVID-19 in Singapore with patient-wise reporting to the public. Hence, a case study in Singapore pertaining to the demographic analysis of clinically recovered patients enables a systematic understanding of the recovery rate (γ) of the pandemic.
One of the significant benefits of aggressive contact tracing has been hospital isolation within 5 days from the onset of symptoms [3]. However, 12.6% of transmission has been found to be presymptomatic [4], as analysed for seven clusters found in Singapore. While there is an innate uncertainty from the onset of symptoms in the symptomatic cases to a positive confirmation, the hospital stay from positive confirmation to discharge upon negative results exhibits more cohesive statistics, as can be observed with similar studies in hospital stay [5]. Hence, in this work, we perform statistical analysis of the recovery-period, i.e., length of hospital stay, of COVID-19 patients in the hospital. Also, the motivation behind studying the clinical recovery of patients is to assess the overall load on the healthcare system in terms of patient occupancy as the hospital stays determines the load. The clinical recovery studied in this paper corresponds to the hospitalization period for each patient. The demographic analysis of the recovered patients gives insight to shifts in gender and age-groups in the recovery of patients. This analysis further complements the observation in the transmission rate is higher in older males with comorbidities [6]. Fitting the data of hospitalization period to a statistical distribution is essential for estimating γ.
The epidemiological models are generally used to simulate the progression of a disease. The proportions of population being “susceptible,” “infected,” and “removed” are used in these models. “Removed” implies both “recovered” and “deceased.” The first two deaths in Singapore owing to the COVID-19 contagion occurred on March 21, 2020. The number increased to 3 deaths by April 01. Owing to relatively low number of deaths in Singapore due to COVID-19 contagion during January 23-April 01, 2020, we have assumed death/mortality/fatality rate to be 0 in our work. Thus, here, “recovery” implies the state of “clinically recovered and discharged from hospital.”
Since the contagion has time-varying reproduction number (Rt) with characteristic trends in specific time-periods [7], we split the time-period of January 23-April 01, to perform a piecewise analysis of the timeline [7, 8]. We perform two analyses on the periodized timeline. Firstly, we study the age-gender distribution of the patients who have been confirmed positive of COVID-19 and those who have clinically recovered. Secondly, we extract the distribution of clinical recovery-periods and fit regression models. In both analyses, we discuss the observable period-wise shifts in trends and their influencing factors, thus estimating γ. The novel contribution of our work is an in-depth analysis of the clinical recovery of COVID-19 patients to estimate the recovery rate γ, which is a key parameter in the SIR (susceptible-infected-recovered) model for the disease [9].
2 Methods
The data for our work has been collated from the public press releases made by the MoH, the Government of Singapore 6. This dataset includes the case-ID’s, age, gender, positive confirmation date, discharge date, and date of onset of symptoms7. The data has been cross-verified with dashboard 8 for case details. We have analyzed this patient-wise data pertaining to age, gender, and timeline of the disease progression.
We define recovery-period Δtr as the time elapsed between positive COVID-19 confirmation using RT-PCR test, and the discharge date from hospital after two consecutive negative results, using RT-PCR tests. Owing to the strict protocols followed in the Singapore healthcare system, the recovery-period can be considered equivalent to the virus shedding period. Δtr is estimated to be 15 days [10] or 20 days [11]. We consider Δtr as an observed count variable.
Age-gender distribution
There has been early evidence of the influence of both age and gender in susceptibility of COVID-19 infection [6, 12]. Hence, we look at the influence of age and gender in clinical recovery of patients, with respect to the recovery-period.
There are 1000 patients (576M, 424F) in the population confirmed COVID-19 positive, of which 245 patients (137M,108F) have clinically recovered (§Figure 1(i)). The age distribution of the positively confirmed patients is (13 in [0-9], 29 in [10-19], 273 in [20-29], 202 in [30-39], 155 in [40-49], 164 in [50-59], 110 in [60-69], 38 in [70-79], 15 in [80-89], 1 in [100-]) patients in age groups given in years in square brackets. Similarly the age distribution of the clinically recovered in (4 in [0-9], 4 in [10-19], 36 in [20-29], 58 in [30-39], 44 in [40-49], 55 in [50-59], 34 in [60-69], 8 in [70-79], 2 in [80-89]). The preliminary counts indicate that the restrictions used for slowing the contagion down in the susceptible group of the population, namely the older males [12], have led to the shifts in distribution of contagion with respect to age and gender. There is a conspicuous shift to the younger population, namely, in the age groups [20-29], and [30-39], and more evenly across both genders. Here, we investigate the change in recovery rate γ owing to these shifts.
Periodization
As the pandemic progresses, the evolution needs to be studied piecewise in different time periods [7, 8]. From the timeline of disease progression in Singapore, we have identified the following significant dates:
On January 23, the first patient was confirmed COVID-19 positive.
On February 4, the first clinically recovered patient was discharged from the hospital.
On March 17, the total/cumulative number of positively confirmed patients started increasing exponentially with 44 new cases on a single day.
On March 21, the first two deaths owing to complications from COVID-19 were recorded.
On April 1, the number of confirmed patients reached 1000.
These trends can be observed in the daily profile of patient counts (§Figure 1 (ii),(iii)). Assuming a zero death rate owing to the low number of deaths in Singapore from COVID-19, we consider the following three periods:
Period P1 during January 23-February 3, which is the period with no clinically recovered cases.
Period P2 during February 4-March 16, which is the period of slow growth in the total/cumulative number of positively COVID-19 confirmed cases, Ni, with an increase in the total/cumulative number of clinically recovered cases, Nr, and zero deaths.
Period P3 during March 17-April 1, which is the period of exponential growth in Ni, reaching Ni = 1000, slow growth in Nr, and having the first 3 deaths.
Recovery rate γ
The governing differential equations in the simplest SIR model, also known as Kermack-McKendrick model [9], are given as follows:
β is the rate at which an infected individual infects others, γ the transition rate in SIR model9, Np the size of the population, Ni the number of infected persons (i.e., with positive COVID-19 confirmation), Nr the number of recovered persons (i.e., clinically recovered), and Ns is the number of susceptible people. The basic reproduction number or reproduction rate, , characterizes an infection. ℛ0 > 1 implies the infection will continue to spread, and ℛ0 < 1 implies that the spread is limited and under control. Currently, ℛ0 for COVID-19 is estimated to be (0.8-5.0)[8, 13]. γ is estimated as the reciprocal of the recovery-period Δtr, which implies that γ ∼ (0.05-0.067), based on estimates of Δtr [10, 11].
The absolute numbers indicate that rri, i.e., , has increased from 0.00 at the end of P1 to at the end of P2, and again dipped to at the end of P3. The dip is unfavorable for the scenario, given that the number of positive COVID-19 confirmations has increased exponentially, starting the beginning of P3. Since rri and γ are positively correlated, it implies a further decrease in γ.
However, the absolute counts Ni and Nr do not explicitly show the shifts in the age-gender distribution of the infected population (§Figure 2(i)). Thus, it is difficult to demonstrate the influence of this shift in the recovery rate γ. Social distancing reduces β, thus decreasing ℛ0. At the same time, increasing γ also favours a decrease in ℛ0. In this work, we hypothesize that restricted mobilization and aggressive contact tracing would have indirectly increased γ. Thus, we propose computing the time-varying Δtr using the shifts in the age-gender distribution in the periodized timeline.
Recovery-period Δtr analysis
Our goal is to study the period-wise changes in Δtr owing to the shift in the demographic structure, in order to determine the period-wise change in γ. We performed descriptive statistical analysis using median and interquartile range (IQR), followed by fitting appropriate regression models. We consider two types of regression models. Firstly, we use the time-series of Δtr and fit a line using the loess model. Loess is a non-parametric local regression model for smoothening empirical time-series data [14] and scatterplots [15]. Secondly, we use the number of patients recovering for a specific Δtr as a count variable and fit multivariate linear regression models considering age and gender as independent variables, and Δtr as the dependent variable. Since we are using a combination of a categorical variable (gender) and numerical variable (age), we use generalized linear models (GLM) for regression, which is semi-parametric. Length of hospital stay (LoS) is a naturally skewed distribution, for which GLM’s such as the Poisson regression model (PRM) and negative binomial regression model (NBM) have been used [5]. Hence, we propose the use of PRM and NBM for modeling Δtr.
For the period-wise analysis of Δtr, we group the clinically recovered patients using two strategies.
Grouping based on recovery date, 𝔾−cfrmDt: The two groups of clinically recovered patients can be obtained as per the period in which their discharge/recovery date falls, namely, P2 and P3 (§Figure 2 (ii),(a)). 𝔾−cfrmDt corresponds to a group of 109 patients who were discharged in P2, and 136 patients in P3.
Grouping based on positive confirmation date, 𝔾+cfrmDt: However, we can perform a finer-grain analysis of the groups of patients based on the period in which their date of positive confirmation/hospital admission falls (§Figure 2 (ii),(b)). This gives us groups of patients who got tested positive in a period and got clinically recovered during the entire period of our study here. This gives us 18 patients in P1, 161 in P2, and 66 in P3.
Symptom-onset period Δtso analysis
We additionally have data of the date of onset of symptoms for 227 of the 245 clinically recovered patients, who were confirmed positive during P1 and P2. We define the symptom-onset period, Δtso, as the number of days between the onset of symptoms and positive confirmation of COVID-19. We use the time-series of Δtso to fit a loess model, similar to Δtr.
3 Results
Table 1 gives the percentage values of the data presented in Figure 2.
Descriptive statistical analysis
We present the descriptive statistics either as a Δtr five-number summary10 and as “median [IQR]” of the observed count variable, i.e., Δtr.
Δtr observed during the entire period, January 23-April 01, has the following median and IQR values:
Overall: 11 [7] days.
Gender-wise: 11 [6.25] days for females and 11 [8] days for males (§Figure 3(i)).
Age-wise: 4.5 [4.75] days for (0-9), 13.5 [4.25] days for (10-19), 9.5 [8] days for (20-29), 11 [5] days for (30-39), 11 [6.5] days for (40-49), 11 [8.5] days for (50-59), 11.5 [5.75] days for (60-69), 9 [7] days for (70-79), and 7.5 [3.5] days for (80-89), for age groups in paranthesis given in years (§Figure 3(ii)).
We observe that the Δtr is overall lesser in this dataset than reported in early analyses of COVID-19 patients, i.e., 15 days [10] and 20 days [11]. The overall median of Δtr is 11 days, which is the same as the gender-weighted median, and the age-group-weighted median is 10.7 days. Thus, the overall descriptive statistical analysis gives a conservative estimate of γ to be . The remaining work is to estimate γ more precisely based on the influence of gender and age.
There is a stronger influence of age than gender on Δtr, as the median values are similar for both genders. In contrast, it is relatively lower for the age groups of (0-9), (20-29), and (80-89) years, specifically. These age groups comprise of 1.6%, 14.6%, and 0.8% of the clinically recovered patients (§Table 1). This result is significant as the age group of (20-29) years contributes the highest (27.3%) to the infected population. 88.6% of the patients in this age group have been confirmed positive in P3. Thus, our key conclusion is that since the most susceptible group of people has lower Δtr, the recovery rate γ is bound to increase further in the period after April 01 compared to the value estimated in our work.
The period-wise five-number summaries for Δtr, using 𝔾+cfrmDt grouping, are:
(7, 11.5, 17, 22.75, 28) for P1, (0, 8, 12, 16, 27) for P2, and (1, 6, 9, 11, 15) for P3.
For females: (10, 11, 13, 22, 26) for P1, (0, 7, 12, 15, 26) for P2, and (3, 6.25, 9, 11, 14) for P3.
For males: (11, 17, 20, 23, 29) for P1, (0, 8, 12, 16, 27) for P2, and (1, 5.75, 8, 11, 15) for P3.
We observe similar trends when considering 𝔾−cfrmDt grouping. The range, the IQR, and the median of Δtr decrease from P1 to P3, when we look at the data for each gender as well as the data without the gender information. This indicates that irrespective of gender, the measure of the spread of Δtr decreases with time, similar to the trend in the value of Δtr. The age-wise minima reduce sharply from P1 to P2, and increase slightly further to P3, indicating an overall trend of decrease in minima. This supports the overall decrease in Δtr value from P1 to P3.
Table 2 shows the fine-grained five-number summaries of the age-gender based box and whisker plots in Figure 4. We observe that there is a strong influence of age on Δtr. While the overall median values for females and males are similar, we observe that there are variations across different age groups. In particular, for the age group of (20-29) years, the median [IQR] value of males of 8.5 [8.25] days is lower than the median value of females of 10 [5.75] days. This also shows that there is a higher spread (IQR) of Δtr values in males. We further observe that the gender difference in IQR arises in P3 in the 𝔾+cfrmDt grouping, which happens in P2 in the 𝔾−cfrmDt grouping. The minima and median values are overall lower in P3 in the 𝔾+cfrmDt than the 𝔾−cfrmDt grouping. Since the protocols followed in the hospital can be perceived to be similar for patients with closer hospital admission dates, the Δtr has more cohesive descriptive statistics in the 𝔾+cfrmDt grouping than the 𝔾−cfrmDt one. Hence, we use the 𝔾+cfrmDt grouping exclusively in the regression analysis.
Since the symptom-onset dates have been studied [7, 8], we report the five-number summaries of Δtso, for which data is available for P1 and P2 only.
Overall: (0, 3, 6, 9, 16) for P1 and (1, 2, 5, 8, 17) for P2.
For females: (0, 2, 5, 7, 12) for P1, and (1, 3, 5, 8, 15) for P2.
For males: (1, 4.75, 8, 9, 15) for P1, and (1, 2, 5, 8, 17) for P2.
We observe that Δtso has similar IQR and median across P1 and P2, irrespective of gender. We attribute this to the continuous monitoring of susceptible cases in Singapore leading to lesser delays in positive confirmations, thus showing a lower measure of spread and overall lower values for Δtso. Overall, we do not emphasize on analysing Δtso owing to the clinical uncertainties involved [4], which do not reflect in the timeline data that we are using here.
Loess model
We now use the loess model to confirm the trend in the change in Δtr. The loess model has been estimated on the time-series of the Δtr and Δtso values, represented as scatter plots (§Figure 5). The loess model has been implemented using the stats.loess11 in R [16]. We have considered the scatter plots based on the positive COVID-19 confirmation date, akin to 𝔾+cfrmDt grouping. The loess model for Δtr, considered during January 23-April 04, 2020, uses 245 observations, with 5.37 degrees of freedom (DoF) and residual standard error (RSE) of 145.4 (§Figure 5(i)). The loess models done on period-wise data (§Figure 5(ii)) have the following details: for P1 uses 18 observations, with 4.66 DoF and RES of 5.817; for P2 uses 161 observations, with 5.23 DoF and RSE of 61.74; and for P3 uses 66 observations, with 4.91 DoF and RSE of 109.8. The loess model for Δtso uses 227 observations, with 5.03 DoF and RSE of 70 (§Figure 5(iii)).
The degrees of freedom roughly corresponds to the degree of the polynomial used to generate the fitting curve. Thus, both Δtr and Δtso can be modeled using a 5th degree polynomial. Higher degree polynomial implies less bias but larger variance. Δtr has a slope of −25o in P3 when using the loess model for the entire time period (§Figure 5(i)) and that of − 6o when using loess model for P3 (§Figure 5(ii)). Thus, the key conclusion from the local regression model on the time-series is the negative slope, i.e., a downward trend in Δtr in P3, which is favorable in improving recovery rate γ.
Multivariate (linear) regression model
Now that we have observed and concluded from both the descriptive statistical analysis and loess model that Δtr is decreasing during the period of January 23-April 01, our next step is to predict the value of Δtr. We experiment with the generalized linear model (GLM) for a multivariate linear regression model for Δtr using Poisson (PRM) and negative binomial (NBM) distributions. Our choice of model and distributions are commonly used for count data [5], and hospital length of hospital stay (LoS) is commonly over-dispersed data [17]. For each model, we use four scenarios, namely, for the entire period and for each period. We use the Akaike Information Criterion (AIC) and its corrected version for a small sample size (AICc) for determining the goodness of fit of our proposed models. We have implemented these models using the stats.glm in R [16].
For GLM with Poisson and binomial families, the dispersion is fixed at 1.0, and the number of parameters (k) is the same as the number of coefficients in the regression model [16]. The negative binomial distribution has an additional parameter to model over-dispersion in the data. For the number of samples (n) in the data, AIC is used if , and AICc is used otherwise [18]. Thus, we use AIC for scenarios of the entire period and P2, and AICc for P1 and P3, owing to the relatively lesser samples (§Table 3).
Table 3 gives the results of our models. We infer the following:
The “age” variable is significant only in the PRM, and only for the scenarios of the entire period and P2, with a p-value for the coefficients corresponding to the variable being less than 5%.
The NBM shows lower values for the median and the variance (observable from range and IQR) of deviance residuals, and AIC/AICc than the PRM. Thus, we conclude that NBM is a better fit than PRM. Also, for both PRM and NBM, the models for the scenarios of P1 and P3 are a better fit than those of the entire period and P2.
These observations may be attributed to the relatively small sample size for P1 and P3. Since we do not have a large number of variables to discard, we retain the “gender” variable in the model despite its insignificance.
The key conclusion from the multivariate regression analysis is that a GLM with NBM for P3 is the best model for us to estimate Δtr. This helps us to estimate the value of Δtr to be 8 days, with the maximum likelihood of 11.8%. The expected value of Δtr of NBM for P3 is 8.05 days. Hence, overall, we conclude that the estimated value of γ is .
4 Discussion
The improved γ, as per our estimate, is an outcome of the existing protocols in Singapore. The approach of containment of the contagion undertaken by Singapore has been government-mandated, which has ensured delays in the spread of the disease. While the spread got contained, there has been a shift in the age-group of the population getting infected. This shift has brought about the decrease in the recovery-period, Δtr.
Our work has two specific limitations. Firstly, our study is short of using a non-zero death/mortality/fatality rate of the disease. The number of deaths will continue to increase, warranting its consideration in the SIR model. Secondly, we have modeled recovery isolated from the infection. Since the number of infected persons in Singapore has increased exponentially since March 17, 2020, the infection rate, β, and consequently, basic reproduction number ℛ0 have to be re-estimated. Nevertheless, our estimated γ is thus applicable for improving estimation of ℛ0 until April 01, and for simulating/predicting disease progression using SIR model beyond April 01.
In summary, we have looked at the demographic data and timeline of the first 1000 COVID-19 patients in Singapore during January 23-April 01, 2020. We have closely investigated the data on the positive confirmation and discharge/clinical recovery dates of 245 patients who recovered during this time period. We have used regression analysis, subsequent to a descriptive statistical analysis, to get an estimate of recovery-period Δtr (i.e., hospital length of stay (LoS)). We have found that the Δtr is time-varying, after performing periodization to find three significant periods, namely P1 (January February 03), P2 (February 04-March 16), and P3 (March 17-April 01). The estimates of Δtr varied from ∼17 days in P1 to ∼10 days in P2 to ∼ 8 days in P3. We have used the loess model for time-series data to demonstrate the negative slope of the regression curve of Δtr in P3, in particular. We then estimated period-wise Δtr using generalized linear models for multivariate (linear) regression with Poisson and negative binomial distributions for count data. This shows an improvement in the values published for Δtr, i.e., 20 days [11] and 15 days [10]. This has led us to estimate the current recovery rate γ in the SIR model to be from March 17, 2020 onwards.
Data Availability
All the data used for this work has been obtained from the press releases on the official website of the Ministry of Health, the Government of Singapore.
Acknowledgements
The authors would like to thank IIIT Bangalore, and particularly, the Graphics-Visualization-Computing Lab and the E-Health Research Center, for supporting this work.
Footnotes
↵3 https://www.who.int/emergencies/diseases/novel-coronavirus-2019
↵4 As retrieved on April 02, 2020, from https://www.worldometers.info/coronavirus/
↵7 The press releases did not carry patient-wise information after March 17, 2020, owing to which the date of onset of symptoms is not available beyond that date.
↵8 https://experience.arcgis.com/experience/7e30edc490a5441a874f9efe67bd8b89
↵9 The transition rate must include both recovery and deceased. However, since we assume a zero death rate, the transition rate is equivalent to recovery rate, in our work.
↵10 A five-number summary is (minimum, first quartile, median, third quartile, maximum) values of a (count) variable.
↵11 The loess model is a default local regression model used for a sample with less than 1000 observations in stats package in R.