Summary
Background Severe cases of coronavirus disease 2019 (COVID-19) rapidly develop acute respiratory distress leading to respiratory failure, with high short-term mortality rates. At present, there is no reliable risk stratification tool for non-severe COVID-19 patients at admission. We aimed to construct an effective model for early identifying cases at high risk of progression to severe COVID-19.
Methods SARS-CoV-2 infected patients from one center in Wuhan city and two centers in Guangzhou city, China were included retrospectively. All patients with non-severe COVID-19 during hospitalization were followed for more than 15 days after admission. Patients who deteriorated to severe or critical COVID-19 and patients who kept non-severe state were assigned to the severe and non-severe group, respectively. We compared the demographic, clinical, and laboratory data between severe and non-severe group. Based on baseline data, least absolute shrinkage and selection operator (LASSO) algorithm and logistic regression model were used to construct a nomogram for risk prediction in the train cohort. The predictive accuracy and discriminative ability of nomogram were evaluated by area under the curve (AUC) and calibration curve. Decision curve analysis (DCA) and clinical impact curve analysis (CICA) were conducted to evaluate the clinical applicability of our nomogram.
Findings The train cohort consisted of 189 patients, while the two independent validation cohorts consisted of 165 and 18 patients. Among all cases, 72 (19.35%) patients developed severe COVID-19 and 107 (28.76%) patients had one of the following basic disease, including hypertension, diabetes, coronary heart disease, chronic respiratory disease, tuberculosis disease. We found one demographic and six serological indicators (age, serum lactate dehydrogenase, C-reactive protein, the coefficient of variation of red blood cell distribution width (RDW), blood urea nitrogen, albumin, direct bilirubin) are associated with severe COVID-19. Based on these features, we generated the nomogram, which has remarkably high diagnostic accuracy in distinguishing individuals who exacerbated to severe COVID-19 from non-severe COVID-19 (AUC 0.912 [95% CI 0.846-0.978]) in the train cohort with a sensitivity of 85.71 % and specificity of 87.58% ; 0.853 [0.790-0.916] in validation cohort with a sensitivity of 77.5 % and specificity of 78.4%. The calibration curve for probability of severe COVID-19 showed optimal agreement between prediction by nomogram and actual observation. DCA and CICA further indicated that our nomogram conferred significantly high clinical net benefit.
Interpretation Our nomogram could help clinicians to early identify patients who will exacerbate to severe COVID-19. And this risk stratification tool will enable better centralized management and early treatment of severe patients, and optimal use of medical resources via patient prioritization and thus significantly reduce mortality rates. The RDW plays an important role in predicting severe COVID-19, implying that the role of RBC in severe disease is underestimated.
Funding Science and Technology Program of Guangzhou, China (201804010474)
Introduction
Since the outbreak of novel coronavirus pneumonia (COVID-19) in December 2019, the number of reported cases has surpassed 120,000 with over 4600 deaths worldwide, as of March 12 2020. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a member of coronaviruses known to cause common colds and severe illnesses such as, is the cause of COVID-191. Compared with much higher overall case-fatality rates (CFR) for the severe acute respiratory syndrome (SARS) and middle east respiratory syndrome (MERS), COVID-19 is being responsible for more total deaths because of the increased transmission speed and the growing numbers of cases2. Up to now, the World Health Organization (WHO) has raised global Coronavirus Disease (COVID-19) outbreak risk to “Very High”, and SARS-CoV-2 infection has become a serious threat to public health.
According to a report recently released by the Chinese Center for Disease Control and Prevention (CDC) that included approximately 44,500 confirmed cases of SARS-CoV-2 infections, up to 15.8% were severe or critical. Most COVID-19 patients have a mild disease course, while some patients experience rapid deterioration (particularly within 7-14 days) from onset of symptoms into severe COVID-19 with or without acute respiratory distress syndrome (ARDS)3. Current epidemiological data suggests that the mortality rate of severe COVID-19 patients is about 20 times higher than that of non-severe COVID-19 patients4, 5. This situation highlights the need to identify COVID-19 patients at risk of approaching to severe COVID-19. These severe illness patients often require utilization of intensive medical resources. Therefore, early identification of patients at high risk for progression to severe COVID-19 will facilitate appropriate supportive care and reduce the mortality rate, unnecessary or inappropriate healthcare utilization via patient prioritization.
At present, an early warning model for predicting COVID-19 patients at-risk of developing a costly condition is scarce3, 6. So far, prognosis factors of COVID-19 mainly focus on the immune cells. In our study, we found that older age, higher lactate dehydrogenase (LDH) and C-reactive protein (CRP), RDW (the coefficient of variation of red blood cell distribution width), DBIL (direct bilirubin), blood urea nitrogen (BUN), and lower albumin (ALB) on admission correlated with higher odds of severe COVID-19. Based on these indexes, we developed and validated an effective prognostic nomogram with high sensitivity and specificity for accurate individualized assessment of the incidence of severe COVID-19. Among these indexes, the prognostic role of RDW in COVID-19 is underestimated, which is associated with the increased turnover of erythrocytes. Our results hinted that the turnover of RBC might involve in severe illness.
Material and method
Data collection
Data on COVID-19 inpatients between January 20th 2020 and March 2nd 2020 was retrospectively collected from three clincial centers: Guangzhou Eighth People’s Hospital, Zhongnan Hospital of Wuhan University and the Third Affiliated Hospital of Sun Yat-sen University. A total of 372 patients with COVID-19 were enrolled, 9 patients younger than 15 years of age were excluded from the study. Clinical laboratory test results, including SARS-CoV-2 RNA detection results, biochemical indices, blood routine results, were collected from routine clinical practice. Written informed consent was waived by the Ethics Commission of each hospital for emerging infectious diseases. The study was approved by the Ethics Committee of the Eighth People’s Hospital of Guangzhou, the Ethics Commission of the Third Affiliated Hospital of Sun Yat-sen University and the Ethics Commission of Zhongnan Hospital.
The diagnosis of SARS-CoV-2 infection was based on the Guidelines for Diagnosis and Treatment of Novel Coronavirus Pneumonia (5th version), released by National Health Commission of China. Suspected cases of COVID-19 requires meeting any of the following epidemiology history criteria or any two of the following clinical manifestations: (A) Epidemiological history: a history of travel to or residence in Wuhan in the last 14 days prior to symptom onset; contact with a confirmed or suspected case of 2019-nCOV infection in the last 14 days prior to symptom onset; aggressive disease onset. (B) Clinical manifestation: Fever and/or respiratory infection, or with normal/decreased white blood cells counts and normal/decreased lymphocyte counts. In the absence of the above mentioned criteria for epidemiological history, the suspected case should meet with all of the above mentioned criteria for clinical manifestation. A confirmed case was defined as an individual with laboratory confirmation of SARS-CoV-2 which required positive results of SARS-CoV-2 RNA, irrespective of clinical signs and symptoms. For diagnosis of Severe COVID-19 group, at least one of the following conditions should be met: (1) Shortness of breath, Respiratory rate (RR) ≥30times/min, (2) Arterial oxygen saturation (Resting status) ≤93%, or (3) the ratio of Partial pressure of oxygen to Fraction of inspiration O-2(PaO-2/ FiO-2)≤300mmHg.
Laboratory Methods
Clinical laboratory test results, including SARS-CoV-2 RNA detection results, biochemical indices, blood routine results, were collected from routine clinical practice. Clinical laboratory test results included albumin (ALB), aspartate aminotransferase (AST), alanine transaminase (ALT), blood urea nitrogen (BUN), creatine kinase (CK), creatine kinase-MB (CK-MB), creatinine (Crea), C-reactive protein (CRP), total bilirubin (TBIL), direct bilirubin (DBIL), globulin (GLB), lactate dehydrogenase (LDH), procalcitonin (PCT), total bile acid (TBA), hemoglobin (HB), lymphocyte count, monocyte count, neutrophil count, platelet distribution width (RDW), platelet (PLT), red blood cell (RBC), RDW (red blood cell distribution width-coefficient variation). SARS-CoV-2 RNA were detected using real time quantitive PCR (qPCR) on nucleic acid extracted from upper respiratory swab samples. Upper respiratory swab samples were collected on all suspected cases of SARS-CoV-2 infection on admission and immediately placed into sterile tubes with viral transport medium. All biochemical and hematology parameters were obtained via standard automated laboratory methods and using commercially available kits following to the manufacturers protocols.
Statistical Analysis
Categorical variables were expressed as frequency and percentages, and Fisher’s exact test was performed to analyze the significance. Continuous variables were expressed as mean (standard deviation [SD]), or median (interquartile range [IQR]), as appropriate. Parametric test (T test) and non-parametric test (Mann-Whitney U) were used for continuous variables with or without normal distribution, respectively. A value of p < 0.05 was considered statistically significant. Except for filling missing values, all the statistical analyses were analyzed using R (version 3.6.2) with default parameters.
Of all potential predictors in the dataset, 0.09 % of the fields had missing values. Predictor exclusion was limited to those with more than 7% missing rate to minimize the bias of the regression coefficient. [16]. Little’s MCAR test (R package BaylorEdPsych) was used to assess the suitability of the remaining missing values for imputation. This test is used to test whether missing values are “missing completely at random” (MCAR) or biased. The missing values were imputed by expectation-maximization (EM) method using SPSS statistical software, version 25 (SPSS, Inc., Chicago, IL, USA).
To identify the relative importance of each feature, feature selection was performed using the least absolute shrinkage and selection operator (LASSO) regression method, and prediction models were built using logistic regression, decision tree, random forest (RF) and support vector machine (SVM) using R package Caret, using 300-time repeated random sub-sampling validation for diverse parameter conditions, respectively. As described previously, Nomograms were established with the rms package and the performance of nomogram was evaluated by discrimination (Harrell’s concordance index) and calibration (calibration plots and Hosmer-Lemeshow calibration test) in R. During the external validation of the nomogram, the total points for each patient in the validation cohort were calculated based on the established nomogram.
Results
The selection of the study population is illustrated in Figure 1. A total of 372 COVID-19 patients were enrolled after admission from three centers in Guangzhou and Wuhan (Figure 1). All patients with non-severe COVID-19 during hospitalization were followed for more than 15 days after admission. Patients who deteriorated to severe or critical COVID-19 and patients who kept non-severe state were assigned to the severe and non-severe group, respectively. There were no significant differences in age, sex, disease types between the train cohort and validation cohorts (Table 1). In the train cohort, the non-severe COVID-19 group consisted of 159 (86.41%) patients, with a median age of 45 years of age (range 33-61 years) while 25 patients (13.59%), with a median age of 64 years of age (range 55-72 years) progressed to severe COVID-19. By the end of Feb 25, one patient with severe COVID-19 in the train group died. None of the 189 patients from the train group had a history of exposure to Huanan seafood market in Wuhan, 55 of them (29.1%) had not left Guangzhou recently, but had a close exposure history with COVID-19 patients, and the rest (70.9%) were Wuhan citizens or visited Wuhan recently. Other baseline characteristics in train cohort were shown in Table 2.
A total of 49 features were collected from each patient in the train cohort. After excluding irrelevant and redundant features, 39 features were remained for LASSO regression analysis. The results of the 189 patients showed that age, DBIL, RDW, BUN, CRP, LDH and ALB were predictive factors for severe COVID-19 with maximal Area Under Curve (AUC) (Figure 2B and 2C). Then we built prediction models using logistic regression, decision tree, random forest (RF) and support vector machine (SVM), and evaluated their performance by the receiver operating characteristic curve (ROC) and the precision-recall curve (appendix p1). There were no big difference in performance of these models except for decision tree. Therefore, logistic regression model was used for further analysis owing to its high predictive power and interpretability.
The predictive nomogram that integrated 7 selected features for the incidence of severe COVID-19 in the train cohort is shown (Figure 2C). To evaluate clinical applicability of our risk prediction nomogram, decision curve analysis (DCA) and clinical impact curve analysis (CICA) were performed. The DCA and CICA visually showed that the nomogram had a superior overall net benefit within the wide and practical ranges of threshold probabilities and impacted patient outcomes (Figure 2D and 2E). In Figure 3A and 3B, the calibration plot for severe illness probability showed a good agreement between the prediction by nomogram and actual observation in the train cohort and validation cohort 1, respectively.
In the train cohort, the nomogram had a significantly high AUC 0.912 (95% CI 0.846-0.978) to discriminate individuals with severe COVID-19 from non-severe COVID-19, with a sensitivity of 85.71 % and specificity of 87.58% (Figure 3C, Table 2). Cutpoint R package was used to calculate optimal cutpoints by bootstraping the variability of the optimal cutpoints, which was 188.6358 for our nomogram (corresponding to a threshold probability of 0.190). Then patients in the validation cohorts were divided into the low group (score ≤188.6358) and the high group (score>188.6358) for further analysis. In consistent with the train cohort, in validation cohort 1, AUC was 0.853 for patients with severe COVID-19 versus non-severe COVID-19 with a sensitivity of 77.5 % and specificity of 78.4% (Figure 3D, Table 3). In validation cohort 2, the sensitivity and the specificity of the nomogram were observed to be 75% and 100%, respectively.
Discussion
Early identification of patients approaching to severe COVID-19 patients will lead to better management and optimal use of medical resources. In this research, we identified older age, higher LDH and CRP, DBIL, RDW, BUN, and lower ALB on admission correlated with higher odds of severe COVID-19. Furthermore, we developed an effective prognostic nomogram composed of 7 features, had significantly high sensitivity and specificity to distinguish individuals with severe COVID-19 from non-severe COVID-19. DCA and CICA further indicated that our nomogram conferred significantly high clinical net benefit, which is of great value for accurate individualized assessment of the incidence of severe COVID-19.
So far, several researches reported some risk factors for severe COVID-19. However, the nomogram could present a quantitive and practical predictor tool for risk stratification of non-severe COVID-19 patients at admission. Though Liu et al developed a nomogram from a single center with a small sample size and no external validation6, our nomogram has a significantly higher AUC in the train and validation cohorts than Liu’s nomogram (0.912/0.853 vs 0.849). Our nomogram predicted a total of 188.6358 points at a 19.0 % probability threshold, which was close to the prevalence of severe COVID-19 (14.8%) in the training cohort and hence consistent with the reality. This cut-off value may lead to a slight increase of false positive rates but in the setting of this COVID-19 outbreak, a little high false positive rates are acceptable in order to minimize risks of missed diagnosis. Meanwhile, application of the nomogram in the training cohort and validation cohort showed good differentiation with AUC values of 0.912 and 0.853 respectively, as well as high sensitivity and specificity.
Furthermore, only seven easy-access features were included in our nomogram, including older age, higher LDH and CRP, DBIL, RDW, BUN, and lower ALB. Among of them, age, NLR and LDH has been reported to be risk factors for severe patients with SARS-CoV-2 infection3, 6, 7. NLR, a widely used marker for the assessment of system inflammation, was not identified by LASSO as an important feature instead of LDH and CRP, which are associated with the systemic inflammatory response8. However, LDH could predict severity of tissue damage in early stage of diseases as an auxiliary marker9. These might be reasons why the lasso model did not identified NLR as a more important feature. Consistent with other reports, our results indicate that patients with higher levels of inflammation at admission might be at higher risk for severe COVID-19 as well.
Interestingly, we found RDW was also an important prognostic predictor for severe COVID-19. RDW, one of the numbers or blood cell indices, reflects the variation in the size of RBC (red blood cells), which has been tightly correlated with critical disease10-12 but negligent in COVID-19. It is a robust predictor of the risk of all cause patient mortality and bloodstream infection in the critically ill11-14, including acute exacerbation of interstitial pneumonia, ARDS 10,15. RDW also can predict prognosis of sepsis, which was tied to poor COVID-19 outcomes-death16.
The increased RDW in COVID-19 patients may be due to the increased turnover of erythrocytes: 1) Pro-inflammatory states may be responsible for insufficient erythropoiesis with structural and functional alteration of RBC, such as decreased deformability leading to more rapid clearing of RBC. 2) Plasma cytokines such as interleukin 1 (IL-1) and tumor necrosis factor-α (TNF-α) could not only attenuate the renal erythropoietin (EPO) production, but also blunt the erythroid progenitor response to EPO. In addition, INF-γ contributes to apoptosis of the erythroid progenitors and decrease the EPO receptor expression17. 3) RBC are dynamic reservoirs of cytokines18. Decreased deformability of RBC in severe illness leads to RBC lysis and release of intracellular contents into the circulation19, including some inflammatory cytokines. This positive feedback could greatly promote the apparent shortened RBC survival and ultimately more morphological variations in cell sizes (i.e., elevated RDW), increased inflammatory response, and lead to severe illness. RDW can be regarded as an index of enhanced patient fragility and higher vulnerability to adverse outcomes20. The elevated RDW may explain fatigue experienced by severe COVID-19 patients.
Our study has several strengths: first, we provide a practical quantitive prediction tool based on only 7 features which were relatively inexpensive and easy to be obtained directly from the routine blood tests. Second, to guarantee the robustness of the conclusion, we included the data from three centers with a large sample size and validation in independent cohorts. The performance of our nomogram was efficient for clinical practice.
There were some limitations in the study. First, this is a retrospective study, including 372 patients with non-severe COVID-19 on admission. Second, some patients are still in hospital and their condition may change with follow-up. More comprehensive investigations need to be conducted to explain the characteric of the 7 features.
In summary, our data suggest that our nomogram could early identify the severe COVID-19 patients, and RDW was vaulable for prediction of severe diseases. Our nomogram is especially valuable for risk stratification management, which will be helpful for alleviating insufficient medical resources and reducing mortality.
Data Availability
No additional data available.
Contributors
BH, YLS and FZ designed the study and had full access to all data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. JG, JYO XPQ, YSJ, YQC and MKT contributed to collect data, analyze data, and write the paper. LXY, JC, MKT and WYX contributed to the statistical analysis. All authors contributed to data interpretation, and reviewed and approved the final version.
Conflict of Interest
The author(s) declare(s) that there is no conflict of interest regarding the publication of this paper.
Competing interests
All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare that there is no conflict of interest regarding the publication of this paper.
Acknowledgments
This work is funded by the Science and Technology Program of Guangzhou, China (201804010474).