Abstract
Objectives The aim of this observational retrospective study is to improve early risk stratification of hospitalized Covid-19 patients by predicting in-hospital mortality, transfer to intensive care unit (ICU) and mechanical ventilation from electronic health record data of the first 24 hours after admission.
Methods and Results Our machine learning model predicts in-hospital mortality (AUC=0.918), transfer to ICU (AUC=0.821) and the need for mechanical ventilation (AUC=0.654) from a few laboratory data of the first 24 hours after admission. Models based on dichotomous features indicating whether a laboratory values exceeds or falls below a threshold perform nearly as good as models based on numerical features.
Conclusions We devise completely data-driven and interpretable machine-learning models for the prediction of in-hospital mortality, transfer to ICU and mechanical ventilation for hospitalized Covid-19 patients within 24 hours after admission. Numerical values of CRP and blood sugar and dichotomous indicators for increased partial thromboplastin time (PTT) and glutamic oxaloacetic transaminase (GOT) are amongst the best predictors.
Introduction
Beginning in late 2019 and lasting until now SARS-CoV-2 manifested as Covid-19 spread all over the world and caused a worldwide pandemic. Infected patients develop a variety of disease symptoms and differences in the hemogram resulting in a wide range of disease severity from mild symptoms not requiring any medical intervention to mechanical ventilation or a transfer to intensive care unit (ICU) or even death (Amin et al., 2021; Palladino, 2021; Son et al., 2021). Several drugs for Covid-19 treatment have been developed since the beginning of the pandemic, but most of them are linked to different disease stages. For example, hospitalized patients with severe symptoms can be treated with Remdesivir and Dexamethason, wheareas antibody-based therapy has to be administered at an early disease stage before a patient has developed severe symptoms (Han et al., 2021; Mechineni et al., 2021). For optimal patient care and treatment in hospitals it is very important to detect patients with bad prospective disease progression early. Hence, there is an urgent need for generalizable clinical prediction models to identify patients with potentially severe disease courses.
Many existing predictive models of severe Covid-19 disease progression are based on data from tertiary care hospitals like university hospitals or from clinical study data repositories. Many scoring models incorporate non-standard laboratory values, which renders their widespread application in daily clinical practice difficult (Sun et al., 2021; Wollenstein-Betech et al., 2020). Here we present personalized and completely data-driven machine-learning models for the prediction of (i) in-hospital mortality, (ii) transfer to ICU and (iii) mechanical ventilation of hospitalized Covid-19 patients. Our models use standard clinical laboratory data from hospitals of medium level of care measured during clinical routine in combination with biological sex and age as covariates. Our purely data-driven approach avoids potential bias or the pure reproduction of well-known results (Yarritu and Matute, 2015) and is an important addition to the landscape of expert knowledge-based Covid-19 risk scores (Häger et al., 2022). We also present simplified models using only dichotomous predictors indicating whether a laboratory value is below or above reference threshold. These might better reflect the daily clinical practice than a complex combination of numerical features. In addition, we report a comprehensive analysis of laboratory values associated with a severe Covid-19 disease progression.
Methods and patients
Study population and inclusion criteria
For model development we conduct an observational retrospective cohort study using data from a hospital of medium level of care located in the federal state of Rhineland-Palatinate in the west of Germany (Table 1, FIigure 1). We include 520 patients with a positive RT-PCR for SARS-CoV-2 identified by the ICD code U07.1 admitted from March 2020 until December 2021 to the hospitals. Because of too many missing values, 12 patients were excluded. No patient was transferred from an ICU of another hospital. For model development, we use 80% of the data as training set and 20% as test set. The report is based on the STROBE-statement (Vandenbroucke et al., 2007). Ethical approval was obtained from the local ethics commission.
Study design and statistical analysis
We define three Covid-19 associated endpoints (see Supplemental Material for details):
Death during hospital stay, short “in-hospital mortality”
Admission to intensive care unit (ICU), short “transfer to ICU”
Necessity for mechanical ventilation (all OPS beginning with “8-71”), short “mechanical ventilation”
For the training of the prediction models we use the laboratory values obtained during the first 48 hours after admission and average them over this time period. For prediction and model testing, we restrict the time span to 24 hours after admission. For each endpoint we divide the patient cohort into two distinct groups, depending on whether the endpoint occurred or not. To check for differences in the laboratory values between these groups we perform Wilcoxon-rank-sum tests with Bonferroni-Holm adjusted p-values. The p-values are used as a measure of association strength between the laboratory value and the endpoint and enable us to rank the features. We filter the top-10 laboratory values with an adjusted p-value smaller than 5% and less than 10% missing values. These are combined with biological sex and age to form potential features for the machine-learning models.
We compare three supervised classifiers: Logistic regression (LR), Random forest (RF) and XGBoost. To select predictive features for each of these three model classes we employ 5-fold cross validation. For LR we perform forward-backward selection. For the random forest classifier or the XGBoost classifier we use the mean feature importance as a criterion for feature selection and in addition also train these tree based classifiers using the same features as identified for the LR models. The model (including selected features) with the highest receiver operator characteristics area under the curve (ROC-AUC) averaged over the cross validation folds from the training data set is selected as the final model for the respective endpoint.
During model creation we observed that the cross validation performance using the features from the first 48 hours is similar to the performance using the same features observed during the first 24 hours only. Thus, we have trained our models on data from the first 48 hours, but for prediction and testing we restrict to average laboratory values of the first 24 hours.
In addition to these models based on numerical laboratory values, we train models using dichotomous features indicating whether a certain laboratory value exceeds or falls below a predefined reference threshold. In these models, age is also replaced by a dichotomous feature indicating whether the patient is older or younger than 60 years. These models are easier to interpret and might support the need for rapid decision making by physicians in daily clinical practice (see Supplemental Material for details). We exclude blood sugar from the list of possible dichotomous predictors, because reference values depend on the time gap to the last meal before the blood draw. Information about the last meal was not available. The dichotomous models are more sensitive to these variations than the numerical models.
Results
Study population
A total of 520 patients (248 (47.7%) female) admitted to the hospital between March 2020 and December 2021 and diagnosed with SARS-CoV-2 are included in our study (see Table 1). From these, 87 patients (16.7%) deceased and 89 patients (17.1%) were transferred to ICU during hospital stay. Due to DNR/DNI or palliative treatment just a subgroup of the deceased patients were transferred to ICU. A mechanical ventilation was performed on 59 patients. The mean age of our cohort is 60.4 (45.0 – 82.0), which is expected given that age is a well-known risk factor for severe disease progression (Romero Starke et al., 2021).
For each of the three endpoints we divide the patients into two subgroups, depending on whether the endpoint occurred or not. To identify laboratory values indicating differences between the two respective subgroups we use Wilcoxon-rank-sum-tests with Bonferroni-Holm adjustment. We restrict this to the first 48 hours of the hospital stay and use the adjusted p-values to rank the laboratory values according to their association with the respective endpoint, see Fig. 2. All laboratory values with a p-value smaller than 0.05 are considered to be strongly associated with the endpoint.
For the endpoint in-hospital mortality we find 23 laboratory values to be strongly associated (Fig. 2a). This includes well-known biomarkers for a severe Covid-19 progression, e.g. lymphocytes (Lymph) and monocytes (Monoc) as hematological biomarkers, CRP, lactatdehydrogenase (LDH) and procalcitonin (PCT) as inflammatory biomarkers and N-terminal of the prohormona brain natriuretic peptide (NTpBNp), glutamic oxaloacetic transaminase (GOT) as cardiac biomarkers, and calcium as minerals (Samprathi and Jayashree, 2021). The laboratory values with the smallest p-values urea and creatinine are known to have elevated levels at admission to hospitals in non-survivors compared to survivors of Covid-19 patients (Wang et al., 2020). In accordance with our findings, Covid-19 is sometimes associated with a coagulation dysfunction, which could be indicated through the significant partial thromboplastin time (PTT), QUICK test and INR values (Lin et al., 2021). We report significantly increased levels of the mean corpuscular volume (MCV) and decreased levels of the mean corpuscular hemoglobin concentration (MCHC) for Covid-19 patients who died during their hospital stay which are known to be altered in Covid-19 patients (Grau et al., 2022).
For the endpoint transfer to ICU we identify 15 laboratory values (Fig. 2b), nine of them overlapping with the strongly associated laboratory values for in-hospital mortality, including blood sugar (Glucose), calcium and CRP. Interestingly, the two laboratory values urea and creatinine with the smallest p-values for the endpoint in-hospital mortality are not strongly associated with a transfer to ICU. We identify Neutrophil granulocytes (Neutro) to be higher for patients referred to ICU, but not for patients who died in the hospital. Neutrophil granulocytes were previously reported to play an important role in Covid-19-associated thrombosis (Reusch et al., 2021; Zuo et al., 2021). Reduced levels of Eosinophils (Eos) and an increase in segmented neutrophils (Seg) are also strongly associated with a transfer to ICU, but not with in-hospital mortality. Low ionized Calcium (iCalcium) and calcium are known indicators of a severe Covid-19 disease progression (Zhou et al., 2020).
We find 12 laboratory values to be strongly associated with the necessity for “Mechanical Ventilation” (Fig. 2c). All of them are a subset of the laboratory values strongly associated to transfer to ICU, which makes sense, because most of the patients, who received mechanical ventilation were transferred to ICU – just seven of them were not transferred to ICU.
Overall, it can be seen that just a fraction of the 85 to 90 tested laboratory values show a strong association with the endpoints in our population. In agreement with previous reports we find CRP, blood sugar (Glucose), LDH, and Lymph as markers for the occurrence of either of the adverse events. However, it is interesting that urea and creatinine are the laboratory values with the strongest associations to in-hospital mortality, but are not strongly associated with the other two endpoints.
CRP and blood sugar are good predictors for the Covid-19 associated endpoints in-hospital mortality, transfer to ICU and mechanical ventilation (Figure 3)
We devise prediction models for the occurrence of the endpoints based on biological sex, age and the top-10 laboratory values with the strongest associations to the respective endpoints from Figure 2. We perform 5-fold cross validation on the training data (80%) to select the models and their respective features with the highest ROC-AUC. In Fig. 3 we present results (ROC-curves) for predictions of these selected best models on the test data (20%) not used for training with violinplots of the predictors based on the entire dataset. In-hospital mortality can be predicted from the combination of the three laboratory values CRP, urea and blood sugar evaluated at the first 24 hours after admission augmented by age (Romero Starke et al., 2021) with an AUC of 0.918 (95% CI: 0.857-0.979) using a logistic regression model (Fig. 3a). Urea as the top laboratory value associated with in-hospital death (Fig. 2a) is chosen as a predictive feature, although it is not strongly associated with the other endpoints (Figs 2 b,c).
A more complex nonlinear XGBoost model based on age and four laboratory values predicts the transfer to ICU (Fig. 3b) with an AUC of 0.821 (95% CI: 0.688-0.954). Please note the differences in the age distribution for this endpoint by contrast with the deceased patients in Fig. 3a. Compared to this endpoint, the laboratory values GOT and Ca are chosen in addition to CRP and blood sugar as predictors for transfer to ICU, whereas urea was eliminated by the feature selection procedure. Some patients exhibit extreme GOT levels, as indicated by the violin plots.
Most patients who were transferred to the ICU also received mechanical ventilation. Nevertheless, prediction of mechanical ventilation is more difficult (Fig. 3c). The best model is a Random Forest based on calcium, CRP and blood sugar with a test AUC of 0.654 (95% CI: 0.498-0.81). These laboratory values are also in the set of predictors for transfer to ICU.
Increased levels of CRP and blood sugar are strongly associated with and important predictors for all three endpoints.
PTT and GOT are good dichotomous predictors for the Covid-19 associated endpoints in-hospital mortality, transfer to ICU and mechanical ventilation (Figure 4)
The combination of numerical laboratory values and age might still not be simple enough to guide medical decision making under stressful conditions in hospitals. Therefore, we devise models based on dichotomous features indicating, whether the value is higher or lower than a predefined critical threshold. In addition, we also use a dichotomous feature for age, indicating whether the patient was younger than 60 years or not.
In hospital mortality can be predicted from dichotomous values for urea, PTT, GOT and age by logistic regression with an AUC of 0.865 (95% CI: 0.787-0.943), see Fig. 4a. This is only slightly worse than the prediction from numerical features (compare Fig. 3a). Age and urea are included as predictors in both the numerical and dichotomous model for this endpoint.
Using only dichotomised features, transfer to the ICU can be predicted with an average AUC of 0.748 (95% CI: 0.614 to 0.883), see Fig. 4b. This is nearly as accurate as the prediction from numerical features (compare Fig. 3b). The selected logistic regression model uses the laboratory values calcium, PTT and GOT in combination with biological sex as predictors (Fig 4b). GOT and calcium are also part of the numerical model. Predicting the necessity of mechanical ventilation using dichotomous features only (Fig. 4c) seems to be not less accurate (AUC of 0.73, 95% CI:0.565-0.896) than predictions from numerical features (Fig. 3b). For this endpoint, the best performing model is again XGBoost with calcium, CRP, PTT and GOT as predictors. Calcium and CRP is also selected in the model with numerical features, whereas blood sugar is replaced by a combination of PTT and GOT in the model with dichotomous features only. As for the numerical features, neither age nor biological sex as additional features improve the prediction (cross validation on the training data) of the need for mechanical ventilation.
The differences between the features selected for the numerical and dichotomous models indicate that some laboratory values are more suitable for decisions based on dichotomized values (“too high / too low”) than others. The reference range of the laboratory values is defined such that 95% of a healthy reference population have values lying within the reference range, which does not mean, that laboratory values lying outside the reference range are automatically critical values (Boyd, 2010). For example, urea and calcium seem to be robust against dichotomization, whereas the absolute level of the CRP seems to be more informative than just an increase above the reference level. In contrast, a too high a value of PTT seems to be informative even when the absolute level is not considered.
Summary and Conclusions
All in all, we devise purely data-driven generalizable predictive machine-learning models for a severe Covid-19 outcome using a small and well interpretable number of standard laboratory values combined with age and biological sex. The endpoints in-hospital mortality and transfer to the ICU can be predicted with high or good accuracy within the first 24 hours after admission. Predicting the need for mechanical ventilation is much more difficult. For all three endpoints, models using only dichotomous features perform only slightly worse than models based on a complex combination of numerical laboratory values, sometimes complemented by age and/or biological sex. In particular, the models based on dichotomous features are simple to interpret and easily applicable in a real life hospital setting.
For some laboratory values including CRP and blood sugar the numerical values are informative for prediction, whereas other laboratory values like PTT and GOT are suitable as dichotomous features indicating values which are too high or too low. We observe that many features including CRP, blood sugar, LDH and Lymph are strongly associated to all of three endpoints. Intriguingly, urea and creatinine are the laboratory values most strongly associated with in-hospital mortality, although they are not significantly associated with the other two endpoints.
Please note that we also analyzed ICD codes for diagnosis as additional features.
Although significant differences between the frequency of diagnosis between the two patient groups for each endpoint were observed (Supplemental Material Figure S1), we have observed that inclusion of these diagnostic features did not improve the models much. This suggests that laboratory values alone are sufficient to predict Covid-19 outcomes in hospitals. In addition, the time of diagnosis is often not available in our data.
Limitations
In our study we include patients admitted to hospitals from the beginning of the pandemic until the end of 2021. Due to the rapidly changing epidemiological circumstances of the pandemic we were not able to test the generalizability of our models to a population, where the Omicron mutation is the dominating virus mutation. From 2020 until December 2021 the Wildtype, Alpha, Beta and Delta mutations were the dominating Covid-19 variants in Germany (Boehm et al., 2021; Schilling et al., 2021). Unfortunately, we have no opportunity to check the patient-level mutation status of the virus variant, but it is plausible that these might be the dominating mutations in our dataset.
Furthermore, we have no data regarding the vaccination status of the patients, but we assume that most patients until spring or summer 2021 were not completely vaccinated against Covid-19, but after summer 2021 the majority of the patients should be completely vaccinated based on the vaccination rate in Germany (Steffen, et al., 2022).
The inclusion of vital parameters and vaccination status could improve our models.
Outlook
To test how well our predictions generalize to other hospitals, we will evaluate the performance of the trained models on a test set from a different patient cohort and different hospitals. This will also include extensions to patient cohorts with other dominating virus mutations. Further improvements include time dependent predictions allowing for an online monitoring of patients
Data Availability
Raw data can not made available because of German and EU data privacy regulations.
Conflict of Interest
The authors declare no competing interests.
Funding Source
This work was part of the project “Ein Global-Trigger-Tool für COVID-19-bedingte Schwerstschadenereignisse in Krankenhäusern” (A global trigger tool for Covid-19-caused sentinel events in hospitals) funded by the Ministerium für Wissenschaft und Gesundheit Rheinland-Pfalz, Deutschland (ministry of sciences and health of Rhineland-Palatinate, Germany).
Ethical Approval statement
The restrospective observational study was approved by and performed according to the guidelines of the local ethics committees.
Data availability statement
Due to german data protection law we are not allowed to publicly share the patient data used in this publication.
Acknowledgements
We would like to thank Jan Hasenauer for the fruitful discussions.