Abstract
Objective To create and validate an ensemble of machine learning algorithms to accurately predict ICU admission or mortality upon initial presentation to the emergency department.
Methods This is a retrospective cohort study of a multicenter hospital system in the United States. The electronic health record was queried from March 2020 to December 2021 for patients who presented to the emergency department who were subsequently COVID-positive. Associated patient demographics, vitals, and laboratory vitals were obtained. High-risk individuals were defined as those who required ICU admission or died; low-risk individuals did not meet those criteria. The dataset was split into a 3:1 training to testing dataset. A machine learning ensemble stack was built to predict ICU admission and mortality.
Results Of the 3,142 hospital admissions with a COVID positive test, there were 1,128 (36%) individuals labeled as high-risk, and 2,014 (64%) as low-risk. We obtained 147 unique variables. CRP, LDH, procalcitonin, glucose, anion gap, creatinine, age, oxygen saturation, oxygen device, and obtainment of an ABG were chosen. Six machine learning models were then trained over model-specific hyperparameters, and then assessed on the testing dataset, generating an area under the receiver operator curve of 0.751, with a specificity of 95% in predicting high-risk individuals based on an initial emergency department assessment.
Conclusion A novel machine learning model was generated to predict ICU admission and patient mortality from a multicenter hospital system and validated on unseen data.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study was reviewed by the Loyola University Medical Center IRB (study ID: 215607), determining that the study meets federal and university criteria for exemption.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data are not currently available.
Abbreviations
- ABG
- arterial blood gas
- aPTT
- activated partial thromboplastin time
- AUROC
- area under the receiver operating characteristic curve
- BMI
- body mass index
- BNP
- brain natriuretic peptide
- BTF
- between the flags
- CBC
- complete blood count
- CMP
- complete metabolic panel
- COVID-19
- coronavirus disease 2019
- CRP
- c-reactive protein
- EHR
- electronic health record
- EN
- elastic net
- FDA
- flexible discriminant analysis
- GFR
- glomerular filtration rate
- ICU
- intensive care unit
- INR
- international normalized ratio
- LASSO
- least absolute shrinkage and selection operator
- LDH
- lactate dehydrogenase
- ML
- machine learning
- MEWS
- modified early warning score
- NB
- naïve bayes
- NEWS
- national early warning score
- NLR
- neutrophil to lymphocyte ratio
- NPV
- negative predictive value
- PCR
- polymerase chain reaction
- PPV
- positive predictive value
- qSOFA
- quick sequential organ failure assessment
- RF
- random forest
- SIRS
- systemic inflammatory response syndrome
- SVM
- supervise vector machine
- XGBoost
- extreme gradient boosted trees.