ABSTRACT
BACKGROUND Risk prediction scores and classification models are fundamental tools to effectively triage incoming COVID-19 patients. However, current triaging methods often have poor predictive performance, are based on variables that are expensive to measure, and lead to decisions that are sometimes hard to interpret.
OBJECTIVE We introduce two new classification methods that are able to predict COVID-19 mortality risk from the automatic analysis of routine clinical variables with high accuracy and interpretability. The classifiers, denominated SVM22-GASS and Clinical-GASS, leverage machine learning methods and clinical expertise, respectively.
METHODS Both classifiers were developed using a derivation cohort of 499 patients and were validated with an independent validation cohort of 250 patients. The cohorts included COVID-19 positive patients admitted to two hospitals in the Italian Province of Ferrara between March 2020 and June 2020 (derivation cohort) and between September 2020 and March 2021 (validation cohort). The potential predictive variables analyzed in this study included demographic, anamnestic, and laboratory data, retrieved with the patients’ consents from their electronic health records.
The SVM22-GASS classifier is based on a Support Vector Machine model (SVM) with Radial Basis Function kernel (RBF). Importantly, the model uses only a subset of predictive variables that were automatically selected with the Least Absolute Shrinkage and Selection Operator (LASSO), while the RBF kernel is approximated with random feature expansions to reduce the computational requirements. The Clinical-GASS classifier is a threshold-based classifier that leverages the General Assessment of SARS-CoV-2 Severity (GASS) score: a highly interpretable COVID-19-specific clinical score that has been recently shown to be more effective at predicting the COVID-19 mortality risk than standard clinical scores.
RESULTS The SVM22-GASS model was able to predict the mortality risk of the validation cohort with an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.87 and an accuracy of 0.88 — performing on par with influential classification methods that exploit variables derived from expensive analyses such as medical imaging. Furthermore, variable importance analyses showed that the model relies primarily on eight variables for its predictions: White Blood Cell Count, Lymphocyte Count, Brain Natriuretic Peptide, Creatine Phosphokinase, Lactate Dehydrogenase, Fibrinogen, PaO2/FiO2 Ratio, and High-Sensitivity Troponin I.
Similarly, the Clinical-GASS classifier predicted the mortality risk of the validation cohort with an AUC of 0.77 and an accuracy of 0.78 — on par with other established and emerging machine-learning-based methods.
CONCLUSIONS Our results demonstrate that it is possible to accurately predict the COVID-19 mortality risk using only routine clinical variables that can be readily collected in the very early stages of hospital admission. The classifiers have the potential to assist the clinicians in quickly identifying the patients’ mortality risk to optimally allocate both human and financial resources.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The local Ethics Committee Comitato Etico Indipendente di Area Vasta Emilia Centro (CE-AVEC) approved the protocol of this study; the protocol code is 712/2020/Oss/AOUFe.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors