Abstract
The severe acute respiratory syndrome coronavirus (SARS-CoV-2) causing coronavirus disease 2019 (COVID-19) is highly transmissible and has been responsible for a pandemic associated with a high number of deaths. The clinical management of patients and the optimal use of resources are two important factors in reducing this mortality, especially in scenarios of high incidence. To this end, it is necessary to develop tools that allow early triage of patients with the minimal use of diagnostic tests and based on readily accessible data, such as electronic medical records. This work proposes the use of a machine learning model that allows the prediction of mortality and risk of hospitalization using simple demographic characteristics and comorbidities, using a COVID-19 dataset of 86867 patients. In addition, we developed a new method designed to deal with data imbalance problems. The model was able to predict with high accuracy (89-93%, ROC-AUC = 0.94) the patient’s final status (expired/discharged) and with medium accuracy the risk of hospitalization (71-73%, ROC-AUC = 0.75). These models were obtained by assembling and using easily obtainable clinical characteristics (2 demographic characteristics and 19 predictors of comorbidities). The most relevant features of these models were the following patient characteristics: age, sex, number of comorbidities, osteoarthritis, obesity, depression, and renal failure.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This research was supported by the Science and Technology Agency, Seneca Foundation, Comunidad Autonoma Region de Murcia, Spain. AC was supported by the same foundation through the grant 20762/FPI/18. JB was supported by the same foundation through the research project 00007/COVI/20.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics committee of University of Murcia gave ethical approval for this work
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors and Murcia health service