Abstract
Background Timely identification of COVID-19 patients at high risk of mortality can significantly improve patient management and resource allocation within hospitals. This study seeks to develop and validate a data-driven personalized mortality risk calculator for hospitalized COVID-19 patients.
Methods De-identified data was obtained for 3,927 COVID-19 positive patients from six independent centers, comprising 33 different hospitals. Demographic, clinical, and laboratory variables were collected at hospital admission. The COVID-19 Mortality Risk (CMR) tool was developed using the XGBoost algorithm to predict mortality. Its discrimination performance was subsequently evaluated on three validation cohorts.
Findings The derivation cohort of 3,062 patients has an observed mortality rate of 26.84%. Increased age, decreased oxygen saturation (≤ 93%), elevated levels of C-reactive protein (≥ 130 mg/L), blood urea nitrogen (≥ 18 mg/dL), and blood creatinine (≥ 1.2 mg/dL) were identified as primary risk factors, validating clinical findings. The model obtains out-of-sample AUCs of 0.90 (95% CI, 0.87-0.94) on the derivation cohort. In the validation cohorts, the model obtains AUCs of 0.92 (95% CI, 0.88-0.95) on Seville patients, 0.87 (95% CI, 0.84-0.91) on Hellenic COVID-19 Study Group patients, and 0.81 (95% CI, 0.76-0.85) on Hartford Hospital patients. The CMR tool is available as an online application at covidanalytics.io/mortality_calculator and is currently in clinical use.
Interpretation The CMR model leverages machine learning to generate accurate mortality predictions using commonly available clinical features. This is the first risk score trained and validated on a cohort of COVID-19 patients from Europe and the United States.
Evidence before this study We searched PubMed, BioRxiv, MedRxiv, arXiv, and SSRN for peer-reviewed articles, preprints, and research reports in English from inception to March 25th, 2020 focusing on disease severity and mortality risk scores for patients that had been infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Earlier investigations showed promise at predicting COVID-19 disease severity using data at admission. However, existing work was limited by its data scope, either relying on a single center with rich clinical information or broader cohort with sparse clinical information. No analysis has leveraged Electronic Health Records data from an international multi-center cohort from both Europe and the United States.
Added value of this study We present the first multi-center COVID-19 mortality risk study that uses Electronic Health Records data from 3,062 patients across four different countries, including Greece, Italy, Spain, and the United States, encompassing 33 hospitals. We employed state-of-the-art machine learning techniques to develop a personalized COVID-19 mortality risk (CMR) score for hospitalized patients upon admission based on clinical features including vitals, lab results, and comorbidities. The model validates clinical findings of mortality risk factors and exhibits strong performance, with AUCs ranging from 0.81 to 0.92 across external validation cohorts. The model identifies increased age as a primary mortality predictor, consistent with observed disease trends and subsequent public health guidelines. Additionally, among the vital and lab values collected at admission, decreased oxygen saturation (≤ 93%) and elevated levels of C-reactive protein (≥ 130 mg/L), blood urea nitrogen (≥ 18 mg/dL), blood creatinine (≥ 1.2 mg/dL), and blood glucose (≥180 mg/dL) are highlighted as key biomarkers of mortality risk. These findings corroborate previous studies that link COVID-19 severity to hypoxemia, impaired kidney function, and diabetes. These features are also consistent with risk factors used in severity risk scores for related respiratory conditions such as community-acquired pneumonia.
Implications of all the available evidence Our work presents the development and validation of a personalized mortality risk score. We take a data-driven approach to derive insights from Electronic Health Records data spanning Europe and the United States. While many existing papers on COVID-19 clinical characteristics and risk factors are based on Chinese hospital data, the similarities in our findings suggest consistency in the disease characteristics across international cohorts. Additionally, our machine learning model offers a novel approach to understanding the disease and its risk factors. By creating a single comprehensive risk score that integrates various admission data components, the calculator offers a streamlined way of evaluating COVID-19 patients upon admission to augment clinical expertise. The CMR model provides a valuable clinical decision support tool for patient triage and care management, improving risk estimation early within admission, that can significantly affect the daily practice of physicians.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
HW is supported by the National Science Foundation Graduate Research Fellowship under Grant No. 174530. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the National Science Foundation. No further funding was provided for the study.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All independent organizations and the Massachusetts Institute of Technology institutional review boards approved this protocol as minimal-risk research using data collected for standard clinical practice and waived the requirement for informed consent. The survey was anonymous and confidentiality of information was assured.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Electronic Health Record data cannot be shared publicly because it consists of personal information from which it is difficult to guarantee de-identification. As a result, there is a possibility of deductive disclosure of participants and therefore full data access through a public repository is not permitted by the institution that provided us the data. The data and associated documentation from each collaborating institution can only be made available under a new data sharing agreement with which includes: 1) commitment to using the data only for research purposes and not to identify any individual participant; 2) a commitment to securing the data using appropriate measures, and 3) a commitment to destroy or return the data after analyses are complete. Requests can be made to the research team of the corresponding institutions.