RT Journal Article SR Electronic T1 CovRNN—A recurrent neural network model for predicting outcomes of COVID-19 patients: model development and validation using EHR data JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2021.09.27.21264121 DO 10.1101/2021.09.27.21264121 A1 Laila Rasmy A1 Masayuki Nigo A1 Bijun Sai Kannadath A1 Ziqian Xie A1 Bingyu Mao A1 Khush Patel A1 Yujia Zhou A1 Wanheng Zhang A1 Angela Ross A1 Hua Xu A1 Degui Zhi YR 2021 UL http://medrxiv.org/content/early/2021/09/29/2021.09.27.21264121.abstract AB Background Predicting outcomes of COVID-19 patients at an early stage is critical for optimized clinical care and resource management, especially during a pandemic. Although multiple machine learning models have been proposed to address this issue, based on the need for extensive data pre-processing and feature engineering, these models have not been validated or implemented outside of the original study site.Methods In this study, we propose CovRNN, recurrent neural network (RNN)-based models to predict COVID-19 patients’ outcomes, using their available electronic health record (EHR) data on admission, without the need for specific feature selection or missing data imputation. CovRNN is designed to predict three outcomes: in-hospital mortality, need for mechanical ventilation, and long length of stay (LOS >7 days). Predictions are made for time-to-event risk scores (survival prediction) and all-time risk scores (binary prediction). Our models were trained and validated using heterogeneous and de-identified data of 247,960 COVID-19 patients from 87 healthcare systems, derived from the Cerner® Real-World Dataset (CRWD). External validation was performed using three test sets (approximately 53,000 patients). Further, the transferability of CovRNN was validated using 36,140 de-identified patients’ data derived from the Optum® de-identified COVID-19 Electronic Health Record v. 1015 dataset (2007–2020).Findings CovRNN shows higher performance than do traditional models. It achieved an area under the receiving operating characteristic (AUROC) of 93% for mortality and mechanical ventilation predictions on the CRWD test set (vs. 91·5% and 90% for light gradient boost machine (LGBM) and logistic regression (LR), respectively) and 86.5% for prediction of LOS > 7 days (vs. 81·7% and 80% for LGBM and LR, respectively). For survival prediction, CovRNN achieved a C-index of 86% for mortality and 92·6% for mechanical ventilation. External validation confirmed AUROCs in similar ranges.Interpretation Trained on a large heterogeneous real-world dataset, our CovRNN model showed high prediction accuracy, good calibration, and transferability through consistently good performance on multiple external datasets. Our results demonstrate the feasibility of a COVID-19 predictive model that delivers high accuracy without the need for complex feature engineering.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the Cancer Prevention and Research Institute of Texas (CPRIT) Grant No. RP170668 and the UTHealth Innovation for Cancer Prevention Research Training Program Pre-Doctoral Fellowship (CPRIT Grant No. RP160015 and CPRIT Grant No. RP210042).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:1. Non-abbreviated, full names and affiliations of all Ethics Committees / Institutional Review Boards that ruled on ethics of your study. 2. Decision made, i.e. whether ethical approval was given or waived. The Committee for the Protection of Human Subjects at the University of Texas Health Science Center in Houston reviewed the IRB # HSC-SBMI-20-0836 for the "Analysis of COVID-19 related data in Cerner's HealtheDataLab" project. The committee determined the project to qualify for exempt status according to 45 CFR 46.101(b)All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data that support the findings of this study, the Cerner® Real-World COVID-19 Q3 Dataset and Optum® de-identified COVID-19 Electronic Health Record v. 1015 dataset (2007 - 2020), are available for licensing at Cerner Corporation and Optum, Inc., respectively. Data access may require a data-sharing agreement and may incur data access fees.