RT Journal Article SR Electronic T1 Predicting Critical State after COVID-19 Diagnosis Using Real-World Data from 20152 US Patients JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.07.24.20155192 DO 10.1101/2020.07.24.20155192 A1 Rinderknecht, Mike D. A1 Klopfenstein, Yannick YR 2020 UL http://medrxiv.org/content/early/2020/08/11/2020.07.24.20155192.abstract AB The global COVID-19 pandemic caused by the virus SARS-CoV-2 has led to over 10 million confirmed cases, half a million deaths, and is challenging healthcare systems worldwide. With limited medical resources, early identification of patients with a high risk of progression to severe disease or a critical state is crucial. We present a prognostic model predicting critical state within 28 days following COVID-19 diagnosis trained on data from US electronic health records (EHR) within IBM Explorys, including demographics, comorbidities, symptoms, laboratory test results, insurance types, and hospitalization. Our entire cohort included 20152 COVID-19 cases, of which 3160 patients went into critical state or died. Random, stratified train-test splits were repeated 100 times to obtain a distribution of performance. The median and interquartile range of the areas under the receiver operating characteristic curve (ROC AUC) and the precision recall curve (PR AUC) were 0.863 [0.857, 0.866] and 0.539 [0.526, 0.550], respectively. Optimizing the decision threshold led to a sensitivity of 0.796 [0.775, 0.821] and a specificity of 0.784 [0.769, 0.805]. Good model calibration was achieved, showing only minor tendency to over-forecast probabilities above 0.6. The validity of the model was demonstrated by the interpretability analysis confirming existing evidence on major risk factors (e.g., higher age and weight, male gender, diabetes, cardiovascular disease, and chronic kidney disease). The analysis also revealed higher risk for African Americans and “self-pay patients”. To the best of our knowledge, this is the largest dataset based on EHR used to create a prognosis model for COVID-19. In contrast to large-scale statistics computing odds ratios for individual risk factors, the present model combining a rich set of covariates can provide accurate personalized predictions enabling early treatment to prevent patients from progressing to a severe or critical state.Competing Interest StatementThe authors have declared no competing interest.Funding StatementNo third party funding was received for this work.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The project was approved by the Data Access and Control Board (IBM Watson Health) to further research into COVID-19 for the greater good. Patients in the US opt in by default and need to actively opt out. If the opt out, their data will not be part of the Explorys database used in this work.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data are the property of IBM and not in the public domain.