PT - JOURNAL ARTICLE AU - Chen, Yuanfang AU - Ouyang, Liu AU - Bao, Forrest Sheng AU - Li, Qian AU - Han, Lei AU - Zhu, Baoli AU - Xu, Ming AU - Liu, Jie AU - Ge, Yaorong AU - Chen, Shi TI - An Interpretable Machine Learning Framework for Accurate Severe vs Non-severe COVID-19 Clinical Type Classification AID - 10.1101/2020.05.18.20105841 DP - 2020 Jan 01 TA - medRxiv PG - 2020.05.18.20105841 4099 - http://medrxiv.org/content/early/2020/05/22/2020.05.18.20105841.short 4100 - http://medrxiv.org/content/early/2020/05/22/2020.05.18.20105841.full AB - Effectively and efficiently diagnosing COVID-19 patients with accurate clinical type is essential to achieve optimal outcomes for the patients as well as reducing the risk of overloading the healthcare system. Currently, severe and non-severe COVID-19 types are differentiated by only a few clinical features, which do not comprehensively characterize complicated pathological, physiological, and immunological responses to SARS-CoV-2 invasion in different types. In this study, we recruited 214 confirmed COVID-19 patients in non-severe and 148 in severe type, from Wuhan, China. The patients’ comorbidity and symptoms (26 features), and blood biochemistry (26 features) upon admission were acquired as two input modalities. Exploratory analyses demonstrated that these features differed substantially between two clinical types. Machine learning random forest (RF) models using features in each modality were developed and validated to classify COVID-19 clinical types. Using comorbidity/symptom and biochemistry as input independently, RF models achieved >90% and >95% predictive accuracy, respectively. Input features’ importance based on Gini impurity were further evaluated and top five features from each modality were identified (age, hypertension, cardiovascular disease, gender, diabetes; D-Dimer, hsTNI, neutrophil, IL-6, and LDH). Combining top 10 multimodal features, RF model achieved >99% predictive accuracy. These findings shed light on how the human body reacts to SARS-CoV-2 invasion as a unity and provide insights on effectively evaluating COVID-19 patient’s severity and developing treatment plans accordingly. We suggest that symptoms and comorbidities can be used as an initial screening tool for triaging, while biochemistry and features combined are applied when accuracy is the priority.One Sentence Summary We trained and validated machine learning random forest (RF) models to predict COVID-19 severity based on 26 comorbidity/symptom features and 26 biochemistry features from a cohort of 214 non-severe and 148 severe type COVID-19 patients, identified top features from both feature modalities to differentiate clinical types, and achieved predictive accuracy of >90%, >95%, and >99% when comorbidity/symptom, biochemistry, and combined top features were used as input, respectively.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study was jointly supported by the National Science Foundation for Young Scientists of China (81703201), the Natural Science Foundation for Young Scientists of Jiangsu Province (BK20171076), the Jiangsu Provincial Medical Innovation Team (CXTDA2017029), the Jiangsu Provincial Medical Youth Talent program (QNRC2016548), the Jiangsu Preventive Medicine Association program (Y2018086), the Lifting Program of Jiangsu Provincial Scientific and Technological Association, and the Jiangsu Government Scholarship for Overseas Studies.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe codes and fully de-identified data would be freely available on GitHub. https://github.com/forrestbao/corona/tree/master/blood