RT Journal Article SR Electronic T1 Using Machine Learning of Clinical Data to Diagnose COVID-19 JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.06.24.20138859 DO 10.1101/2020.06.24.20138859 A1 Li, Wei Tse A1 Ma, Jiayan A1 Shende, Neil A1 Castaneda, Grant A1 Chakladar, Jaideep A1 Tsai, Joseph C. A1 Apostol, Lauren A1 Honda, Christine O. A1 Xu, Jingyue A1 Wong, Lindsay M. A1 Zhang, Tianyi A1 Lee, Abby A1 Gnanasekar, Aditi A1 Honda, Thomas K. A1 Kuo, Selena Z. A1 Yu, Michael Andrew A1 Chang, Eric Y. A1 Rajasekaran, Mahadevan “Raj” A1 Ongkeko, Weg M. YR 2020 UL http://medrxiv.org/content/early/2020/06/24/2020.06.24.20138859.abstract AB The recent pandemic of Coronavirus Disease 2019 (COVID-19) has placed severe stress on healthcare systems worldwide, which is amplified by the critical shortage of COVID-19 tests. In this study, we propose to generate a more accurate diagnosis model of COVID-19 based on patient symptoms and routine test results by applying machine learning to reanalyzing COVID-19 data from 151 published studies. We aimed to investigate correlations between clinical variables, cluster COVID-19 patients into subtypes, and generate a computational classification model for discriminating between COVID −19 patients and influenza patients based on clinical variables alone. We discovered several novel associations between clinical variables, including correlations between being male and having higher levels of serum lymphocytes and neutrophils. We found that COVID-19 patients could be clustered into subtypes based on serum levels of immune cells, gender, and reported symptoms. Finally, we trained an XGBoost model to achieve a sensitivity of 92.5% and a specificity of 97.9% in discriminating COVID-19 patients from influenza patients. We demonstrated that computational methods trained on large clinical datasets could yield ever more accurate COVID-19 diagnostic models to mitigate the impact of lack of testing. We also presented previously unknown COVID-19 clinical variable correlations and clinical subgroups.Competing Interest StatementThe authors have declared no competing interest.Funding StatementUniversity of California, Office of the President/Tobacco-Related Disease Research Program Emergency COVID-19 Research Seed Funding Grant (R00RG2369) to W.M.O.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:N/AAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe datasets during and/or analysed during the current study available from the corresponding author on reasonable request.CRPC-reactive ProteinANOVAAnalysis of VatrianceSOMSelf-organizing mapXGBoostExtreme Gradient BoostingROCReceiver Operating CharacteristicAUCArea Under the CurvePRPrecision-Recall