RT Journal Article SR Electronic T1 Routine laboratory blood tests predict SARS-CoV-2 infection using machine learning JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.06.17.20133892 DO 10.1101/2020.06.17.20133892 A1 Yang, He Sarina A1 Vasovic, Ljiljana V. A1 Steel, Peter A1 Chadburn, Amy A1 Hou, Yu A1 Racine-Brzostek, Sabrina E. A1 Cushing, Melissa M. A1 Loda, Massimo A1 Kaushal, Rainu A1 Zhao, Zhen A1 Wang, Fei YR 2020 UL http://medrxiv.org/content/early/2020/07/03/2020.06.17.20133892.abstract AB Background Accurate diagnostic strategies to rapidly identify SARS-CoV-2 positive individuals for management of patient care and protection of health care personnel are urgently needed. The predominant diagnostic test is viral RNA detection by RT-PCR from nasopharyngeal swabs specimens, however the results of this test are not promptly obtainable in all patient care locations. Routine laboratory testing, in contrast, is readily available with a turn-around time (TAT) usually within 1-2 hours.Method We developed a machine learning model incorporating patient demographic features (age, sex, race) with 27 routine laboratory tests to predict an individual’s SARS-CoV-2 infection status. Laboratory test results obtained within two days before the release of SARS-CoV-2-RT-PCR result were used to train a gradient boosted decision tree (GBDT) model from 3,346 SARS-CoV-2 RT-PCR tested patients (1,394 positive and 1,952 negative) evaluated at a large metropolitan hospital.Results The model achieved an area under the receiver operating characteristic curve (AUC) of 0.854 (95% CI: 0.829-0.878). Application of this model to an independent patient dataset from a separate hospital resulted in a comparable AUC (0.838), validating the generalization of its use. Moreover, our model predicted initial SARS-CoV-2 RT-PCR positivity in 66% individuals whose RT-PCR result changed from negative to positive within two days.Conclusion This model employing routine laboratory test results offers opportunities for early and rapid identification of high-risk SARS-COV-2 infected patients before their RT-PCR results are available. It may play an important role in assisting the identification of SARS-COV-2 infected patients in areas where RT-PCR testing is not accessible due to financial or supply constraints.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThe work of FW and YH is supported by National Science Foundation under grant number 1750326 and 1716432, and Office of Naval Research under grant number N00014-18-1-2585.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This study was approved by the Institutional Review Board (#20-03021671) of Weill Cornell Medicine.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe patient data used in this study were collected from NYP-WCM and NYP-LMH and thus not publicly availableCOVID-19corona virus disease-2019SARS-CoV-2severe acute respiratory syndrome coronavirus 2TATturn-around timeEDemergency departmentICUintensive care unitRT-PCRreal-time reverse transcription polymerase chain reactionGBDTgradient boosted decision treeHCPhealthcare personnelWBCwhite blood cellsRBCred blood cellsLDHlactic acid dehydrogenaseRDW-CVRed blood cell distribution widthALTAlanine aminotransferaseASTAspartate aminotransferaseALKAlkaline phosphataseBUNBlood urea nitrogenMCHMean corpuscular hemoglobinMCVMean corpuscular volumeaPTTActivated partial thromboplastin timeCRPC-reactive proteinINRInternational normalized ratioPTProthrombin timeAUCArea under the receiver operating characteristic curve.