Abstract
Background Accurate diagnostic strategies to rapidly identify SARS-CoV-2 positive individuals for management of patient care and protection of health care personnel are urgently needed. The predominant diagnostic test is viral RNA detection by RT-PCR from nasopharyngeal swabs specimens, however the results of this test are not promptly obtainable in all patient care locations. Routine laboratory testing, in contrast, is readily available with a turn-around time (TAT) usually within 1-2 hours.
Method We developed a machine learning model incorporating patient demographic features (age, sex, race) with 27 routine laboratory tests to predict an individual’s SARS-CoV-2 infection status. Laboratory test results obtained within two days before the release of SARS-CoV-2-RT-PCR result were used to train a gradient boosted decision tree (GBDT) model from 3,346 SARS-CoV-2 RT-PCR tested patients (1,394 positive and 1,952 negative) evaluated at a large metropolitan hospital.
Results The model achieved an area under the receiver operating characteristic curve (AUC) of 0.854 (95% CI: 0.829-0.878). Application of this model to an independent patient dataset from a separate hospital resulted in a comparable AUC (0.838), validating the generalization of its use. Moreover, our model predicted initial SARS-CoV-2 RT-PCR positivity in 66% individuals whose RT-PCR result changed from negative to positive within two days.
Conclusion This model employing routine laboratory test results offers opportunities for early and rapid identification of high-risk SARS-COV-2 infected patients before their RT-PCR results are available. It may play an important role in assisting the identification of SARS-COV-2 infected patients in areas where RT-PCR testing is not accessible due to financial or supply constraints.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The work of FW and YH is supported by National Science Foundation under grant number 1750326 and 1716432, and Office of Naval Research under grant number N00014-18-1-2585.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study was approved by the Institutional Review Board (#20-03021671) of Weill Cornell Medicine.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The patient data used in this study were collected from NYP-WCM and NYP-LMH and thus not publicly available
Abbreviation
- COVID-19
- corona virus disease-2019
- SARS-CoV-2
- severe acute respiratory syndrome coronavirus 2
- TAT
- turn-around time
- ED
- emergency department
- ICU
- intensive care unit
- RT-PCR
- real-time reverse transcription polymerase chain reaction
- GBDT
- gradient boosted decision tree
- HCP
- healthcare personnel
- WBC
- white blood cells
- RBC
- red blood cells
- LDH
- lactic acid dehydrogenase
- RDW-CV
- Red blood cell distribution width
- ALT
- Alanine aminotransferase
- AST
- Aspartate aminotransferase
- ALK
- Alkaline phosphatase
- BUN
- Blood urea nitrogen
- MCH
- Mean corpuscular hemoglobin
- MCV
- Mean corpuscular volume
- aPTT
- Activated partial thromboplastin time
- CRP
- C-reactive protein
- INR
- International normalized ratio
- PT
- Prothrombin time
- AUC
- Area under the receiver operating characteristic curve.