Abstract
Effectively identifying COVID-19 patients using non-PCR clinical data is critical for the optimal clinical outcomes. Currently, there is a lack of comprehensive understanding of various biomedical features and appropriate technical approaches to accurately detecting COVID-19 patients. In this study, we recruited 214 confirmed COVID-19 patients in non-severe (NS) and 148 in severe (S) clinical type, 198 non-infected healthy (H) participants and 129 non-COVID viral pneumonia (V) patients. The participants’ clinical information (23 features), lab testing results (10 features), and thoracic CT scans upon admission were acquired as three input feature modalities. To enable late fusion of multimodality data, we developed a deep learning model to extract a 10-feature high-level representation of the CT scans. Exploratory analyses showed substantial differences of all features among the four classes. Three machine learning models (k-nearest neighbor kNN, random forest RF, and support vector machine SVM) were developed based on the 43 features combined from all three modalities to differentiate four classes (NS, S, V, and H) at once. All three models had high accuracy to differentiate the overall four classes (95.4%-97.7%) and each individual class (90.6%-99.9%). Multimodal features provided substantial performance gain from using any single feature modality. Compared to existing binary classification benchmarks often focusing on single feature modality, this study provided a novel and effective breakthrough for clinical applications. Findings and the analytical workflow can be used as clinical decision support for current COVID-19 and other clinical applications with high-dimensional multimodal biomedical features.
One sentence summary We trained and validated late fusion deep learning-machine learning models to predict non-severe COVID-19, severe COVID-19, non-COVID viral infection, and healthy classes from clinical, lab testing, and CT scan features extracted from convolutional neural network and achieved predictive accuracy of > 96% to differentiate all four classes at once based on a large dataset of 689 participants.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study is supported by the North Carolina Biotechnology Center Flash Grant on COVID-19 Clinical Research (2020-FLG-3898), the National Science Foundation for Young Scientists of China (81703201), the Natural Science Foundation for Young Scientists of Jiangsu Province (BK20171076), the Jiangsu Provincial Medical Innovation Team (CXTDA2017029), the Jiangsu Provincial Medical Youth Talent program (QNRC2016548), the Jiangsu Preventive Medicine Association program (Y2018086), the Lifting Program of Jiangsu Provincial Scientific and Technological Association, and the Jiangsu Government Scholarship for Overseas Studies.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study was rigorously evaluated and approved by both IRB committees of Wuhan Union Hospital, Huazhong University of Science and Technology (approval number 2020-IEC-J-345) and Kunshan People Hospital, Jiangsu Provincial Center for Disease Control and Prevention (approval number JSJK2020-8003-01).
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All codes and de-identified data files were freely available on GitHub.