Abstract
With the growing amount of COVID-19 cases, especially in developing countries with limited medical resources, it is essential to accurately and efficiently diagnose COVID-19. Due to characteristic ground-glass opacities (GGOs) and other types of lesions being present in both COVID-19 and other acute lung diseases, misdiagnosis occurs often — 26.6% of the time in manual interpretations of CT scans. Current deep-learning models can identify COVID-19 but cannot distinguish it from other common lung diseases like bacterial pneumonia. Concretely, COVision is a deep-learning model that can differentiate COVID-19 from other common lung diseases, with high specificity using CT scans and other clinical factors. COVision was designed to minimize overfitting and complexity by decreasing the number of hidden layers and trainable parameters while still achieving superior performance. Our model consists of two parts: the CNN which analyzes CT scans and the CFNN (clinical factors neural network) which analyzes clinical factors such as age, gender, etc. Using federated averaging, we ensembled our CNN with the CFNN to create a comprehensive diagnostic tool. After training, our CNN achieved an accuracy of 95.8% and our CFNN achieved an accuracy of 88.75% on a validation set. We found a statistical significance that COVision performs better than three independent radiologists with at least 10 years of experience, especially in differentiating COVID-19 from pneumonia. We analyzed our CNN’s activation maps through Grad-CAMs and found that lesions in COVID-19 presented peripherally, closer to the pleura, whereas pneumonia lesions presented centrally.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
1) Consortium of Chest CT Image Investigation (CC-CCII) Dataset: http://ncov-ai.big.ac.cn/download?lang=en 2) Khorshid COVID Cohort (KCC): https://doi.org/10.6084/m9.figshare.16682422.v1 3) Israeli Ministry of Health: https://data.gov.il/dataset/covid-19/resource/74216e15-f740-4709-adb7-a6fb0955a048 The CT Scans of COVID-19, pneumonia, and healthy patients were obtained from the China Consortium of Chest CT Image Investigation (CC-CCII) dataset. Ground truth for the CC-CCII dataset was established via serology tests and confirmed by laboratory findings. Clinical factors for COVID-19 and pneumonia patients were obtained from the Khorshid COVID Cohort (KCC). Clinical factors for healthy patients were obtained from the Israeli Ministry of Health public dataset. We compiled all the clinical factors data into a CSV file using the pandas and NumPy libraries in Python. We removed the clinical factors from the dataset that were not one of the following: shortness of breath, cough, headache, fever, sore throat, age, and gender. We binarized the ages of the patients by having a threshold age of 60 years (1 assigned to age if the age is greater than 60 years, 0 assigned if the age is less than 60 years).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
The manuscript has been revised for the following reasons: 1) The original introduction highlights the significance of identifying COVID-19 cases through CT images. However, it doesn't provide a clear and brief statement of the research question and hypothesis. 2) The methodology section lacked technical details, creating a gap in the paper's depth. It only offered a general overview of the deep learning process, while omitting vital specifics essential for reproducibility and result evaluation. For instance: “The numbers of COVID-19, pneumonia, and healthy slices within the 194,922 isolated CT slices were unspecified. Therefore, it is unclear whether duplicated slices are present in the stratified random samples of 35,000 COVID-19, 35,000 pneumonia, and 35,000 healthy slices, respectively. A similar uncertainty surrounds the clinical factors (features).” 3) There was an inconsistency between the numbers presented in Table 1 and those mentioned in the accompanying text. 4) The ROC curves should've been presented individually for each class, rather than combining all classes together. 5) When comparing the performance of the CNN with that of independent board-certified radiologists, metrics such as AUC, sensitivity, specificity, and other classification-specific measures should've been used instead of a t-test.
Data Availability
All data produced are available online at: 1) Consortium of Chest CT Image Investigation (CC-CCII) Dataset: http://ncov-ai.big.ac.cn/download?lang=en 2) Khorshid COVID Cohort (KCC): https://doi.org/10.6084/m9.figshare.16682422.v1 3) Israeli Ministry of Health: https://data.gov.il/dataset/covid-19/resource/74216e15-f740-4709-adb7-a6fb0955a048
http://ncov-ai.big.ac.cn/download?lang=en
https://doi.org/10.6084/m9.figshare.16682422.v1
https://data.gov.il/dataset/covid-19/resource/74216e15-f740-4709-adb7-a6fb0955a048