Abstract
Due to the high availability of large-scale annotated image datasets, knowledge transfer from pre-trained models showed outstanding performance in medical image classification. However, building a robust image classification model for datasets with data irregularity or imbalanced classes can be a very challenging task, especially in the medical imaging domain. In this paper, we propose a novel deep convolutional neural network, we called Self Supervised Super Sample Decomposition for Transfer learning (4S-DT) model. 4S-DT encourages a coarse-to-fine transfer learning from large-scale image recognition tasks to a specific chest X-ray image classification task using a generic self-supervised sample decomposition approach. Our main contribution is a novel self-supervised learning mechanism guided by a super sample decomposition of unlabelled chest X-ray images. 4S-DT helps in improving the robustness of knowledge transformation via a downstream learning strategy with a class-decomposition layer to simplify the local structure of the data. 4S-DT can deal with any irregularities in the image dataset by investigating its class boundaries using a downstream class-decomposition mechanism. We used 50,000 unlabelled chest X-ray images to achieve our coarse-to-fine transfer learning with an application to COVID-19 detection, as an exemplar. 4S-DT has achieved a high accuracy of 99.8% (95% CI: 99.44 %, 99.98%) in the detection of COVID-19 cases on a large dataset and an accuracy of 97.54% (95% CI: 96.22%, 98.91%) on an extended test set enriched by augmented images of a small dataset, out of which all real COVID-19 cases were detected, which was the highest accuracy obtained when compared to other methods.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
no external funding was received
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
NO IRB
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The developed code in a test mode is available at https://github.com/asmaa4may/4S-DT. Datasets: In this work, we used two datasets of labelled and unlabelled chest X-ray images, defined respectively as: 1-Unlabelled chest X-ray dataset, a large set of chest X-ray images used as an unlabelled dataset: A set of 50,000 unlabelled chest X-ray images collected from three different datasets: 1) 336 cases with a manifestation of tuberculosis,and 326 normal cases from [30, 31]; 2) 5,863 chest X-Ray images with 2 categories: pneumonia and normal from [32]; and 3) aset of 43,475 chest X-ray images randomly selected from a total of 112,120 chest X-ray images, including 14 diseases, available from [33]. 2-COVID-19 dataset, an imbalanced set of labelled chest X-ray with COVID-19 cases: 80 normal cases from [34, 35], and chest X-ray dataset from [36], which contains 105 and 11 cases of COVID-19 and SARS, respectively.