PT - JOURNAL ARTICLE AU - Rajaraman, Sivaramakrishnan AU - Sornapudi, Sudhir AU - Alderson, Philip O AU - Folio, Les R AU - Antani, Sameer K TI - Interpreting Deep Ensemble Learning through Radiologist Annotations for COVID-19 Detection in Chest Radiographs AID - 10.1101/2020.07.15.20154385 DP - 2020 Jan 01 TA - medRxiv PG - 2020.07.15.20154385 4099 - http://medrxiv.org/content/early/2020/07/16/2020.07.15.20154385.short 4100 - http://medrxiv.org/content/early/2020/07/16/2020.07.15.20154385.full AB - Data-driven deep learning (DL) methods using convolutional neural networks (CNNs) demonstrate promising performance in natural image computer vision tasks. However, using these models in medical computer vision tasks suffers from several limitations, viz., (i) adapting to visual characteristics that are unlike natural images; (ii) modeling random noise during training due to stochastic optimization and backpropagation-based learning strategy; (iii) challenges in explaining DL black-box behavior to support clinical decision-making; and (iv) inter-reader variability in the ground truth (GT) annotations affecting learning and evaluation. This study proposes a systematic approach to address these limitations for COVID-19 detection using chest X-rays (CXRs). Specifically, our contribution benefits from (i) pretraining specific to CXRs in transferring and fine-tuning the learned knowledge toward improving COVID-19 detection performance; (ii) using ensembles of the fine-tuned models to further improve performance compared to individual constituent models; (iii) performing statistical analyses at various learning stages to validate our claims; (iv) interpreting learned individual and ensemble model behavior through class-selective relevance mapping (CRM)-based region of interest (ROI) localization; (v) analyzing inter-reader variability and ensemble localization performance using Simultaneous Truth and Performance Level Estimation (STAPLE) methods. We observe that: (i) ensemble approaches improved classification and localization performance; and, (ii) inter-reader variability and performance level assessment helped guide algorithm design and parameter optimization. To the best of our knowledge, this is the first study to construct ensembles, perform ensemble-based disease ROI localization, and analyze inter-reader variability and algorithm performance for COVID-19 detection in CXRs.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study is supported by the Intramural Research Program (IRP) of the National Library of Medicine (NLM) and the National Institutes of Health (NIH).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:No IRB required since we used publicly available datasets.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesData availability is appropriately cited within the manuscript.