Dynamic Prediction of SARS-CoV-2 RT-PCR status on Chest Radiographs using Deep Learning Enabled Radiogenomics
=============================================================================================================

* Wan Hang Keith Chiu
* Dmytro Poplavskiy
* Sailong Zhang
* Philip Leong Ho Yu
* Michael D. Kuo

## Abstract

Reverse Transcription-Polymerase Chain Reaction (RT-PCR) is the gold standard for diagnosis of SARS-CoV-2 infection, but requires specialized equipment and reagents and suffers from long turnaround times. While valuable, chest imaging currently only detects COVID-19 pneumonia, but if it can predict actual RT-PCR SARS-CoV-2 status is unknown. Radiogenomics may provide an effective and accurate RT-PCR-based surrogate. We describe a deep learning radiogenomics (DLR) model (RadGen) that predicts a patient's RT-PCR SARS-CoV-2 status solely from their frontal chest radiograph (CXR).

## Brief Introduction

Reverse Transcription-Polymerase Chain Reaction (RT-PCR) is the gold standard for diagnosis of SARS-CoV-2 infection1, but requires specialized equipment and reagents and suffers from long turnaround times. While valuable, chest imaging currently only detects COVID-19 pneumonia, but if it can predict actual RT-PCR SARS-CoV-2 status is unknown. Radiogenomics may provide an effective and accurate RT-PCR-based surrogate2. We describe a deep learning radiogenomics (DLR) model (RadGen) that predicts a patient’s RT-PCR SARS-CoV-2 status solely from their frontal chest radiograph (CXR).

## Methods

The RadGen architecture, based on *SE-ResNeXt-50-32×4d*, was pretrained on ImageNet and ChestX-ray14 and 28,430 CXR from PadChest, and Kaggle before fine-tuned using CXR from a multinational cohort of RT-PCR tested patients from Hong Kong, GITHUB, SIRM and BIMCV (6,326 images)3-5. The model first predicted and selected only frontal CXR images, then predicted a segmentation mask of the cropped lung areas to reduce model fitting to unrelated parts of the image before using the segmented area as input for the RT-PCR SARS-CoV-2 binary classification task. The final prediction score was an ensemble consisting of the average of 4 models.

## Results

RadGen achieved a mean Area Under the ROC curve (AUROC) of 0.959 (95%CI 0.955,0.962), sensitivity of 80.8% (3007/3723) and specificity 95.1% (16206/17033) using a pre-determined 0.4 cutpoint. It distinguishes pre-COVID-19, laboratory confirmed pneumonia from SARS2-CoV-2 cases with a specificity of 89.3% (225/252), and a specificity of 96.4% (106/110) on excluding SARS-CoV-2 infection on patients pre-COVID-19 CXR who later became RT-PCR positive.

The RT-PCR-tested cohort in Hong Kong consisted of 314 positive and 2,471 negative patients from 4 hospitals3. The SARS-CoV-2 positive patients had a median of 9 (range 3-24) RT-PCR tests and 3 corresponding (range 1-8) CXR during their hospitalization (range 3-112 days); 90.8% (285/314) had mild/asymptomatic disease. The sensitivity and specificity of RadGen for predicting SARS-CoV-2 infection on initial presenting CXR was 79.5% (225/283) and 85.2% (2105/2471).

RadGen time course analysis by autocorrelation function (ACF) plot, which describes how well RadGen predicts a patient’s RT-PCR SARS-CoV-2 status over the course of their entire SARS-CoV-2 infection period, was performed revealing a peak lag at 2 days for radiogenomic signature manifestation on CXR after initial RT-PCR diagnosis. The per-film false negative rate was 10.0% (26/261) and 9.3% (21/225) within and after 7 days of the first RT-PCR positive test with a false positive rate of 68.1% (32/47) and 11.3% (45/397) within and after one week of achieving RT-PCR confirmed viral clearance (Fig 1).

![Fig. 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/15/2021.01.10.21249370/F1.medium.gif)

[Fig. 1.](http://medrxiv.org/content/early/2021/01/15/2021.01.10.21249370/F1)

Fig. 1. 
Serial CXR of a SARS-CoV-2 RT-PCR positive male patient in his 40s. RadGen correctly predicted the RT-PCR status of the patient on CXR prior to COVID19 (pre-COVID-19), at initial SARS-CoV-2 RT-PCR confirmed diagnosis (Day 0), during his infection period (Day 8) and upon achieving RT-PCR confirmed viral clearance (Day 12).

## Comment

Leveraging a DLR strategy and a rich body of training datasets including Asian and Western countries (reflecting a diverse set of clinical containment protocols), a wide spectrum of clinical presentations including mild and asymptomatic disease, and a prospectively collected multi-timepoint RT-PCR SARS-CoV-2 positive patient cohort, we generated a DLR model capable of predicting a patient’s RT-PCR status from CXR.

Interestingly, we also show that RadGen can non-invasively ‘track’ RT-PCR SARS-CoV-2 status over the course of their infection, from diagnosis to viral clearance. A time-delayed correlation between RadGen and RT-PCR seen at the time of initial RT-PCR positivity and at the time of achieving RT-PCR viral clearance was observed. This is not unexpected as SARS-CoV-2 genomic dosage changes have been shown to take time to accumulate and be phenotypically reflected on a cellular, organ and systems level. Further, it is known that SARS-CoV-2 RNA can persist long after active infectivity and symptom resolution6; thus, that RadGen performs this well, particularly in a mild/asymptomatic cohort, is notable.

In conclusion, the feasibility for DLR models to dynamically track RT-PCR SARS-CoV-2 changes on an individual level significantly expands the scope of radiogenomics.

## Supporting information

Supplement [[supplements/249370_file03.docx]](pending:yes)

## Data Availability

Code and models used in this study are available upon reasonable request to the corresponding author and under a collaboration agreement.

*   Received January 10, 2021.
*   Revision received January 10, 2021.
*   Accepted January 15, 2021.


*   © 2021, Posted by Cold Spring Harbor Laboratory

The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.

## References

1.  1.Kuo MD, Jamshidi N. Behind the Numbers: Decoding Molecular Phenotypes with Radiogenomics—Guiding Principles and Technical Considerations. Radiology. 2014;270(2):320–5.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1148/radiol.13132195&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24471381&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.10.21249370.atom) 

2.  2.Segal E, Sirlin CB, Ooi C, Adler AS, Gollub J, Chen X, et al. Decoding global gene expression programs in liver cancer by noninvasive imaging. Nature Biotechnology. 2007;25(6):675–80.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nbt1306&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17515910&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F15%2F2021.01.10.21249370.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000247077500025&link_type=ISI) 

3.  3.Chiu WHK, Vardhanabhuti V, Poplavskiy D, Yu PLH, D.R, Yap AYH, et al. Detection of COVID-19 Using Deep Learning Algorithms on Chest Radiographs. Journal of Thoracic Imaging. 2020;Publish Ahead of Print.
    
    
4.  4. Iglesia la de Vayá M, Saborit JM, Montell JA, Pertusa A, Bustos A, Cazorla M, et al. BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients2020 June 01, 2020:[arXiv:2006.01174 p.]. Available from: [https://ui.adsabs.harvard.edu/abs/2020arXiv200601174I](https://ui.adsabs.harvard.edu/abs/2020arXiv200601174I).
    
    
5.  5.Bustos A, Pertusa A, Salinas J-M, de la Iglesia-Vayá M. PadChest: A large chest x- ray image dataset with multi-label annotated reports. arXiv e-prints [Internet]. 2019 January 01, 2019:[arXiv:1901.07441 p.]. Available from: [https://ui.adsabs.harvard.edu/abs/2019arXiv190107441B](https://ui.adsabs.harvard.edu/abs/2019arXiv190107441B).
    
    
6.  6.Cevik M, Kuppalli K, Kindrachuk J, Peiris M. Virology, transmission, and pathogenesis of SARS-CoV-2. Bmj. 2020.