A Multi-Task Pipeline with Specialized Streams for Classification and Segmentation of Infection Manifestations in COVID-19 Scans
================================================================================================================================

* Shimaa EL-Bana
* Ahmad Al-Kabbany
* Maha Sharkas

## Abstract

We are concerned with the challenge of coronavirus disease (COVID-19) detection in chest X-ray and Computed Tomography (CT) scans, and the classification and segmentation of related infection manifestations. Even though it is arguably not an established diagnostic tool, using machine learning-based analysis of COVID-19 medical scans has shown the potential to provide a preliminary digital second opinion. This can help in managing the current pandemic, and thus has been attracting significant research attention. In this research, we propose a multi-task pipeline that takes advantage of the growing advances in deep neural network models. In the first stage, we fine-tuned an Inception-v3 deep model for COVID-19 recognition using multi-modal learning, i.e., using X-ray and CT scans. In addition to outperforming other deep models on the same task in the recent literature, with an attained accuracy of 99.4%, we also present comparative analysis for multi-modal learning against learning from X-ray scans alone. The second and the third stages of the proposed pipeline complement one another in dealing with different types of infection manifestations. The former features a convolutional neural network architecture for recognizing three types of manifestations, while the latter transfers learning from another knowledge domain, namely, pulmonary nodule segmentation in CT scans, to produce binary masks for segmenting the regions corresponding to these manifestations. Our proposed pipeline also features specialized streams in which multiple deep models are trained separately to segment specific types of infection manifestations, and we show the significant impact that this framework has on various performance metrics. We evaluate the proposed models on widely adopted datasets, and we demonstrate an increase of approximately 4% and 7% for dice coefficient and mean intersection-over-union (mIoU), respectively, while achieving 60% reduction in computational time, compared to the recent literature.

Keywords
*   COVID-19
*   CAD system
*   DeepLab-v3
*   Pneumonia
*   Transfer learning
*   Inception

## 1. Introduction

The Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) [1] is a strain of Severe Acute Respiratory Syndrome-related CoronaVirus (SARS-CoV or SARSr-CoV). The latter is a species of coronaviruses, which are a group Ribonucleic Acid (RNA) viruses. SARS-CoV-2 causes an infectious respiratory disease that is known as the Coronavirus Disease 2019 (COVID-19), since it was first identified in December 2019, following a pneumonia outbreak [2,3]. The first human-to-human transmission was confirmed in January 2020 [4], and the World Health Organization (WHO) declared a pandemic on the 11th of March 2020. Over three million confirmed cases to date, hundreds of thousands of deaths, and a severe socioeconomic impact in hundreds of countries that are hit by the virus [5,6] have induced significant efforts from governmental, public, and private sectors worldwide to manage the pandemic. One principal aspect of pandemic management and future epidemic prevention is the development of effective, efficient, and scale-able diagnostic tools.

There are several diagnostic tools that have been used, or currently under development, for SARS-CoV-2. To the best of our knowledge, nucleic acid tests are the most established and the most widely used tool to date [7]–particularly, the Polymerase Chain Reaction (PCR) and its variants, such as Quantitative PCR (qPCR) and Reverse Transcription PCR (RT-PCR). PCR is a DNA and RNA cloning technique that is used to amplify/augment DNA/RNA samples required in micro biology studies. Even though it is characterized by high sensitivity and specificity, in addition to rapid detection, it is prone to producing false negatives. In part, this is due to the localized nature of the sample acquisition process, mainly as nasal, throat, and nasopharyngeal swabs, i.e., an active virus could be present elsewhere along the respiratory tract. There are also other limitations for PCR-based tests including universal availability, especially amidst shortage of supplies, slow turnaround times, depending on the resources of the lab, and in many cases, it is required to repeat the tests several times before they can be confirmed [8]. Other diagnostic tools include antibody tests which can give an indication on whether a person was previously infected by the virus. However, they are still not well established; hence, they are not widely used. It is worth mentioning that the recent literature features recommendations for combining more than one diagnostic tool. The authors of [7], for example, suggested the adoption of a combination of qRT-PCR and CT scans for robust management of COVID-19.

Using CT scans and other modalities, such as X-ray, falls under an ever-growing area of high-paced research, namely, medical imaging diagnostics. It has been emerging as a reliable disease diagnosis tool, with several recent research findings referring to a performance that is on-par with human performance [9,10]. To a large part, this is due to the advances that are taking place in developing new machine learning techniques. This has resulted in the emergence of the Human-in-the-loop (HITL) AI framework [11], in order to harness the power of both approaches while avoiding their respective limitation simultaneously. For the current pandemic, though, using imaging as a first-line diagnostic tool for COVID-19 has been controversial to date [12–15]. Meanwhile, to the best of our knowledge, there is a consensus on the possibility of using medical imaging as a digital second opinion, or a complement, to the *gold standard* PCR-based tests. For example, the authors of [16,17], respectively, highlighted CT scans as either a tool with comparable diagnostic performance as initial RT-PCR, or an important screening tool especially for patients who have initial negative results for the RT-PCR test. Accordingly, highly-paced research has been devoted to harness the potential of deep learning-based medical imaging diagnostics, towards the goal of providing a rapid, accurate, scale-able, and affordable diagnosis.

Deep neural network models have shown a considerable potential with regards to automatic detection of lung diseases [18–20]. Thanks to their ability to extract and learn meaningful features, deep models can provide an effective alternative to manual labelling by radiologists–a task that is highly impacted by individual clinical experiences. Recent literature highlights the adoption of deep neural networks to analyze X-ray and CT scans, in order to recognize/classify COVID-19 from healthy subjects. Moreover, COVID-19 virus has a bilateral distribution of patchy shadows and ground glass opacity in early stages, which progress to multiple ground glass opacities and infiltrations in both lungs [21]. These features are very similar to other types of pneumonia with only slight differences that are difficult to be distinguished by radiologists. Hence, deep models have been used to recognize/classify COVID-19 from other types of pneumonia, including bacterial and viral pneumonia [22–24]. Deep models have also been used in the quantification and the segmentation of infection manifestations such as ground-glass opacity (GGO) and pulmonary consolidation, in early and late stages of infection respectively [15,25].

In this research, we are inspired by a typical flow in a real-life scenario where a radiologist would employ a deep learning-empowered screening system, first, to recognize/diagnose COVID-19, then to quantify and segment infection manifestations in X-ray and CT scans. The development of multi-task pipelines has been the scope for previous research [26]. Nevertheless, we demonstrate either competitive or superior performance compared to the recent literature at every stage of the proposed pipeline. The following points summarize the principal contributions of this research:

1.  We outperformed the recent literature on COVID-19 recognition by attaining a classification accuracy of 99.4% for the two-class problem, i.e. (COVID-19/Non-COVID-19) and 98.1% for the four-class problem of recognizing COVID-19 among scans that involve normal cases, other types of pneumonia, in addition to COVID-19. To achieve this performance, we propose a training procedure that involves fine-tuning of an Inception-v3 architecture. We present the performance of this architecture under varying learning parameters, and using different performance metrics.

2.  For the same stage, we show comparative analysis for learning using X-ray scans only against learning from X-ray and CT scans combined, i.e., multi-modal learning, and we demonstrate a potential advantage for the latter.

3.  We propose a CNN architecture for multi-label recognition/classification (ML-CNN) of different types of lung infection manifestations. Particularly, we solve the problem of identifying the probabilities of having infection manifestations, such as Ground Glass Opacities (GGO), Pleural Effusion (PE), and Consolidation, in medical scans. This is envisaged to have a potential role in the early diagnosis of COVID-19. It is worth mentioning that this problem was not addressed by previous work on multi-task pipelines for COVID-19 [26].

4.  We adapt knowledge from another domain, namely, pulmonary nodule detection, to enhance the segmentation of lung infections in chest CT scans. Particularly, we employ our own previous work [18] on improving semantic segmentation of pulmonary nodules using the recently proposed DeepLab-v3+ architecture. Moreover, using Xception network as a feature extractor backbone, we evaluate the performance of the DeepLab model, which suits client-side applications.

5.  We propose a new learning procedure for semantic segmentation of infection manifestations. It involves the training of multiple streams, each of which is specialized to segment a specific type of manifestations. We demonstrate the effectiveness of this procedure over single stream-based segmentation, and compared to the recent literature, we attain an increase of approximately 7% and 4% for mean intersection-over-union (mIoU) and dice coefficient, respectively.

The rest of the paper is organized as follows: Previous research that incorporates deep learning methods for COVID-19 diagnosis and infection segmentation is presented in section 2. Section 3 discusses the proposed multi-stage pipeline, and we elaborate on the adopted datasets, data augmentation methods, and pre-processing techniques. Section 4 is dedicated to highlight and discuss the Experimental results, and finally the work is concluded in section 5.

## 2. Related Work

This research intersects with four main areas in the literature, namely, COVID-19 recognition based on deep models, segmentation of COVID-19-related infection manifestations based on deep models, multi-task pipelines that have the ability to accomplish both tasks, and multi-stream recognition pipelines. In the rest of this section, we highlight the most relevant (to the proposed work) in these four areas.

The literature on COVID-19 diagnosis features end-to-end deep models as well as transfer learning approaches. The authors of [27], for example, proposed a COVID-19 classification method in X-ray images using deep features that are computed using a pre-trained convolutional neural network (CNN) model, and an SVM classifier. This method attained an accuracy of 95.38% with the ResNet50 model employed as the feature extractor. In [28], a retrospective, single-center, study was conducted on 78 patients. They aimed at investigating the correlation between CT-based manifestations and clinical classification of COVID-19. With an attained sensitivity of 82.6% and a specificity of 100.0%, they concluded that CT-based quantitative analysis is highly correlated with the clinical classification of COVID-19. They also pointed out that CT visual quantitative analysis is highly consistent in terms of the *Total Severity Score* that was introduced in their research. The authors of [29] used a dataset of 150 CT scans to generate two sub-datasets of 16 × 16 and 32 × 32 patches. Deep features were then computed and an SVM classifier was trained on producing binary labels. They also proposed a novel method for feature ranking and fusion to enhance the performance of the proposed approach. An accuracy of 98.27% and 98.93% sensitivity were attained on the latter sub-dataset of patches. A weakly-supervised software system was developed in [30]. It adopts deep learning and uses 3D CT volumes to detect COVID-19, achieving a specificity and sensitivity of 0.911 and 0.907, respectively.

The U-Net model is a CNN that was proposed by [31], and is among the widely-adopted neural networks in medical image segmentation. It was further extended to 3D U-Net [32], and UNet++ [33] that showed promising performance on various image segmentation tasks. The authors of [34] proposed a U-Net-based segmentation technique that addressed COVID-19, and that employed an attention mechanism on 100 CT slices. They obtained a Dice Score of 69.1%. In our previous work [18], DeepLab-v3+ [35] was shown to outperform U-Net in pulmonary nodule segmentation. In [36], Fan et al. proposed a novel COVID-19 deep model for lung infection segmentation (Inf-Net) to identify infected regions from chest CT scans in an automated manner. On ground glass opacities and consolidation, Inf-Net achieved a dice coefficient of 0.646 and 0.238, respectively.

The authors of [26] proposed a deep learning model that jointly identifies COVID-19 and segments the related lesions in chest CT scans. Their *three-arm* model consisted of a common encoder and two decoders for image reconstruction and segmentation, where the image reconstruction stage is meant to enhance feature representation. The third arm featured a multi-layer perceptron neural network for COVID-19 recognition, i.e., a binary classification problem. For the segmentation task, they achieved a dice coefficient of 0.78. Yu-Huan Wu et al. [37] proposed a COVID-19 classification and segmentation system, that was trained on a dataset containing 144,167 CT scans, collected from 400 COVID-19 patients and 350 uninfected cases. Their JCS model achieved a 78.3% Dice Coefficient on the segmentation test set, and a sensitivity of 95.0%, and a specificity of 93.0% on the classification test set.

Deep networks with multiple streams have been employed in visual recognition applications. To the best of our knowledge, the authors [38] were the first to adopt a two-stream ConvNet architecture, which incorporates spatial and temporal networks, for action recognition in videos. The proposed architecture involved training the second stream on optical flow, and it was shown to attain a very good performance despite limited data. Following the work of [38], other multi-stream techniques that adopt other modalities [39] were proposed. In contrast to these previous techniques, our proposed multi-stream approach for segmenting infection manifestations trains each stream on a different label, rather than training each stream on a different modality of the whole dataset (all the labels). The latter is still a point of future research, though.

## 3. Proposed Methods

A machine learning-empowered system for COVID-19 diagnostics inherently involves multiple tasks. As a digital second opinion for radiologists, the system would first be required to recognize COVID-19 in medical scans. It might further be asked to differentiate between COVID-19 and other types of pneumonia. Following the recognition of COVID-19, the system would be required to identify the probability of presence of different infection manifestations, and would further be asked to segment the regions corresponding to these manifestations accurately. Figure 1 depicts the proposed pipeline which realizes the aforementioned tasks. First, we employ the Inception-v3 model for image classification, particularly, for COVID-19 recognition. Second, we train a multi-label classifier to infer the probability of different types of infection manifestations, namely, Ground Glass Opacities (GGO), Pleural Effusion (PE), and Consolidation. The third stage involves feeding COVID-19 CT scans to DeepLab-v3+ model, which produces binary segmentation masks that highlight the regions corresponding to infection manifestations. To alleviate the impact of the limited amount of data, we use data augmentation techniques throughout the proposed pipeline. In the rest of this section, we elaborate on the datasets that are used for the training and testing of the proposed models, we elaborate on the adopted data augmentation techniques, and we discuss the implementation details of each of the three stages in the pipeline.

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/26/2020.06.24.20139238/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/F1)

Figure 1. 
The block diagram of the proposed method.

### 3.1. Datasets

To the best of our knowledge, the research community still lacks a comprehensive dataset that involves CT and/or X-ray scans and that suits both diagnosis and segmentation tasks at the same time. This necessitates the reliance on multiple datasets if the goal is to develop a multi-task pipeline. For training the proposed deep models, we used the following datasets, which involve two of the most widely used datasets in the recent literature [36]:

1.  [COVID-19 CT Segmentation Dataset](https://bit.ly/34UEMdV): This dataset includes 100 axial CT images from 40 patients with COVID-19. The images were segmented by a radiologist using Three labels: ground-glass, consolidation and pleural effusion. Figure 2 shows an example of CT COVID-19 slice from the dataset.

2.  [The COVID-19 Image Data Collection Repository on GitHub](https://github.com/ieee8023/COVID-chestxray-dataset): This dataset is hared by Dr. Joseph Cohen. It is a growing collection of deidentified chest X-rays (CXRs) and CT scans from COVID-19 cases internationally [40].

3.  [The RSNA Pneumonia Detection Challenge Dataset](https://www.kaggle.com/c/rsna-pneumonia-detection-challenge): This dataset is available on Kaggle, and it contains several deidentified CXRs and includes a label indicating whether the image shows evidence of pneumonia. Figure 3 shows different examples of X-ray images from the dataset.

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/26/2020.06.24.20139238/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/F2)

Figure 2. 
An example of a CT scan. Figure 2a shows the COVID-19 CT axial slice, and Figure 2b shows the ground truth segmented mask. The white regions in the latter represent the consolidation, while the dark gray regions represent pleural effusion, and light gray regions represent ground-glassopacities. Please see sub-section 3.1.

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/26/2020.06.24.20139238/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/F3)

Figure 3. 
Examples of input X-ray images from the adopted datasets. Please see subsection 3.1

### 3.2. Preprocessing and Data Augmentation

All medical scans were resized to have the shape of 512 × 512 × 3. The Contrast Limited Adaptive Histogram Equalization (CLAHE) method is used for enhancing small details, textures and local contrast of the images [41]. Local details can therefore be enhanced even in the regions that are darker or lighter than most of the image [42]. To avoid over-fitting, since the number of CT volumes is limited, we applied data augmentation strategies such as random transformations. These transformations include rotation, horizontal and vertical translations, zooming and shearing. For each training sample, the transformation parameters were randomly generated and the augmentation was identically applied for each slice in the sampled image. More details about the medical scans included in the adopted datasets are summarized in Table 1, such as the types of involved cases, the number of slices in each case and their modalities, and the total number of slices after augmentation.

View this table:
[Table 1.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/T1)

Table 1. 
Details of the medical scans included in the adopted datasets, such as the cases available, the number of slices in each case and their modalities, the total number of slices after augmentation, and the task supported by each type of slices

### 3.3. COVID-19 Recognition Using Transfer Learning on the Inception-v3 Architecture

The Inception-v3 architecture is an evolution of the GoogLeNet architecture. Prior to GoogLeNet, such as in the AlexNet and VGGNet architectures, a standard structure for CNNc consisted of stacked convolutional layers, max-pooling, and full-connected layers. To avoid over-fitting, computational demand, and exploding or vanishing gradients, the inception architecture encouraged sparsity through local sparse structures, namely, the Inception Modules/Blocks. Each of these blocks consists of four paths, and contains filters (convolutions) of different sizes, providing the ability to extract patterns at different spatial sizes. Convolutional layers that consist of 1 × 1 filters were used to make the network deeper, and to reduce the model’s complexity and the number of parameters, by reducing the number of input channels. The 1 × 1 convolutional layers also add more non-linearity by using ReLU after each 1 × 1 convolutional layer [43]. The fully connected layer in this architecture is replaced with a global average pooling layer. Compared to GoogLeNet, Incpetion-v2 featured the factorization of convolutions into smaller convolutions, while Incpetion-v3 extended Incpetion-v2 by batch-normalization of the fully connected layer of the auxiliary classifier [44]. Figure 4 depicts a compressed view of the Inception-v3 [45] model.

![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/26/2020.06.24.20139238/F4.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/F4)

Figure 4. 
A schematic diagram of the Inception-v3 architecture, inspired by the research in [44].

In the first stage of the proposed pipeline, we fine-tuned an Incpetion-v3 architecture, which consists of a feature extraction stage, followed by a classification stage. Instead of training the whole architecture from scratch, we started from a model that is pre-trained on ImageNet. We left the weights of the pre-trained model untouched while the final layer is retrained from scratch. The number of classes in the dataset determines the number of output nodes in the last layer. In section 4, we discuss the impact of varying learning parameters, such as the number of steps and the learning rate, on the attained accuracy. We also demonstrate the performance of fine-tuning using multi-modal data, i.e., X-rays and CT scans, as compared to fine-tuning using X-rays only.

### 3.4. A CNN Architecture for Multi-Label Recognition of Infection Manifestations in Chest CT Scans

There are several differences between the proposed pipeline and previous work on multi-task models for COVID-19 [26]. One principal difference, though, is that the second stage of our pipeline addresses a problem that was not handled by previously proposed models [26], namely, the inference of the probabilities of presence of different infection manifestations, namely, Ground Glass Opacities (GGO), Pleural Effusion (PE), and Consolidation. Given that the output of the segmentation stage is a binary mask, important insights are missing with regards to the types of manifestations that correspond to the segmented regions, i.e., the white regions in the output mask.

COVID-19 CT scans have featured three types of manifestations, namely, ground-glass opacity, consolidation and pleural effusion. Moreover, a scan may include one or more types of infections; hence it is a multi-label image recognition/classification problem. Towards the goal of recognizing different manifestations, we propose the CNN architecture that is shown in Figure 5. The output of this architecture is a vector of three probabilities for the presence of ground-glass opacities, consolidations and pleural effusion in a CT scan. In a sense, the output of this stage complements the information obtained from binary segmentation masks, which will be addressed by the third stage of the pipeline. In addition, we envisage the second stage to have a significant role in early diagnosis even if the output from the first stage does not indicate signs for COVID-19.

![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/26/2020.06.24.20139238/F5.medium.gif)

[Figure 5.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/F5)

Figure 5. 
The proposed CNN model for multi-label classification of infection manifestations. As depicted in the figure, the output from the model are the probabilities of having different types of infection manifestations in chest CT scans.

The convolutional layers consist of M kernels of size *N* × *N*. Max-pooling is applied in non-overlapping windows of size 2 × 2. Every max-pooling reduces the size of each patch by half. Two dense layers with 128 and 64 neurons respectively are used with a dropout of 0.5 to avoid over-fitting, and the elu activation function is applied. The last layer is a dense layer for image classification using a sigmoid function to obtain the multi-label predictions and a cross entropy as the loss function. For N >2, i.e., multi-label classification, we calculate a separate loss per observation for each class label and sum the result as follows: ![Formula][1]</img>  where, *N* is the number of classes, *y* is the corrected label, *ŷ* is a predicted output.

Another principal difference between the proposed model and the work in [26] is that we deal with each task in the pipeline separately, i.e., there is no common encoder. Hence, we are able to harness the power of different architectures in each task. This becomes apparent in the third stage where we adopted the DeepLab-v3+ model for segmentation, which was shown to achieve significantly better results [18] compared to U-NET that was adopted in [26].

### 3.5. Segmenting Infection Manifestations with Knowledge Adaptation from Pulmonary Nodule Segmentation

The third stage of the proposed pipeline uses the first dataset in sub-section 3.1, and is concerned with pixel-level segmentation of the regions corresponding to infection manifestations in CT scans. We capitalize on our previous research work in [18] in which we employed the DeepLab-v3+ model with CT scans to enhance the segmentation of pulmonary nodules, and in which we attained competitive results compared to the recent literature. The DeepLab-v3+ model was developed by Google, and it involves a simple decoder module to refine the segmentation masks along object boundaries. The model is fed with a single CT slice, and the corresponding ground truth mask showing the lesion locations is expected at the output. We explain the elements of the adopted model as follows:

1.  **Atrous Separable Convolution**: This form of convolution [46] is meant to reduce the complexity of the proposed model without compromising the performance. It is applied in the depth-wise convolution, where a depth-wise separable convolution replaces the standard convolution with two consecutive steps, namely, a depth-wise convolution followed by a point-wise convolution (i.e., 1 × 1 convolution). For 2D signals, each location *i* on the output feature map *y*, atrous convolution is computed as follows: ![Formula][2]</img>  where *w* is a convolution filter. The stride with the sampled input signal is determined by the atrous rate *r*. Standard convolution, though, is a particular case with *r* = 1.

2.  **Encoder**: In segmentation tasks, objects in images as well as their locations represent the essential information required to accomplish successfully the computation of segmentation masks. This information is expected to get extracted by the encoder. In the proposed pipeline, the primary feature extractor in the DeepLab-v3+ model is an Aligned Xception model–a modified version of the Xception-65 model [47]. Xception is a modified version of the Inception module, in which Incpetion modules are replaced with separable depth convolutions. Moreover, in Aligned Xception, we use depthwise separable convolution with striding instead of all the maximum pooling operations. After each 3 x 3 depthwise convolution, extra batch normalization and ReLU activation are applied. Also, the depth of the model is increased without varying the entry flow of the network structure. Figure 6 depicts the modified Xception model.

3.  **Decoder**: In this stage, the features computed during the encoding phase are employed to compute the segmentation masks. First, we bilinearly-upsample the encoder features by a factor of 4, before we concatenate them with the corresponding low-level features. 1 × 1 convolution is used on the low-level features before concatenation, in order to decrease the number of channels. After the concatenation, 3 × 3 convolutions are applied to enhance the features, which is followed by another bilinear upsampling by a factor of 4, as shown in the DeepLab-v3+ model in Figure 1.

In this work, we started from a pre-trained DeepLab-v3+ model. Particularly, we adapt another knowledge domain, namely, the pulmonary nodule segmentation, to enhance the segmentation of COVID-19 manifestations in CT-scans. We used the pre-trained model weights that were obtained in [18]. Furthermore, since we focus on the enhancement of segmentation masks, we propose a new learning procedure that involves specialized streams, each of which features a DeepLab-v3+ model that trains to segment a specific type of manifestations. In the next section, we present the results of the proposed pipeline, and we elaborate on the gain of training multiple specialized streams as compared to a single-stream pipeline.

![Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/26/2020.06.24.20139238/F6.medium.gif)

[Figure 6.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/F6)

Figure 6. 
The modified Xception model [48] which is used as the backbone (feature extractor) for the DeepLab-v3+ model in the segmentation stage of the proposed pipeline.

## 4. Results and Discussion

All the simulations were carried out on a machine with a GeForce GTX 1080Ti GPU, and 8 GB of VRAM. We used Python as the primary programming language and Tensorflow as the backbone in all the experiments. This research implements a new multi-task pipeline that is capable of accomplishing the following types of tasks: 1) COVID-19 classification in X-rays and CT scans, 2) Multi-label recognition of COVID-19 manifestations in CT scans, and 3) and segmentation of COVID-19 manifestations in CT scans. We adopted the most commonly used performance metrics in the respective areas, i.e., for classification and segmentation, which are: sensitivity, specificity, accuracy, precision, F1-Score, Dice Coefficient (DSC), Intersection over Union (IoU), and Matthews Correlation Coefficient (MCC). The mathematical expressions for computing the aforementioned metrics are are given by: ![Formula][3]</img>  ![Formula][4]</img>  ![Formula][5]</img>  ![Formula][6]</img>  ![Formula][7]</img>  ![Formula][8]</img>  ![Formula][9]</img>  ![Formula][10]</img>  where TP, FP, FN, and TN are the number of True Positives, False Positives, False negatives, and True Negatives, respectively.

For the segmentation task, our training set contains 5000 COVID-19 images and the test set has 403 images. For the classification task, however, the training set contains the 9618 images, and 955 images are included in the test set. For the two-class version of the classification problem, i.e., COVID-19 vs. Normal, the total number of training images are 5219, and 536 images are included in the test set. In the rest of this section, we refer to the COVID-19 classification of stage 1 as **Task 1**, to the multi-label recognition probelm of stage 2 as **Task 2**, and the segmentation problem of stage 3 as **Task3**.

### 4.1. Results of Task 1: Classification Using a Fine-Tuned Inception-v3 Model

For fine-tuning the Inception-v3 model, we used a batch size of 100 for 2800 steps/iterations. Starting from a pre-trained model on ImageNet, we removed the weights of the last layer and re-trained it using X-ray and CT scans. For the four-class version of the recognition problem, i.e., COVID-19, Normal, Viral Pneumonia, and Bacterial Pneumonia, the number of output nodes that is equal to the number of the classes is set to 4. For the two-class version of the recognition problem, i.e., COVID-19 and Normal, the number of output nodes is set to 2. The last layer of the model was trained with the back-propagation algorithm, and the weight parameter is adjusted using the cross-entropy cost function by calculating the error between the softmax layer output and the specified sample class label vector.

Table 2 summarizes the results of the fine-tuned Inception-v3 model using 0.01 learning rate on the two-class and the four-class problems. After 2800 steps for 4 classes, we achieved an accuracy of 99.9%, 97.71%, and 98.1% for the training, validation and testing, respectively. For 2 classes, however, we achieved an accuracy of 98.84%, 99.08%, and 99.4% for the training, validation and testing respectively. The confusion matrices of the two-class and four-class cases are shown in Fig. 7a and Fig. 7b respectively. We also show the variations of the accuracy and cross-entropy for that model for classification of 2 classes in Figure 8.

View this table:
[Table 2.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/T2)

Table 2. 
Classification results of the fine-tuned Inception-v3 model for the two-class and the four-class COVID-19 recognition problems. Please see text for more details.

![Figure 7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/26/2020.06.24.20139238/F7.medium.gif)

[Figure 7.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/F7)

Figure 7. 
Confusion matrices of the fine-tuned Inception-v3 model for the two-class and the four-class COVID-19 recognition problems. Figure 7a shows the confusion matrix for two classes, and Figure 7b shows the confusion matrix for four classes.

![Figure 8.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/26/2020.06.24.20139238/F8.medium.gif)

[Figure 8.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/F8)

Figure 8. 
The variation of accuracy and cross-entropy using the Inception-v3 model with 2-classes X-ray dataset.

We also compared the performance of the adopted model with other models in the recent literature. Table 3 presents a summary of the accuracy, sensitivity, specificity, precision, F1-Score and MCC attained by different architectures. We demonstrate that the transfer learning approach with Inception-v3 surpassed all other architectures by achieving a 99.4% accuracy in case the training was done using X-rays only. We further tried to train using multi-modal data, i.e., using X-rays and CT scans, and we achieved a 99.5% accuracy. We argue that the increase in the attained accuracy, using multi-modal data, is due to the 3D cues that are provided by, and inherently exist, in CT scans, but are missing in X-rays. It is worth mentioning that in order to avoid imbalanced data, we made sure that we have an equal number of X-rays and CT scans when we trained with multi-modal data. Particularly, we under-sampled the X-rays so that we get a number equal to the number of available CT scans. The under-sampling was done randomly, and we report the results that corresponds to the average of 5 runs. Complete results for each of the 5 runs are given in Table 4. It is worth mentioning that due to the limited number of available CT scans, uni-modal learning (using X-rays only) was carried out using a larger number of scans, yet multi-modal learning attained a slightly higher accuracy–99.4% for the former vs. 99.5% for the latter. We report this comparison to highlight that multi-modal learning is worth further exploration when larger number of CT scans becomes available.

View this table:
[Table 3.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/T3)

Table 3. 
Comparing the recognition performance of the proposed model with other models in the recent literature

View this table:
[Table 4.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/T4)

Table 4. 
The prediction performance for the 5 runs which were carried out on two-class, multi-modal data (X-ray and CT scans).

### 4.2. Results of Task 2: Multi-Label Classification of Infection Manifestations in CT Scans

In the multi-label classifier, each convolutional layer is followed by maxpooling and dropout regularization of 25% to prevent the model from over-fitting. We used 5 × 5 filter for convolution and 2 × 2 for maxpooling, then, a flattening operation is carried out for classification. The activation function is elu for all the layers, except for the last one which is a sigmoid, in order to generate a probability for each label–ground glass, consolidation, and pleural effusion. The loss function is the binary cross-entropy and the metric is the accuracy, with Adam as the optimizer [51]. The model was trained for 50 epochs. Figure 9 shows the confusion matrix for the three labels in the COVID-19 dataset. More performance metrics are given in Table 5. **It is worth mentioning that we do not report a comparison between our performance at this stage and the recent literature**. This is because, to the best of our knowledge, this research is the first to address the problem of recognizing different types of infection manifestations. Even for the recently proposed multi-task model in [26], its *recognition arm* addressed binary classification, which is identical to the two-class problem addressed by stage 1 of our pipeline. The segmentation stage in [26] did not address multi-label infection recognition either, as it was limited to produce binary masks.

View this table:
[Table 5.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/T5)

Table 5. 
Different performance metrics for the adopted multi-label classifier. We show the performance for individual labels as well as the overall performance.

![Figure 9.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/26/2020.06.24.20139238/F9.medium.gif)

[Figure 9.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/F9)

Figure 9. 
The confusion matrix of the adopted multi-label classifier.

### 4.3. Results of Task 3: Semantic Segmentation of COVID-19 Infection Manifestations Using Multiple Specialized Streams

As mentioned in the previous section, we initialized the DeepLab-v3+ model using the weights of the checkpoint used to segment the lung cancer nodules in our previous work [18]. We set the learning rate to 0.0001, the momentum to 0.9, the weight decay to 0.00004, and the steps to 50,000. We also adjusted the atrous rates as [6, 12, 18] with an output stride of 16. In Fig. 10, we present the output segmentation masks on the COVID-19 validation set. The figure shows the segmentation output of each of the specialized streams, and the output of the all-class stream, i.e., the single stream that was trained to segment all the classes of manifestations at the same time. To support subjective results with objective measures, we report in Table 6 the dice coefficient (DSC) and the mean Intersection over Union (IoU) attained by the all-class stream, each of the three specialized streams, and their average. Considering the performance of the specialized streams, which outperformed the single stream approach, we believe that this defines an accuracy-complexity trade-off, i.e., in order to attain better DSC and IoU, the system needs to include multiple specialized streams. We also believe that given the COVID-19 pandemic management as an application, in which significant resources have already been invested, there is a higher priority for developing highly accurate systems over low-complexity systems.

View this table:
[Table 6.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/T6)

Table 6. 
A comparison between the performance of each of the specialized streams as well as the all-class stream, with regards to dice coefficient (DSC) and mean Intersection over Union (mIoU). For all the streams, a DeepLab-v3+ model, with an Xception_65 as a feature extractor, is used.

![Figure 10.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/26/2020.06.24.20139238/F10.medium.gif)

[Figure 10.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/F10)

Figure 10. 
The output segmentation masks of the adopted deep models. The images in column (a) show the chest CT images of three scans. Column (b) shows the ground-truth masks for these three scans, where the white represents the consolidation, dark gray represents pleural effusion and light gray corresponds to ground-glass opacities. Column (c) depicts the segmentation results generated by our model for all classes where the red represents the consolidation, the green represents the pleural effusion, and the yellow represents the ground-glass opacities. The images in columns (d), (e), and (f) represent the output from the specialized stream that are trained to segment ground-glass opacities, pleural effusion, and the consolidation, respectively.

To compare the performance of the proposed approach with other models, we report the results for specific types of infection manifestations as well as the overall performance for all types of manifestations. Table 7 shows a manifestation-specific comparison between the performance of our model, namely, DeepLab-v3+ model with transfer learning from pulmonary nodule detection, and other models from the recent literature including previous research that adopted DeepLab-v3+. The comparison highlights the superiority of our approach consistently for the two types of manifestations. This represents approximately 41% and 290% increase in DCS of ground-glass opacities and consolidation, respectively, compared to the recent literature. For mIoU, the comparison yields an increase of approximately 77% and 500% in DCS of ground-glass opacities and consolidation, respectively.

View this table:
[Table 7.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/T7)

Table 7. 
A quantitative comparison of manifestation-specific DSC and mIoU, for Ground-Glass Opacity and Consolidation, between our segmented method and other methods in the recent literature.

We further make a comparison that is not manifestation-specific, between the performance of the proposed approach and the recent literature. In Table 8, we demonstrate an increase of approximately 7% and 4% for mean intersection-over-union (mIoU) and dice coefficient, respectively, compared to the recent literature. Figure 11 depicts a subjective comparison using examples for the output segmentation masks on the COVID-19 validation set obtained using U-net [52] and DeepLab-v3+ (ours). We also demonstrate less computational cost than the traditional test, the RT-PCR, and other diagnostic tools [37,53]. We report this comparison in Table 9, which shows a 60% reduction in diagnosis/computational time per case.

View this table:
[Table 8.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/T8)

Table 8. 
A quantitative comparison on the COVID-19 segmentation dataset between our segmentation method and other methods in the recent literature. The comparison considers DSC and mIoU. It also considers the overall performance on the three different types of infection manifestations, i.e., it is not a manifestation-specific comparison.

View this table:
[Table 9.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/T9)

Table 9. 
A comparison between the proposed method and other diagnostic tools in the literature concerning the average diagnosis time per case.

![Figure 11.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/06/26/2020.06.24.20139238/F11.medium.gif)

[Figure 11.](http://medrxiv.org/content/early/2020/06/26/2020.06.24.20139238/F11)

Figure 11. 
Segmentation output visualization results. The row images (a) show the chest CT images of two scans. The row images (b) show the ground-truth masks for these two scans, where the white represents the consolidation, dark gray represents pleural effusion and light gray corresponds to ground-glass opacities. The row images (c) are the outputs of the U-Net. The row images (d) are the segmentation results generated by our model.

## 5. Conclusion

In this research, we proposed a multi-task pipeline for the recognition of COVID-19, and the classification and segmentation of related infection manifestations in medical scans. We are inspired by the emerging role that medical imaging-based diagnostics can play as a digital second opinion to manage the current pandemic. The proposed pipeline starts with a COVID-19 recognition stage. Towards this goal, we fine-tuned and Inception-v3 model which was pre-trained on ImageNet. We evaluated the performance of this model on two tasks, namely, the two-class problem of COVID-19/non-COVID-19 recognition, and the four-class problem of recognizing COVID-19 scans from other scans that correspond to normal, viral pneumonia, and bacterial pneumonia cases. We outperformed other techniques in the recent literature, consistently in both types of classification problems. To the best of our knowledge, we are also the first to highlight a potential advantage for multi-modal learning, i.e., learning from X-rays and CT scans over learning from X-rays only. In the second stage, we addressed a problem that was not been addressed by the recent literature, namely, the identification of the probabilities of presence for different types of infection manifestations in medical scans. This stage was implemented using a multi-label CNN classifier, and we envisage its potential to serve in early detection of infection manifestations. It also complements the third stage which addresses the problem of computing binary masks for segmenting the regions corresponding to infection regions in CT scans. For effective segmentation, we adapted the knowledge from another domain, namely, pulmonary nodule segmentation. This approach resulted in an increase of approximately 4% and 7% for dice coefficient and mean intersection-over-union (mIoU), respectively, while requiring 60% less computational time, compared to the recent literature. All the stages of the proposed pipeline were trained and tested using widely adopted datasets, and evaluated using various objective measures. We also used data augmentation techniques to avoid over-fitting that might have occurred due to the relatively limited volume of available data. For further enhancement of the performance of the segmentation stage, we showed that using multiple streams can significantly improve the quality of the output masks, as measured by the DSC and mIoU, such that each stream is trained to segment a specific type of infection manifestations.

## Data Availability

N/A

## Author Contributions

Conceptualization, S.E.-B.; Formal analysis, S.E.-B. and A.A.-K.; Investigation, S.E.-B.; Methodology, S.E.-B. and A.A.-K.; Supervision, A.A.-K. and M.S.; Visualization, S.E.-B.; Writing – original draft, S.E.-B. and A.A.-K.; Writing – review and editing, S.E.-B.,A.A.-K. and M.S.

## Funding

This research received no external funding.

## Conflicts of Interest

The authors declare no conflicts of interest.

## Footnotes

*   shimaaelbana{at}yahoo.com

*   msharkas{at}aast.edu

*   Received June 24, 2020.
*   Revision received June 24, 2020.
*   Accepted June 26, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## References

1.  1.of the International, C.S.G., others. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nature Microbiology 2020, p. 1.
    
    
2.  2.Lai, C.C., Shih, T.P., Ko, W.C., Tang, H.J., Hsueh, P.R. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and corona virus disease-2019 (COVID-19): the epidemic and the challenges. International journal of antimicrobial agents 2020, p. 105924.
    
    
3.  3.Sharfstein, J.M., Becker, S.J., Mello, M.M. Diagnostic testing for the novel coronavirus. JAMA2020.
    
    
4.  4.Chan, J.F.W., Yuan, S., Kok, K.H., To, K.K.W., Chu, H., Yang, J., Xing, F., Liu, J., Yip, C.C.Y., Poon, R.W.S., others. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. The Lancet 2020, 395, 514–523.
    
    
5.  5.Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y., Ren, R., Leung, K.S., Lau, E.H., Wong, J.Y., others. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. New England Journal of Medicine 2020.
    
    
6.  6.Wu, J.T., Leung, K., Leung, G.M. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. The Lancet 2020, 395, 689–697.
    
    
7.  7.Tahamtan, A., Ardebili, A. Real-time RT-PCR in COVID-19 detection: issues affecting the results, 2020.
    
    
8.  8.Chu, D.K., Pan, Y., Cheng, S.M., Hui, K.P., Krishnan, P., Liu, Y., Ng, D.Y., Wan, C.K., Yang, P., Wang, Q., others. Molecular diagnosis of a novel coronavirus (2019-nCoV) causing an outbreak of pneumonia. Clinical chemistry 2020, 66, 549–555.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/clinchem/hvaa029&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F26%2F2020.06.24.20139238.atom) 

9.  9.Liu, X., Faes, L., Kale, A.U., Wagner, S.K., Fu, D.J., Bruynseels, A., Mahendiran, T., Moraes, G., Shamdas, M., Kern, C., others. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The lancet digital health 2019, 1, e271–e297.
    
    
10. 10.Shen, J., Zhang, C.J., Jiang, B., Chen, J., Song, J., Liu, Z., He, Z., Wong, S.Y., Fang, P.H., Ming, W.K. Artificial intelligence versus clinicians in disease diagnosis: Systematic review. JMIR medical informatics 2019, 7, e10010.
    
    
11. 11.Patel, B.N., Rosenberg, L., Willcox, G., Baltaxe, D., Lyons, M., Irvin, J., Rajpurkar, P., Amrhein, T., Gupta, R., Halabi, S., others. Human–machine partnership with artificial intelligence for chest radiograph diagnosis. NPJ digital medicine 2019, 2, 1–10.
    
    
12. 12.Hope, M.D., Raptis, C.A., Shah, A., Hammer, M.M., Henry, T.S. A role for CT in COVID-19? What data really tell us so far. The Lancet 2020, 395, 1189–1190.
    
    
13. 13.Fang, Y., Zhang, H., Xie, J., Lin, M., Ying, L., Pang, P., Ji, W. Sensitivity of chest CT for COVID-19: comparison to RT-PCR. Radiology 2020, p. 200432.
    
    
14. 14.Zu, Z.Y., Jiang, M.D., Xu, P.P., Chen, W., Ni, Q.Q., Lu, G.M., Zhang, L.J. Coronavirus disease 2019 (COVID-19): a perspective from China. Radiology 2020, p. 200490.
    
    
15. 15.Ai, T., Yang, Z., Hou, H., Zhan, C., Chen, C., Lv, W., Tao, Q., Sun, Z., Xia, L. Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology 2020, p. 200642.
    
    
16. 16.He, J.L., Luo, L., Luo, Z.D., Lyu, J.X., Ng, M.Y., Shen, X.P., Wen, Z. Diagnostic performance between CT and initial real-time RT-PCR for clinically suspected 2019 coronavirus disease (COVID-19) patients outside Wuhan, China. Respiratory Medicine 2020, p. 105980.
    
    
17. 17.Chen, D., Jiang, X., Hong, Y., Wen, Z., Wei, S., Peng, G., Wei, X. Can Chest CT Features Distinguish Patients With Negative From Those With Positive Initial RT-PCR Results for Coronavirus Disease (COVID-19)? American Journal of Roentgenology 2020, pp. 1–5.
    
    
18. 18.EL-Bana, S., Al-Kabbany, A., Sharkas, M. A Two-Stage Framework for Automated Malignant Pulmonary Nodule Detection in CT Scans. Diagnostics 2020, 10, 131.
    
    
19. 19.Polat, H.,  Danaei Mehr, H. Classification of pulmonary CT images by using hybrid 3D-deep convolutional neural network architecture. Applied Sciences 2019, 9, 940.
    
    
20. 20.Nasrullah, N., Sang, J., Alam, M.S., Mateen, M., Cai, B., Hu, H. Automated lung nodule detection and classification using deep learning combined with multiple strategies. Sensors 2019, 19, 3722.
    
    
21. 21.Wang, D., Hu, B., Hu, C., Zhu, F., Liu, X., Zhang, J., Wang, B., Xiang, H., Cheng, Z., Xiong, Y., others. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus–infected pneumonia in Wuhan, China. Jama 2020, 323, 1061–1069.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2020.1585&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F26%2F2020.06.24.20139238.atom) 

22. 22.Narin, A., Kaya, C., Pamuk, Z. Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. arXiv preprint arXiv:2003.10849 2020.
    
    
23. 23.Wang, L., Wong, A. COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest radiography images. arXiv preprint arXiv:2003.09871 2020.
    
    
24. 24.Song, Y., Zheng, S., Li, L., Zhang, X., Zhang, X., Huang, Z., Chen, J., Zhao, H., Jie, Y., Wang, R., others. Deep learning Enables Accurate Diagnosis of Novel Coronavirus (COVID-19) with CT images. medRxiv 2020.
    
    
25. 25.Ye, Z., Zhang, Y., Wang, Y., Huang, Z., Song, B. Chest CT manifestations of new coronavirus disease 2019 (COVID-19): a pictorial review. European radiology 2020, pp. 1–9.
    
    
26. 26.Amyar, A., Modzelewski, R., Ruan, S. MULTI-TASK DEEP LEARNING BASED CT IMAGING ANALYSIS FOR COVID-19: CLASSIFICATION AND SEGMENTATION.
    
    
27. 27.Sethy, P.K., Behera, S.K. Detection of coronavirus Disease (COVID-19) based on Deep Features 2020.
    
    
28. 28.Li, K., Fang, Y., Li, W., Pan, C., Qin, P., Zhong, Y., Liu, X., Huang, M., Liao, Y., Li, S. CT image visual quantitative evaluation and clinical classification of coronavirus disease (COVID-19). European Radiology 2020, pp. 1–10.
    
    
29. 29.Ozkaya, U., Ozturk, S., Barstugan, M. Coronavirus (COVID-19) Classification using Deep Features Fusion and Ranking Technique. arXiv preprint arXiv:2004.03698 2020.
    
    
30. 30.Zheng, C., Deng, X., Fu, Q., Zhou, Q., Feng, J., Ma, H., Liu, W., Wang, X. Deep Learning-based Detection for COVID-19 from Chest CT using Weak Label. medRxiv 2020.
    
    
31. 31.Ronneberger, O., Fischer, P., Brox, T. U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
    
    
32. 32.Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O. 3D U-Net: learning dense volumetric segmentation from sparse annotation. International conference on medical image computing and computer-assisted intervention. Springer, 2016, pp. 424–432.
    
    
33. 33.Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Transactions on Medical Imaging 2019.
    
    
34. 34.Zhou, T., Canu, S., Ruan, S. An automatic COVID-19 CT segmentation based on U-Net with attention mechanism. arXiv preprint arXiv:2004.06673 2020.
    
    
35. 35.Chen, L.C., Papandreou, G., Schroff, F., Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 2017.
    
    
36. 36.Fan, D.P., Zhou, T., Ji, G.P., Zhou, Y., Chen, G., Fu, H., Shen, J., Shao, L. Inf-Net: Automatic COVID-19 Lung Infection Segmentation from CT Scans. medRxiv 2020.
    
    
37. 37.Wu, Y.H., Gao, S.H., Mei, J., Xu, J., Fan, D.P., Zhao, C.W., Cheng, M.M. JCS: An Explainable COVID-19 Diagnosis System by Joint Classification and Segmentation. arXiv preprint arXiv:2004.07054 2020.
    
    
38. 38.Simonyan, K., Zisserman, A. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, 2014, pp. 568–576.
    
    
39. 39.Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H. Real-time action recognition with enhanced motion vector CNNs. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2718–2726.
    
    
40. 40.Cohen, J.P., Morrison, P., Dao, L. COVID-19 image data collection. arXiv preprint arXiv:2003.11597 2020.
    
    
41. 41.Zuiderveld, K. Contrast limited adaptive histogram equalization. Graphics gems IV. Academic Press Professional, Inc., 1994, pp. 474–485.
    
    
42. 42.Koonsanit, K., Thongvigitmanee, S., Pongnapang, N., Thajchayapong, P. Image enhancement on digital x-ray images using N-CLAHE. 2017 10th Biomedical Engineering International Conference (BMEiCON). IEEE, 2017, pp. 1–4.
    
    
43. 43.Mahdianpari, M., Salehi, B., Rezaee, M., Mohammadimanesh, F., Zhang, Y. Very deep convolutional neural networks for complex land cover mapping using multispectral remote sensing imagery. Remote Sensing 2018, 10, 1119.
    
    
44. 44.Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z. Rethinking the inception architecture for computer vision. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
    
    
45. 45.Xia, X., Xu, C., Nan, B. Inception-v3 for flower classification. 2017 2nd International Conference on Image, Vision and Computing (ICIVC). IEEE, 2017, pp. 783–787.
    
    
46. 46.Chen, G., Li, C., Wei, W., Jing, W., Wozńiak, M., Blažauskas, T., Damaševičius, R. Fully Convolutional Neural Network with Augmented Atrous Spatial Pyramid Pool and Fully Connected Fusion Path for High Resolution Remote Sensing Image Segmentation. Applied Sciences 2019, 9, 1816.
    
    
47. 47.Chollet, F. Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
    
    
48. 48.Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818.
    
    
49. 49.Loey, M., Smarandache, F.M Khalifa, N.E. Within the Lack of Chest COVID-19 X-ray Dataset: A Novel Detection Model Based on GAN and Deep Transfer Learning. Symmetry 2020, 12, 651.
    
    
50. 50.Zhao, J., Zhang, Y., He, X., Xie, P. Covid-ct-dataset: a ct scan dataset about covid-19. arXiv preprint arXiv:2003.13865 2020.
    
    
51. 51.Kingma, D.P., Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014.
    
    
52. 52.Chen, X., Yao, L., Zhang, Y. Residual Attention U-Net for Automated Multi-Class Segmentation of COVID-19 Chest CT Images. arXiv preprint arXiv:2004.05645 2020.
    
    
53. 53.Huang, Z., Zhao, S., Li, Z., Chen, W., Zhao, L., Deng, L., Song, B. The Battle Against Coronavirus Disease 2019 (COVID-19): Emergency Management and Infection Control in a Radiology Department. Journal of the American College of Radiology 2020.
    
    
54. 54.Won, J., Lee, S., Park, M., Kim, T.Y., Park, M.G., Choi, B.Y., Kim, D., Chang, H., Kim, V.N., Lee, C.J., others. Development of a Laboratory-safe and Low-cost Detection Protocol for SARS-CoV-2 of the Coronavirus Disease 2019 (COVID-19). Molecular Cell 2018, 70, 72–82.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.molcel.2018.03.004&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29625039&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F26%2F2020.06.24.20139238.atom)

 [1]: /embed/graphic-7.gif
 [2]: /embed/graphic-8.gif
 [3]: /embed/graphic-10.gif
 [4]: /embed/graphic-11.gif
 [5]: /embed/graphic-12.gif
 [6]: /embed/graphic-13.gif
 [7]: /embed/graphic-14.gif
 [8]: /embed/graphic-15.gif
 [9]: /embed/graphic-16.gif
 [10]: /embed/graphic-17.gif