Abstract
The COVID-19 disease has widely spread all over the world since the beginning of 2020. On January 30, 2020 the World Health Organization (WHO) declared a global health emergency. At the time of writing this paper the number of infected about 2 million people worldwide and took over 125,000 lives, the advanced public health systems of European countries as well as of USA were overwhelmed. In this paper, we propose an eXplainable Deep Learning approach to detect COVID-19 from computer tomography (CT) - Scan images. The rapid detection of any COVID-19 case is of supreme importance to ensure timely treatment. From a public health perspective, rapid patient isolation is also extremely important to curtail the rapid spread of the disease. From this point of view the proposed method offers an easy to use and understand tool to the front-line medics. It is of huge importance not only the statistical accuracy and other measures, but also the ability to understand and interpret how the decision was made. The results demonstrate that the proposed approach is able to surpass the other published results which were using standard Deep Neural Network in terms of performance. Moreover, it produce highly interpretable results which may be helpful for the early detection of the disease by specialists.
1 Introduction
In December 2019, an outbreak coronavirus (SARS-CoV-2) infection began in Wuhan, the capital of central China’s Hubei province [1, 2, 3]. On January 30, 2020 the World Health Organization (WHO) declared a global health emergency [4] and with some delay and hesitation on 11 March 2020 WHO declared pandemic. By 14 April 2020, accumulative 1,985,135 confirmed cases and 125,344 deaths were documented [5]. USA has become the new epicenter of the disease with 605,354 documented cases and 25,394 deaths (14 April 2020) [5].
Researchers of different disciplines work along with public health officials to understand the COVID-19 pathogenesis and jointly with the policymakers urgently develop strategies to control the spread of this new disease [6]. Recent findings have observed imaging patterns on chest radiography and computed tomography (CT) for patients diagnosed with COVID-19 [7, 8, 9, 10].
Prospective analysis revealed bilateral lung opacities on 40 of 41 (98%) chest CTs in infected patients in Wuhan and described lobular and subsegmental areas of consolidation as the most typical findings [6]. Other investigators found high rates of ground-glass opacities and consolidation, sometimes with a rounded morphology and peripheral lung distribution [11, 10]. Thoracic radiology evaluation is often key to the evaluation of patients suspected of COVID-19 infection [12]. Prompt detection and diagnosis of the disease is invaluable in the efforts to ensure timely treatment. From a public health perspective, rapid patient isolation is crucial for containment of this communicable disease[4] and optimal use of available resources which quickly become scarce and overwhelmed by the exponentially growing number of patients and prolonged periods of treatment.
Recently, artificial intelligence (AI) and, specifically, deep learning based approaches have demonstrated high levels of performance in the medical imaging domain due to their ability to automatically extract latent features and bypass so called “handcrafting” [13]. In addition, the technique called transfer learning [14] made possible to train a deep neural network on one set of images (e.g. ImageNet [15]) but use effectively on another set of images. While manual reading CT and X-ray images takes 15 minutes and involves a highly skilled medical doctor/consultant which are now in high demand the use of AI and deep learning can take few seconds on a computer and be automated which provides opportunity for high throughput and remote way of operation. However, few stumbling blocks still hamper the wider use of deep neural networks. These include: i) their opaque, “black-box” nature and inability to explain any decision [16]; ii) their inability to continue to learn once trained, to learn from a handful of examples and data and compute power appetite [17].
In this paper we present a new deep learning method that is explainable by design. It is able to continue to learn and adapt for each new data sample which is immensely important for the case of COVID-19 (and other disease), because new cases are being accumulated every minute and traditional approaches require either iterative re-training or ignores the new data. The proposed approach is non-iterative and is entirely based on recursive calculations and use of prototypes. Therefore, it is computationally very efficient. The architecture of the proposed method combines reasoning and learning in a synergy while alternative approaches focus on either reasoning which favours the interpretability and explainability or on machine learning and statistical approaches which favour the accuracy and other statistical measures for the expence of the interpretability and explainability. In this paper we demosntrate that the proposed approach can be very efficient in detecting COVID-19 via CT scans and can be very useful to explain the decisions which itself may also be very important for medical doctors.
The main idea of the proposed approach is based on prototypes (images of CT scans - both, with and without COVID-19) and is using the density in the data/feature space to build empirical estimations of the distributions [18]. The proposed approach is non-iterative and non-parametric, which explains its efficiency in terms of time and computational resources. From the user perspective, the proposed approach is clearly understandable/explainable which for the specific case of COVID-19 infections means to aid explanations and decisions made ultimately by human doctors (instead of percentages and likelihoods they can see and understand an image and compare similarities). In this sense, this is an approach of anthropomorphic machine learning [18]. We tested the proposed method on the very recent COVID-CT-Dataset [19] which contains a set of real cases. Results have demonstrated that proposed approach provides superior performance’s measured by F1 score and other metrics, but also, critically, it offers epxlainability and is able to continue to learn from new data.
2 Methods and Algorithms
2.1 Concept and Basic Algorithm
Same as most machine learning methods, the proposed in this paper method starts with pre-processing which involves scaling, augmentation, and rotation. In order to extract features of the CT images we use transfer learning over the GoogleNet Deep Learning structure [14]. It is important to stress that GoogleNet is used just to define the feature space and it was not trained on the CT images, but on ImageNet [15]. Other approaches could also be used for this purpose.
The prototype-based learning is the core of the proposed method (Fig. (1)). The prototypes are actual training data samples (in this case, images) which are highly representative (local peaks of the density and empirically derived probability distributions [18]). They are focal points of locally valid generative models described by multi-modal Cauchy distribution [18].
The algorithm of the proposed approach is described below. With the first observed image (data sample) it is being converted to a vector of features using transfer learning. In this paper, we use a vector with size 1000 formed from the last fully connected layer of the GoogleNet [14]. More information about the pre-processing step for the proposed method can be found in the Supplementary Material available for this paper.
Let be training data set with xi ∈ℝn denoting the feature vector and ci ∈ {1, 2} denoting the class (COVID-19 or No COVID-19) for each i ∈ {1, …, N}. N is the number of training data/images used.
The proposed algorithm works per class; therefore, all the calculations are done for each class separately.
The meta-parameters are initialized with the first observed data sample. where µ denotes the mean; V1 denotes the first cluster; p1 is the first prototype of the first cluster, V1; S1 is the corresponding support (number of members); P is the total number of the identified prototypes; r1 is the corresponding radius of the area of influence of V1(in this paper, we use same as [18]; the rationale is that two vectors for which the angle between them is less than π/6 or 30° are pointing in close/similar directions. That is, we consider that two feature vectors can be considered to be similar if the angle between them is smaller than 30 degrees. Note that r∗ is data derived, not a problem- or user- specific parameter. In fact, it can be defined without prior knowledge of the specific problem or data).
The next step is to calculate the data density at the current data point, ; i ∈ {1, …,N}.
Starting from the mutual distances (Euclidean or Mahalanobis type) between the data points (samples) in the feature space it can be demonstrated theoretically [18] that the data density takes the form of a Cauchy type function as in Eq. (2).
Then the algorithm absorbs the new data samples/images, one by one by assigning then to the nearest (in the feature space) prototype, pj* :
Because of this form of assignment, the shape of the data partitioning is of the so-called Voronoi tessellation type [20]. We call all data points associated with a prototype data clouds, because their shape is not regular (e.g., hyper-spherical, hyper-ellipsoidal, etc.) and the prototype is not necessarily the statistical and geometric mean [18].
Then, using the density and the distance to the nearest prototype we check the following conditions [18] based on which we determine if the current data sample/image is going to be added to the set of prototypes as a new prototype or not:
When adding a new data cloud the following updates are being made:
Alternatively, the meta parameters of the nearest data cloud are being updated as follows [18]:
One of the strongest aspects of the proposed approach is its high level of interpretability which comes from its prototype-based nature. Linguistic IF…THEN expressions that represent human reasoning can be formed around the local generative models:
The learning procedure of the proposed approach is summarized by the following algorithm.
Learning Procedure
3 Results
In this section we report the results obtained by the proposed eXplainable Deep Learning classification approach when applied to the COVID-CT-Dataset [19]. Results presented in Table 1 compare the proposed algorithm with other state-of-the-art approaches, including traditional (black-box) deep neural network, Support vector Machines, etc. In summary, the advantages of the proposed method include:
– high precision as compared with the top state-of-the-art algorithms.
– high level of explainabilit.
– no user- or problem- specific algorithmic meta parameters
– non-iterative algorithm able to learn continuously.
Using the proposed method we generated (extracted form the data) linguistic IF…THEN rules which involve actual images of both cases (COVID-19 and NO COVID-19) as illustrated in Figs. (3) and (4). Such transparent rules can be used in the decision-making process for early diagnostics for COVID-19 infection. Rapid detection with high sensitivity of viral infection may allow better control of the viral spread. Early diagnosis of COVID-19 is crucial for the disease treatment and control.
Fig. (5) illustrates the evolving nature of the proposed approach. The proposed approach is able to continuously learn as new data is presented to the system. Therefore, no full retraining is required due to its life-long learning architecture. In the opposite way, the Baseline approach [19] is based on Neural Network Deep Learning and requires full retraining for new data samples, what can be really cost in terms of time and computational complexity.
Computing tomography is a quick non-invasive imaging modality with high accuracy. According to [8, 9] almost all patients with COVID-19 had characteristic CT features during the disease, effects such as different degrees of ground- glass opacities with or without crazy-paving sign, multifocal organizing pneumonia, and architectural distortion in a peripheral distribution. The proposed approach has demonstrated high efficiency on the identification and classification of such characteristics, and then provide high accurate and interpretable results.
4 Conclusion
In this paper we present a new explainable deep learning approach for COVID-19 detection via CT Scan. The proposed approach demonstrates better results in terms of performance than other state-of-the-art approaches, surpassing the baseline Deep Neural Network approach in terms of performance. Moreover, it also provides epxlanations in the form of IF…THEN rules using actual images of CT scans with and without COVID-19. This is of great importance for medical specialists to understand and diagnose COVID-19 at early stages via computed tomography. In addition, this method is fast and can continue to learn from new images which is very important in a real life application. CT can accurately reflect the disease evolution and monitor the treatment effects [21]. Rapid detection and diagnostics of the disease is of supreme importance to ensure timely treatment, and rapid patient isolation in order to slow the spread of the disease [22].
In conclusion, chest CT imaging has high sensitivity for diagnosis of COVID-19. We offer a highly transparent deep learning approach which outperforms state-of-the-art approaches in order to detect COVID-19 via CT.
Data Availability
Data used in this research was provided by: Zhao, Jinyu, et al. "COVID-CT-Dataset: a CT scan dataset about COVID-19." arXiv preprint arXiv:2003.13865 (2020).
Author contributions statement
P. A. conceived and detailed the idea. E. S. designed and implemented the algorithms, designed and performed the experiments. P. A. and E. S. wrote the manuscript and interpreted the results.
Footnotes
e.almeidasoares{at}lancaster.ac.uk
↵* Honorary Professor, Technical University, Sofia, Bulgaria.