RT Journal Article SR Electronic T1 Inherently explainable deep neural network-based interpretation of electrocardiograms using variational auto-encoders JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2022.01.04.22268759 DO 10.1101/2022.01.04.22268759 A1 van de Leur, Rutger R. A1 Bos, Max N. A1 Taha, Karim A1 Sammani, Arjan A1 van Duijvenboden, Stefan A1 Lambiase, Pier D. A1 Hassink, Rutger J. A1 van der Harst, Pim A1 Doevendans, Pieter A. A1 Gupta, Deepak K. A1 van Es, René YR 2022 UL http://medrxiv.org/content/early/2022/01/05/2022.01.04.22268759.abstract AB Background Deep neural networks (DNNs) show excellent performance in interpreting electrocardiograms (ECGs), both for conventional ECG interpretation and for novel applications such as detection of reduced ejection fraction and prediction of one-year mortality. Despite these promising developments, clinical implementation is severely hampered by the lack of trustworthy techniques to explain the decisions of the algorithm to clinicians. Especially, currently employed heatmap-based methods have shown to be inaccurate.Methods We present a novel approach that is inherently explainable and uses an unsupervised variational auto-encoder (VAE) to learn the underlying factors of variation of the ECG (the FactorECG) in a database with 1.1 million ECG recordings. These factors are subsequently used in a pipeline with common and interpretable statistical methods. As the ECG factors are explainable by generating and visualizing ECGs on both the model- and individual patient-level, the pipeline becomes fully explainable. The performance of the pipeline is compared to a state-of-the-art ‘black box’ DNN in three tasks: conventional ECG interpretation with 35 diagnostic statements, detection of reduced ejection fraction and prediction of one-year mortality.Findings The VAE was able to compress the ECG into 21 generative ECG factors, which are associated with physiologically valid underlying anatomical and (patho)physiological processes. When applying the novel pipeline to the three tasks, the explainable FactorECG pipeline performed similar to state-of-the-art ‘black box’ DNNs in conventional ECG interpretation (AUROC 0·94 vs 0·96), detection of reduced ejection fraction (AUROC 0·90 vs 0·91) and prediction of one-year mortality (AUROC 0·76 vs 0·75). Contrary to state-of-the-art, our pipeline provided inherent explainability on which morphological ECG features were important for prediction or diagnosis.Interpretation Future studies should employ DNNs that are inherently explainable to facilitate clinical implementation by gaining confidence in artificial intelligence, and more importantly, making it possible to identify biased or inaccurate models.Funding This study was financed by the Netherlands Organisation for Health Research and Development (ZonMw, no. 104021004) and the Dutch Heart Foundation (no. 2019B011).Evidence before this study A comprehensive literature survey was performed for research articles on interpretable or explainable artificial intelligence (AI) for interpretation of raw electrocardiograms (ECGs) using PubMed and Google Scholar databases. Articles in English up to November 24, 2021, were included and the following key words were used: deep neural network (DNN), deep learning, convolutional neural network, artificial intelligence, electrocardiogram, ECG, explainability, explainable, interpretability, interpretable, and visualization. Many studies that used DNNs to interpret ECGs with high predictive performances were found, some focusing on tasks known to be associated with the ECG (e.g., rhythm disorders) and others identifying completely novel use cases for the ECG (e.g. reduced ejection fraction). All of these studies employed post-hoc explainability techniques, where the decisions of the ‘black box’ DNN were visualized after training, usually using heatmaps (i.e., using Grad-CAM, SHAP or LIME). In these studies, only some example ECGs were handpicked, as these heatmap-based techniques only work on single ECGs. Three studies also investigated the global features of the model by taking a summary measure of the heatmaps, by relating heatmaps to known ECG parameters (i.e., QRS duration) or by using prototypes. No studies investigated whether the features found using heatmaps were robust or reproducible.Added value of this study Currently employed post-hoc explainability techniques, usually heatmap-based, have limited explainable value as they merely indicate the temporal location of a specific feature in the individual ECG. Moreover, these techniques have been shown to be unreliable, poorly reproducible and suffer from confirmation bias. To address this gap in knowledge, we designed a DNN that is inherently explainable (i.e. explainable by design instead of investigating post-hoc). This DNN is used in a pipeline that consists of three components: (i) a generative DNN (variational auto-encoder) that learned to encode the ECG into its underlying 21 continuous factors of variation (the FactorECG), (ii) a visualization technique to provide insight into these ECG factors, and (iii) a common interpretable statistical method to perform diagnosis or prediction using the ECG factors. Model-level explainability is obtained by varying the ECG factors while generating and plotting ECGs, which allows for visualization of detailed changes in morphology, that are associated with physiologically valid underlying anatomical and (patho)physiological processes. Moreover, individual patient-level explanations are also possible, as every individual ECG has its representative set of explainable FactorECG values, of which the associations with the outcome are known. When using the explainable pipeline for interpretation of diagnostic ECG statements, detection of reduced ejection fraction and prediction of one-year mortality, it yielded predictive performances similar to state-of-the-art ‘black box’ DNNs. Contrary to the state-of-the-art, our pipeline provided inherent explainability on which ECG features were important for prediction or diagnosis. For example, ST elevation was discovered to be an important predictor for reduced ejection fraction, which is an important finding as it could limit the generalizability of the algorithm to the general population.Implications of all the available evidence A longstanding assumption was that the high-dimensional and non-linear ‘black box’ nature of the currently applied ECG-based DNNs was inevitable to gain the impressive performances shown by these algorithms on conventional and novel use cases. This study, however, shows that inherently explainable DNNs should be the future of ECG interpretation, as they allow reliable clinical interpretation of these models without performance reduction, while also broadening their applicability to detect novel features in many other (rare) diseases. The application of such methods will lead to more confidence in DNN-based ECG analysis, which will facilitate the clinical implementation of DNNs in routine clinical practice.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study was financed by the Netherlands Organisation for Health Research and Development (ZonMw) with grant number 104021004 and the Dutch Heart Foundation with grant number 2019B011.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This study was approved by the University Medical Center Utrecht ethical committee with number 18-827.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe training datasets used in this study are not openly available due to privacy concerns. The expert-annotated test set is available upon request to the corresponding author. The decoder for the FactorECG is publicly available at https://decoder.ecgx.ai. The code for training and evaluating the β-VAE and the black box DNN is available upon request to the corresponding author. https://decoder.ecgx.ai