Abstract
Background Diagnosis of mitral regurgitation (MR) requires careful evaluation of echocardiography with Doppler imaging. This study presents the development and validation of a fully automated deep learning pipeline for identifying apical-4-chamber view videos with color Doppler and detection of clinically significant (moderate or severe) mitral regurgitation from transthoracic echocardiography studies.
Methods A total of 58,614 studies (2,587,538 videos) from Cedars-Sinai Medical Center (CSMC) were used to develop and test an automated pipeline to identify apical-4-chamber view videos with color Doppler across the mitral valve and then assess mitral valve regurgitation severity. The model was tested on an internal test set of 1,800 studies (80,833 videos) from CSMC and externally evaluated in a geographically distinct cohort of 915 studies (46,890 videos) from Stanford Healthcare (SHC).
Results In the held-out CSMC test set, the view classifier demonstrated an AUC of 0.998 (0.998 - 0.999) and correctly identified 3,452 of 3,539 MR color Doppler videos (sensitivity of 0.975 (0.968-0.982) and specificity of 0.999 (0.999-0.999) compared with manually curated videos). In the external test cohort from SHC, the view classifier correctly identified 1,051 of 1,055 MR color Doppler videos (sensitivity of 0.996 (0.990 – 1.000) and specificity of 0.999 (0.999 – 0.999) compared with manually curated videos). For evaluating clinically significant MR, in the CSMC test cohort, moderate-or-severe MR was detected with AUC of 0.916 (0.899 - 0.932) and severe MR was detected with an AUC of 0.934 (0.913 - 0.953). In the SHC test cohort, the model detected moderate-or-severe MR with an AUC of 0.951 (0.924 - 0.973) and severe MR with an AUC of 0.969 (0.946 - 0.987).
Conclusions In this study, we developed and validated an automated pipeline for identifying clinically significant MR from transthoracic echocardiography studies. Such an approach has potential for automated screening of MR and precision evaluation for surveillance.
Introduction
Mitral regurgitation (MR) is one of the most common forms of valve disease, affecting more than 4 million Americans.1,2,3 Often progressing insidiously and frequently underrecognized3, both primary MR as well as secondary MR can be initially asymptomatic but lead to worsening heart failure and mortality.1,4–6,7,8 There has been an increased focus on early MR diagnosis given advances in surgical and transcatheter treatment options1,9,10. Echocardiography with color Doppler is the most common method of initial evaluation of MR, with a holistic assessment combining left atrial size, effective regurgitant orifice area, regurgitant fraction, regurgitant volume, as well as other key clinical factors to accurately assess disease severity.11,12 Despite ultrasound technology becoming more widely available, accurate assessment of MR still requires experienced expert image acquisition and evaluation.
Recent advances in machine learning offer opportunities to automate time-consuming steps in the interpretation of medical imaging. Artificial intelligence (AI) has the ability to precisely phenotype subtle cardiac physiology as well as identify imaging features of disease severity not recognized by clinicians.13–16 Deep learning has been applied to echocardiography to improve the precision of common measurements, such as left ventricular ejection fraction13 and wall thickness15,17, as well as streamlining assessment of aortic stenosis18, hypertrophic cardiomyopathy (HCM)15, and cardiac amyloidosis (CA).15,19 With increased ultrasound availability, AI guidance has been developed for both image acquisition and interpretation13,20. With the increasing prevalence of MR in an aging population with co-morbid heart failure, AI could aid in MR screening and surveillance.21–25
In this study, we developed and evaluated a deep learning pipeline’s performance in automating identification of MR from standard transthoracic echocardiogram studies. We hypothesized that a deep learning approach can identify color Doppler apical-4-chamber videos and assess MR severity with high-throughput automation, and this automated pipeline was evaluated in two geographically distinct cohorts (Figure 1). Combined with other echocardiography AI algorithms, such an approach can be used for serial surveillance and screening of mitral regurgitation.
Methods
Study Population and Data Source
Cedars-Sinai Medical Center (CSMC) Cohort
A total of 58,614 transthoracic echocardiogram studies from 38,461 patients receiving care at Cedars-Sinai Medical Center (CSMC) between October 11, 2011 and June 04, 2022 were used to train and evaluate the deep learning models. A total of 2,587,538 videos (an average of 44 videos per study after excluding still images) were initially sourced from Digital Imaging and Communications in Medicine (DICOM) files and underwent de-identification, view classification, and pre-processing into AVI videos. 354,117 videos were classified as apical-4-chamber videos using an automated view classifier, and then manually curated to identify 34,714 videos with color Doppler across the mitral valve.26
Following view selection, the CSMC cohort included 34,714 videos across 30,453 unique echocardiogram studies from 22,661 patients, and a subset enriched for moderate and severe MR were used for training. A total of 20,604 videos from 18,133 studies in the dataset were split on a patient level into train (80%), validation (10%), and test (10%) cohorts to train a deep neural network for MR severity classification (Figure 2). MR severity for each study was determined based on the clinical echocardiogram report determined in a high volume echocardiography lab in accordance with ASE guidelines.27 When MR was characterized as an intermediate category (“trace to mild” or “mild to moderate” or “moderate to severe”), videos were placed in the more severe categories. Both primary and secondary MR were included. Studies with concomitant mitral stenosis, other prosthetic valves, and heart failure were also included in both training and validation datasets.
Stanford Healthcare (SHC) Cohort
The pipeline was evaluated on 915 studies (containing a total of 46,890 videos) from SHC’s high-volume academic echocardiography lab. The automated view classification pipeline was compared with manual curation of videos within those studies to evaluate specificity. All videos identified by the view classifier were used for downstream MR severity model validation. Model output was compared with MR severity determined by expert cardiologists from the clinical reports. This study was approved by the Institutional Review Boards at Cedars-Sinai Medical Center and Stanford Healthcare. The need for informed consent was waived as the study involved secondary analysis of existing data.
AI Model Training
Deep learning models were trained using the PyTorch Lightning deep learning framework. When patients had multiple echocardiogram videos and studies, each video was considered an independent example during training, with care not to have patient overlap across training, validation, and test cohorts. Video-based convolutional neural networks (R2+1D) were used for view classification and MR severity assessment.28 This model architecture was previously used for other echocardiography tasks and shown to be effective.17 The models were initialized with random weights and trained using a binary cross entropy loss function for up to 100 epochs, using an ADAM optimizer, an initial learning rate of 1e-2, and a batch size of 24 on two NVIDIA RTX 3090 GPUs. Early stopping was performed based on the validation loss.
The view classifier was trained using the 34,714 manually curated videos with color Doppler across the mitral valve as cases and 49,263 other apical-4-chamber videos as controls. Controls were a combination of videos that did not have color Doppler or had color Doppler window not focused on the mitral valve (videos focused on the tricuspid valve, intra-atrial septum, or ventricular septum). The MR severity model was trained on 6,206 videos without MR, 6,128 videos with mild MR, 6,174 videos with moderate MR and 2,042 videos with severe MR. This process is summarized in Figure 2.
Statistical Analysis
Model performance was evaluated using area under the receiver operating characteristic curve (AUROC) and confusion matrices. F1-score, recall (sensitivity), positive predictive value (PPV), and negative predictive value (NPV) for both greater than moderate MR and severe MR. During external validation, the view classifier and MR classifier were evaluated serially as an automated pipeline. Statistical analysis was performed in Python (version 3.8.0) and R (version 4.2.2). Confidence intervals were computed via bootstrapping with 10,000 samples. Reporting of study results are consistent with guidelines put forth by CONSORT-AI.29,30
Model Explainability
The key imaging features identified by the MR severity model were evaluated using saliency mapping generated using the Integrated Gradients method.31 This method generated a heatmap for every frame of the video, summarized as a final 2-dimensional heatmap generated by using the maximum value along the temporal axis for each pixel location in the video. Pixels brighter in intensity and closer to yellow were more salient to model predictions, while those darker in color were less important to the model’s final prediction. When assessing videos with no MR, heatmaps were obtained by taking the maximum of saliency maps for the moderate and severe class output neurons for each pixel location (Figure 4).
Results
Study Population
A total of 58,614 studies from 38,461 patients were used to train the deep learning pipeline. From 2,587,538 initial videos, a total of 354,117 videos were identified as apical-4-chamber and subsequently manually curated to identify videos that had color Doppler across the mitral valve. The manually curated color Doppler videos were used to train a view-classification model and linked with clinician reports to train the MR severity model. Patient characteristics are presented in Tables 1 & 2 and are representative of the general CSMC patient population that received echocardiograms. The data was split on patient level for training and validation and had similar patient age, ejection fraction, left atrial volume index and proportions of male sex, coronary artery disease, and atrial fibrillation.
View Classifier Performance Across Two Institutions
On a test set of 3,109 studies (132,767 videos) from CSMC not seen during model training, the view classifier identified 3,452 videos (97.5% of manually identified cases). This corresponds to an of AUC of 0.998 (0.998 – 0.999), and at the Youden Index, with a sensitivity of 0.975 (0.968 - 0.982) and specificity of 0.999 (0.999-0.999). To evaluate generalization of the view classification model at a geographically distinct site, we evaluated its performance on 915 studies from SHC. The view classifier isolated 1,091 videos from a total of 46,890 videos, while manual review identified 1,055 videos with color Doppler across the mitral valve. The view classifier correctly identified 1,051 (99.6%) of manually curated videos, with 4 videos not found by the AI pipeline and 40 false positives. This corresponds to a sensitivity of 0.996 (0.990 – 1.000) and specificity of 0.999 (0.999 – 0.999).
Mitral Regurgitation Severity Performance Across Two Institutions
The MR severity model showed strong performance in distinguishing MR severity and identifying clinically significant mitral regurgitation (Figure 3). In the internal CSMC test set not used during model training, the model demonstrated an AUC of 0.916 (0.899 - 0.932) in detecting ≥ moderate MR and an AUC of 0.934 (0.913 - 0.953) for severe MR. The AI model had an NPV of 0.954 (0.940 - 0.967) for severe MR and an NPV of 0.863 (0.835 - 0.890) for ≥ moderate MR. Further information on MR model performance is presented in Table 3. The MR severity model performance was similar across institutions. In the SHC cohort, the model identified severe MR with an AUC of 0.969 (0.946 - 0.987) and ≥ moderate MR with an AUC of 0.951 (0.924 - 0.973). In this cohort, the model had an NPV of 0.977 (0.962 – 0.990) for severe MR and an NPV of 0.986 (0.974 – 0.995) for ≥ moderate MR.
Model Interpretation
Notably, saliency maps for our model demonstrate that the model focuses on the clinically relevant imaging features of mitral regurgitation. Saliency maps from Integrated Gradients were used to identify regions of interest in each video contributing the most to detection of MR severity (Figure 4).31 These interpretability techniques demonstrated localization of the activation signal in the color Doppler window and primarily highlighting the regurgitant jet, indicating that the model used appropriate, physiologic features of the mitral regurgitation to make predictions. Frame-by-frame saliency visualizations are shown in Supplemental Videos S1-S4.
Discussion
We developed and validated an automated pipeline for assessing for mitral regurgitation in echocardiography. From a full transthoracic echocardiogram study, the algorithm automatically screens for appropriate A4C videos with color Doppler on the mitral valve and then assesses MR severity. For both severe MR as well as ≥ moderate MR, the model demonstrated strong performance (> 0.916 AUC and > 0.863 NPV). This automated workflow worked in unselected external validation studies without preselection or exclusion of other concomitant comorbidities. Given these characteristics, our deep learning model could aid in the preliminary assessment of MR, facilitate review of institutional databases, or expand access for screening in low-resource settings.
Our algorithm learns features of mitral regurgitation that generalize across variability in imaging practices in two geographically distinct sites. Many prior echocardiography AI models primarily focused on black-and-white standard 2D B-mode images, while our study focuses on the AI assessment of color Doppler videos and utilized a video-based model for the incorporation of rich temporal information, both crucial for accurate MR assessment. Incorporation of Doppler information greatly expands the opportunities for AI in echo, particularly in valve disease. In expert clinical interpretation, a variety of metrics beyond just color Doppler and views are synthesized together to come up with a holistic assessment of MR severity. Intriguingly, our AI algorithm generally results in concordant interpretations with the comprehensive clinical approach while relying only on the A4C view, suggesting there is significant overlapping information as well as dependence on the A4C view in standard clinical practice.
While promising, the present work carries limitations. Echocardiographic assessment of MR depends on appropriate images being obtained, with different views potentially maximizing the visualized regurgitant jet. This algorithm would not overcome incomplete input information and insufficient images that would result underestimation of MR. Future work could focus on automatically quantifying parameters like valve leaflet thickness, effective regurgitant orifice area, regurgitant volume and fraction.
The present work builds upon prior work in the space of echocardiography and AI. Several recent works have reported strides in computer vision and echocardiography, including automated view classification32,33, phenotyping of left ventricular hypertrophy15, assessment of LV systolic function13, aortic stenosis risk stratification, and detection of complex congenital heart defects.34 Prior work in machine learning applied to MR has primarily focused on structured data and non-deep learning approaches. The combination of our algorithm with previously published works using AI to guide novices in acquiring imaging could potentially increase access to screening of MR.20,35
In summary, we introduce a model to screen for and stratify mitral regurgitation severity from transthoracic echocardiogram videos. To do so, we provide a workflow for isolating mitral valve color Doppler videos and automation of MR severity assessment. The models were evaluated to have good performance in internal and external test cohorts. The use of such a model, with a high AUC, NPV, and generalizability across sites, can open the door for screening of mitral valve disease in the primary care setting or in low-resource environments.
Disclosures
A.V., G.D., M.V., and D.L. report no disclosures. S.C. reports consulting fees from UCB and Viz.ai and research grants NIH R01-HL131532 and NIH R01-HL142983. D.O. reports support from NIH NHLBI, NIH grant R00-HL157421 and AstraZeneca Alexion, as well as consulting from EchoIQ, Ultromics, Pfizer, InVision, Korean Society of Echo, and Japanese Society of Echo.
Code Statement
Our code and model weights are available at https://github.com/echonet/MR
Acknowledgements
A.V. is a research fellow supported by the Sarnoff Cardiovascular Research Award. S.C. acknowledges support from the Erika J Glazer Family Foundation.