Artificial Intelligence-Enhanced Echocardiographic Assessment of the Aortic Valve Stenosis Continuum
====================================================================================================

* Jiesuck Park
* Jiyeon Kim
* Jaeik Jeon
* Yeonyee E. Yoon
* Yeonggul Jang
* Hyunseok Jeong
* Youngtaek Hong
* Seung-Ah Lee
* Hong-Mi Choi
* In-Chang Hwang
* Goo-Yeong Cho
* Hyuk-Jae Chang

## ABSTRACT

**Background** Transthoracic echocardiography (TTE) is the primary modality for diagnosing aortic valve stenosis (AVS), yet it requires skilled operators and can be resource-intensive.

**Objectives** To develop and validate an artificial intelligence (AI)-based system for evaluating AVS that is effective in both resource-limited and advanced settings.

**Methods** We created a dual-pathway AI system for AVS evaluation using a nationwide echocardiographic dataset (developmental dataset, n=8,427): 1) a deep learning (DL)-based AVS continuum assessment algorithm using limited 2D TTE videos, and 2) automating conventional AVS evaluation. We performed internal (internal test dataset [ITDS], n=841) and external validation (distinct hospital dataset [DHDS], n=1,696; temporally distinct dataset [TDDS], n=772) for diagnostic value across various stages of AVS and prognostic value for composite endpoints (cardiovascular death, heart failure, and aortic valve replacement)

**Results** The DL index for the AVS continuum (DLi-AVSc, range 0-100) increases with worsening AVS severity and demonstrated excellent discrimination for any AVS (AUC 0.87-0.99), significant AVS (0.93-0.97), and severe AVS (0.97). A 10-point increase in DLi-AVSc was associated with an 85% increased risk for composite endpoints in ITDS and a 53% and 59% increase in DHDS and TDDS, respectively. Automatic measurement of conventional AVS parameters demonstrated excellent correlation with manual measurement, resulting in high accuracy for AVS staging (98.2% for ITDS, 81.0% for DHDS, and 96.8% for TDDS) and comparable prognostic value to manually-derived parameters.

**Conclusions** The AI-based system provides accurate and prognostically valuable AVS assessment, suitable for various clinical settings. Further validation studies are planned to confirm its effectiveness across diverse environments.

Keywords
*   Aortic valve stenosis
*   artificial intelligence
*   echocardiography
*   diagnostic accuracy
*   prognostic value

## 1. INTRODUCTION

Medical advancements have significantly increased life expectancy, with about 10% of the global population over 60, projected to double by 2050.1 This aging demographic notably increased the incidence of degenerative diseases like aortic valve stenosis (AVS). Studies revealed that 12.4% of individuals aged 75 and older have some degree of AVS, with severe cases at 3.4%.2 Untreated AVS can cause irreversible myocardial damage, characterized by left ventricular hypertrophy, fibrosis, and functional impairment, leading to increased morbidity, mortality, and socioeconomic burden.3 Therefore, timely detection and management of AVS are essential to mitigate its severe consequences.

Transthoracic echocardiography (TTE) is the primary imaging modality for assessing AVS. Accurate identification and staging of AVS via TTE require advanced expertise in scanning and interpretation, often unavailable in a general community healthcare setting. Even in tertiary care centers, the process is time-consuming and labor-intensive, involving multiple measurements, calculations, and precise interpretation. These complexities highlight the need for innovative solutions that simplify AVS assessment. Such solutions would be particularly beneficial in settings with limited resources by using fewer TTE videos and in more advanced settings by automating the measurement and interpretation processes.

To meet these clinical needs and advance beyond existing research,4–6 we developed a comprehensive artificial intelligence (AI)-based system to evaluate AVS, suitable for both resource-limited and advanced settings. This system uses deep learning (DL) to diagnose and assess AVS from limited 2-dimensional (2D) TTE videos. Importantly, it does not merely classify the AVS severity but is designed to reflect the disease’s progressive continuum. Simultaneously, the system automatically measures a broad spectrum of structural and hemodynamic parameters, facilitating the conventional calculation of the aortic valve area (AVA) and providing a quantitative assessment of AVS. This paper describes the development process of our AI-based system and evaluates its diagnostic and prognostic potential in assessing AVS.

## 2. METHODS

### 2.1. Study Population and Data Sources

The AI-based frameworks utilized in this study were developed and validated using the Open AI Dataset Project (AI-Hub) dataset, an initiative supported by the South Korean government’s Ministry of Science and ICT.7 This dataset consists of 30,000 echocardiographic examinations retrospectively collected from five tertiary hospitals between 2012 and 2021, covering a wide range of cardiovascular diseases.(**Supplemental Methods 1**) The AI-based frameworks introduced here were all developed using data extracted from the AI-Hub dataset.8–10 To develop the DL-based AVS continuum assessment algorithm, a key focus of this study, we assembled the **Development Dataset (DDS**) by deliberately excluding Severance Hospital data among five hospitals. Instead, data from Severance Hospital were used exclusively for external validation (**Distinct Hospital Dataset, DHDS**). Further external validation was conducted using data collected from Seoul National University Bundang Hospital in 2022 (**Temporally Distinct Dataset, TDDS**). Detailed methodologies for data utilization in developing and validating the AI-based system are in **Supplemental Methods 1**. As a result, the DDS comprised TTE images from 8,427 patients, while the DHDS included 1,696 patients, and the TDDS included 772 patients. The study followed the Declaration of Helsinki (as revised in 2013). The institutional review board of each hospital approved this study and waived the requirement for informed consent because of the retrospective and observational nature of the study design. All clinical and echocardiographic data were fully anonymized before data analysis.

### 2.2. Echocardiogram Acquisition and Interpretation

All echocardiographic studies were conducted by trained echocardiographers or cardiologists and interpreted by board-certified cardiologists specialized in echocardiography. These reports adhered to the recent guidelines11,12 and were part of routine clinical care. The parameter values in these reports were used as ground truth labels without additional measurements. In the DDS, AVS presence and severity were determined using these values following the standard clinical criteria (**Table 1**).11 In the DHDS and TDDS, the prior clinician’s decision regarding AVS severity in the clinical report was used to reflect actual clinical practice.

View this table:
[Table 1.](http://medrxiv.org/content/early/2024/07/09/2024.07.08.24310123/T1)

Table 1. 
Study population

### 2.3. AI-Based System

We have developed a fully automated AI-based framework that addresses AVS evaluation through the dual pathway, leveraging innovative and conventional methodologies. (**Central Illustration**) The operational sequence of this system begins by automatically selecting the necessary views, including the parasternal long-axis (PLAX), parasternal short-axis (PSAX) at the aortic valve (AV) level, AV continuous wave (CW) and pulsed wave (PW) Doppler, and left ventricular outflow tract (LVOT) PW Doppler. In the DL-based AVS continuum assessment pathway, the algorithm evaluates AVS using only the PLAX and PSAX videos. Concurrently, the DL segmentation network generates masks for each view in the automated conventional AVS assessment pathway. These masks facilitate the measurement of LVOT diameter from the PLAX view and analyze spectral Doppler images to ascertain key indicators such as AV peak velocity (Vmax), AV velocity time integral (VTI), AV mean pressure gradient (mPG), and LVOT VTI. Then, the system calculates AVA, enabling quantitative evaluation of AVS. This dual approach (DL-based AVS continuum assessment and automated conventional AVS assessment) has the potential to support both resource-limited and advanced settings.

#### 2.3.1. View Classification

To assess AVS, we improved our preexisting view classification algorithm.8 The algorithm could already identify the PLAX view, PSAX at the AV level, AV CW Doppler from apical views, AV PW Doppler, and LVOT PW Doppler. We augmented it to recognize the PLAX-AV zoomed views and the AV CW Doppler obtained from the right parasternal view. Detailed information about this development is in **Supplemental Method 2**.

#### 2.3.2. DL-based AVS Continuum Assessment Algorithm

Our objective was to develop a network that classifies AVS severity in a way that reflects its continuum nature rather than just discrete categories. We used 3-dimensional (3D) convolutional neural networks (CNNs; r2plus1d18) as a backbone to separate spatial and temporal filters.13 (**Supplemental Methods 3**) This network processes input videos from PLAX and PSAX at the AV level to output a score predicting the AVS severity, entitled the DL index for the AVS continuum (DLi-AVSc). To achieve accurate classification reflecting the AVS continuum, we implemented two strategies: 1) continuous mapping with ordered labels and 2) multi-task learning with auxiliary tasks that predict numeric parameters indicative of the AVS continuum, such as AV Vmax, mPG, and AVA. Conventional multi-class classification with cross-entropy loss was unsuitable for reflecting the AVS continuum as it fails to capture the disease’s progressive nature due to equidistance between one-hot encoded severity levels. Instead, the continuous approach assigns each severity level a value between 0 and 1 (e.g., Normal: 0, Sclerosis: 0.25, Mild: 0.5, Moderate: 0.75, Severe: 1) and trains the model by minimizing negative Bernoulli likelihood *LBernoulli*. While this method reflects AVS progression, it primarily converts discrete labels into continuous values. To truly capture the continuum and enable nuanced transitions within and between severity levels, we incorporated three auxiliary tasks predicting TTE parameters based solely on 2D TTE videos. These tasks, predicting Vmax, mPG, and AVA, provide rich information content, allowing the network to learn anatomical features and the motion of the AV. The loss function for each auxiliary task is the mean squared error (MSE) between the predicted and actual TTE parameter values: ![Graphic][1]</img>. Training the network to predict continuous TTE parameters allows it to capture both discrete transitions and subtle variations within each severity category. For instance, it can distinguish between cases classified as “moderate” closer to mild AVS and those nearing severe AVS. The combined loss function integrates the negative Bernoulli likelihood and the MSE losses for the auxiliary tasks ![Graphic][2]</img> where λ is a weighting parameter balancing the contributions of the classification and regression tasks. Detailed network configurations and implementation details are in **Supplementary Methods 3**.

#### 2.3.3 Automated Conventional AVS Assessment Algorithm

Our AI-based system also automates the conventional method to calculate AVA and assess AVS severity. Automating conventional AVA assessment in our system involves three key steps: 1) segmentation of anatomical structures and spectral Doppler envelopes, 2) uncertainty quantification to assess the confidence of the predicted segmentation masks, 3) post-processing algorithms to extract clinical measurements from segmentation masks.

We had previously developed and validated algorithms for analyzing spectral Doppler by segmenting the Doppler envelope to capture velocity profiles with essential topological features.9,10 This approach automatically measures AV Vmax, AV VTI, and LVOT VTI by segmenting Doppler envelopes in every analyzable cycle in all provided images. In this study, to quantify AVA, we further developed a DL network based on the SegFormer transformer architecture to measure the LVOT diameter in the PLAX view.14 This advanced model can segment all anatomical structures visible in the PLAX view, including the left ventricle (LV), LV septum and posterior wall, left atrium, right ventricle, aorta, and even the mitral valve and AV. Detailed information is provided in **Supplemental Methods 4** and **Videos S1**.

Deep segmentation networks are highly effective due to their ability to learn complex patterns and features from large datasets. However, quantifying uncertainty in their predictions is crucial because segmentation errors can impact subsequent post-processing for automatic measurement. To address this, we used predictive entropy from the segmentation network’s probability map, which combines two sources of uncertainty: lack of knowledge in DL (epistemic uncertainty) and poor data quality (aleatoric uncertainty).15 By evaluating the predictive entropy, cases requiring manual review due to poor image quality or model uncertainty can be identified. Detailed methodologies are provided in **Supplemental Method 5** and **Videos S2.**

In the post-processing stage, the segmented masks were utilized to extract clinical measurements. From the predicted segmentation mask, we identified points where the mitral valve intersects with the aorta and where the septum intersects with the aorta to determine annulus points. Considering the differing opinions on the appropriate location for measuring the LVOT diameter,16 our algorithm was designed to measure the LVOT diameter at three different locations: at the annulus, 2.5mm, and 5mm away from the annulus towards the LV cavity. In this study, the measurements taken at the annulus were used for analysis as they showed the highest agreement with the ground truth. For technical details and performance information, please refer to **Supplemental Method 6** and **Video S1**.

For spectral Doppler images, AV Vmax and VTI were derived from the segmented Doppler envelope of AV CW Doppler. This analysis included AV CW Doppler obtained from both the apical and right parasternal views, selecting the largest envelope across all cycles in all images to obtain AV Vmax and VTI. The LVOT PW Doppler analysis also spanned all cycles, using the average value of LVOT VTI to avoid overestimating LVOT flow.12 These measurements were then used to calculate mPG and AVA, which were used to assess the presence and severity of AVS.11

### 2.4 Ascertainment of Clinical Information and Outcome Definition

The clinical data were acquired by a dedicated review of the electronic health records at the study institutions. The clinical outcome was defined as a composite endpoint of cardiovascular death, hospitalization for heart failure, and AV replacement via surgical or transcatheter approaches.

### 2.5 Validation of AI-Based AVS Evaluation System and Statistical Analysis

Our AI-based framework was validated using an internal test dataset (ITDS) and two external datasets (DHDS and TDDS). The view classification algorithm, the shared initial step, was evaluated against human expert labels. Precision, recall, and F1 scores were calculated for each view, with overall accuracy determined by the ratio of correctly classified images to the total number of images.

Subsequently, we evaluated the two AI-based pathways. The performance of the DL-based AVS continuum assessment algorithm was evaluated by examining the distribution of the DLi-AVSc across various stages using violin plots. We also assessed the correlation of DLi-AVSc with conventional parameters (AV Vmax, mPG, and AVA). To verify that DLi-AVSc accurately reflects the continuum of AVS progression, we used Uniform Manifold Approximation and Projection (UMAP) to visualize this relationship,17 projecting the data into a 2D space, using 15 nearest neighbors, a minimum distance of 0.1, and the Euclidean distance. To highlight the areas with the greatest influence on the model’s prediction, we generated saliency maps using the Gradient-weighted Class Activation Mapping (Grad-CAM).18 We present representative samples for each severity level in both PLAX and PSAX views.

The conventional AVS assessment algorithm was validated by comparing AI-derived parameters with manual measurements. Since these parameters are not typically measured in normal or AV sclerosis groups, the comparison was limited to the AVS group. Moreover, as manual measurements were not always available for all AVS cases, details on ground truth measurements availability and the success rate of automatic measurements are provided in **Supplemental Methods 7**. The correlation between automated and manual measurements was assessed using the Pearson Correlation Coefficient (PCC). The AVS severity determined from the automatic measurements was also compared to the ground truth label made by the clinician’s prior decision.

We also evaluated the discrimination ability of the DLi-AVSc and other AI-derived conventional parameters for various stages of AVS, including mild or greater AVS (any AVS), moderate or greater AVS (significant AVS), and severe AVS. This evaluation was conducted through receiver operating characteristic (ROC) curve analysis, from which we calculated the area under the curve (AUC).

Lastly, we assessed the prognostic capability of AI-derived parameters for composite endpoints. Specifically, we conducted a spline curve analysis for our novel index, the DLi-AVSc, to visualize its predictive power. Additionally, we applied Cox regression analysis to validate the prognostic relevance of the DLi-AVSc and other AI-derived AVS parameters, with adjustment for clinical risk factors (age, sex, body mass index, hypertension, and diabetes).

## 3. RESULTS

### 3.1 Baseline Characteristics

The distribution of AVS severity across three datasets is shown in **Table 1**: ITDS (n=841), DHDS (n=1,696), and TDDS (n=772). ITDS and TDDS exhibited a higher prevalence of mild AVS (28% and 41%, respectively), with fewer moderate and severe cases. Conversely, DHDS displayed a more balanced severity distribution (12% mild, 15% moderate, and 12% severe, respectively). Baseline clinical characteristics are available in **Supplemental Result 1**.

### 3.2 View Classification

Our view classification algorithm accurately identified the required images for assessing AVS across all datasets. The overall accuracy rates were 99.6% for ITDS, 99.5% for DHDS, and 99.4% for TDDS. Detailed metrics are in **Supplemental Result 2**.

### 3.3 Performance of DL-Based AVS Continuum Assessment Algorithm

The distribution of the DLi-AVSc, produced by the DL-based AVS continuum assessment algorithm, exhibited a consistent trend of increasing scores with the severity of AVS across all datasets. (**Figure 1A**) Interestingly, at the AV sclerosis stage, the DLi-AVSc already significantly increased compared to the normal stage, indicating the algorithm’s ability to detect early changes. When discordant cases excluded from the training dataset were included in the ITDS, mild to moderate and low-flow, low-pressure gradient moderate AVS were distributed between mild and moderate AVS, while moderate to severe and low-flow, low-pressure gradient severe AVS were distributed between moderate and severe AVS. (**Supplemental Results 3**) The DLi-AVSc demonstrated an increasing trend as conventional parameters assessing AVS severity, such as AV Vmax, mPG, and AVA, worsened. (**Supplemental Results 4).**

![Figure1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/09/2024.07.08.24310123/F1.medium.gif)

[Figure1](http://medrxiv.org/content/early/2024/07/09/2024.07.08.24310123/F1)

Central Illustration: AI-Enhanced Echocardiographic Assessment of AVS Continuum
The illustration depicts a dual-pathway AI system for evaluating AVS. The top row illustrates the DL-based assessment of the AVS continuum using limited views, providing a unique DL index for the AVS continuum, termed DLi-AVSc. The bottom row demonstrates the automated AVS assessment, which derives conventional echocardiographic AVS parameters. By integrating both pathways, our AI system enables accurate AVS diagnosis and prognostication, making it broadly applicable in advanced and resource-limited settings.

AVS, aortic valve stenosis; DL, deep-learning; DLi-AVSc, DL index for the AVS continuum

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/09/2024.07.08.24310123/F2.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2024/07/09/2024.07.08.24310123/F2)

Figure 1. The Distribution of DLi-AVSc According to AVS Severity and UMAP Visualization
(A) The DLi-AVSc, generated by the DL-based AVS continuum algorithm, showed a consistent trend of increasing scores with the progression of AVS severity observed across both internal and external datasets. (B) The UMAP plot demonstrates a continuous nonlinear gradient transition from the normal state (grey) through AV sclerosis (yellow) to advanced AVS stages (red), visually underscoring the DLi-AVSc accurately representing the AVS continuum.

Abbreviations as in Central Illustration: DHDS, distinct hospital dataset; ITDS, internal test dataset; TDDS, temporally distinct dataset; UMAP, uniform manifold approximation and projection

Furthermore, when we utilized UMAP to verify that the DLi-AVSc accurately represents the AVS continuum, the DLi-AVSc, derived from the approach incorporating both ordered labels and multi-task learning, displayed a distinct continuous gradient from normal through AV sclerosis to advancing AVS stages, consistently evident in ITDS and both external datasets. (**Figure 1B**) In contrast, a conventional multi-class classification approach using 5-class cross-entropy loss resulted in the stage-based grouping but lacked the continuous progression seen in our approach. The continuous mapping with ordered labels approach, but without additional multi-task learning to predict key TTE parameters, appeared somewhat linear but did not accurately reflect the severity progression. (Supplemental Results 5)

For each severity level, we present representative samples with Grad-CAM saliency maps overlaid on both PLAX and PSAX views, specifically localizing the AV. (Supplemental Results 6 and Video S3) These results demonstrate that our model accurately identifies the relevant regions for evaluating AVS across all severity levels and views without supervision.

### 3.4 Performance of Automated Conventional Assessment Algorithm

Our algorithm’s automatic measurements demonstrated high correlations with the ground truth values for AV Vmax (PCC 0.974-0.991) and mPG (PCC 0.966-0.991). (**Figure 2A**) The correlation for AVA (PCC 0.789-0.887) was also good but relatively lower than Vmax and mPG, as AVA is calculated from multiple measurements. Missing measurements resulted in fewer comparison cases (**Supplemental Methods 7**), and accumulated differences affected the overall accuracy. The overall accuracy of AVS severity classification among any AVS based on these automated measurements was 98.2% for ITDS, 81.0% for DHDS, and 96.8% for TDDS. (**Figure 2B**)

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/09/2024.07.08.24310123/F3.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2024/07/09/2024.07.08.24310123/F3)

Figure 2. Concordance in AVS Diagnosis Between DL-based Automated Assessment and Conventional Evaluation
(A) Across all datasets, the auto-measured AVS parameters (AV maximal velocity, mean pressure gradient, and valve area) strongly correlated with those obtained from manual measurements. (B) Consequently, AVS gradings from both methods exhibited a high concordance rate, ranging from 81% to 96.8%.

Abbreviations as in Figure 1: AV, aortic valve; AVA, aortic valve area; mPG, mean pressure gradient; Vmax, maximal velocity.

### 3.5 Comparison of Diagnostic Performance of Two Different AI-Based Approach

The discrimination performance of DLi-AVSc for various stages of AVS was generally excellent: AUC 0.87-0.99 for any AVS, 0.93-0.97 for significant AVS, and 0.97 for severe AVS (**Figure 3**) When compared to automatically measured conventional parameters, in ITDS, the discrimination performance of DLi-AVSc was lower than that of automatically measured Vmax and mPG but comparable to AVA. In DHDS, the performance of DLi-AVSc surpassed AVA in diagnosing all stages of AVS, while in TDDS, it was comparable to AVA for diagnosing significant and severe AVS.

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/09/2024.07.08.24310123/F4.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2024/07/09/2024.07.08.24310123/F4)

Figure 3. Diagnostic Performances of DLi-AVSc and Other AI-derived Conventional AVS Parameters Across Various Stages.
The discriminative ability of DLi-AVSc and other conventional AVS parameters was consistently excellent for diagnosing any AVS, significant AVS (moderate to severe), and severe AVS across all datasets: (A) ITDS, (B) DHDS, and (C) TDDS.

Abbreviations as in Figures 1 and 2: AUC, the area under the curve; DLi-AVSc, DL index for the AVS continuum

### 3.6 Prognostic Value of AI-Based AVS Assessment

Analysis of spline curves across the ITDS, DHDS, and TDDS showed that an increase in DLi-AVSc correlated with a rising risk of adverse clinical outcomes. (**Figure 4**) The multivariable Cox regression analysis affirmed the strong and independent prognostic value of DLi-AVSc. A 10-point increase in DLi-AVSc from limited TTE videos was associated with an 85% increase in adverse outcome risk in ITDS and a 53 and 59% increase in DHDS and TDDS, respectively. (**Figure 5**) Moreover, the AI-derived parameters, such as Vmax, mPG, and AVA, demonstrated prognostic values comparable to those of manually-derived parameters. (**Figure 5**)

![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/09/2024.07.08.24310123/F5.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2024/07/09/2024.07.08.24310123/F5)

Figure 4. Spline Curves for Composite Outcomes Associated with DLi-AVSc
The risk of composite outcome gradually increased with higher DLi-AVSc across all datasets:

(A) ITDS, (B) DHDS, and (C) TDDS. The solid lines represent the hazard ratio, and the blue shaded area represents the 95% confidence interval.

Abbreviations as in Figures 1 and 2.

![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/09/2024.07.08.24310123/F6.medium.gif)

[Figure 5.](http://medrxiv.org/content/early/2024/07/09/2024.07.08.24310123/F6)

Figure 5. Prognostic value of DL-AVSc and AVS parameters
The DLi-AVSc showed independent predictive value for composite outcomes. Similarly, other AI-derived AVS parameters were significant predictors for composite outcomes as well as manually-derived AVS parameters.

Abbreviations as in Figures 1 and 2: HR, hazard ratio

## DISCUSSIONS

We have developed and validated a comprehensive AI-based system to evaluate AVS, suitable for both resource-limited and advanced settings. It addresses AVS through the dual pathway: 1) It can evaluate the presence and severity of AVS using only the PLAX and PSAX videos initially acquired during TTE, and 2) if additional images are obtained in advanced settings, it can automatically analyze these to diagnose and assess AVS using conventional methods. Internal and external validation demonstrated excellent diagnostic accuracy and strong prognostic capabilities.

While our AI-based system is not the first to evaluate AVS, it stands apart from previous studies by enabling both automation of conventional measurements and evaluation using limited 2D TTE videos. Prior research has typically focused on one of these aspects. For instance, Krishna et al. developed an AI model to automate quantitative AVS evaluation.6 However, their model did not include the crucial initial visual analysis of the AV from 2D TTE videos, which is essential for initiating conventional quantitative AVS analysis. Several studies used CNNs to extract AVS-related features from 2D TTE videos through end-to-end learning without requiring Doppler information.5,7,19,20 Although these studies achieved decent performance in classifying AVS severity, they lack conventional evaluation of AVS, compromising trustworthiness, explainability, and interpretation. Our system is the first to integrate both approaches, making it suitable for both resource-limited and advanced settings and even as a hybrid solution. Since PLAX and PSAX views are typically acquired at the initial stage of TTE, our system can use these views to derive the DLi-AVS, indicating high probability of significant AVS and prompting the acquisition of additional views for conventional AVS evaluation. This approach can guide less experienced operators, reducing image acquisition and interpretation errors. For example, if AV CW Doppler is not properly acquired, it could lead to AVS underestimation or misinterpretation of low-flow, low-pressure gradient AVS. In that case, a high DLi-AVSc can suggest the likelihood of significant AVS, thereby guiding further necessary evaluations.

Another strength of our study is that, unlike previous research, it reflects the continuous nature of AVS progression. For instance, Wessler et al. trained CNNs to classify AVS severity into three categories (no, early, and significant AVS) using limited 2D images.7 Similarly, Ahmadi et al. proposed a transformer-based spatiotemporal architecture to classify AVS into four categories (normal, mild, moderate, and severe AVS) by capturing anatomical features and AV motion.19 Vaseli et al. focused on model explainability in AVS severity classification, incorporating uncertainty estimation and classifying AVS severity into three classes (no, early, and significant AVS).20 However, these classifiers discretize AVS severity, losing the continuum information of AVS. Recently, Holste et al. proposed a binary classifier based on the 3D-ResNet18 architecture to detect severe AVS, observing that model probabilities generated increase with AVS severity.5 However, this model focused only on a binary classification task (e.g., non-severe vs. severe), not capturing the full range of AVS severity levels in the training stage. In contrast, our framework employs continuous mapping with ordered labels, providing a more nuanced representation of AVS severity. Importantly, we use multi-task learning with auxiliary tasks to predict continuous AVS TTE parameters. This approach not only transitions from discrete labels to continuous values but also captures the underlying continuum of the disease more effectively. In UMAP visualizations, our model demonstrates a clear continuous gradient from normal to severe AVS, unlike other classification models. Additionally, the appropriate distribution of DLi-AVSc in discordant cases further supports the performance of our framework. It should be noted that our dataset was collected entirely from tertiary hospitals. Therefore, it is significant that our model can diagnose and predict AVS outcomes at a level comparable to parameters derived in advanced settings.

The implications of our AI-based system extend beyond precise AVS diagnosis. Our DLi-AVSc exhibits significant prognostic capability, comparable to traditional AVS parameters, even when utilizing only PLAX and PSAX views. Moreover, the DLi-AVSc increases notably from normal levels at AV sclerosis and mild AVS stages before significant AVS progression. To our knowledge, this is the first algorithm to achieve such performance. DLi-AVSc is poised to effectively monitor AVS progression from preclinical stages as a score-based tool. We anticipate the clinical utility of our system becoming prominent, especially as new pharmacological treatments are investigated for AVS prevention.21,22 If such treatments become available, our algorithm’s sensitivity in detecting early AVS stages will be highly advantageous.

### Limitations

The present study has some limitations. Although we developed and thoroughly validated our AI-based system using data from multiple centers, including internal and external validation, all the data were obtained from tertiary centers in South Korea. This means that skilled operators acquired TTE, and it remains to be seen if the DLi-AVSc will perform well on TTE videos acquired in truly resource-limited and novice settings. Further evaluation is needed to confirm its performance in various clinical environments and among different populations. We plan to conduct additional validation in primary clinics and a multi-national study to address these concerns. Additionally, while we designed the DLi-AVSc to reflect the AVS continuum, it needs to be verified whether the DLi-AVSc increases progressively with the natural progression of AVS. This issue will be addressed in future studies.

### Conclusions

We developed and validated a comprehensive AI-based system for evaluating AVS. This system operates through a dual pathway: it assesses the presence and severity of AVS using limited TTE videos and simultaneously automates conventional quantitative AVS evaluation. Internal and external validations demonstrated excellent diagnostic accuracy and strong prognostic capabilities. While additional validation in various clinical settings is needed, our system is expected to be suitable for both resource-limited and advanced settings.

## CLINICAL PERSPECTIVES

### Competency in Medical Knowledge

Echocardiography is a primary imaging tool for evaluating aortic valve stenosis (AVS), necessitating advanced expertise. This study demonstrates the feasibility and high accuracy of an AI-enhanced system in diagnosing and assessing the severity of AVS. The AI system provides a severity index derived from limited echocardiographic images and automatically measures conventional AVS parameters, showing a higher agreement with expert assessments and potential value in predicting outcomes.

### Translational Outlook

The current AI system can accurately identify AVS and assist in the precise clinical evaluation of AVS. The clinical benefit of this AI system in managing AVS patients, particularly regarding long-term improvements in clinical outcomes, needs to be validated in further prospective clinical trials.

## Supporting information

Supplemental [[supplements/310123_file02.pdf]](pending:yes)

Supplemental Video S1 [[supplements/310123_file03.mp4]](pending:yes)

## Data Availability

The AI-based frameworks utilized in this study were developed and validated using the Open AI Dataset Project (AI-Hub) dataset, an initiative supported by the South Korean government's Ministry of Science and ICT.

*   Received July 8, 2024.
*   Revision received July 8, 2024.
*   Accepted July 9, 2024.


*   © 2024, Posted by Cold Spring Harbor Laboratory

The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.

## REFERENCES

1.  1.Sixsmith A. Technology and the Challenge of Aging. In: Sixsmith A, Gutman G, editors. Technologies for Active Aging. Boston, MA: Springer US; 2013:7–25.
    
    
2.  2.Osnabrugge RL, Mylotte D, Head SJ, et al. Aortic stenosis in the elderly: disease prevalence and number of candidates for transcatheter aortic valve replacement: a meta-analysis and modeling study. J Am Coll Cardiol. 2013;62:1002–1012.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo0OiJhY2NqIjtzOjU6InJlc2lkIjtzOjEwOiI2Mi8xMS8xMDAyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDcvMDkvMjAyNC4wNy4wOC4yNDMxMDEyMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

3.  3.Iung B, Arangalage D. Community burden of aortic valve disease. Heart. 2021;107:1446–1447.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiaGVhcnRqbmwiO3M6NToicmVzaWQiO3M6MTE6IjEwNy8xOC8xNDQ2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDcvMDkvMjAyNC4wNy4wOC4yNDMxMDEyMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

4.  4.Holste G, Oikonomou EK, Mortazavi BJ, et al. Severe aortic stenosis detection by deep learning applied to echocardiography. Eur Heart J. 2023;44:4592–4604.
    
    
5.  5.Krishna H, Desai K, Slostad B, et al. Fully Automated Artificial Intelligence Assessment of Aortic Stenosis by Echocardiography. J Am Soc Echocardiogr. 2023;36:769–777.
    
    
6.  6.Wessler BS, Huang Z, Long GM, Jr., et al. Automated Detection of Aortic Stenosis Using Machine Learning. J Am Soc Echocardiogr. 2023;36:411–420.
    
    
7.  7.National Information Society Agency. Open AI Dataset Project (AI-Hub). [https://aihub.or.kr/](https://aihub.or.kr/).
    
    
8.  8.Jeon J, Ha S, Yoon Y, et al. Echocardiographic view classification with integrated out-of-distribution detection for enhanced automatic echocardiographic analysis. *arXiv preprint arXiv:2308.16483v1*. 2023.
    
    
9.  9.Jeon J, Kim J, Jang Y, et al. A Unified Approach for Comprehensive Analysis of Various Spectral and Tissue Doppler Echocardiography. *arXiv preprint arXiv:2311.08439*. 2023.
    
    
10. 10.Park J, Jeon J, Yoon YE, et al. Artificial intelligence-enhanced automation of left ventricular diastolic assessment: a pilot study for feasibility, diagnostic validation, and outcome prediction. Cardiovasc Diagn Ther. 2024;14:352–366.
    
    
11. 11. Writing Committee M, Otto CM, et al. 2020 ACC/AHA Guideline for the Management of Patients With Valvular Heart Disease: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. J Am Coll Cardiol. 2021;77:e25–e197.
    
    
12. 12.Baumgartner H, Hung J, Bermejo J, et al. Recommendations on the Echocardiographic Assessment of Aortic Valve Stenosis: A Focused Update from the European Association of Cardiovascular Imaging and the American Society of Echocardiography. J Am Soc Echocardiogr. 2017;30:372–392.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.echo.2017.02.009&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F09%2F2024.07.08.24310123.atom) 

13. 13.Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018:6450–6459.
    
    
14. 14.Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems. 2021;34:12077–12090.
    
    
15. 15.Everett D, Nguyen AT, Richards LE, Raff E. Improving Out-of-Distribution Detection via Epistemic Uncertainty Adversarial Training. *arXiv preprint arXiv:220903148*. 2022.
    
    
16. 16.Baumgartner HC, Hung JC-C, Bermejo J, et al. Recommendations on the echocardiographic assessment of aortic valve stenosis: a focused update from the European Association of Cardiovascular Imaging and the American Society of Echocardiography. Eur Heart J Cardiovasc Imaging. 2017;18:254–275.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F09%2F2024.07.08.24310123.atom) 

17. 17.McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. *arXiv preprint arXiv:180203426*. 2018.
    
    
18. 18.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. International journal of computer vision. 2020;128:336–359.
    
    
19. 19.Ahmadi N, Tsang MY, Gu AN, Tsang TSM, Abolmaesumi P. Transformer-Based Spatio-Temporal Analysis for Classification of Aortic Stenosis Severity From Echocardiography Cine Series. IEEE Trans Med Imaging. 2024;43:366–376.
    
    
20. 20.Vaseli H, Gu AN, Ahmadi Amiri SN, et al. ProtoASNet: Dynamic Prototypes for Inherently Interpretable and Uncertainty-Aware Aortic Stenosis Classification in Echocardiography. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023:368–378.
    
    
21. 21.Ito S, Oh JK. Aortic Stenosis: New Insights in Diagnosis, Treatment, and Prevention. Korean Circ J. 2022;52:721–736.
    
    
22. 22.Lindman BR, Sukul D, Dweck MR, et al. Evaluating Medical Therapy for Calcific Aortic Stenosis: JACC State-of-the-Art Review. J Am Coll Cardiol. 2021;78:2354–2376.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jacc.2021.09.1367&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F09%2F2024.07.08.24310123.atom)

 [1]: /embed/inline-graphic-1.gif
 [2]: /embed/inline-graphic-2.gif