Abstract
Background Embolic stroke of undetermined source (ESUS) may be associated with carotid artery plaques with <50% stenosis. Plaque vulnerability is multifactorial, possibly related to intraplaque hemorrhage (IPH), lipid-rich-necrotic-core (LRNC), perivascular adipose tissue (PVAT), and calcification morphology. Machine-learning (ML) approaches in plaque classification are increasingly popular but often limited in clinical interpretability by black-box nature. We apply an explainable ML approach, using noncalcified plaque components and calcification features with SHapley Additive exPlanations (SHAP) framework to classify calcified carotid plaques as culprit/non-culprit.
Methods In this retrospective cross-sectional study, patients with unilateral anterior circulation ESUS who underwent neck CT angiography and had calcific carotid plaque were analyzed. Calcification-level features were derived from manual segmentations. Plaque-level features were assessed by a neuroradiologist blinded to stroke-side and by semi-automated software. Calcifications/plaques were classified as culprit if ipsilateral to stroke-side. Eight baseline ML models were compared. Three CatBoost models were trained: Plaque-level, Calcification-level, and Combined. SHAP was incorporated to explain model decisions.
Results 70 patients yielded 116 calcific carotid plaques (60 ipsilateral to stroke; 270 calcifications (146 ipsilateral)). 17 plaque-level and 15 calcification-level features were extracted. Baseline CatBoost model outperformed other models. Combined model achieved test AUC 0.77 (95% CI: 0.59-0.92), accuracy 0.82 (95% CI: 0.71 - 0.91), mean cross-validation AUC 0.78. Plaque-level and calcification-level models performed lower (AUC 0.41 95% CI: 0.15-0.68, 0.60 95% CI 0.44-0.76). Combined model utilized five features: plaque thickness, IPH/LRNC volume ratio, PVAT volume, calcification minimum density, and total calcification volume over mean density ratio. Plaque thickness was most important feature based on SHAP values, with potential threshold at >2.6 mm.
Conclusions ML model trained with noncalcified plaque and calcification features can classify culprit calcific carotid plaque with greater accuracy than models trained using only plaque-level or calcification-level features. Model using clinically interpretable features with SHAP framework provides explanations for its decisions and allows identification of potential thresholds for high-risk features.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
YS is supported by NIH/NINIB T-32 grant (EB004311, PIs Mankoff, Gade, Schnall). JWS is funded by the American Heart Association (938082) and consulting fees from JLK Group. SEK receives/received grant support from Bayer, DiaMedica, DaiichiSankyo; consulting fees from AstraZeneca (DSMB), Medtronic (DSMB); and royalties from UpToDate. BC received consulting fees from Anthos Therapeutics and Bayer and royalties from UpToDate.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This study was deemed exempt by the Institutional Review Board at the University of Pennsylvania (853355). Individual patient consent was waived because of the retrospective nature of the study.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Sources of Funding/Disclosures: YS is supported by NIH/NINIB T-32 grant (EB004311, PIs Mankoff, Gade, Schnall). JWS is funded by the American Heart Association (938082) and consulting fees from JLK Group. SEK receives/received grant support from Bayer, DiaMedica, DaiichiSankyo; consulting fees from AstraZeneca (DSMB), Medtronic (DSMB); and royalties from UpToDate. BC received consulting fees from Anthos Therapeutics and Bayer and royalties from UpToDate.
Data Availability
The data and model can be made available on reasonable written request to the corresponding author. The ML model-building methodology described in this article is based on open-source resources, and the hyperparameters of our final models are shared in the Results.
Nonstandard Abbreviations and Acronyms
- AUC
- Area Under the Curve
- CatBoost
- Categorical Boosting
- CTA
- Computed Tomography Angiography
- ESUS
- Embolic Stroke of Undetermined Source
- IPH
- Intraplaque Hemorrhage
- Light GBM
- Light Gradient Boosting Machine
- LRNC
- Lipid-rich Necrotic Core
- MATX
- Plaque Matrix
- ML
- Machine Learning
- PVAT
- Perivascular Adipose Tissue
- ROC
- Receiver Operating Characteristics curve
- SHAP
- SHapley Additive exPlanations
- SVM
- Support Vector Machine
- XGBoost
- eXtreme Gradient Boosting