Development of an ensemble machine learning prognostic model to predict 60-day risk of major adverse cardiac events in adults with chest pain

Chris J. Kennedy; Dustin G. Mark; Jie Huang; Mark J. van der Laan; Alan E. Hubbard; Mary E. Reed

doi:10.1101/2021.03.08.21252615

Abstract

Background Chest pain is the second leading reason for emergency department (ED) visits and is commonly identified as a leading driver of low-value health care. Accurate identification of patients at low risk of major adverse cardiac events (MACE) is important to improve resource allocation and reduce over-treatment.

Objectives We assessed machine learning (ML) methods and electronic health record (EHR) covariate collection for MACE prediction. We aimed to maximize the pool of low-risk patients that were accurately predicted to have less than 0.5% MACE risk and could be eligible for reduced testing (“rule-out” strategy).

Population Studied 116,764 adult patients presenting with chest pain in the ED between 2013 and 2015 and evaluated for potential acute coronary syndrome (ACS). 60-day MACE rate was 2%.

Setting Data analysis was performed May 2018 to August 2021.

Methods We evaluated ML algorithms (lasso, splines, random forest, extreme gradient boosting, Bayesian additive regression trees) and SuperLearner stacked ensembling. We tuned ML hyperparameters through nested ensembling, and imputed missing values with generalized low-rank models (GLRM). Performance was benchmarked against individual biomarkers, validated clinical risk scores, decision trees, and logistic regression. We assessed clinical utility through net benefit analysis and explained the models through variable importance ranking and accumulated local effect visualization.

Results The SuperLearner ensemble provided the best cross-validated discrimination with areas under the curve of 0.15 for precision-recall (PR-AUC) and 0.87 for receiver operating characteristic (ROC-AUC), and the best accuracy with an index of prediction accuracy of 0.07. The ensemble’s risk estimates were miscalibrated by 0.2 percentage points on average, and dominated the net benefit analysis at all examined thresholds. At a 0.5% threshold the ensemble model yielded 31 benefit-adjusted workups avoided per 100 patients, compared to 25 for logistic regression and 2-14 for clinical risk scores. The most important predictors were age, troponin, clinical risk scores, and electrocardiogram. GLRM achieved a 90% average reduction in reconstruction error compared to median-mode imputation.

Conclusion Combining ML algorithms with a broad set of EHR covariates improved MACE risk prediction and would reduce over-treatment compared to simpler alternatives, while providing calibrated predictions and interpretability. Patients should receive targeted benefit in their care from thorough detection of nuanced health patterns via ML.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by a Kaiser Permanente Division of Research Delivery Science Research Grant.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Approved by the Kaiser Permanente Division of Research IRB.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

Addition of net benefit analysis, reorganization of figures/tables, expansion of supplemental info, improved writing.

Data Availability

The dataset contains protected health information and is not shareable.

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.