PT - JOURNAL ARTICLE AU - Kim, Chang H. AU - Al-Kindi, Sadeer AU - Tarabichi, Yasir AU - Gohel, Suril AU - Vyas, Riddhi AU - Srinivasan, Shankar TI - Machine Learning to Predict 10-year Cardiovascular Mortality from the Electrocardiogram: Analysis of the Third National Health and Nutrition Examination Survey (NHANES III) AID - 10.1101/2021.09.09.21263327 DP - 2021 Jan 01 TA - medRxiv PG - 2021.09.09.21263327 4099 - http://medrxiv.org/content/early/2021/09/14/2021.09.09.21263327.short 4100 - http://medrxiv.org/content/early/2021/09/14/2021.09.09.21263327.full AB - Background The value of the electrocardiogram (ECG) for predicting long-term cardiovascular outcomes is not well defined. Machine learning methods are well suited for analysis of highly correlated data such as that from the ECG.Methods Using demographic, clinical, and 12-lead ECG data from the Third National Health and Nutrition Examination Survey (NHANES III), machine learning models were trained to predict 10-year cardiovascular mortality in ambulatory U.S. adults. Predictive performance of each model was assessed using area under receiver operating characteristic curve (AUROC), area under precision-recall curve (AUPRC), sensitivity, and specificity. These were compared to the 2013 American College of Cardiology/American Heart Association Pooled Cohort Equations (PCE).Results 7,067 study participants (mean age: 59.2 ± 13.4 years, female: 52.5%, white: 73.9%, black: 23.3%) were included. At 10 years of follow up, 338 (4.8%) had died from cardiac causes. Compared to the PCE (AUROC: 0.668, AUPRC: 0.125, sensitivity: 0.492, specificity: 0.859), machine learning models only required demographic and ECG data to achieve comparable performance: logistic regression (AUROC: 0.754, AUPRC: 0.141, sensitivity: 0.747, specificity: 0.759), neural network (AUROC: 0.764, AUPRC: 0.149, sensitivity: 0.722, specificity: 0.787), and ensemble model (AUROC: 0.695, AUPRC: 0.166, sensitivity: 0.468, specificity: 0.912). Additional clinical data did not improve the predictive performance of machine learning models. In variable importance analysis, important ECG features clustered in inferior and lateral leads.Conclusions Machine learning can be applied to demographic and ECG data to predict 10-year cardiovascular mortality in ambulatory adults, with potentially important implications for primary prevention.Competing Interest StatementThe authors have declared no competing interest.Funding StatementNo funding was received for this manuscript.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Since NHANES III is a publicly available, de-identified data set, a separate Institutional Review Board review was not required for this study.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesNHANES III is a publicly available, de-identified data set. https://www.n.cdc.gov/nchs/nhanes/nhanes3/default.aspx ASCVDAtherosclerotic cardiovascular diseaseAUPRCArea under precision-recall curveAUROCArea under receiver operating characteristics curveECGElectrocardiogramNHANESNational Health and Nutrition Examination SurveyPCEPooled Cohort Equations