RT Journal Article SR Electronic T1 Machine Learning for Integrating Social Determinants in Cardiovascular Disease Prediction Models: A Systematic Review JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.09.11.20192989 DO 10.1101/2020.09.11.20192989 A1 Zhao, Yuan A1 Wood, Erica P. A1 Mirin, Nicholas A1 Vedanthan, Rajesh A1 Cook, Stephanie H. A1 Chunara, Rumi YR 2020 UL http://medrxiv.org/content/early/2020/09/13/2020.09.11.20192989.abstract AB Background Cardiovascular disease (CVD) is the number one cause of death worldwide, and CVD burden is increasing in low-resource settings and for lower socioeconomic groups worldwide. Machine learning (ML) algorithms are rapidly being developed and incorporated into clinical practice for CVD prediction and treatment decisions. Significant opportunities for reducing death and disability from cardiovascular disease worldwide lie with addressing the social determinants of cardiovascular outcomes. We sought to review how social determinants of health (SDoH) and variables along their causal pathway are being included in ML algorithms in order to develop best practices for development of future machine learning algorithms that include social determinants.Methods We conducted a systematic review using five databases (PubMed, Embase, Web of Science, IEEE Xplore and ACM Digital Library). We identified English language articles published from inception to April 10, 2020, which reported on the use of machine learning for cardiovascular disease prediction, that incorporated SDoH and related variables. We included studies that used data from any source or study type. Studies were excluded if they did not include the use of any machine learning algorithm, were developed for non-humans, the outcomes were bio-markers, mediators, surgery or medication of CVD, rehabilitation or mental health outcomes after CVD or cost-effective analysis of CVD, the manuscript was non-English, or was a review or meta-analysis. We also excluded articles presented at conferences as abstracts and the full texts were not obtainable. The study was registered with PROSPERO (CRD42020175466).Findings Of 2870 articles identified, 96 were eligible for inclusion. Most studies that compared ML and regression showed increased performance of ML, and most studies that compared performance with or without SDoH/related variables showed increased performance with them. The most frequently included SDoH variables were race/ethnicity, income, education and marital status. Studies were largely from North America, Europe and China, limiting the diversity of included populations and variance in social determinants.Interpretation Findings show that machine learning models, as well as SDoH and related variables, improve CVD prediction model performance. The limited variety of sources and data in studies emphasize that there is opportunity to include more SDoH variables, especially environmental ones, that are known CVD risk factors in machine learning CVD prediction models. Given their flexibility, ML may provide opportunity to incorporate and model the complex nature of social determinants. Such data should be recorded in electronic databases to enable their use.Funding We acknowledge funding from Blue Cross Blue Shield of Louisiana. The funder had no role in the decision to publish.Competing Interest StatementThe authors have declared no competing interest.Funding StatementWe acknowledge funding from Blue Cross Blue Shield of Louisiana. The funder had no role in the decision to publish.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:N/A (Systematic review)All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll papers included in the review are summarized in the Appendix.