Abstract
We construct non-linear machine learning (ML) prediction models for systolic and diastolic blood pressure (SBP, DBP) using demographic and clinical variables and polygenic risk scores (PRSs). We developed a two-model ensemble, consisting of a baseline model, where prediction is based on demographic and clinical variables only, and a genetic model, where we also include PRSs. We evaluate the use of a linear versus a non-linear model at both the baseline and the genetic model levels and assess the improvement in performance when incorporating multiple PRSs. We report the ensemble model’s performance as percentage variance explained (PVE) on a held-out test dataset. A non-linear baseline model improved the PVEs from 28.1% to 30.1% (SBP) and 14.3% to 17.4% (DBP) compared with a linear baseline model. Including seven PRSs in the genetic model computed based on the largest available GWAS of SBP/DBP improved the genetic model PVE from 4.8% to 5.1% (SBP) and 4.7% to 5% (DBP) compared to using a single PRS. Adding additional 14 PRSs computed based on two independent GWASs further increased the genetic model PVE to 6.3% (SBP) and 5.7% (DBP). PVE differed across self-reported race/ethnicity groups, with primarily all non-White groups benefitting from the inclusion of additional PRSs.
Competing Interest Statement
B Psaty serves on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. G Lyons is currently an employee of Alexion Pharmaceuticals, however, her contributions to the present manuscript were performed as part of her previous affiliation at the Harvard T.H. Chan School of Public Health and this work is not related to her current occupation and affiliation. M Moll has received grant funding from Bayer and consulting fees from TriNetX, 2ndMD, TheaHealth, Sitka, Verona Pharma, and Axon Advisors.
Funding Statement
This study was supported by National Heart Lung and Blood Institute (NHLBI) grant R01HL161012 to TS. MM was supported NHLBI grant K08HL159318.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This work was approved by the Mass General Brigham Institutional Review Board and by the Beth Israel Deaconess Medical Center Committee on Clinical Investigations. All study protocols were approved by the institutional review board at the University of Maryland Baltimore. Informed consent was obtained from each study participant. The ARIC study has been approved by a single Institutional Review Board (sIRB) at Johns Hopkins School of Medicine and Institutional Review Boards (IRB) at all participating institutions: University of North Carolina at Chapel Hill IRB, Johns Hopkins University School of Public Health IRB, University of Minnesota IRB, Wake Forest University Health Sciences IRB, and University of Mississippi Medical Center IRB. Study participants provided written informed consent at all study visits. The BioMe cohort was approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai. All BioMe participants provided written, informed consent for genomic data sharing. All CARDIA participants provided informed consent, and the study was approved by the Institutional Review Boards of the University of Alabama at Birmingham and the University of Texas Health Science Center at Houston. Cleveland Family Study was approved by the Institutional Review Board (IRB) of Case Western Reserve University and Mass General Brigham (formerly Partners HealthCare). Written informed consent was obtained from all participants. All CHS participants provided informed consent, and the study was approved by the Institutional Review Board [or ethics review committee] of University Washington. All COPDGene participants provided written informed consent, and the study was approved by the Institutional Review Boards of the participating clinical centers. The Framingham Heart Study was approved by the Institutional Review Board of the Boston University Medical Center. All study participants provided written informed consent. Written informed consent was obtained from all subjects and approval was granted by participating institutional review boards (University of Michigan, University of Mississippi Medical Center, and Mayo Clinic). All subjects provided informed consent and the GenSalt study was approved by the Institutional Review Board (IRB) of all participating institutes in the US and China. This study was approved by the institutional review boards (IRBs) at each field center, where all participants gave written informed consent, and by the Non-Biomedical IRB at the University of North Carolina at Chapel Hill, to the HCHS/SOL Data Coordinating Center. All IRBs approving the study are: Non-Biomedical IRB at the University of North Carolina at Chapel Hill. Chapel Hill, NC; Einstein IRB at the Albert Einstein College of Medicine of Yeshiva University. Bronx, NY; IRB at Office for the Protection of Research Subjects (OPRS), University of Illinois at Chicago. Chicago, IL; Human Subject Research Office, University of Miami. Miami, FL; Institutional Review Board of San Diego State University. San Diego, CA. The Institutional Review Boards at Jackson State University, Tougaloo College, and the University of Mississippi Medical Center approved the study, and all participants provided written informed consent. All MESA participants provided written informed consent, and the study was approved by the Institutional Review Boards at The Lundquist Institute (formerly Los Angeles BioMedical Research Institute) at Harbor-UCLA Medical Center, University of Washington, Wake Forest School of Medicine, Northwestern University, University of Minnesota, Columbia University, and Johns Hopkins University. All THRV participants provided informed consent, and the study was approved by the Institutional Review Board at The Lundquist Institute (formerly Los Angeles BioMedical Research Institute, or LA BioMed) at Harbor-UCLA Medical Center, and at Washington University in St. Louis. All WHI participants provided informed consent and the study was approved by the Institutional Review Board (IRB) of the Fred Hutchinson Cancer Research Center.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data availability
TOPMed freeze 8 WGS data and harmonized BP phenotypes are available by application to dbGaP according to the study specific accessions: Amish: “phs000956”, ARIC: “phs001211“, BioMe: “phs001644”, CARDIA: “phs001612”, CFS: “phs000954”, CHS: “phs001368”, COPDGene: “phs000951”, FHS: “phs000974”, GENOA: “phs001345”, GenSalt: “phs001217”, HCHS/SOL: “phs001395”, JHS: “phs000964”, MESA: “phs001211”, THRV: “phs001387”, WHI: “phs001237”. Summary statistics from MVP BP GWAS are available from dbGaP by application to study accession “phs001672”. The summary statistics from the UKBB + ICBP BP GWAS are available at https://grasp.nhlbi.nih.gov/FullResults.aspx. MGB Biobank genotyping and phenotypic data are available to Mass General Brigham investigators with required approval from the Mass General Brigham Institutional Review board (IRB). Data needed to construct the selected BP PRSs generated in this study will become publicly available on a Zenodo repository upon paper acceptance and will include variants, alleles, and weights for each of the PRS based on GWAS of SBP and DBP. The BED files that define LD-regions used for construction of local PRSs are available under the Bitbucket repository in https://bitbucket.org/nygcresearch/ldetect-xsdata/src/master/.