Abstract
Background and Aims Peripheral artery disease (PAD) is a heritable atherosclerotic condition that is underdiagnosed and undertreated. With growing knowledge of the genetic basis for PAD and related risk factors, this study sought to construct a new polygenic score for PAD (GPSPAD).
Methods GPSPAD was constructed by integrating multi-ancestry summary statistics for PAD and related traits. GPSPAD was trained in a UK Biobank dataset of 96,239 individuals and validated in a holdout UK Biobank dataset (N=304,294) and All of Us (AoU; N=237,173) and Mass General Brigham Biobank (MGBB, N=37,017).
Results GPSPAD was associated with an OR-per SD increase of 1.64 in the UK Biobank dataset (95% CI 1.60-1.68). Compared to previously published PAD polygenic scores, GPSPAD was more strongly associated with PAD in AoU and MGBB, including enhanced transferability to non-European subgroups. GPSPAD improved discrimination of incident PAD (1¢C-statistic 0.030) that was nearly equivalent to the additive performances of diabetes (1¢C-statistic 0.029) and smoking (1¢C-statistic 0.034). GPSPAD was associated with reduced ankle-brachial index in the MGBB with the top 8% of individuals having a mean ABI <0.90 when assessed. Among individuals with prevalent PAD, GPSPAD consistently identified individuals at high MALE-risk in the UK Biobank (HR 1.48; 95% CI 1.24-1.77), MGBB, (HR 1.34; 95% CI 1.12-1.60), and AoU (HR 1.33; 95% CI 1.12-1.58).
Conclusions An integrated, multi-ancestry polygenic score for PAD predicts disease and adverse limb outcomes in three diverse cohorts. Incorporating polygenic risk into PAD care has the potential to guide screening and tailor management to prevent MALE.
Introduction
Peripheral artery disease (PAD) is an atherosclerotic vascular condition that affects a global population of 230 million adults with high resource utilization owing to both systemic and limb ischemic events.1,2 While PAD shares risk factors with coronary artery disease (CAD), 32-54% of individuals presenting with PAD do not have clinically significant coronary or cerebrovascular disease.3,4 In addition, there are differences in the major etiologies of acute events in CAD and PAD. Ischemic CAD events most commonly result from plaque rupture, whereas the cause of acute limb ischemia is most commonly owing to embolism or in situ thrombosis, regardless of atherosclerosis extent.5 Indeed, the increasingly recognized role of thrombosis in PAD is also supported by the discovery of genetic variants in coagulation factors, including mutations in Factor V Leiden, which have been uniquely associated with PAD and not CAD.6
Clinical subsets of PAD include asymptomatic, claudication, and chronic-limb threatening ischemia (CLTI) resulting in tissue loss and major adverse limb events (MALE). MALE are a devastating complication and are often associated with critical illness, numerous resource-intensive attempts at revascularization, and prolonged hospital stays as part of the limb salvage effort. Despite its high morbidity, PAD is grossly underdiagnosed and has a lack of consensus on screening indications. For example, while European and American guidelines recommend consideration of screening based on age and risk factors, the United States Preventive Services Task Force (USPSTF) does not recommend screening regardless of risk factors.7-9 Conflicting recommendations have implications on care implementation, for example, in the U.S. where preventive service coverage is guided by USPSTF recommendations.10
In addition, there is no standard tool to predict complications before advanced disease develops. Such a tool would be useful as there are contemporary antithrombotic treatments targeted to prevent MALE but remain underutilized in PAD management.11
Polygenic risk scores (PRS) provide a quantitative metric for the inherited component of disease risk by integrating common genetic variants discovered from genome-wide association studies (GWAS) into a single instrument. As PAD has an estimated heritability of 20-50%12,13, incorporation of genetic factors offers an opportunity to optimize risk stratification. While there have been advances in PRS for CAD, there has been limited progress thus far in developments for PAD owing to largely European-based data, lack of validation in external datasets, and unclear transferability across diverse ancestry groups.14-16 Additionally, PRS utility for incident prediction of adverse PAD events has not been described.
To address these needs, in the present study, we develop a new genome-wide polygenic score for PAD (GPSPAD) that incorporates multi-ancestry GWAS data for PAD and related traits from over 2 million individuals (Figure 1). We assess the performance of GPSPAD in predicting PAD among individuals of diverse ancestry in a 304,294 internal validation cohort and two independent study populations comprising 259,627 individuals. Lastly, we apply GPSPAD to predict MALE and identify individuals with clinically important increased risk.
Methods
Study populations
GPSPAD was developed using data from the UK Biobank, a longitudinal cohort study of approximately 500,000 individuals from the United Kingdom aged 40-69 years.17 Participants were followed for outcomes based on International Classification of Disease 9th and 10th revisions (ICD-9/ICD-10) and Office of Population Censuses and Surveys versions 3 and 4 (OPCS-3/OPCS-4). Ancestry was based on self-identified ethnicity and country of origin with African, East Asian, European, Latino, Middle Eastern/North African (MENA), and South Asian categories (Supplemental Tables 1-2). Coding of comorbidities and lifestyle factors is detailed in the Supplemental Materials (Supplemental Table 3).
GPSPAD performance was evaluated in two external datasets: Mass General Brigham Biobank (MGBB) and All of Us (AoU). The MGBB is a New England-healthcare based cohort that includes >145,000 patients treated at seven regional hospitals and clinics, including ∼56,000 have been genotyped.18 Baseline phenotypes are linked to the electronic health record (EHR) with available data from clinical notes, ICD-9/ICD-10, and Current Procedural Terminology (CPT) codes.
The AoU Research program is a United States-based cohort study that aims to recruit individuals who have been historically underrepresented in biomedical research.19 The program has recruited ∼400,000 individuals and includes health questionnaires, biospecimen collection, and longitudinal EHR data.
Genetic data and quality control
In the UK Biobank, individuals were genotyped using UK BiLEVE Axiom Array or UK Biobank Axiom Array and centrally imputed to the 1000 Genomes (1000G) Panel, Haplotype Reference Consortium, or UK10K Panel.17,20 After performing quality control (Details Methods) and excluding individuals of Latino ancestry due to low population representation (N=11 PAD cases) and unreported/mixed ethnicity, 304,294 individuals were included for internal validation.
MGBB samples were genotyped on Illumina Multi-Ethnic Genotyping Array or Global Screening Array.21 Imputation was performed to the multi-ancestry TOPMED r2 reference panel. Given lack of detailed self-reported ethnicity or ancestry data, ancestry was genetically predicted using a K-nearest neighbor model trained with principal components (PCs) from the 1000G reference panels for European, African, Latino/Ad Mixed American, East Asian, and South Asian (Detailed Methods).18 We excluded individuals who were in prior discovery GWAS for CAD22 and did not map to a single genetic ancestry, leaving 37,017 individuals for external validation.
AoU participants were genotyped using the Illumina Global Diversity Array at AoU genome centers.23 Similar to the MGBB, there was a lack of detailed reported ancestry data, thus ancestry was assigned based on genetic similarity. AoU assigns categorical ancestries to African, Latino/Ad Mixed American, East Asian, South Asian, European, MENA, and Other based on a random forest classifier trained using gnomAD, Human Genome Diversity Project, and 1000G reference labels (Detailed Methods).23 After exclusion of individuals with missing/other ancestry, 237,173 individuals were included.
Clinical endpoints
In the UK Biobank, PAD was defined based on self-reported history, ICD codes, OPCS codes for lower extremity revascularization or major amputation, and cilostazol prescription (Supplemental Table 4). In AoU, PAD was defined on self-report, occurrences of >2 ICD codes, or a single procedure code for revascularization, major amputation or supervised exercise therapy for PAD (Supplemental Table 5).24
In the MGBB, PAD was derived from phenotypes developed by the Mass General Brigham Research Patient Data Registry (RPDR) based on structured and unstructured clinical data from the EHR (Detailed Methods).25 ABI were extracted from imaging reports retrieved from the RPDR. We selected each individual’s minimum ABI and excluded values that were >1.4 or non-compressible.
MALE was defined as a surrogate of major amputation and acute limb ischemia based on diagnosis and procedure codes (Supplemental Tables 6-8). Revascularization included thrombectomy, thrombolysis, and emergency lower extremity bypass. In the MGBB, CPT codes for catheter-directed thrombolysis/thrombectomy required concomitant coding for aortogram or lower extremity angiography to ensure arterial intervention.
GPSPAD construction
PRS predictive performance is improved by the incorporation of data from increasingly diverse genetic ancestries and consideration of genetically correlated traits.26-28 To leverage the common mechanistic pathways of PAD, other atherosclerotic conditions, and PAD risk factors, GWAS results of PAD and 14 candidate traits were considered in GPSPAD construction: PAD, CAD, ischemic stroke, glomerular filtration rate, diabetes mellitus, smoking, systolic blood pressure (SBP), diastolic blood pressure (DBP), low-density lipoprotein cholesterol (LDL-C), total cholesterol, high-density lipoprotein cholesterol (HDL-C), triglycerides, BMI, carotid plaque burden, and carotid intima-media thickness (Supplemental Table 9). After collection of GWAS summary statistics, GPSPAD was trained using target data from 96,239 European individuals in the UK Biobank. GWAS results from the UK Biobank were not used to construct GPSPAD such that data for score development and training were non-overlapping.
GPSPAD was developed in a two-layer process (Figure 1).26 Layer 1 involved using ancestry-stratified GWAS data for each trait to calculate multi-ancestry polygenic scores that were optimized according to their PAD predictive performance. Separate scores were constructed using LDpred2, a widely used method that adjusts marginal single nucleotide polymorphism effect sizes for linkage disequilibrium (LD) patterns and selects a subset of variants with non-zero effects to calculate the polygenic score.29 Using LDpred2-auto, scores were calculated using GWAS results stratified by African, East Asian, European, Latino, and South Asian ancestry (R bigsnpr v.11.4). This resulted in 100 scores across all ancestries and traits. For each trait, the ancestry-specific scores were combined into the best-performing multi-ancestry score per trait using Least Absolute Shrinkage and Selection Operator (LASSO) regression in models predicting PAD (R glmnet v4.0-2). Feature selection was performed iteratively for all 15 traits in layer 1, yielding 15 multi-ancestry trait-specific scores with mixing weights detailed in Supplemental Table 9.
Layer 2 involved combining trait-specific scores from layer 1 to derive the integrated, multi-trait GPSPAD. Trait scores from Layer 1 were input into a LASSO regression model predicting PAD to construct the final GPSPAD. The final weight in GPSPAD was derived by calculating the proportional weight from layer 1 in layer 2, normalized to 100%. Of the 15 candidate trait scores, 11 traits contributed to GPSPAD with mixing weights in Supplemental Table 10. GPSPAD was then calculated in the validation UK Biobank dataset. See Detailed Methods for details on GPSPAD construction.
GPSPAD validation and benchmarking
GPSPAD was calculated in the MGBB and AoU for external validation. GPSPAD performance was compared with previously published PRS in the Polygenic Score Catalog30 and a recently reported multi-ancestry polygenic score for CAD (GPSCAD).26 We previously developed GPSCAD utilizing a similar two-layer framework for CAD as in the present study with slight modifications including stepwise regression for feature selection.26 From the Polygenic Score Catalog, PRSPLR is a single-trait score for PAD developed using LASSO regression and individual-level European data from the UK Biobank.15 PRSLDpred2 is another single-trait score for PAD trained using UK Biobank data, but was calculated with LDpred2-auto.15
We also calculated the lifetime risk of PAD in individuals in the UK Biobank based on clinical variables using the Johns Hopkins University PAD risk tool (http://ckdpcrisk.org/padrisk, Detailed Methods).31
Statistical analysis
The unadjusted rate of PAD was calculated across percentiles of GPSPAD. Model calibration was assessed with the Hosmer–Lemeshow test (R ResourceSelection v0.3-6) and by comparing the observed and predicted prevalence across percentiles calculated using a logistic regression model with only GPSPAD as a predictor. We estimated PAD risk in the extremes of the GPSPAD distribution using logistic regression models. We also derived the proportion of the population with a given magnitude of risk by calculating the odds ratio (OR) of varying extremes of GPSPAD percentiles compared to the middle quintile group (40-59%).
The association of GPSPAD and other scores with all PAD (incident and prevalent cases) were assessed using logistic regression. Performance metrics included OR, area under the receiver operator characteristic curve (AUC, R pROC v1.17.0.1), and phenotypic variance explained (Nagelkerke-R2). Incident event analyses were performed using Cox proportional hazards models with metrics including hazard ratio (HR) and C-statistic (R survival v3.5-7). MALE analyses were restricted to individuals with prevalent PAD diagnoses. Time-to-MALE curves were estimated using the Kaplan-Meier method, standardized to mean age and gender (R survminer v0.4.9). Linear regression was used to evaluate the relationship between polygenic scores and minimum ABI.
Logistic, linear, and Cox models including polygenic scores were adjusted for age, sex, genotyping array, and the first ten PCs. Cox models were used to estimate the 10-year incidence of PAD across lifestyle and genetic risk groups, standardized to the mean of covariates in each group. In the UK Biobank, we performed sensitivity analyses to test the strength of GPSPAD associations with PAD and incident MALE after additionally accounting for clinical variables (current smoking, hypertension, diabetes, hyperlipidemia, and chronic kidney disease); and continuous variables for individuals with available physical and laboratory data (SBP, DBP, hemoglobin A1c, HDL-C, estimated untreated LDL-C32, GFR, and anti-hypertensive use).
GPSPAD and other continuous variables were scaled to a mean of zero and one standard deviation (SD) such that effect sizes indicate OR- or HR-per SD. Statistical significance was defined as P<0.05 or 95% confidence interval (CI) that excluded the null value. Statistical analyses were performed using R-4.1.0.
Results
Given that complex diseases such as PAD share genetic backgrounds with related traits, we leveraged this genetic correlation to more fully capture human genetic architecture and construct GPSPAD.6,33,34 GPSPAD included 603,595 variants across 11 traits and 5 ancestries, including PAD, smoking, CAD, ischemic stroke, diabetes, SBP, LDL-C, glomerular filtration rate, BMI, carotid IMT, and total cholesterol. While GPSPAD was predominantly derived from European discovery data, the score incorporated genetic variation discovered in African, East Asian, South Asian, and Latino populations (Supplemental Table 10). Of the non-European populations, African GWAS discovery data contributed the most to GPSPAD.
Association of GPSPAD with PAD risk
Within the UK Biobank, GPSPAD was associated with a OR 1.77 (95% CI: 1.70-1.86) for PAD in the European training sample (Supplemental Table 12). The effect size was mildly attenuated in the multiethnic validation cohort including 304,294 individuals, but remained strongly associated with PAD (OR 1.63; 95% CI 1.60-1.68). The holdout validation cohort included 164,108 females (53.9%) and 286,356 participants of European (94%), 7,680 South Asian (2.5%), 6,939 African (2.3%), 1,761 East Asian (0.6%), and 1,558 (0.51%) MENA ancestry. There was a decrement, yet persistently significant association with adjustment for clinical risk factors (OR 1.35; 95% CI 1.31-1.39) and in a model adjusted for continuous covariates (OR 1.37; 95% CI: 1.33-1.41).
We found significant differences in PAD rates across the GPSPAD percentile distribution, ranging from 0.78% in the bottom percentile to 7.91% in the top percentile (Figure 2A, Supplemental Table 13). Predicted PAD prevalence was overall consistent with observed prevalence, excluding the >98% percentile where there was slight risk underestimation by GPSPAD alone driven by a sharp increase in the observed PAD prevalence (Supplemental Figure 1). GPSPAD stratified individuals in low and high-risk groups with up to 4.71-fold increased PAD risk in the top 1% of GPSPAD (OR 4.71, 95% CI 4.07-5.42) compared to the middle quintile (Figure 2B).
Compared to previously published scores, GPSPAD resulted in an improvement in effect estimate, discrimination, and proportion of phenotypic variance explained compared to PRSPLR . In a model adjusted for age, sex, genotyping array, and the first ten PCs of ancestry, GPSPAD had an OR of 1.63, AUC of 0.74, and Nagelkerke-R2 of 0.087, while PRSPLR had an OR of 1.20, AUC 0.71, and Nagelkerke-R2 of 0.064. PRSLDpred2 achieved greater performance in the UK Biobank than GPSPAD with OR 2.46 and nearly equivalent discriminative ability with AUC 0.75, however Nagelkerke-R2 and effect sizes in the extremes of PRSLDpred2 percentiles strongly suggested overfitting. For example, PRSLDpred2 Nagelkerke-R2 was 0.17 in the UK Biobank validation cohort and OR for the top 1% PRSLDpred2 was 71.9 (95% CI 64.9-79.7).
GPSPAD identified individuals with distinct risk compared to an integrated polygenic score for CAD (GPSCAD).26 In the UK Biobank, GPSPAD was more strongly associated with PAD (OR 1.63; 95% CI 1.60-1.68) than GPSCAD (OR 1.48; 95% CI 1.44-1.52) and identified a greater number of high-risk individuals (Supplemental Table 13). For example, the proportion of the UK Biobank population at 3-fold and 4-fold greater odds for PAD was 5.7% (N=17,345) and 1.90% (N=5,782) when classified by GPSPAD, respectively. There was a comparatively small fraction of individuals at similar PAD risk according to GPSCAD with only 1.90% (N=5,782) at 3-fold greater risk and 0.20% (N=609) at 4-fold greater risk.
We then compared polygenic score performance in ancestral subgroups in the UK Biobank (Figure 2C). Across each ancestral population, GPSPAD predicted disease risk in African, European, East Asian, MENA, and South Asian subgroups with effect estimates of 1.26, 1.66, 1.60, 1.64, and 1.25 respectively. While PRSLDpred2 and PRSPLR predicted PAD risk in European individuals, both were not associated with PAD in non-European subgroups (Supplemental Table 14). In clinical subgroups, GPSPAD performance was fairly stable, with OR ranges of 1.33-1.72 depending on demographics and the presence of risk factors (Supplemental Figure 2).
Validation of GPSPAD in distinct external cohorts
GPSPAD was externally validated in two independent multi-ancestry cohorts and showed improved cross-population predictive performance relative to previously published scores. GPSPAD was significantly associated with an increased risk for PAD (OR 1.26; 95% CI 1.18-1.37, Figure 3A) in the MGBB, which included 5.3% African (N=1,945), 87% European (N=32,139), and 7.9% Latino individuals (N=2,933). GPSPAD performed similarly in AoU (OR 1.21; 95% CI 1.19-1.23), which embodied 22% African (N=51,691), 2.2% East Asian (N=5,268), 50% European (N=119,167), 17% Latino (N=39,576), 0.21% MENA (N=509), and 0.97% South Asian individuals (N=2,289). Of note, while PRSLDpred2 outperformed GPSPAD in the UK Biobank where it was derived, PRSLDpred2 was not significantly associated with PAD risk in the MGBB (OR 1.02; 95% CI 0.95-1.08; P=0.06) and was more weakly associated with PAD than GPSPAD in AoU (OR 1.10; 95% CI 1.08-1.12). GPSPAD consistently had stronger effect sizes than PRSPLR in each external cohort (Figure 3A).
When stratifying by ancestry, GPSPAD demonstrated improved transferability in most subgroups (Supplemental Table 14). Among African (N=51,691), European (N=119,167), Latino (N=39,576), and South Asian (N=2,289) individuals in AoU, GPSPAD was the most strongly predictive of PAD compared to previously published scores (Figure 3B). GPSPAD was also transferable to Europeans in the MGBB with a larger effect size (OR 1.26; 95% CI 1.17-1.34) than PRSPLR (OR 1.09; 95% CI 1.02-1.16) and PRSLDpred2, which was not associated with disease risk (OR 1.01; 95% CI 0.94-1.08). No polygenic scores were associated with PAD in AoU East Asians, AoU MENA, and non-European subgroups in the MGBB (Supplemental Figure 3).
Modeling incident PAD according to polygenic risk and clinical risk factors
We then analyzed 301,932 participants of the UK Biobank who were free of PAD at study enrollment. Over a median 12.1 years follow-up (interquartile range (IQR) 10.6-13.5 years), incident PAD was observed in 4,202 individuals (1.39%). Within the UK Biobank, the baseline model of age, sex, and ten PCs of ancestry had a C-statistic of 0.731 (Figure 4A). The addition of GPSPAD resulted in improvement in discrimination to 0.761, which was greater than the improvement of most risk factors also in isolation, and was approximately equivalent to the additive benefit of diabetes (C-statistic 0.760) and smoking (C-statistic 0.765). The greatest improvement in discrimination was observed with the baseline model, clinical variables, and GPSPAD (C-statistic 0.087).
Compared to the individual performance of other scores, GPSPAD resulted in greater benefit in discrimination of incident PAD than PRSPLR (C-statistic 0.735) and the lifetime PAD risk score31 (C-statistic 0.675). PRSLDpred2 achieved a larger improvement in C-statistic (C-statistic 0.774) from the baseline model compared to GPSPAD, but the HR was drastically large in the UK Biobank where it was wholly derived, and consistent with overfitting (HR 199; 95% CI 172-231), as observed in cross-sectional analyses.
In the baseline model, GPSPAD HR for incident PAD was 1.66 (95% CI 1.61-1.71) and HR 1.37 (95% CI 1.30-1.46) in a multivariable model adjusted for clinical risk factors. When examining the interaction between GPSPAD and clinical variables, we observed a weak interaction with levels of LDL-C (GPSPAD*LDL-C HR 1.05; 95% CI 1.01-1.08; Pinteraction=0.005) and hemoglobin A1c (GPSPAD*hemoglobin A1c HR 1.02; 95% CI 1.00-1.03; Pinteraction=0.01). Lifestyle factors influenced PAD risk across each stratum of genetic risk (Supplemental Table 15). For the high genetic risk group, the standardized 10-year incidence of PAD of 3.17% for smokers and decreased to 1.05% for those who refrained from smoking (Figure 4A). There were similar patterns of an offset of genetic risk by modifiable lifestyle factors according to dietary patterns (Figure 4B).
Integration of polygenic risk with clinical guidelines
We then assessed whether GPSPAD could identify individuals at equivalent risk for incident disease as the high-risk groups that current guidelines recommend screening for PAD.8 We first quantified the 10-year incidence of PAD among subgroups of individuals proposed to be candidates for ABI screening according to the AHA/ACC Guidelines.8 Individuals above 65y had a 10-year incidence of 2.28% (95% CI 2.16-2.41), individuals who were age 50-64y with one risk factor for atherosclerosis had an incidence of 1.23% (95% CI 1.17-1.30%), individuals <50y with diabetes and 1 additional risk factor had an incidence of 2.90% (95% CI 1.62-3.21%), and those with known atherosclerosis in another vascular bed had the greatest incidence of 5.95% (95% CI 5.48-6.42). As a comparison, incidence rates ranged from 2.81-4.40% in the top 1-5% GPSPAD compared to the middle quintile distribution (Supplemental Table 16). We also looked at age-based combinations of each individual risk factor and compared event rates to subgroups with the combination of risk factors and high polygenic risk.
This analysis revealed that the combination of high GPSPAD with one atherosclerosis risk factor identified more individuals who subsequently developed PAD than many of the subgroups defined by the combination of similar risk factors with age (Figure 5A). For example, 7.96% of individuals in the top 5% GPSPAD and diabetes developed PAD, as compared to a 4.02% incident rate among individuals aged 50-64y with diabetes.
Similarly, there was a relative increase in PAD cases among individuals in the top 5% GPSPAD with dyslipidemia (5.63%; 95% CI 4.88-6.36) compared to individuals 50-64y with dyslipidemia (1.97%; 95% CI 1.80-2.15). The addition of the top 5% GPSPAD also increased the proportion of additional cases of PAD when combined with other atherosclerotic conditions such as CAD and carotid stenosis compared to consideration of the conditions alone (Supplemental Table 16).
Predicting ankle-brachial indices based on polygenic risk
In addition to predicting binary PAD presence, we evaluated whether GPSPAD was associated with minimum ABI for a subset of individuals in the MGBB (N=883, Supplemental Figure 4). The minimum ABI was first used to validate the PAD phenotype in the MGBB. PAD cases indeed had a lower mean ABI compared to controls (0.78 vs. 1.10; P<2x10-16, Supplemental Figure 5). After adjusting for age, sex, and the first ten PCs, GPSPAD was significantly associated with lower ABI (Figure 5B), while previously published scores were not (all P>0.1, Supplemental Table 17). Each SD increase in GPSPAD was associated with an -0.088 reduction in ABI (95% CI -0.161 to -0.015; P=0.019). Based on the percentile distribution in the MGBB, the group in the top 8% GPSPAD had a mean ABI<0.90 diagnostic for PAD (95% CI 0.83-0.97, Figure 5C).35
Association of GPSPAD with incident major adverse limb events
Lastly, we assessed whether polygenic background could identify individuals at increased risk for MALE after PAD diagnosis. In external validation cohorts, GPSPAD was significantly associated with incident MALE and provided improved risk stratification than previously published scores. Individuals with high polygenic risk (top 20% of GPSPAD) in the UK Biobank had a 75% higher relative risk of incident MALE (HR 1.75; 95% CI 1.18-2.57; P=0.005) over a median of 12.9 years (IQR 11.7-14.1) compared to the remainder of the PAD population. The top 20% GPSPAD also had an increased risk of MALE in the MGBB (HR 1.56; 95% CI 1.06-2.30; P=0.02) during 5.19 years of follow-up (IQR 2.63-7.75) and AoU cohort (HR 1.51; 95% CI 1.03-2.24; P=0.03) over 2.32 years of follow-up (IQR 0.35-4.29, Figure 6A).
In the overall PAD cohort in the UK Biobank, GPSPAD was significantly associated with MALE (HR 1.48; 95% CI 1.24-1.77; Figure 6B), which was diminished but remained associated after multivariable adjustment (HR 1.30; 95% CI 1.07-1.59; P=0.009).
Associations of GPSPAD with incident MALE remained significant in the prevalent PAD cohorts of the MGBB (HR 1.34; 95% CI 1.12-1.60) and AoU (HR 1.33; 95% CI 1.11-1.58; Figure 6B), while other scores demonstrated weaker or non-significant risk estimates.
Discussion
Using ancestry-stratified GWAS results from multiple populations, we developed a polygenic score that aggregates the effects of variants in PAD and related traits across five ancestries. GPSPAD shows strong disease prediction and is transferable to external cohorts with ancestrally diverse populations, demonstrating associations with lower ABI and greater effect sizes for PAD association in non-European populations.
This new score also identifies more high-risk individuals than a polygenic score for CAD, highlighting the differences in genetic risk for these atherosclerotic conditions.28 Moreover, among individuals with diagnosed PAD, high GPSPAD predicts future risk of MALE, defined as a surrogate of major amputation and acute limb ischemia.
Prior polygenic scores for PAD were solely derived from European data or employed GWAS data from a single population and were not validated in external cohorts.14-16 Our computational framework considered variants discovered from >2 million ancestrally diverse individuals and leveraged the pleiotropy between PAD and related risk factors to enhance genetic risk prediction.26 Such approaches that incorporate genome-wide genetic correlation increase the power of polygenic association analyses and improve prediction accuracy.36,37 GPSPAD was indeed the most strongly predictive of PAD risk in external validation cohorts. Unlike previously published polygenic scores, GPSPAD was able to predict PAD risk in Latino individuals in All of Us and both African and South Asian individuals in two distinct cohorts (All of Us and the UK Biobank).
In prospective analysis in the UK Biobank, GPSPAD was associated with incident PAD with similar effect sizes as cross-sectional analysis. The inclusion of GPSPAD resulted in a comparable benefit in discriminative capacity as those afforded by smoking and diabetes.3 We also note a significant interaction between GPSPAD and LDL-C or hemoglobin A1c in models predicting incident PAD. This suggests that individuals with higher GPSPAD may have a greater benefit from dietary changes, statin therapy, or glycemic control. Indeed, post-hoc analyses of statin therapy trials demonstrate that statins confer a greater cardiovascular risk reduction in those with high genetic risk compared to the general population.38
Current European guidelines more broadly recommend consideration of screening based on age and cardiovascular risk factors, while American cardiovascular society guidelines recommend screening for PAD with ABI in four groups: age>65 years, age 50-64 years with atherosclerotic risk factors, age <50 years with diabetes and 1 risk factor, and history of atherosclerosis in other vascular beds.8,9 Addition of the GPSPAD has the potential to further streamline these screening recommendations and identify higher risk populations. When evaluating the combination of polygenic risk and clinical risk assessment, incorporation of the top 5% polygenic risk with conventional risk factors identified more individuals who went on to develop PAD compared to the combination of risk factors with age. These results suggest the top 5% genetic risk and presence of risk factors could be considered a criteria for screening with ABIs to establish PAD diagnosis. Indeed, when leveraging ABIs, our analysis revealed an ABI diagnostic of PAD associated with high GPSPAD in the MGBB, although replication in other cohorts is needed.
Secondary prevention efforts in PAD focus on limb salvage. MALE are associated with poor prognosis for PAD patients and represent index events for subsequent hospitalizations, reinterventions, major adverse cardiac events, and even mortality.39 Once acute limb ischemia occurs, numerous returns to the operating room for revascularization with combined exposure to anticoagulation and thrombolytic therapy and subsequent reperfusion is a major physiologic stress in an already comorbid patient population. Staging systems such as the Wound, Ischemia, and foot Infection are used to determine amputation risk in the CLTI population, but there are no current vetted clinical tools for predicting MALE early in the disease course.3 Our analysis found that GPSPAD stratified MALE risk specifically in the PAD population and remained consistently predictive of incident MALE in the UK Biobank, Mass General Brigham Biobank, and All of Us.
These findings have several implications. First, an individual’s genetic risk in combination with their cardiovascular risk factor profile may be employed as a criteria for consideration of screening with non-invasive vascular testing. PAD is an underdiagnosed and undertreated condition with guideline-directed therapies that lag that of CAD.8 Incorporation of polygenic risk may be used as an adjunct to clinical risk assessment to identify a greater proportion of high-risk individuals for medical optimization. In addition, as with studies of polygenic risk and CAD38,40-42, there may be early benefits for targeted intensive lifestyle and risk factor optimization in PAD with high genetic risk. However, there is need for further testing the effect of polygenic scores on hard cardiovascular outcomes in prospective studies, particularly among younger age groups. Given that high GPSPAD was strongly associated with MALE in three distinct PAD cohorts, there is promise for implementation of polygenic scores into secondary prevention measures for vascular practitioners, such as in frequency of surveillance or guiding more aggressive or prolonged antithrombotic therapies shown to reduce MALE incidence.11,39
Our study has several limitations. First, GPSPAD was trained using data from largely European individuals due to limitations in sample size of other ancestries. Nevertheless, we demonstrated evidence of portability to non-European groups, not previously achieved by prior PAD polygenic scores. Second, this study was restricted to individuals with genetic similarities to single ancestries and had more limited cases of non-European groups, which could have limited power to detect associations especially among the East Asian and MENA groups. Race alone was importantly not used to infer ancestry, however such broad ancestral categorizations can mask heterogeneity in polygenic score performance in some groups. Tailoring of polygenic scores to non-European target data, continued recruitment of underrepresented populations in biobanks, and development of methods to account for admixture should be a priority to improve polygenic score performance. Lastly, PAD and MALE phenotyping were based on diagnosis and procedures codes from the EHR. There could have been variation in the quality of definitions between cohorts that influenced analyses towards null associations. However, we used non-invasive studies in the MGBB as another surrogate for PAD and observed similar associations of polygenic risk with binary PAD classifications in both the MGBB and AoU, where the latter is comprised of several U.S. healthcare institutions.
Ethics Approval
Individuals in the UK Biobank, All of Us, and Mass General Brigham Biobank underwent signed consent for genetic sequencing, storage of biological specimens, and access to electronic health record data. This research was conducted using the UK Biobank resource under application number 7089. Secondary analyses of the UK Biobank, MGBB, and AoU was approved by the Massachusetts General Hospital Institutional Review Board.
Data Availability
Individual-level data from the UK Biobank and All of Us are available upon request from researchers to each organization. This study used Controlled Tier data from All of Us which is available to authorized users on the All of Us Researcher Workbench. GPSPAD constructed in this paper will be made available in the Polygenic Score Catalog following publication. Polygenic scores used for comparison of GPSPAD performance are available in the Polygenic Score Catalog through accession ID PGS001843 and PGS002055.
Funding
This work was supported by the Harvard Medical School LaDue Fellowship in Cardiovascular Medicine (to A.M.F.) and National Institutes of Health (grants R01HL127564 and U01HG011719 to P.N. and K08HL168238 to A.P.P.).
Disclosure of Interest
P.N. reports research grants from Allelica, Amgen, Apple, Boston Scientific, Genentech / Roche, and Novartis, personal fees from Allelica, Apple, AstraZeneca, Blackstone Life Sciences, Creative Education Concepts, CRISPR Therapeutics, Eli Lilly & Co, Foresite Labs, Genentech / Roche, GV, HeartFlow, Magnet Biomedicine, Merck, and Novartis, scientific advisory board membership of Esperion Therapeutics, Preciseli, TenSixteen Bio, and Tourmaline Bio, scientific co-founder of TenSixteen Bio, equity in MyOme, Preciseli, and TenSixteen Bio, and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work. All other authors have no relevant disclosures.
Acknowledgments
We thank the participants and investigators in the UK Biobank, All of Us, and Mass General Brigham Biobank for their contributions to this study.
Footnotes
↵# These authors jointly supervised the work
Abbreviations
- AoU
- All of Us
- ABI
- Ankle-brachial index
- BMI
- Body mass index
- CAD
- Coronary artery disease
- CPT
- Current Procedural Terminology
- DBP
- Diastolic blood pressure
- EHR
- Electronic health record
- GWAS
- Genome-wide association study
- GFR
- Glomerular filtration rate
- HDL-C
- High-density lipoprotein cholesterol
- ICD
- International Classification of Disease
- LASSO
- Least Absolute Shrinkage and Selection Operator
- LDL-C
- Low-density lipoprotein cholesterol
- MALE
- Major adverse limb events
- MGBB
- Mass General Brigham Biobank
- OPCS
- Office of Population Censuses and Surveys
- PAD
- Peripheral artery disease
- PRS
- Polygenic risk scores
- PCs
- Principal components
- RPDR
- Research Patient Data Registry
- SBP
- Systolic blood pressure
- USPSTF
- United States Preventive Services Task Force