Abstract
To evaluate how age and Apolipoprotein E-ε4 (APOE4) status interact with APOE- independent polygenic risk score (PRSnon-APOE), we estimated PRSnon-APOE in superagers (age ≥ 90 years, N=346), 89- controls (age 60-89, N=2,930) and Alzheimer’s Disease (AD) cases (N=1,760). Employing superagers, we see a nearly five times greater odds ratio (OR) for AD comparing the top PRSnon-APOE decile to the lowest decile (OR=4.82, P=2.5×10-6), which is twice the OR as using 89- controls (OR=2.38, P=4.6×10-9). Thus PRSnon-APOE is correlated with age, which in turn is associated with APOE. Further exploring these relationships, we find that PRSnon-APOE modifies age-at-onset among APOE4 carriers, but not among non-carriers. More specifically, PRSnon-APOE in the top decile predicts an age-at-onset five years earlier compared to the lowest decile (70.1 vs 75.0 years; t-test P=2.4×10-5) among APOE4 carriers. This disproportionally large PRSnon-APOE among younger APOE4-positive cases is reflected in a significant statistical interaction between APOE4 status and age-at-onset (β=-0.02, P=4.8×10-3) as a predictor of PRSnon-APOE. Thus, the known AD risk variants are particularly detrimental in young APOE4 carriers.
Disclosure Statement AMG has consulted for Eisai, Biogen, Pfizer, AbbVie, Cognition Therapeutics and GSK. She also served on the SAB at Denali Therapeutics from 2015-2018. YFH co-owns stock and stock options of Regeneron Pharmaceuticals. All other authors have no interests to declare.
1. Introduction
Alzheimer’s disease (AD) has an estimated heritability of 59-79% (Gatz et al., 2006) and a decades long pre-clinical stage, during which AD-specific pathological changes develop without clinical signs of cognitive decline (Bateman et al., 2012; Jack et al., 2010). Low-cost non-invasive testing would ideally identify at-risk individuals in the pre-clinical stage to enable preventive intervention. To this end, genetic risk factors have been proposed as useful predictors. APOE is a major risk gene for AD with dose-dependent risk effects. Compared to non-carriers of the APOE-ε4 allele (APOE4), heterozygous carriers of APOE4 have a three- to fourfold and homozygous carriers up to a 14-fold increased odds of developing AD (Farrer et al., 1997; Genin et al., 2011), whereas the APOE-ε2 allele (APOE2) is protective (odds ratio, OR=0.6) (Bertram et al., 2008; Bickeböller et al., 1997; Corder et al., 1994; Genin et al., 2011; Kunkle et al., 2019a). The effects of APOE variants have been found to be age-dependent (Rasmussen et al., 2018). In addition to the APOE gene, genome-wide association studies (GWAS) and meta-analysis of GWAS have identified 39 risk loci in subjects with European ancestry (Jansen et al., 2019; Kunkle et al., 2019b; Lambert et al., 2013b; Marioni et al., 2018). Desikan et al. constructed a polygenic hazard score (PHS) by combining AD-associated single nucleotide polymorphisms (SNPs) from GWAS from the International Genomics of Alzheimer’s Project (IGAP) and disease incidence estimates from the United States population (Desikan et al., 2017). This PHS was reported to be predictive for age-at-onset (AAO) and useful to identify individuals at greatest risk for developing AD at a given age. However, the interactions between APOE-independent genetic risks, age and APOE have not been evaluated. Similar to PHS, individual genetic risk can be aggregated into a polygenic risk score (PRS) by summing the number of risk alleles, each weighted by its effect size derived from independent GWAS summary statistics (Euesden et al., 2015; Khera et al., 2018; Purcell et al., 2009). Both PRS and PHS have been applied to AD and showed similar results (Leonenko et al., 2019) although a conflicting study (Fan et al., 2020) reported that PHS is superior for predicting age at onset and other age-related phenotypes, particularly when stratified by sex. PRS, rather than PHS, is broadly used and may be considered the current state of the art for genetic risk prediction. In addition, because PRS quantifies genetic risk in individuals independently of age, PRS can also be used to explore the interactions between genetics and other causes of disease, including age. PRS has been reported to be useful to identify individuals at high risk for common diseases such as coronary artery disease (Khera et al., 2018). For AD, the predictive power of PRS derived from the GWAS meta-analysis (Lambert et al., 2013a) reached an area under the curve (AUC) of 0.78 in clinically-defined AD when genetic risk factors were combined with demographic factors, and 0.84 in pathologically-confirmed AD (Escott-Price et al., 2017; Escott-Price et al., 2015).
The predictive performance of PRS was shown to depend on the prevalence of the disease in a specific population (Gibson et al., 2019), which is meaningful for AD, given that prevalence of AD increases exponentially with age (Mayeux and Stern, 2012). Furthermore, it is important to note that APOE is not only a major constituent for predicting AD (Desikan et al., 2017; Escott-Price et al., 2015), but is also strongly associated with age itself (Deelen et al., 2019; Sebastiani et al., 2019) as well as AAO of AD (Olarte et al., 2006). Therefore, it is important to investigate whether and how APOE-independent PRS (PRSnon-APOE) depends on age and APOE, both for the interpretation of AD PRS and to understand how genetic risk is distributed in case-control AD genetics studies.
To address this question, we stratified healthy controls into superagers aged 90 years or older without dementia (90+; range 90-109 years) and subjects aged 89 or younger (89-; range 60-89 years). To understand the role of age with regard to PRSnon-APOE, we compared these two control cohorts to AD cases with a broad range of AAO (range 60-99 years). We further investigated whether PRSnon-APOE is correlated with AAO in cases and age of controls and whether the APOE4 risk allele would modify such a correlation.
2. Methods
2.1 Participants
To ensure that there is no overlap with the samples included in the 2013 IGAP meta-analysis (Lambert et al., 2013b), we selected 13 independent cohorts consisting of a total of 1,733 cases and 3,121 controls from the Alzheimer’s Disease Genetics Consortium (ADGC) (Supplementary Methods). Additionally, we included an independent sample of 182 subjects with Ashkenazi ancestry from the Litwin Zucker Alzheimer’s Research Center at the Feinstein Institutes for Medical Research (LZ). The sample ascertainment and assessment methods were described previously (Adelson et al., 2019; Freudenberg-Hua et al., 2016; Koppel et al., 2018). A visual overview of data preparation and analysis workflow is in Supplementary Figure 1.
Age was defined using ADGC criteria. For ADGC cases, age is defined as AAO as reported by the study (n=1,574). If AAO is not reported, we estimated AAO by subtracting ten years from age at death (AAD, n=102) and for those without AAD, age at last examination (AAE) is used (n=57). AAO estimation has been used in previous studies using the same cohorts (Desikan et al., 2017; Jun et al., 2010; Naj et al., 2014). To ensure that estimated AAO does not introduce biases, we further performed sensitivity analysis by removing individuals with estimated AAO from the dataset. For the ADGC controls, AAE is used (n=2,995) and AAD is used for a small proportion of controls (n=126). For the LZ cohort, age is reported as age at enrollment for cases (n=27), and as AAE for controls (89-: n=35, 90+: n=120). The total sample utilized in this study includes 1,760 AD cases (60-99 years), 2,930 89- controls, and 346 90+ controls. Sample and genotype data quality control procedures and principal component analysis are described in Supplementary Methods. The demographics of the sample set following merging and QC are shown in Supplementary Table 1. The distribution of APOE genotypes among cases and controls is described in Supplementary Table 2.
2.2 Polygenic Risk Score Estimation
A PRS is a sum of genetic variants, weighted by variant effect sizes for a given trait. The effect sizes are estimated from independent GWAS summary statistics. PRSnon-APOE is calculated at various P-value thresholds (PT) based on GWAS summary statistics (Lambert et al., 2013c) excluding the APOE region (Supplementary Methods). We separately optimized PRSnon-APOE using two control sets – ADGC controls without age restriction and 90+ controls (Supplementary Figure 2) – which resulted in identical optimal PT for the most predictive PRS at PT =1×10-5. Details of PRS calculation are in Supplementary Methods.
2.3 Statistical Analysis
Statistical analysis was performed using the R software version 3.6.3 (R Core Team, 2019). Linear or logistic regression models included covariates for sex and the top ten principal components (PCs), as well as the variables of interest. All plots and statistical tests (except for those in Figure 1a) used Z-standardized PRS. To facilitate comparison with previous findings on polygenic risks and survival, we plotted a Kaplan-Meier survival curve with the primary event being AAO of AD and we calculated hazard ratio using the R packages “survival” and “survminer”.
(a), Violin plots comparing PRSnon-APOE in AD cases (case), controls age 60-89 years (89-), and superager controls age ≥ 90 years (90+). Pairwise comparison Bonferroni P-values: *** highly significant (P < 0.001).
(b), PRS effects on AD risk. Continuous OR (blue): odds ratio of having AD for a one SD increase in PRSnon-APOE. Extremes OR (orange): odds ratio of having AD with PRSnon-APOE in the top decile compared to having PRSnon-APOE in the lowest decile.
Abbreviations: 90+ ctrl, controls ≥ 90 years old were used in analysis; 89- ctrl, controls age 60-89; AD, Alzheimer’s disease; CI, confidence interval; OR, odds ratio; PRSnon-APOE, APOE-independent polygenic risk score.
To compare PRS distributions across cases, 89- controls, and 90+ controls, we performed a one-way ANOVA and pairwise Student’s t-tests. In order to ensure that observed differences were not due to ethnicity or sex, we tested the residuals of a linear model with sex and ten PCs as predictors and PRS as the response variable.
We evaluated the association between PRS and AD status in two datasets: 1) cases and 89- controls; and 2) cases and superager (90+) controls. To determine the difference of odds ratio (OR) for AD between the extremes of PRS, we selected the bottom and top deciles of the PRS distribution and performed logistic regressions between PRS decile and AD status (extremes model). To determine the effect of a one standard deviation (SD) increase in PRS, we performed logistic regressions with PRS – z-standardized across all samples – as the predictor and AD status as the response (continuous model). The effect size of this continuous model is useful because it does not depend on the extremeness of PRS quantization (e.g. top and bottom 10% vs. 5%).
It has been previously shown that polygenic risk predicts AAO of AD. In this study we further investigate how age and APOE4 status modify the effects of PRS. Therefore, we modeled the association between age, APOE genotype (i.e., 2/2, 2/3, 3/4, or 4/4, dummied with APOE3/3 as the contrast variable), and PRS using linear regression, with covariates for confounding variables as before. To determine if these effects are independent of AD status, we performed an additional linear regression controlling for AD status. We also tested for collinearity by calculating the variance inflation factor (VIF) with the “car” package.
To test the interaction between age and APOE genotype on PRS, we stratified by AD case-control status, categorized samples as APOE4 carriers and non-carriers, and then tested for interactions. To improve interpretability by reducing the number of interactions tested, we collapsed the APOE genotypes into binary categories – an APOE4 carrier was defined as having APOE3/4 or 4/4 (excluding 2/4), and a non-carrier as APOE2/2, 2/3 or 3/3. We first performed linear regressions separately in cases and controls, with APOE4 carrier status and age as predictors and PRS as the response. We then performed a second linear regression stratified by AD status, adding the interaction between APOE4 carrier status and age as a predictor.
Finally, to test for the effect of these interactions on case-control status, we performed logistic regressions for the relationship between AD case-control status, APOE4 carrier status, age, and PRS, with and without interactions and confirmed independence of predictors with VIF.
3. Results
3.1 Superagers have the lowest PRSnon-APOE
PRSnon-APOE is different between cases, 89- controls, and 90+ controls after residualizing for sex and ancestry PCs (ANOVA P=1×10-19) (Figure 1a). AD subjects have significantly higher PRSnon-APOE compared to both 89- controls (Bonferroni adj. post-hoc t-test, P=5.5×10-4) and 90+ controls (P=3.6×10-20). The 89- controls have significantly higher PRSnon-APOE than 90+ controls (P=5.1×10-14). This difference shows the dependence of PRSnon-APOE on age among healthy controls. To further evaluate this age dependence, we assessed the effect of PRSnon-APOE on AD risk by comparing cases with 89- controls and then with 90+ controls (Figure 1b). We first calculated “extremes” ORs between subjects in the bottom and top deciles of PRSnon-APOE. Comparing 90+ controls and AD cases, subjects having PRSnon-APOE in the top decile are nearly five times more likely to have AD than subjects with PRSnon-APOE in the lowest decile (OR=4.82, CI: 2.50-9.26, P=2.5×10-6). This effect size is double the odds ratio using 89- controls (OR=2.38, CI: 1.78-3.18, P=4.6×10-9). This difference remains present with the alternative approach of “continuous” ORs measuring the effects of a one SD increase in PRSnon-APOE, albeit less pronounced (90+ controls: OR=1.55, CI: 1.29-1.85, P=1.8×10-6; 89-controls: OR=1.28, CI: 1.18-1.39, P=2.1×10-9). The higher effect sizes of PRSnon-APOE observed when comparing with 90+ controls are consistent with the effect sizes seen for the APOE4 alleles and genotypes (Supplementary Table 3).
3.2 The negative association between age and PRSnon-APOE interacts with APOE genotype in cases
The particularly low PRSnon-APOE observed in superagers led us to elucidate the relationship between PRS and age. Biologically, PRS can affect age of inclusion or onset in AD studies by modifying both AD risk and AAO. Here, we investigated age as a predictor for PRS to explore how non-APOE genetic predictors change with age in case-control studies, and how APOE genotype affects those changes. Age is negatively correlated with PRSnon-APOE among all subjects (Supplementary Table 4). In a linear regression including sex, APOE and ten PCs, age is a significant predictor of PRSnon- APOE (P=2.7×10-5, β=-0.007). This effect remains significant when accounting for case-control status (P=1.6×10-4, β =-0.006) (Supplementary Table 4). No APOE genotype is directly associated with PRSnon-APOE.
We next stratified the subjects into APOE4 carriers and non-carriers to investigate whether APOE4 carrier status modifies the correlation between age and PRSnon-APOE. We find that this correlation depends on both case-control status and APOE4 status (Figure 2a). Interestingly, the negative correlation between age and PRSnon-APOE depends on APOE4 carrier status in cases, but not in controls (Figure 2b-c, Supplementary Table 5). More specifically, in AD cases, AAO is correlated with PRSnon-APOE (P=0.029, β=-0.007) and APOE4 carrier status is not associated with PRSnon-APOE (P=0.58). However, when the interaction between AAO and APOE (P=4.8×10-3, β=-0.02) is accounted for, APOE4 status becomes a significant predictor of PRSnon-APOE (P=6.1×10-3, β=1.29), whereas age is no longer a significant predictor (P=0.72). Thus, the inverse correlation between AAO and PRSnon-APOE among cases is primarily driven by the relationship between PRSnon-APOE on AAO among APOE4 carriers (Figure 2b). Consequently, among AD cases who are APOE4 carriers, those with PRSnon-APOE in the top decile have an AAO five years earlier than those in the bottom decile (70.1 vs. 75.0 years; 95% CI of difference in means: −7.2 – −2.7; P=2.4×10-5). This difference in AAO remains significant after removing APOE4/4 homozygous cases (71.1 vs. 76.7 years; P=2.1×10-5). A sensitivity analysis removing cases with estimated AAO further shows that these interaction results are robust (Supplementary Table 6).
(a) Scatter plots and regression models showing the dependence of PRSnon-APOE on age in AD cases (red) and controls (blue). In AD cases age is defined as age-at-onset. The shaded areas show the 95% confidence intervals. PRSnon-APOE in cases and controls over age is shown separately among APOE4 non-carriers and carriers. (b) Age x APOE interaction in AD cases. (c) Age x APOE interaction in controls, which is negatively associated with PRS for both APOE4 carriers (purple) and non-carriers (orange). Abbreviations: AD, Alzheimer’s disease; APOE4, Apolipoprotein E-ε4 allele; PRSnon-APOE, APOE-independent polygenic risk score.
In contrast, the negative correlation between age and PRSnon-APOE (P=0.013, β=-0.005) does not depend on APOE4 carrier status among controls (Figure 2c, Supplementary Table 5), and APOE is not associated with PRSnon-APOE (P=0.27). Accounting for interactions between age and APOE4 status, the negative association between age and PRSnon-APOE remains significant (P=0.021, β=-0.005) and APOE4 status non-significant (P=0.78). The interaction between APOE4 and age is not significant (P=0.88). It is worthwhile to note that the main effect sizes while accounting for interactions do not have any intuitive interpretation due to the extrapolation of the age variable (Aschard, 2016).
3.3 Interactions between age, PRSnon-APOE, and APOE on AD status
The interactions between AAO and APOE4 status led us to evaluate the effect of the interactions between APOE4 status, age and PRSnon-APOE on AD status using logistic regression (Supplementary Table 7). Without modeling for interactions, age (P=2.4×10-6, β=-0.02), APOE4 carrier status (P=8.6×10-86, β=1.31) and PRSnon-APOE (P=2.7×10-12, β=0.23) are all predictors of AD status. When all interactions between these three predictors are included in the model, the main effects of age (P=0.60) and PRSnon-APOE (P=0.41) are no longer significant, indicating that the effects of age and PRSnon-APOE on AD status are dependent on APOE. APOE4 carrier status remains a significant predictor (P=8.2×10-15, β=4.79). Similarly, the main effects of the interactions between age and APOE (P=1.3×10-8, β=-0.05), between PRSnon-APOE and APOE (P=0.035, β=1.31), and the three-way interaction between age, PRS and APOE (P=0.035, β=- 0.02) are all significant predictors for AD status.
The Receiver Operator Characteristics (ROC) are similar to previous reports when APOE SNPs are included (Supplementary Figure 3) and the Kaplan-Meier survival curve for PRSnon-APOE stratified by APOE4 carriers status is similar to that previously reported (Supplementary Figure 4) (Desikan et al., 2017).
4. Discussion
We show that APOE-independent polygenic risk has a disproportionally large effect in younger APOE4 carriers. Among APOE4 carriers, the AAO is on average five years earlier for cases with PRSnon-APOE in the top decile compared to those with PRSnon-APOE in the lowest decile. In contrast, no correlation between PRSnon-APOE and AAO was observed among APOE4 non-carriers. We further observe a negative correlation between age and PRSnon-APOE that is independent of APOE4 carrier status among controls, consistent with the notion that APOE-independent genetic risk increases AD incidence throughout late life. Our findings indicate that younger APOE4 carriers bear greater detrimental effect from currently known AD risk variants, as captured by PRSnon-APOE, whereas superager controls have fewer AD risk variants independent from APOE. Even though the age effect per year is small, it can be considerable in a population of subjects with ages ranging from 60 to over 100 years. This pattern may either be inherent to genetically conferred risk for AD or reflect biases in current PRS estimates as a result of ascertainment of previous GWAS subjects.
Compared with healthy superagers, subjects having PRSnon-APOE in the top decile have nearly five times the odds of developing AD than those in the bottom decile. This large effect size may be caused by depletion of AD risk alleles among superagers, indicating that many typical controls will develop AD later in life. This is consistent with the long pre-clinical stage of 20 to 25 years during which a subject does not display dementia symptoms (Bateman et al., 2012). Our observation also corroborates the reported higher effect sizes of GWAS risk alleles when superagers are used as controls (Tesi et al., 2019). The lifetime risk of AD for healthy females and males at 75 years old is estimated to be 26.2% and 19.9%, respectively (Brookmeyer et al., 2018). As the majority of controls enrolled in AD genetic studies are in their 70s and 80s, a considerable proportion of them are expected to convert to AD at a later age unless they die from other causes. These cryptic preclinical AD cases reduce the power of case-control studies. The larger magnitude of effect sizes for both PRSnon-APOE and APOE when 90+ controls were deployed in our study highlights the importance of inclusion of cognitively healthy superagers in future studies. Increasing the number of superager controls in case-control studies for AD is expected to increase the power to detect additional risk variants as well as protective variants.
The greater PRSnon-APOE among younger APOE4 positive cases, as reflected in the statistical interaction between AAO and APOE4 carrier status, suggests a greater role for the currently known APOE-independent genetic risk in younger APOE4 carriers. Although PRSnon-APOE alone is not useful to clinically predict individual risk for developing AD – perhaps unlike PHS/PRS including APOE (Desikan et al., 2017; Escott-Price et al., 2015) – the interactions we observed have implications for the design of future genetic association studies. In particular, enrollment of APOE4-negative cases irrespective of age may lead to the identification of novel variants, given the absent correlation of current PRSnon-APOE with AAO in this group. Novel non-genetic protective factors may best be found among healthy elderly APOE4 carriers with high PRSnon-APOE based on known GWAS risk loci, whereas novel risk factors may best be found among AD patients who have both low PRSnon-APOE and no APOE4 alleles.
Recently another study, investigating subjects that do not overlap with our study and using different parameters, reported a greater effect of APOE in younger participants (Bellou et al., 2020). Our stratified interaction analysis indicating a larger effect of PRSnon-APOE in younger APOE4 carriers leads us to a novel conclusion about APOE-independent genetic risk and provides a logical explanation for their observation: among all APOE4 cases, those with a younger onset carry the largest PRSnon-APOE.
Our study has several limitations, some of which are inherently shared by many large-scale AD genetic studies. In order to reach sufficient power, genetic studies of complex diseases require very large sample size which is often achieved by pooling subjects from many studies that were initially independently designed. This approach introduces heterogeneity in diagnostic criteria, recorded demographics, and comorbidities (Jun et al., 2010; Naj et al., 2011). Such heterogeneity typically introduces noise and reduces statistical power for detecting true signals. There are now collaborative efforts by the National Institute on Aging to harmonize genetic, epidemiologic and clinical data for AD. Notwithstanding the importance of clinical details, genetic studies based solely on self-report of parental AD in the UK Biobank have yielded results that are highly correlated with the previously known IGAP loci (Andrews et al., 2020; Marioni et al., 2018). This indicates that genetic studies can generate valid findings without considering or correcting for all clinical details, when large sample sizes are used. In our study, AAO for AD was generally provided by the participating cohorts of ADGC (Desikan et al., 2017; Jun et al., 2010). Our sensitivity analysis shows that removing subjects with an estimated AAO does not substantially affect our overall results. However, a larger study may be required to fully measure the interactions between PRSnon-APOE and the number of APOE4 and APOE2 alleles. Additionally, because of differences across ethnicities in PRS applicability (Martin et al., 2019) and APOE effects (Rajabli et al., 2018), studies in non-European populations are necessary to determine whether the same conclusions apply.
5. Conclusion
In summary, we show that current APOE-independent PRS has a particularly detrimental effect in younger APOE4 risk allele carriers, predicting significantly earlier age at onset for AD. APOE-independent PRS is negatively correlated with dementia-free aging. Future larger studies are needed to evaluate whether interactions between PRS, APOE and additional demographic, clinical, and environmental risk factors, as reflected in health data and biomarkers, can improve AD risk prediction.
Data Availability
All genetic data referred to in the manuscript will be made available in a suitable repository, and is available upon reasonable request made to the corresponding author following peer-reviewed publication of the manuscript.
Acknowledgements
Funding: This work was supported by the National Institutes of Health, National Institute on Aging (NIH-NIA) (ADGC and grant numbers U01 AG032984 and RC2 AG036528).
Samples from the National Cell Repository for Alzheimer’s Disease (NCRAD), which receives government support under a cooperative agreement awarded by the NIH-NIA (grant number U24 AG21886), were used in this study. The NACC database is funded by the NIH-NIA (grant number U01 AG016976). Data for this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (grant number U24-AG041689-01). In addition, this work was funded by the NIH (grant numbers U01AG052411, U01AG058635, K08AG054727, R01 AG618381, and R01 AG046949), and the Einstein Nathan Shock Center (grant number P30 AG038072). Additional support was provided by the Mildred and Frank Feinberg Family Foundation, the Advancing Women in Science and Medicine Foundation, and the JPB Foundation.
The authors would like to thank the contributors who collected samples used in this study during a period of 30 years, as well as patients and their families, whose help and participation made this work possible. A list of ADGC members and their affiliations are included in the Appendix within the Supplementary Content file.