Abstract
The extent to which genetic variation at the APOE locus explains the burden of late-onset Alzheimer’s disease (AD) is poorly understood. We provide new estimates of the proportions of AD and all-cause dementia attributable to carriage of ε3 and/or ε4 alleles of APOE, using data from 171,133 participants of the UK Biobank cohort study. AD and all-cause dementia were ascertained at baseline and during follow-up via linked electronic health records for up to 16.8 years. We estimate that 74.5% (95% CI: 38.6, 89.4) of AD and 39.5% (95% CI: 10.7, 59.1%) of all-cause dementia burden is attributable to ε3 and ε4 carriage. Thus, differences in the molecular physiology of apolipoprotein E cause most AD and a large fraction of all dementia. Research into this pathway should be prioritised to facilitate dementia prevention.
Main text
The extent to which genetic variation at the APOE locus explains the burden of late-onset Alzheimer’s disease (AD) is poorly understood. Three major apolipoprotein E (apoE) isoforms exist: ε2, ε3 and ε4. Relative to carriage of ε3 – the most common allele with about 95% prevalence worldwide1 – AD risk is higher with ε4 carriage (∼28% prevalence) and lower with ε2 carriage (∼14% prevalence). The proportion of AD cases attributable to the detrimental ε4 allele has been estimated in many settings, with population attributable fractions (PAFs) for this burden ranging considerably, up to approximately 50%.2–4 However, these estimates do not encapsulate the proportion of AD cases attributable to ε3, which is commonly misperceived as neutral for AD risk, even though ε3 substantially increases risk of AD relative to ε2 carriage.4,5 Previous PAF estimates have also likely been deflated due to biases in study design – ascertaining AD/dementia through clinical follow-ups alone (prone to attrition), the elimination of prevalent cases, and/or limited follow-up time in incidence studies.2 PAFs have also been biased downwards when calculated erroneously -- for instance, when using allele frequency (an allele’s proportion among all alleles in a sample) instead of genotype frequency (i.e. the proportion of individuals in a sample with a given genotype) as the prevalence of the exposure in estimates.4 Establishing the full extent of the AD burden that arises due to common differences in apoE is important, because it would indicate the proportion of cases that could be prevented by intervening on this single molecular pathway.
Using data from 171,128 participants of the UK Biobank cohort study (UKB) with long-term follow-up via electronic health records, we provide new estimates of the proportions of AD and all-cause dementia attributable to ε3 and ε4 carriage – i.e. the fraction of cases due to the combined impact of alleles inherited by the vast majority of individuals. We conducted a cohort study among participants aged 60 to 72 years at baseline in 2006-2010,6 where AD and all-cause dementia cases were identified from self-report at baseline and follow-up via linked electronic health and death records available up to December 2022 (minimum/maximum follow-up: 12.2 / 16.8 years). We coded APOE genotypes as one exposure representing ε3 and/or ε4 carriage, relative to ε2 homozygotes. Individuals with an ε3/ε3 genotype are typically used as the reference group in analyses of APOE because ε3/ε3 is the most common genotype. However, to appropriately calculate attributable risk for an exposure with multiple levels, individuals with the lowest risk should be set as the reference group7 – individuals of ε2/ε2 genotype in this instance. This allows the disease burden attributable to exposure to all risk-increasing genotypes to be calculated. Thus, we modelled the full spectrum of risk of AD and all-cause dementia encompassed by combinations of ε3 and ε4 carriage, relative to the lowest risk group (ε2 homozygotes).
Among the analytical sample, 99.4% had ε3 and/or ε4 carriage (Supplemental Table 1). The odds ratio for AD due to ε3 and/or ε4 carriage was 3.93 (95% CI: 1.63, 9.48; Table 1). The equivalent for all-cause dementia was 1.66 (1.12, 2.45). Together, the two APOE risk alleles had PAFs of 74.5% (38.6, 89.4%) and 39.5% (10.7, 59.1%) for AD and all-cause dementia, respectively. In secondary analyses using ascertainment of AD and dementia without diagnoses from primary care records (only available for 45% of the cohort; online methods), the PAFs were slightly lower for AD and slightly higher for all-cause dementia (supplemental table 2).
To put these findings into context, we quantified the proportions of AD attributable to the next nine strongest genetic loci after APOE, as identified by large genome-wide association studies of late-onset AD (Figure 1).4 To compare these magnitudes to effects for another chronic disease, we also calculated PAFs for the proportions of coronary artery disease (CAD) attributable to the disease’s ten strongest genetic risk loci. No PAF for any other locus for either condition exceeded 22%. The preponderance of disease burden attributable to genetic variation at a single locus is exceptional among common, complex chronic diseases.
We then extended calculations to estimate the separate contributions of ε3 and ε4 to the overall burden of AD attributable to these two alleles, on the basis of genotype-specific PAFs (table 1). For ε4, this meant estimating disease burden attributable to genotypes ε2/ε4 and ε4/ε4 and a share of the risk among ε3/ε4 carriers due to ε4 specifically – equating to a PAF of 45.5% for AD for ε4 carriage. The remainder of the PAF for AD (29.5%) was attributable to ε3 carriage specifically, due to ε2/ε3 and ε3/ε3 carriage and ε3’s contribution to risk among ε3/ε4 individuals.
In summary, our findings indicate that if interventions could obviate entirely the detrimental effects of ε3 and ε4 carriage in a population akin to the UKB sample, we could expect to prevent approximately three-quarters of AD cases. Put differently, if all individuals inherited an ε2/ε2 genotype, most AD would not occur. Such a magnitude of attributable risk has been suggested previously (ε3 and ε4 carriage perhaps accounting for 95% of AD8) but not demonstrated directly before. Reasons for this likely include the necessity for very large analytical samples to use rare ε2 homozygotes as the reference group in analyses, and because there is typically little recognition among dementia researchers that ε3 genotypes should also be considered as risk-increasing for AD. Nonetheless, we estimate that the ε3 allele alone could be responsible for almost a third or more of AD, as a result of ε3 conferring considerable risk to a large fraction of the population. It should be noted that PAFs are distinct from heritability, with the heritability of AD estimated to range up to 79%.9 However, heritability analyses are not informative for assessing disease burden attributable to specific causes and PAFs are more appropriate and intuitive for this purpose.10
Limitations of this research include incomplete ascertainment of AD and all-cause dementia cases due to limited record linkage in UKB (which does not yet fully extend to primary care or mental health service records)11 and no follow-up with cognitive assessments of the whole surviving cohort. Measurement of outcome lifetime risks was incomplete – the youngest participants in our sample were aged 73 years at the end of current follow-up. Recruitment into UKB was not representative of the general UK population. It may have been affected by selection effects,12 perhaps including effects from APOE genotypes contributing to cardiovascular and other morbidity and mortality before age 60 years.13 Some AD cases may have been misclassified and had dementia of other aetiologies; APOE associations with AD are stronger in samples where cases have been neuropathologically confirmed.14 However, these biases will likely lead to underestimated, rather than inflated, PAFs. Attributable fractions also assume that the exposure of interest is a cause of the disease being investigated, and not based on biased estimates of risk.7 Due to the properties of genetic inheritance, risk estimates for genetic variants such as the APOE alleles are not subject to reverse causation and are unlikely to be affected by confounding.15 The effects of variation in APOE on AD and all-cause dementia risk are also highly unlikely to be due to nearby co-inherited genetic variation (in linkage disequilibrium with APOE variation), rather than the APOE variants per se.16 Hence, PAFs for these variants provide more robust estimates of disease burden attributable to the variation in question than PAFs estimated for environmental factors, for instance.17
Therefore, given that most AD appears to be caused by differences in the molecular physiology of apoE, our findings and others18 should motivate proportionate attention and funding for research into the mechanisms linking apoE with AD. These should include efforts to understand the distinct functional properties of the ε3 isoform that confer AD risk – relative to properties of the ε2 isoform and other protective variants 18,19 – and not only further research to elucidate and mitigate ε4’s effects. We note that it is often incorrectly assumed that genetic risk is unmodifiable.20 With the advent of gene editing and silencing, genetic risk may now be directly modifiable; thus, editing APOE alleles or affecting the gene’s expression in relevant cells at the most pertinent life stage(s) could potentially prevent most cases of AD. Moreover, genetic findings point to molecular physiology to target by other means. For instance, mutations in the gene PCSK9 that cause familial hypercholesterolaemia led to the development of proprotein convertase subtilisin/kexin type 9 (PCSK9) inhibitors.21 Many strategies to target apoE exist, including immunotherapy and small molecule structural correctors.22,23 However, only one therapy targeting apoE (LX1001) is currently being trialled in humans – representing less than 1% of potential therapies for AD in registered trials.24 To reiterate, findings such as ours should prompt a rebalancing of therapeutic development for AD (as well as basic research) towards apoE. Prioritising direct research into apoE should not preclude research into broader genetic or environmental factors that could be mediating or modifying the effects of apoE on AD or investigations into factors that may be distinct causes of these outcomes independently of apoE (both scenarios include research addressing cerebral amyloidosis and tauopathy). Nonetheless, establishing precisely how, when and in which cell types apoE influences AD risk – and how its deleterious effects can be mitigated – is paramount to AD prevention and treatment.
Methods
Study design
UKB is a multi-centre cohort study that recruited approximately 502,000 participants aged 39–73 years at assessment sites in England, Scotland, and Wales between 2006 and 2010.6 Here, we studied data from a sub-set that were aged ≥60 years at baseline with genotypic data, after exclusions for failing sample-level genetic quality control (genetic/phenotypic sex mismatches, excess heterozygosity, aneuploidy) and the random removal of one individual from related pairs. APOE ε2/ε3/ε4 alleles were coded from genotyped or hard-called imputed microarray data for single nucleotide polymorphisms rs7412 and rs429358.6 All-cause dementia was identified using the cohort’s algorithmically defined outcomes from a combination of self-report at baseline and follow-up via linked electronic health and death records available up to December 2022 (minimum/maximum follow-up: 12.2 / 16.8 years).
Alzheimer’s disease, more specifically, was ascertained only via record linkage and not by baseline self-reporting, due to wording of the survey item at the baseline assessment enquiring about history of ‘dementia or Alzheimer’s disease or cognitive impairment’ non-specifically. In primary analyses, ascertainment of both outcomes was expanded to include identification of diagnostic codes from primary care records for the ∼45% of the cohort for which linkage to records from general practices has been arranged (code lists in supplemental table 3; linkage is being sought for the remainder of the cohort). In secondary (sensitivity) analyses, we limited AD and dementia ascertainment to the use of self-report and secondary care / death record data available for the entire cohort.
Statistical analysis
Risk of AD and all-cause dementia were modelled using multivariable logistic regression, adjusting for age at baseline, sex, ethnicity (entered as a binary variable for white / other in models due to small numbers of ethnic minorities being present in each outcome group in sex-specific and individual APOE genotype analyses), the first ten genetic principal components supplied by the UK Biobank team,6 and genotyping array. Logistic regression was adopted for analyses, rather than survival analysis, because of the sample’s mix of prevalent and incident cases. The prevalence of the exposure (genotype frequency) in the full sample and computed odds ratios for AD and all-cause dementia were used to calculate population attributable fractions (PAF)7: 95% confidence intervals for PAFs were derived using the lower and upper confidence intervals for the odds ratios. PAFs and their confidence intervals were converted from fractions to percentages. Given the low prevalence of the outcomes in the sample (and particularly in the reference group), odds ratios were regarded as equivalent to risk ratios for PAF calculations. We also stratified models by sex to produce PAFs for each outcome in females and males separately.
To calculate PAFs for AD attributable to other genomic loci besides APOE, we used data for the top nine loci beyond APOE from one of the largest genome-wide association studies (GWAS) of neuropathologically confirmed AD.4 For the risk-increasing allele of each GWAS hit, we first identified allele frequency p among individuals of European ancestry in the 1000 Genomes project, phase 3.1 We then calculated genotype frequencies for homozygous carriers of the risk allele (p2) and heterozygous carriers of the risk allele (2 × p x (1-p)). On the basis that the GWAS modelled risk assuming additive effects of variants, we estimated overall PAFs for each variant as a sum of a PAF for homozygous carriers and a PAF for heterozygous carriers. The genotype frequencies of homozygotes and heterozygotes were entered as prevalence of the exposure in equation (1). For homozygotes, odds ratios were recalculated as exp(2 × log-odds) reported by the GWAS; for heterozygotes, odds ratios were based on the reported log-odds. We note that our PAF calculations for AD risk loci differ from the authors’ own calculations which are presented in supplemental table 6 of their publication.4 These appear to have been based on allele frequency among controls in their sample instead of genotype frequencies for exposure to an allele, and the use of a single odds ratio rather than doubling of the odds for the proportion of individuals that would be homozygous for a risk allele – hence, these PAFs are substantially lower than ours. To calculate the equivalent statistics for coronary artery disease (CAD), we used summary statistics from one of the largest CAD GWAS to date.25 We used the same approach as for AD risk loci to calculate PAFs for the top 50 loci identified from this GWAS, and then ranked these to use the highest 10 PAFs in our figure.
To evaluate the separate contributions of ε3 and ε4 alleles to the overall burden of AD attributable to these two APOE alleles, we used a formula for calculating PAFs for multi-level exposures.26 We applied this to each of the five risk-increasing genotypes (ε2/ε3, ε2/ε4, ε3/ε3, ε3/ε4, ε4/ε4), relative to ε2/ε2 carriage, with an indicator variable for genotype entered into the logistic regression modelling described above. The contributions of ε3 and ε4 were then calculated as the sums of PAFs from individual genotypes including each allele separately (e,g, ε2/ε3 and ε3/ε3 for ε3 carriage) along with the estimated shares of each allele due to the increased risk experienced by ε3/ε4 carriers. The genotype-specific PAF for ε3/ε4 was partitioned into ε4 and ε3 contributions according to the ratio by which ε2/e4 to ε2/ε3 genotypes increase AD risk, i.e. according to the individual effects of the two alleles on AD risk. This ratio was 2.46:1 for ε4:ε3, meaning that ε4 was estimated to be responsible for 71.1% of the increased risk experienced by ε3/ε4 carriers – and hence 71.1% of the PAF for the ε3/ε4 genotype, with the remainder attributed to ε3. This assumes that any contribution to an interaction between the two alleles in relation to AD risk is proportionate to each allele’s effects on AD risk in isolation among ε2/ε4 and ε2/ε3 carriers. We note that the overall PAF for AD due to genotypes containing ε3 and/or ε4 in this analysis (75.0%) differed slightly to the point estimate in our main analysis (74.5%) due to differences in precision when making five individual comparisons rather than the use of one binary exposure.
Ethics UKB participants had given written informed consent and ethical approval for the study was granted by the North West Haydock Research Ethics Committee of the UK’s Health Research Authority.
Missingness
From a starting sample of 217,458 individuals in UK Biobank aged 60 years or older at baseline, 171,128 were included in the analytical sample (78.7%). Sequential exclusions were based on: i) no genotype data available (N=6,242); ii) individuals removed during GWAS quality control steps (N=39,242; primarily due to relatedness among study participants); iii) missing data on self-reported ethnicity (N=841); iv) exclusion of individuals whose genotype data implied that they may have a rare ε3r (also known as ε1) allele (N=5).
We followed STREGA reporting recommendations for genetic association studies.
Data Availability
All data used in this research are available to researchers that register with UK Biobank and request access to them as part of an approved project.
Author information
DMW conceived the study, undertook the analyses and drafted the manuscript. All authors contributed to the interpretation of data and the manuscript’s content, and approved its final version.
We have no conflicts of interest to disclose.
Data sharing statement
All data used in this research are available to researchers who register with UK Biobank and request access to them as part of an approved project: https://www.ukbiobank.ac.uk/. The data fields and script used in the analyses can be viewed at: GitHub link to be inserted upon M/S acceptance.
Acknowledgements
This research has been conducted using the UK Biobank Resource under application number 71702. The MRC Unit for Lifelong Health and Ageing at UCL is funded by the Medical Research Council (MC_UU_00019/3). NMD is supported via a Norwegian Research Council Grant number 295989. ELA is supported by a UKRI Future Leaders Fellowship (MR/W011581/1). For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.
Footnotes
We added new analysis of the burden of AD attributable to APOE e3 and e4 alleles separately, and included a comparison of the burden of AD attributable to e3 and e4 with other genetic risk loci for AD and attributable fractions for the top ten genetic risk loci for coronary heart disease (figure 1). We also implemented minor analytical changes to the main modelling.