ABSTRACT
Alzheimer disease (AD) is the most common type of dementia and is currently estimated to affect 6.2 million Americans. It ranks as the sixth leading cause of death in the United States and the proportion of deaths due to AD has been increasing since the year 2000 while the proportion of many other leading causes of deaths have decreased or remained constant. The risk for AD is multifactorial, including genetic and environmental risk factors. Though APOE remains the largest genetic risk factor for AD, more than 26 other loci have been associated with AD risk. Here, we recruited from a population of Amish adults from Ohio and Indiana to investigate AD risk and protective genetic effects. With slightly lower incidence and later age of onset, it is thought that the Amish may hold protective genetic variants for AD. As a founder population that typically practices endogamy, variants that are rare in the general population may be at higher frequency in the Amish population. We characterized the genetic architecture of AD risk in the Amish and compared this to a non-Amish population, elucidating the lower relative importance of APOE and differing genetic architecture of the Amish compared to a general European ancestry population.
INTRODUCTION
Alzheimer disease (AD), the most common type of dementia, is the sixth leading cause of death in the United States and occurs in over 35% of individuals age 85 and older.1,2 It is currently estimated that 6.2 million Americans are living with AD.1 Deaths attributable to AD increased by 146.2% from the years 2000 to 2018 whereas other leading causes of death remained constant or decreased.1 This burden of AD is expected to increase in the coming years due to increased longevity and decreased fertility, known as population aging.1,3,4 The cost of managing AD will continue to increase with an expected annual global cost surpassing $50 billion by 2050.1,5,6 People living with AD also suffer from severe degradation of their quality of life including reduced independence and being at higher risk of somatic and psychiatric comorbidities.7–9 Improved understanding of AD risk and subsequent improvements to screening, prediction, and prevention efforts are needed to reduce these burdens. As current medications only marginally and temporarily delay the progression and lessen the severity of AD, its growing prevalence serves as an imperative issue.
Risk for AD is multifactorial, including genetic and environmental risk factors.8,10,11 While only 2-5% of all cases of AD are strongly familial (e.g. result from high penetrance mutations),12 the overall heritability of late-onset AD is estimated to be as high as 70% based on twin studies and genome-wide association studies (GWASs); however, such estimates can vary by population and environment.13–15 Genetic risk for AD is complex, including more than 26 independent associated loci spanning diverse population groups.16–18 Despite this large number of loci, the currently confirmed loci associated with AD risk account for only a small proportion of the overall heritability of AD.15,16 Increased sample sizes and diversity of study populations will help GWASs to elucidate the remainder of the heritability.
The largest genetic risk for AD is conferred by the apolipoprotein E (APOE) locus19 on chromosome 19 with 3 to 15 times increased risk for those holding either one or two copies of the e4 risk allele compared to those holding no risk alleles.20 This association between AD and APOE has been replicated across many different and diverse populations.21–23
One such population is the Amish: descendants of German and Swiss Anabaptist immigrants who settled in the United States during the eighteenth and nineteenth centuries. Communities currently living in Holmes County, Ohio and Elkhart & LaGrange Counties, Indiana are mostly descendants from the German Palatinate, while the communities in Adams County, Indiana largely descend from Swiss Anabaptist immigrants.24–26 The expansion of the Amish from a population of less than 1,000 founders in the United States27 with subsequent cultural and religious isolation has restricted the introduction of new genetic variation. This leads the Amish to be representative of a subset of a more general European gene pool. Because of these factors, the Amish are a unique population that can serve as an ideal candidate for genetic research. Due to endogamy, some variants rare in the general European population may be at higher frequency in the Amish, allowing for detection and consideration of effects that may not otherwise be captured in studies of the general population.28 This situation is ideal for investigation of susceptibility genes for complex traits, including AD.
A slightly lower prevalence of AD has been reported within Amish populations, even after accounting for the effect of a lower frequency of the APOE e4 risk allele.29–31 Improved understanding of what protective or other risk-bearing variants the Amish may be enriched for could prove helpful in improving general understanding of genetic risk of AD.
We have recruited adults from Amish families living in Holmes County, Ohio and Elkhart, LaGrange, and Adams Counties, Indiana. Our current focus is to recruit individuals who are cognitively unimpaired relative to age-normed benchmarks (CN) but at elevated risk for developing AD. We characterize this population and compare with a non-Amish European-ancestry population living in the US for age, APOE genotype, and both a genetic risk score (GRS) using genome-wide significant variants and a polygenic risk score (PRS) spanning the entire genome.
METHODS
Subjects
Individuals included in this study have been recruited over the past 20 years for multiple studies of AD or dementia,29,32–34 age-related macular degeneration,35–37 and successful aging.38– 40 For all studies, the primary criteria for enrollment included being age 50 or older, being part of the Amish community, and being of Amish descent. All individuals were screened for cognitive status. In addition, for the current study, individuals were enrolled if they were known not to be cognitively impaired (CI) and were age 76 and older. We prioritized enrollment of individuals with at least one family member with probable or confirmed AD. Participants were recruited from Amish families living in Holmes County, Ohio and Elkhart, LaGrange, and Adams Counties, Indiana.
Cognitive Screening
Depending on the specific study, at time of enrollment, individuals were cognitively screened using a combination of the 3MS education-adjusted examination (all individuals),41 the AD8 checklist,42 the CERAD word list learning test,43 and the Trail Making test (for the Alzheimer disease and successful aging studies).44 Individuals were classified as CN or CI based on established cutoffs.41,44 Individuals initially classified as CI were further evaluated by a clinical adjudication board, comprised of neurologists and neuropsychologists, to further classify them as having mild cognitive impairment (MCI), AD, cognitive impairment, not dementia (CIND), having an unclear status.
Genotyping
At time of enrollment, 30 milliliters of blood were collected from all participants for use in direct DNA extraction and storage of plasma. Genotype data were collected using an Illumina Expanded Multi-Ethnic Genotyping Array45 with custom content (MEGAex+3k) or an Illumina Global Screening Array46 (GSA). The MEGAex chip includes over 2 million markers whereas the GSA chip includes a base quantity of 660,000 markers. When performing chip genotyping, we also included customized content of up to 6,000 variants to the MEGAex chip, including over 1,100 novel varaints that have already been identified from our previous Amish whole exome sequencing (WES) and whole genome sequencing (WGS) studies and other associated variants from GWAS and the National Institute on Aging’s Alzheimer’s Disease Sequencing Project47,48 (ADSP) studies that are not already on the chip. After genotype data were attained, imputation was performed based on a Haplotype Reference Consortium (HRC) panel.49,50 We investigated genetic relationships of individuals within the overall study population by calculating kinship coefficients using KING 2.26.51 Further, we compared the average genetic relationship across subpopulations based on recruitment site and cognitive status.
Quality Control
Quality control (QC) was performed on MEGAex+3k and GSA genotyping chip sets independently, with each containing samples from both Indiana and Ohio. A total of 774 individuals in the Illumina MEGAex+3k array met the QC threshold of 3% for genotype missingness. There were 1,973,806 SNPs in this initial set of autosomal and X chromosome SNPs. All SNPs genotyped in < 5% of the individuals (n=52,393) were dropped. Additionally, monomorphic (n=1,235,890) and duplicate (n=1,471) SNPs were excluded. Common SNPs (MAF >=1%) were evaluated for deviation from Hardy-Weinberg Equilibrium (HWE) and dropped if the p-value was < 1 × 10−6 (n=5,518). Mendelian error checking was performed on the related individuals within the set and any identified genotypic errors were zeroed out for all members of the affected family. Missingness and HWE were repeated after Mendelian error checking. The final, cleaned MEGAex+3k data set consisted of 774 individuals and 655,441 SNPs (chromosomes 1-22, X).
1,322 individuals had < 3% missing genotypes for the Illumina GSA array. Of the 703,560 genotyped SNPs, 1,470 were genotyped in < 10% of the individuals and were dropped. Additionally, monomorphic (n = 144,333) and duplicate (n = 4,656) SNPs were excluded. Common SNPs (MAF >= 1%) were evaluated for deviation from HWE and dropped for p-values < 1 × 10−6 (n = 847). Mendelian error checking was performed on the related individuals within the set and any identified genotypic errors were zeroed out for all members of the affected family. Checks for missingness and HWE were repeated after Mendelian error checking. The final, cleaned GSA data set consisted of 1,322 individuals and 545,470 SNPs (chromosomes 1-22, X).
Imputation was run on the Michigan Server using the HRC reference set. The MEGAex+3k and GSA data sets were imputed separately, and each was submitted using the GRCh38 build for autosomes and hg19 for the X chromosome. The reference population for HRC was European and the phasing was done using the Eagle option. Each data set underwent quality control separately after imputation. For rare SNPs (MAF < 0.01) an INFO score minimum of 0.8 was required. Common SNPs were considered to have passed QC with an INFO score of 0.4 and above. The MEGAex+3k set had 1,059,138 rare (MAF < 0.01) SNPs that passed QC and 7,722,065 common SNPs. The GSA dataset had a total of 1,423,947 rare SNPs that passed QC and 7,777,352 common SNPs that met the threshold. The two separately imputed sets were then merged after QC into one set of 2,096 samples using overlapping SNPs contained in both. The final imputed dataset contained 8,311,803 SNPs. Of these, 759,280 were rare and the remaining 7,552,523 were common.
We compared the Amish population to an existing source of non-Amish, European-ancestry individuals living within the US. The ascertainment for this population has been described elsewhere.52,53 This population included individuals ascertained from the University of Miami at the John P. Hussman Institute for Human Genomics, the Vanderbilt University Center for Human Genetics Research, and Duke University. After standard quality control, a total of 2,470 adults were included with an approximate 1:1 case-control ratio. Case status was determined by autopsy when possible. Otherwise, diagnoses were evaluated by two independent neurologists. Other phenotype information includes sex, age of exam, and age of onset in cases.
Comparisons in Genetic Risk of AD
The Amish population and comparison group were initially compared for distributions of sex, age, and cognitive status. Comparisons by genetic risk factors were performed in subsets of the overall population after exclusion of individuals under age 75 years old to account for the late age of onset of AD30 in addition to differences in age distribution between the Amish data and the non-Amish comparison data (Supplemental Figure 1).
The GRS was generated using 31 genome-wide significant variants, excluding APOE variants, as reported in the recent Jansen et al. (2019)17 genetic meta-analysis. The GRS was constructed using PRSice-254 and goodness of fit was assessed in R version 3.5.1.55 Dosage information was considered for imputed SNPs. For ease of interpretation, the mean and standard deviation of the GRS were scaled to zero and one, respectively.
The PRS was generated using a pruning and thresholding approach in PRSice-254 and the best-fit PRS model, in terms of correlation coefficient R2, across the combined Amish and non-Amish dataset was used. All SNPs from the Jansen et al. (2019)17 meta-analysis were included for PRS construction except for those within 500 kilobases of either APOE SNP (rs429358 and rs7412). The parameters for clumping in the construction of PRSs included a 500 kilobase window centered on each index SNP and an r2 threshold of 0.1. Dosage information was considered for imputed SNPs. A best-fit PRS was chosen in combined data after applying across different potential p-value thresholds of included index SNPs. For ease of interpretation, the mean and standard deviation of PRS were scaled to zero and one, respectively.
Distributions of the GRS and PRS were compared across the populations and by Alzheimer disease or other dementia case status. GRS and PRS models were compared with an APOE-only model, covariate-only (sex and age) model, and a combined APOE and covariate model. Additional models were constructed including GRS and PRS to investigate overall predictive ability of the risk scores with and without the presence of the other variables. The predictive value of the constructed models was assessed by area under the receiver operating characteristic (ROC) curve (AUC).
RESULTS
After quality control and assurance, the genotype information of 2,096 Amish individuals was available for analysis. Of these, 1,965 had a cognitive exam performed. The final population included 1,146 females and 819 males (Table 1). Of these, 1,367 were classified after consensus expert review as CN, 385 were CI, 18 had mild cognitive impairment (MCI), and 326 were unclear or missing (Table 1). Among the 385 with CI, 152 individuals (7.3% of the total sample) were considered to have probable or confirmed AD or other type of dementia. The mean and median age of the Amish population sample were 75.17 and 79, respectively, with a range of 21 to 110 years old. This includes 1,198 individuals of age 75 years old or older. After exclusion of individuals under age 75 years, APOE genotypes for the Amish and comparison groups demonstrate a lower prevalence of e4 alleles and higher prevalence of e2 alleles in affected Amish individuals than non-Amish cases (Table 2). The unaffected Amish have a similar distribution of APOE genotype to that of the non-Amish controls, except for a lower prevalence of the e2|e3 genotype.
Relatedness
Average kinship coefficient across all individuals in the Amish study population was calculated to be 0.003703 which is equivalent to between third and fourth cousins. Average kinship coefficient across subpopulations by primary study site and CI status were similar (Supplemental Table 1).
Genetic Risk Score
After GRS construction, we observe, in general, less variance among Amish GRS, regardless of affection status, than in the non-Amish comparison group (Figure 1). Though the mean and median GRS are greater for the affected Amish than in the unaffected Amish, this difference is not statistically significant at α = 0.05. Further, the Amish population has no individuals among the 13 highest values of the GRS in the combined analysis, regardless of affection or case status.
Polygenic Risk Score
We observed that the values of PRS in the affected Amish individuals are lower than in the non-Amish cases. The values of PRS in the non-Amish controls are generally lower than that of the unaffected Amish. Overall, the difference in PRS values between the Amish affected and unaffected is much smaller than between the non-Amish cases and controls. The 29 highest PRS values all belonged to non-Amish individuals with the 13 highest of these belonging to non-Amish cases. PRS was unable to distinguish between affection status in the Amish (p = 0.7) but was able to distinguish between case status in the non-Amish population (p < 0.0001). PRS was also able to distinguish (p < 0.0001) between affected Amish and non-Amish controls in addition to non-Amish cases and unaffected Amish (p < 0.0001).
We evaluated the association of the GRS, PRS, sex, age, and APOE genotype with the primary outcome of AD or other dementia by building a series of logistic regression models, after stratification by source population. Age was associated with the primary outcome across all models at α = 0.05.
We found that none of the APOE genotype categories are associated with affected vs. unaffected status in the Amish whereas each of the APOE genotype categories including at least one e4 allele were associated with case status in the non-Amish population at α = 0.1 (e2|e4 p-value = 0.062; e3|e4 p-value = 0.004; e4|e4 p-value = 0.0003).
The GRS and PRS were associated (p < 0.05) with the primary outcome across all tested models including PRS in the non-Amish populations. However, GRS and PRS were not significantly associated with the primary outcome in the Amish population, despite having an odds ratio (OR) > 1 across all models including GRS and PRS.
We also evaluated goodness of fit through AUC across each of these models (Table 3). We determined that the AUC of the sex and age only (covariate) model is larger in the Amish (0.693) than the non-Amish population (0.601). By contrast, we determined that the AUC for an APOE genotype only model is larger in the non-Amish population (0.712) than in the Amish population (0.594). The GRS models performed similarly in the Amish and non-Amish populations. A higher AUC was observed for the PRS models in the non-Amish population than in the Amish population.
DISCUSSION
This study characterized and evaluated the genetic risk for AD in an Amish population and compared it to a non-Amish population of predominantly European ancestry. The results indicate that there exists not only less variation in APOE genotype within the Amish, but also that APOE genotype may not play as large of a role in development of AD or other dementia as within a typical European ancestry population. Our results support the notion that APOE has a smaller effect on AD risk in the Amish population than in a non-Amish population,30 possibly due to the lower prevalence of APOE e4 in the Amish population.
Non-APOE GRS and PRS have only moderate predictive value on their own but in addition to covariates, they do provide a meaningful increase in predictability in a logistic regression model for case/affected status. We determined that, based on a GRS of genome-wide significant SNPs from a recent meta-analysis of GWASs,17 there exists more variation among genetic risk in a non-Amish population than in an Amish population. When extending to a PRS analysis, this phenomenon is much more prominent. The PRS model also added additional distinguishing ability in AD or other dementia status in the non-Amish population. We also determined that a non-APOE GRS and PRS do not seem to differ greatly between affected Amish and non-Amish cases, suggesting that risk scores created using effect size weights derived from non-Amish European samples may not accurately predict risk in the Amish. This is somewhat similar to previous findings29 of GRSs that included APOE but highlights that APOE still plays an important role in AD prediction in the Amish.
In predicting the primary outcome of AD or other dementia, our results suggest that age is the most crucial risk factor in the Amish population whereas APOE and PRS bear greater importance in the non-Amish population. We observe much worse predictive ability when using a PRS that includes SNPs that do not meet genome-wide significance criteria in the non-Amish population compared to the Amish population, suggesting that the underlying genetic architecture for AD risk is dissimilar to that of a general European ancestry population, especially among SNPs that do not meet criteria for genome-wide significance in the non-Amish population.
The lower prediction ability in the Amish for GRS and PRS comprising known AD risk factors suggests that the risk profile in the Amish is significantly different – either through variation in the effect size for these known alleles, the existence of unidentified AD risk factors, or, likely, both. When combining this with information that the Amish have lower prevalence of cognitive impairment and dementia,29,30,56 it becomes clear that between their genetic risk factors and lifestyle, the Amish are somewhat protected from these outcomes in a way that using risk estimates from a general European ancestry population cannot explain. This warrants further investigation as the Amish are a sub-population of European immigrants that have practiced endogamy since arriving in the United States. Our results add to mounting evidence that there is genetic risk in the Amish that is not captured by genetic risk scores derived from non-Amish populations.
We conclude that there are evident differences in the genetic architecture for AD risk in the Amish compared to a non-Amish European ancestry population, especially in terms of APOE distribution, PRS distribution, and their conferred risk. Future genomic studies including the Amish should consider using effect estimates from an Amish analysis to determine if there are substantial differences in predictive ability than are seen after PRS construction using effect estimates from a non-Amish population. Identification of why the Amish appear to be relatively protected from AD and cognitive impairment, in general, warrants further study to identify risk factors enriched in the Amish that may enhance previously identified pathways important in the development of AD and identify additional pathways or mechanisms that contribute to or protect against cognitive decline. By extending this cohort through new recruitment and longitudinal follow-up, the power of this cohort to identify both novel risk and protective genetic loci, and potential predictors of progression from normal to AD will be increased. This will allow for better detection of rare effects and better understanding of the differences in the genetic risk of AD between the Amish and non-Amish populations.
Data Availability
Data will be made available through NIAGADS (https://www.niagads.org/) upon publication.
ACKNOWLEDGEMENTS
We thank the Amish families for their willing participation in our study. We used the Anabaptist Genealogy Database and Swiss Anabaptist Genealogy Association. This study is supported by National Institutes of Health / National Institute on Aging, grant AG058066 (to Jonathan L. Haines, Margaret A. Pericak-Vance, and William K. Scott). Finally, we acknowledge the resources provided by the Department of Population and Quantitative Health Sciences, School of Medicine at Case Western Reserve University and the John P. Hussman Institute for Human Genomics at University of Miami, Miller School of Medicine.