Abstract
We built a genetic risk score (GRS) from the most complete landscape of the Alzheimer disease (AD) genetics. We extended its analysis in 16 European countries and observed a consistent association of this GRS with AD risk, age at onset and cerebrospinal fluid (CSF) AD biomarker levels regardless of the Apolipoprotein E (APOE) genotype. This GRS was also associated with AD risk (independently of APOE) with a decreasing order of magnitude in those with an European-American, North-African, East-Asian, Latin-American, African-American background respectively. No association of the GRS to AD was seen in sub-Saharan African and Indian populations. This GRS captures information specific to AD as its association decreases as the diagnosis broadens. In conclusion, a simple GRS captures shared genetic information specific to AD between multi-ancestry populations. However, more population diversity is needed to better understand the AD genetic complexity across populations.
Over the past 15 years, genome-wide association studies (GWAS) have successfully identifying the genetic component of multifactorial diseases. This has led to the development of several powerful approaches that help decipher pathophysiological processes and propose potential diagnostic/prognostic tools. These tools include GRS, a cumulative effect score summarizing the information distributed across individual susceptibility genome-wide significant variants or polygenic risk scores (PRS, an extension of GRS to loci of small effect, without genome-wide significant associations to potentially capture additional predictive capacity1.
Due to high estimate of the genetic component of AD, i.e., heritability of 60-80% in twin studies2, such GRSs/PRSs have logically been of particular interest in this pathology. Many GRS/PRS have therefore been developed to assess their associations with AD and related disease (ADRD) risk or with ADRD-related endophenotypes such as age at onset3, brain imaging4-6 or CSF biomarkers7,8. In almost of these studies, the GRSs/PRSs were significantly associated with the quantitative or binary traits of interest. However, comparisons between publications, even for the same trait such as AD risk, appear to be difficult due to obvious differences in the populations analyzed, heterogeneity in summary statistics and in the variants included, or different statistical methods used to calculate the PRSs9. In addition, most of the GRSs/PRSs approaches have been developed utilizing European ancestry populations, and only a few studies have investigated the relevance of such tools in different ancestries10,11.
Recently, a GRS built from the most complete and reliable landscape of AD genetics, including 75 AD risk loci (and 83 independent signals excluding APOE) generated by the European Alzheimer & Dementia Biobank (EADB) consortium, was tested in a limited number of European ancestry cohorts12. This GRS was found to be associated with the risk of developing future AD/dementia in population-based studies, or of progressing from mild cognitive impairment to AD/dementia. In this context, we decided to validate the association of this GRS with AD risk, CSF biomarker levels and age at onset in populations from 16 European countries. We also decided to extend the study of this GRS to include different ancestries from Asia, Africa and Latin America.
The different populations analyzed in this work are described in Supplementary Table 1. Two main GRSs were calculated (see Supplementary Table 1 for the different adjustments used per population and Supplementary Table 2 for the list of SNPs used); (i) a GRS including the genome-wide significant and independent sentinel SNPs at the loci reported by Bellenguez et al. (referred to as GRSALZ) and (ii) a GRS including these sentinel SNPs and adjusted for the number of APOE-ε2 and APOE-ε4 alleles (GRSALZadjAPOE).
We first analyzed the association of these GRSs with the risk of developing AD in case-control studies from 16 countries representative of the European genetic diversity of the EADB. All the GRS distributions in cases and controls are shown in Supplementary Figures 1-3. All countries were analyzed independently and the Odd Ratios (ORs) indicate the effect of the GRS as the increase in the risk of AD associated with each additional allele of average risk in the GRS. GRSALZ appeared to be significantly associated with AD in a homogeneous manner across Europe. The association of GRSALZ with AD risk was little affected when the impact of ε2/ε3/ε4 APOE alleles (GRSALZadjAPOE) was considered (Fig. 1). Of note, GRSALZadjAPOE was also associated with a younger age at onset (Supplementary Fig. 4).
We then generated a mega-analysis based on the EADB datasets of the 16 European countries in order to assess in the most precise way, the risk of developing AD according to GRS strata, i.e. 0-2%, 2-5%, 10-20%, 20-40%, 60-80%, 80-90%, 90-95%, 95-98% and finally 98-100%. The 40-60% GRS stratum was used as the reference. In this analysis, we also generated a GRS including the sentinel SNPs of the GWAS loci but also the two SNPs defining the ε2/ε3/ε4 APOE alleles (GRSALZinclAPOE). Results are shown in Fig 2A and as expected the risk of developing AD in the most extreme strata is particularly high when APOE is included in the GRS. However, association of GRSALZ was also significant in all the strata analyzed and not impacted when adjusted for the number of APOE-ε2 and APOE-ε4 alleles (GRSALZadjAPOE). In the two extreme 0-2% and 98-100% strata, both GRSALZ and GRSALZadjAPOE were associated with a decrease of more than 2-fold or an increase of more than 3-fold of the AD risk when compared with the 40-60% stratum (see Fig. 2A and Supplementary Table 3).
Since all the analyses suggested an association of GRSALZ independently of the APOE status, we took advantage of our large mega-analysis to fully address how GRSALZ interacts with the APOE genotypes. For this purpose, we stratified in 4 groups the mega-analysis according to the APOE genotype by grouping together the ε2ε2/ε2ε3, ε3ε3, ε2ε4/ε3ε4, ε4ε4 carriers. We then assessed in each of these subpopulations, the association of GRSALZ with AD risk per quintile strata (0-20%, 20-40%, 60-80% and 80-100%) with the 40-60% stratum as a reference. Remarkably, GRSALZ appeared to be homogenously associated with AD risk in all the strata indicating that the APOE and GRSALZ risk are independent (Fig. 2B and Supplementary Table 4).
To determine whether these GRSs are associated with pathophysiological processes known to be involved in the AD pathology, we used a large GWAS on CSF Aβ42, Tau and p-Tau that we generated through EADB, including 5,944 individuals13. GRSALZ was associated with a decrease in Aβ42 levels and with an increase in Tau and pTau levels that fits with the increased AD risk associated with this GRS (Fig. 3). GRSALZadjAPOE was similarly significantly associated with these endophenotype levels (Fig. 3). It should be noted that although the association of these GRSs with the level of Aβ42 in CSF appears to be highly heterogeneous between populations, their associations with the levels of Tau and p-Tau are, on the contrary, highly homogeneous in the same populations. This could be interpreted as Tau and p-Tau concentrations directly reflecting pathophysiological processes (probably neuronal) that depend on the genetic factors of Alzheimer’s disease. CSF Aβ42 concentrations, on the other hand, would depend on multiple intermediate processes, leading to the heterogeneity observed between the different populations.
Following the assessment of these GRS associations with AD risk and endophenotypes in European populations, we thought to extend their analyses to other European-ancestry populations (USA) but to include also populations from India, East-Asia (China, Japan and Korea), North-Africa (Tunisia), sub-Saharan Africa (Central African Republic/the Congo Republic), South-America (Argentina, Brazil, Chile, and Colombia) as well as African-, Native- and Latino-American admixture populations from US studies (more than 75% African- or Native-American ancestry or self-reporting for Latino-American populations) (Fig. 4). With the exception of Korea, where 71 SNPs were available, most GRSs were constructed from 79-85 SNPs including APOE variants, supplementary Table 2). All the GRS distributions in cases and controls are shown in Supplementary Figure 5-7. Of note, the association of APOE with AD risk was heterogonous between these different populations as previously described in several prior studies (Figure 4 and Supplementary Table 5)14,15.
As expected, the AD risk associated with GRSALZ or GRSALZadjAPOE was highly similar between European and other European-ancestry populations (USA). Both GRSs were significantly associated with AD risk with a decreasing order of magnitude from Maghreb, East-Asian countries, Latino-American to African-American population admixture (Fig. 5 and Supplementary Table 6). However, these GRSs had no association with risk of AD in the sub-Saharan African and Indian populations. This may be related to the small sample size, which resulted in the study lacking statistical power. However, in African-American admixture, the magnitude of association decreased as the African ancestry percentage increases, to finally reach a level similar to that observed in our sub-Saharan African population: association of GRSALZadjAPOE with AD risk populations with more than 90% African ancestry is OR=1.02 (95% CI 1.00-1.05, P=9x10-2). This observation supports the expectation that GRSs constructed on genetic data generated from European ancestry populations perform poorly in African ancestry populations. Similarly, in ADSP Native-American admixture, the magnitude of association decreased as the Native-American ancestry percentage increases (from OR=1.05 95% CI 1.03-1.07, P=4.1x10-6, OR=1.04 95% CI 1.02-1.06, P=9.0x10-4 to OR=1.03 95% CI 1.01-1.06, P=5.0x10-3, respectively in the populations with more than 50%, 75% and 90% Native-American ancestry). Notably, a similar finding was seen in Chilean and Argentinian populations where the GRSALZ association diminishes when the Native American ancestry rises16. Of note, GRSALZadjAPOE was also associated with a younger age at onset in most of the populations studied (at least 100 AD cases) at the notable exception of the Chinese and Korean populations Supplementary Fig. 8). Of note, the APOE ε2/ε3/ε/4 alleles strongly influence age at onset in these two populations (Supplementary Fig. 9).
To refine our analysis in multi-ancestry populations, we calculated the association of GRSALZadjAPOE and GRSALZinclAPOE with AD risk per quintile strata (0-20%, 20-40%, 60-80% and 80-100%) with the 40-60% stratum as a reference in all our populations. Of note, Salsa, Indian, North-Africa and sub-Saharan Africa populations were excluded because of their small sample size. Results were then meta-analyzed in order to increase statistical power. Consistent with the results shown in Figure 5, the GRSALZadjAPOE association is very similar between the European-ancestry and East-Asian populations, but less marked with the Latino-American populations and the African-American admixture populations (Fig. 6 and supplementary Table 7).
Finally, we took advantage of the Million Veteran Project (MVP) to assess how the association of GRSALZadjAPOE with the risk of AD behaved in multi-ancestry populations according to the diagnosis of dementia, that is, to see how a GRS derived from case/control studies using a specific diagnosis of AD performed when the diagnosis was broadened to dementia. GRSALZadjAPOE showed a decreasing association with AD risk as the diagnosis became less specific, irrespective of the multi-ancestry populations studied (Table 1).
By studying a large multi-ancestry diversity, our work highlights the importance of cross-ancestry studies in unravelling the genetic complexities of this devastating disease and contributes significantly to the understanding of AD risk assessment in diverse populations. The vast majority of GWAS data available worldwide have been developed in European-ancestry populations and the study of the genetics of AD is no exception, especially as the diagnosis of the disease is most often based on specialized clinical structures. This makes studying this disease more difficult in low- and middle-income countries. This is beginning to change as new case-control studies are developed worldwide. However, it is difficult to compare the results obtained in European-ancestry GWASs since GWAS studies in these non-European populations are currently of a size that is at best comparable to that of European studies in the early 2010s17,18. These studies are therefore underpowered to detect at a genome-wide significant level most of the loci found in European populations. In addition, the small size of some of these populations means that there is a risk of generating false positives, as in silico replication is difficult given the sparsity of samples from that population.
To address some of these limitations, we have chosen to develop a simple approach to better compare AD genetics across many populations worldwide. We used a GRS combining sentinel SNPs associated with the risk of developing AD in populations of European origin, as described previously12. We tested its association with the risk of developing AD in the largest panel of diverse populations studied to date. We were able to observe a systematic association of this GRS with all the populations under study, with the exception of Indian and African Sub-Saharan populations, probably partly due to lack of statistical power. Our results thus suggest that all multi-ancestry populations are likely to be affected by shared pathophysiological processes driven in part by these genetic risk factors.
However, this study also reveals important differences between populations, especially in African-ancestry populations: ORs are lowest for populations from African countries or of African-American admixture, with the ORs in African-American population, when their % African ancestry reaches 90%, being similar to those measured in populations from sub-Saharan African countries. Our work thus suggests that the AD genetic architecture in African-ancestry populations strongly differs from European-ancestry populations. Higher genetic diversity is observed in African-ancestry populations compared to non-African-ancestry populations19-21 and this likely implicates different causal variants in already known loci (as previously described for ABCA722 and/or different genes involved in the AD pathophysiology as previously proposed in23,24).
Such Genetic diversity likely explains part of the potential methodological reason for these variations of the GRSALZ association across populations. First, although we attempted to develop a comparable GRS, some SNPs were not present in all the populations studied or were poorly imputed, with the likely consequence of underestimating the association of GRSs with AD risk in these populations. However, as these points mainly affect low-frequency variants (see Supplementary Table 2), their impacts on GRSs are likely to be limited. It should be noted that, limiting this imputation problem, some of the genetic data were generated from whole genome sequencing (Alzheimer disease sequencing project (ADSP) and China). Second, we used sentinel variants defined by populations of European ancestry. Most of these are unlikely to be the causal variants and therefore, they do not capture properly the association of a locus with the risk of developing AD in other ancestries as the linkage disequilibrium structure of the genome changes across ancestries. This has already been described for a number of other diseases and populations. The implementation of a PRS calculated based on well-powered GWAS data generated in the target population might solve this problem25,26. However, such an approach will require access to raw GWAS data from different populations to generate accurate multi-ancestry PRSs as for instance already generated in kidney diseases27. Finally, it has been estimated that around 10-20% of AD dementia is misdiagnosed, and we have observed that the more inaccurate the clinical diagnosis of AD dementia, the smaller the magnitude of the association in the GRS. This observation suggests that our current GRSs capture information specific to AD, but that the quality of the clinical diagnose may interfere with measuring the association of this GRS with the AD risk in a given population. This last point will have to be better controlled in the future with the introduction of standardized diagnosis between different populations, for example, based on biological markers (although there is also a need for more diversity in this field to define the relevance of biomarkers in multi-ancestry populations).
Finally, our study also confirms the strong heterogeneity in the APOE ε2/ε3/ε4 associations with AD risk in different populations (Figure 4). Differential regulation of APOE expression related to different ancestral genomic background around the locus have been proposed to account for the differences in risk between populations of various ancestries28-31. In addition, it has also been postulated that interaction with other genetic risk factors for AD may be an explanation for this heterogeneity. However, our data also clearly indicate that GRSALZ appears to be associated with AD risk regardless of APOE status whatever the population studied. This observation suggests that we may have two independent genetic entities for sporadic AD which fits remarkably well with the recent hypothesis of three variants of AD (autosomal dominant AD, APOE ε4-associated sporadic AD and APOE ε4-unassociated sporadic AD) that may lead to a possible revision of the current clinical taxonomy of AD32.
In conclusion, our various observations are a call for more genetic diversity in the study of Alzheimer’s disease. Although a simple GRS based on the largest European-ancestry GWAS can capture relevant genetic information in most studied populations worldwide, there appears to be important genetic heterogeneity between these populations. Characterizing the global genetic architecture of AD in multi-ancestry populations (particularly African-ancestry) may have important implications for understanding the pathophysiology of AD and accelerating the development of strategies to prevent and treat AD.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Author contributions
Project coordination: A.N., J-C.L; data collection coordination: A.N., Y.LG., J.G., M.D.G., E.N.D.M., J-F.D., H.A., V.E-P., A.Rui., K.H.L., T.I., A.Ram., M.L., J-C.L; Data analyses: A.N, B.G-B., R.S., Y.K., M.K., I.D.R., C.D., X.Z., Y.L.G. C.E.A-B., M.A.C.B., M.Gue., S.v.d.L., M.Gos., A.C., C.B., F.K; EADB Sample contribution: I.d.R., A.C., S.V.D.L,., C.B., F.K., O.P., A.Sch., M.D., D.R., N.Sch., D.J., S.R-H., L.H., L.M.P., E.D., O.G., J.Wilt., S.H-H., S.Moe., T.T., N.Sca., J.C., F.M., J.P-T., M.J.B., P.P., R.S-V., V.Á. M.B., P.G-G., R.Pue., P.Mir., L.M.R., G.P-R., J.M.G-A., J.L.R., E.R-R., H.S., T.K., A.d.M., S.Meh., J.Hor., M.V., K.L.R., J.Q.T., Y.A.P., H.H. J.v.S., I.Ram., F.V., A.v.d.L. P.Sch., C.G., G.P., V.G., G.N., C.Duf., F.P., O.H., S.D., A.B., J-F.Del., E.G., J.P., D.G., B.Aro., P.Mec., V.S., L.P., A.Squ., L.T., B.Bor., B.N., P.C., D.S., I.Rai., A.Dan., J.Will., C.Mas., P.A., F.J., P.K., C.V.D., R.F-S., T.M., P.S-J., K.S., M.I., G.R., M.H., R.Sim., W.v.d.F., O.A., A.Rui., A.Ram., J-C.L., MVP Sample contribution: R.She., R.H., V.M., M.P., R.Z., M.Gaz., M.L.; Salsa Sample contribution: M.Gos., C.S.B., B.F., Q.Y. S.S.; ADSP Sample contribution: A.Gri., T.F., C.Cru., J.Hai., L.F., A.Des., E.W., R.M., M.P-V., B.K., A.Goa., G.D.S., B.V., L-S.W., Y.Y.L., C.Dalg., A.Say., S.S.; Africa sample contribution: M.Gue., P-M.P., P.Mbe, B.Ban, J-F.Dar.; East-Asia sample contribution: Y.K., M.K., X.Z., H.C., N.Y.I., A.K.Y.F., F.C.F.I., A.M., N.H., K.O., S.N., J.G., V.E-P., K.H.L., T.I.; South America sample contribution: C.Dalm., C.E.A-B., M.A.C.B., N.O., C.Muc., C.Cue., L.Cam., P.Sol., D.G.P., S.K., L.I.B., J.O-R., A.G.C.M.,M.F.M., R.Par., G.A., L.A.d.M., M.A.R.S., B.d.M.V., M.T.G.C., B.Ang., S.G., M.V.C., R.A., P.O., A.Sla., C.G-B., C.A., P.F., E.N.d.M., L.M., H.A., A.Rui., A.Ram; Core writing group: A.N, B.G-B., J-C.L.
Competing Interests Statement
C.C. has received research support from: GSK and EISAI. The funders of the study had no role in the collection, analysis, or interpretation of data; in the writing of the report; or in the decision to submit the paper for publication. CC is a member of the advisory board of Vivid Genomics and Circular Genomics and owns stocks. L.M.P. received personal fees from Biogen for consulting activities unrelated to the submitted work T.G. received consulting fees from AbbVie, Alector, Anavex, Biogen, Cogthera, Eli Lilly, Functional Neuromodulation, Grifols, Iqvia, Janssen, Noselab, Novo Nordisk, NuiCare, Orphanzyme, Roche Diagnostics, Roche Pharma, UCB, and Vivoryon; lecture fees from Biogen, Eisai, Grifols, Medical Tribune, Novo Nordisk, Roche Pharma, Schwabe, and Synlab; and has received grants to his institution from Biogen, Eisai, and Roche Diagnostics.
METHODS Online
GRS computation
GRSALZ includes the 83 independent signals associated with AD. GRSALZinclAPOE have two additional SNPs to the list of 83: rs7412, rs429358, encoding APOE ε2 and APOE ε4 and GRSAPOE includes only the SNPs rs7412 and rs429358. The formula used to calculate the different GRSs is as follow:
Where dosagei is the risk allele dosage or genotype of varianti. βi is extracted from stage II meta-analysis summary statistics (including the ADGC, CHARGE and FinnGen data), at the exception of APOE variants, where the βi are extracted from the summary statistics of the last UK Biobank’s GWAS (unpublished data). The last multiplication term is the number of variants divided by the sum of βs. Thanks to this term, one unit of this transformed GRS corresponds to one additional risk allele33.
Sample and variant Quality controls
To have a complete independence of the βs extracted from summary statistics, all samples from ADGC, CHARGE and FinnGen GWASes were filtered out. After meeting the classical GWAS gold standard of sample quality control, each sample was included in the analyses34. If a variant dose, covariate, or discordance between imputed and genotyped (if available) APOE status was observed, the sample was discarded. After QC, a demographic description of each study is shown in Supplementary Table 1.
In case of genotyping data, if the variants are genotyped, they should meet the gold standard of GWAS variant QC34. All the studies were also imputed with the TOPmed reference panel35,36, at the exception of the Korean one, which was imputed with the Haplotype Reference Consortium imputation panel37. If the variants were imputed, variants with a Rsq bellow 0.3 were excluded. In case of WGS data, only variants passing the WGS QC are kept (see supplementary information for ADSP and China sample)(Supplementary Table 2). The global ancestry of each individual in ADSP and MVP was determined with SNPweights v.2.138 (see supplementary information).
European mega-analysis methods
For the mega-analysis of European countries, we merged samples from 5 datasets : EADB-core, GERAD, EADI, Demgene and Bonn. To adjust for population structure, we computed principal components using the following procedure. From the 146,705 variants used in the principal component analysis of EADB-core39, we extracted from this list, TOPMed imputed variants having an imputation quality ≥0.9 in each dataset, resulting in 91,353 variants. Then, we put any genotype to missing if none of the genotype probabilities was higher than 0.8. Finally, we merged all datasets and removed variants having a proportion of missing genotypes higher than 0.02 leading to 90,471 variants used for the principal component analysis. This analysis was performed by flashPCA2. Analyses were adjusted for the 14 firsts principal components, the genotyping chip and center.
Statistical analyses
The association of AD status with the GRSs was tested with different models, named according to the GRS and covariates included in the logistic regression (see supplementary Table 1 for covariates).
Model GRSALZ: AD ∼ GRSALZ + COV
Model GRSALZadjAPOE: AD ∼ GRSALZ + COV + number of APOE ε2 alleles + number of APOE ε4 alleles
Model GRSALZinclAPOE: AD ∼ GRSALZinclAPOE + COV
Model GRSAPOE: AD ∼ GRSAPOE + COV
Quantile or Percentile analyses
Depending on the value of their GRS, the samples were divided into the reference group or one of the test groups. In the mega-analysis, the reference group corresponded to the 40-60% percentile and was tested across each percentile (0-2%, 2-5%, 5-10%, 10-20%, 20-40%, 60-80%, 80-90%, 95-98%, 98-100%). In the APOE stratified analysis, the reference group was defined by the 40-60% percentile and was tested across quantile (0-20%, 20-40%, 60-80%, 80-100%). In the multi-ancestry analyses, the reference group was defined by the 40-60% percentile and was also tested across quantile (0-20%, 20-40%, 60-80%, 80-100%). These analyses were performed by population and then meta-analyzed per ancestry, using the inverse variance method, implemented in METAL40. To note, Indian, North-Africa and sub-Saharan Africa populations were excluded because of their small sample size.
Model GRSALZ: AD ∼ Group0/1(GRSALZ) + COV
Model GRSALZadjAPOE: AD ∼ Group0/1(GRSALZ) + COV + number of APOE ε2 alleles + number of APOE ε4 alleles
Model GRSALZinclAPOE: AD ∼ Group0/1(GRSALZinclAPOE) + COV
Data Availability
Code availability
Code are available upon requests
Table 1 Association of GRSALZadjAPOE with AD, AD and related dementia (ADRD) and dementia in MVP.
Acknowledgments
We thank the many study participants, researchers, and staff for collecting and contributing to the data and the high-performance computing service at the University of Lille. This work was funded by a grant (European Alzheimer&Dementia DNA BioBank, EADB) from the EU Joint Programme – Neurodegenerative Disease Research (JPND) and the Fondation Recherche Alzheimer. A.N. was supported by Fondation pour la recherche médicale (EQU202003010147). UMR1167 is also funded by the Inserm, Institut Pasteur de Lille, Lille Métropole Communauté Urbaine, and the French government’s LABEX DISTALZ program (development of innovative strategies for a transdisciplinary approach to Alzheimer’s disease). Full consortium acknowledgements and funding are in the Supplementary Note.