Abstract
Due to methodological reasons, the X-chromosome has not been featured in the major genome-wide association studies on Alzheimer’s Disease (AD). To finally address this and better characterize the genetic landscape of AD, we performed an in-depth X-Chromosome-Wide Association Study (XWAS) in 115,841 AD cases or AD proxy cases, including 52,214 clinically-diagnosed AD cases, and 613,671 controls. We considered three approaches to account for the different X-chromosome inactivation (XCI) states in females, i.e. random XCI, skewed XCI, and escape XCI. We did not detect any genome-wide significant signals (P ≤ 5 × 10−8) but identified four X-chromosome-wide significant loci (P ≤ 1.7 × 10−6). Two signals locate in the FRMPD4 and DMD genes, while the two others are more than 300 kb away from the closest protein coding genes NLGN4X and GRIA3. Overall, this XWAS found no common genetic risk factors for AD on the non-pseudoautosomal region of the X-chromosome, but it identified suggestive signals warranting further investigations.
Introduction
Alzheimer’s disease (AD) is a progressive neurodegenerative disease and the most common cause of dementia among the elderly. AD is caused by a combination of modifiable and non-modifiable risk factors, including genetics. Currently, more than 80 genetic loci are associated with AD risk, highlighting several underlying biological mechanisms for AD, including APP metabolism, Tau-mediated toxicity, lipid metabolism or immune-related processes1–6. Greater understanding of the genetics of AD is essential to improve the characterization of the pathophysiological processes involved in the disease. However, although the genetic landscape of AD has been extensively studied on the autosomes, little is known about the association of the X-chromosome gene variants with AD risk. To date, large-scale genome-wide association studies (GWAS) did not include the X-chromosome due to the need of specific analyses to account for its features.
While women carry two copies of the X-chromosome, men are hemizygous, meaning they have one X and one Y chromosome. To maintain balance around allelic dosage between the sexes, X-chromosome inactivation (XCI) occurs in females. This process is where one X chromosome is transcriptionally silenced during female development7,8. The choice of the silenced copy is most often random, but inactivation can also be skewed toward a specific copy. Such XCI ‘skewness’ can be subsequently acquired during life and has been described to increase with age in adults 9–12. Importantly, up to one‐third of X‐chromosome genes ‘escape’ inactivation and are expressed from both X‐chromosomes in female cells. However, these tend to be expressed less from the inactive X-chromosome. Notably, all the genes in the pseudoautosomal region (PAR) 1 of the X-chromosome have Y-chromosome homologues and escape inactivation. Additionally, some genes variably escape inactivation: their expression from the inactive X-chromosome differs between individuals or between cells and tissues within an individual8,13. The inactivation process and the distinction between the PAR and non-PAR regions are thus important considerations when performing an X-chromosome-wide association study (XWAS). For all these reasons, the X-chromosome needs to be treated separately from the autosomes in the quality control (QC), the imputation process and the analysis14,15, and has usually been excluded from GWAS, including for the large-scale AD ones. Yet, the X-chromosome represents about 5% of the genome in terms of size and number of genes (UCSC Genome Browser, https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&chromInfoPage=), and thus the study of AD genetics remains incomplete.
Several X-chromosome genes have been associated with brain imaging phenotypes16,17. Furthermore, the X-chromosome carries, disproportionately for the whole genome, more than 15% of the known genes related to intellectual disabilities18. While genes related to intellectual disabilities are considered to modulate early neurodevelopmental stages well before neurodegenerative processes start, they might impact on the development of cognitive abilities and, potentially, on the establishment of cognitive reserve and brain resilience19. Additionally, XCI escape or skewness might contribute to observed sex differences reported in AD. Women have a higher risk of developing dementia than men: in the 65-69 and 85-89 age groups, the prevalence is 1.5% and 24.9% respectively for women, compared with 1.1% and 16.3% for men20,21. This difference can be partly explained by a greater longevity of women, but other factors may also be involved, such as a selective survival bias in men, socio-environmental factors, or different AD-related biological mechanisms between sexes22. Male and female differences have also been observed for AD and AD-related phenotypes, such as cognitive performance or in the impact of APOE variants on the disease risk or on Tau concentration, and such differences may be explained by XCI escape and skewness8,23–25. Consistent with this, in AD mouse models, having two X-chromosomes was associated with reduced mortality and cognitive impairment. This advantage conferred by a second X-chromosome could partly relate to the KDM6A gene, which escapes inactivation. A variant of the human version of this gene was associated with an increase in this gene’s expression in the brain, and with less cognitive decline in aging and preclinical AD26. Finally, in humans, expression/level of other X-linked genes or proteins are reportedly associated with cognitive change or tau pathology in a sex-specific manner27,28.
To investigate the impact of X-chromosome genetic variants on AD risk, we conducted an in-depth XWAS on 115,841 AD cases or AD proxy cases and 613,671 controls from the IGAP (International Genomics of Alzheimer’s Project), EADB (European Alzheimer & Dementia Biobank), UK Biobank and FinnGen studies (Supplementary Table S1). We considered three approaches to account for the different inactivation states in females, i.e., random XCI (r-XCI), skewed XCI (s-XCI), and escape XCI (e-XCI)15.
Results
A total of 288,320, 276,902 and 263,169 common variants (minor allele frequency or MAF ≥ 1%) were analyzed in the r-XCI, e-XCI and s-XCI approaches, respectively. We observed a minor deviation from expected p-values in the r-XCI and e-XCI models (median genomic inflation factor λ = 1.074 and 1.087, respectively) and a deflation in the s-XCI model (median λ = 0.735), likely related to a lack of power (Supplementary material, Supplementary Figures S1-S3 and Supplementary Table S2). We did not identify any genome-wide significant signals (P ≤ 5 × 10−8) among X-chromosome common variants in any of the models (Figures 1, 2 and 3). However, three loci exhibited signals that were X-chromosome-wide significant (P ≤ 1.7 x 10-6) in the r-XCI approach: Xp22.32, FRMPD4 and Xq25 (Figure 1, Table 1). No X-chromosome-wide significant signal was found in the e-XCI or s-XCI analyses (Figures 2 and 3). As expected, we observed correlated results between the r-XCI and e-XCI meta-analysis results (Supplementary Table S3).
Summary of association analysis results with an X-chromosome-wide significant signal. P values are two-sided raw P values derived from a fixed-effect meta-analysis. CI, confidence interval; OR, odds ratio; MAF, minor allele frequency. aReference single-nucleotide polymorphism (SNP) (rs) number, according to dbSNP build 153, bGRCh38 assembly, cNearest protein-coding gene according to GENCODE release 45, dfrom Tukiainen et al., 201713, eWeighted average MAF across all discovery studies, fApproximate OR calculated with respect to the minor allele.
Manhattan plot of common variants (MAF ≥ 0.01) for the r-XCI approach in a) the meta-analysis including AD-proxy cases, b) the diagnosed AD cases meta-analysis and c) the meta-analysis excluding biobanks. The red and blue lines represent the genome-wide significant threshold (5 x 10-8) and the X-chromosome-wide significant threshold (1.7 x 10-6), respectively. The labels show the closest protein-coding gene (according to GENCODE release 45, https://www.gencodegenes.org/human/releases.html) to the index variant of each X-chromosome-wide significant locus.
Manhattan plot of common variants (MAF ≥ 0.01) for the e-XCI approach in a) the diagnosed AD-cases meta-analysis and c) the meta-analysis excluding biobanks. The red and blue lines represent the genome-wide significant threshold (5 x 10-8) and the X-chromosome-wide significant threshold (1.7 x 10-6), respectively.
Manhattan plot of common variants (MAF ≥ 0.01) for the s-XCI approach meta-analysis, which excludes biobanks. The red and blue lines represent the genome-wide significant threshold (5 x 10-8) and the X-chromosome-wide significant threshold (1.7 x 10-6), respectively.
In more detail, rs4364769 (MAF = 0.12, OR = 1.079 [1.048-1.110], P = 2.55 x 10 -7) was identified as the index variant of the Xp22.32 locus in the r-XCI meta-analysis (Table 1, Supplementary Figure S4). Several sensitivity analyses of this signal were performed, for example by excluding AD-proxy or biobank samples, or by further adjusting the analyses on age or APOE (Online Methods). The odds-ratio estimate of rs4364769 shows some variability across sensitivity analyses but confidence intervals overlap (Supplementary Table S4). The index variant of the Xp22.32 signal is located more than 300kb from the closest protein coding gene, NLGN4X (Neuroligin 4 X-Linked).
The index variant in the FRMPD4 (FERM and PDZ Domain Containing 4) locus was rs5933929 (MAF = 0.38, OR = 0.952 [0.935-0.970], P = 1.98 x 10-7) in the r-XCI meta-analysis (Table 1, Supplementary Figure S5). This variant is located in an intron within some transcripts of FRMPD4. The odds-ratio of rs5933929 was consistent across sensitivity analyses (Supplementary Table S4).
rs191195705 was the index variant in the Xq25 signal in the r-XCI meta-analysis (MAF = 0.11, OR = 0.925 [0.896-0.954], P = 7.09 x 10-7, Table 1). Here the males and the UK Biobank (UKB)-proxy males carried a large part of the observed effect, leading to a lower signal in the sensitivity analyses excluding proxy or biobank cases, or in the female-only compared to the male-only meta-analyses (Supplementary Table S4, Supplementary Figure S6). However, the difference of effect between males and females was not significant (P = 0.51, Online Methods, Supplementary Table S4). rs191195705 is over 500 kb from the closest protein coding gene, GRIA3 (Glutamate Ionotropic Receptor AMPA Type Subunit 3).
To account for potential results that we may have missed because of false negatives related to proxy samples or biobanks, we also performed the r-XCI and e-XCI analyses on the whole X-chromosome excluding these samples (note: samples from biobanks, including proxy, were not included in the s-XCI analysis in the first place, Online Methods). We did not identify any genome-wide significant signals among X-chromosome common variants, in any of the models, nor any X-chromosome-wide significant signal when considering only AD diagnosed cases (Figures 1 and 2, Supplementary Figures S1-S2). However, one X-chromosome-wide significant locus was identified in the r-XCI meta-analysis excluding biobanks (Table 1). The index variant was rs5972406 (MAF = 0.075, OR = 1.143 [1.083-1.207], P = 1.16 x 10-6), located in an intron of the DMD dystrophin gene (Table 1, Supplementary Figure S7).
As the XCI mechanism induces variability across females, one might expect stronger effects in males compared to females; we therefore performed an additional sex-stratified analysis, excluding proxy cases (Online Method), and compared the variant effect sizes in males and females. We did not identify any genome-wide nor X-chromosome-wide significant signals in either the male-only or female-only meta-analyses (Supplementary Figure S8). We also did not observe any genome-wide nor X-chromosome-wide significant difference of effect between males and females for any X-chromosome variants (Supplementary Figure S8).
Discussion
We conducted the most comprehensive XWAS on AD to date, including 115,841 AD or AD-proxy cases and 613,671 controls and using three complementary models to account for the complexity related to this chromosome. Despite not detecting any genome-wide significant signals regardless of the approach used, we identified four X-chromosome-wide significant loci.
The signal in the FRMPD4 locus was consistent across the sensitivity analyses, showing strong robustness, while the other signals in NLGN4X, GRIA3 and DMD showed some variability.
FRMPD4 (FERM and PDZ domain containing 4) is mostly expressed in brain tissues (GTex Portal, https://gtexportal.org/). Through its interaction with other proteins, the FRMPD4 protein is involved in the regulation of the morphogenesis and density of dendritic spines, and in the maintenance of excitatory synaptic transmission29. FRMPD4 is an X-linked intellectual disability gene30 and is associated with low educational attainment31. The associated variant is in an intron within some transcripts of FRMPD4 but is also close to the MSL3 gene, which interacts with KAT8, a reported genetic risk factor for AD2,32,33. In addition, FRMPD4 is an inactivated gene in females, while MSL3 escapes inactivation13.
The signal at the intronic variant within the DMD dystrophin gene decreased when including proxy or biobank cases; further analyses are necessary to determine whether this is due to a falsely inflated signal in the clinically diagnosed samples, or to a less specific diagnosis in the proxy and biobank samples. DMD is inactivated in females13, and mutations in the gene can cause Duchenne muscular dystrophy. Some patients suffering from this disease can exhibit cognitive impairment, and a shift towards amyloidogenesis in memory-specific brain regions was found in mice mutated in the DMD gene (mdx mouse) compared to wild-type mice34. Additionally, the DMD rs5927116 variant was reportedly associated with the volume of entorhinal cortex in a small sample (N = 792); however, this signal is 1.4 Mb away from our AD signal and the variants are independent (LD measured by r2 < 0.2)35.
Identifying putative causal genes in the two other loci, Xp22.32 and Xq25, is more challenging, as the index variants are located more than 300 kb away from the closest protein coding gene, NLGN4X and GRIA3, respectively. Additionally, those variants are not eQTL/sQTL for any gene according to GTeX Portal. Expression of the GRIA3 gene in the dorsolateral prefrontal cortex is reportedly associated with cognitive change in women during aging and AD27. However, the rs191195705 index variant of the Xq25 signal is associated with AD risk mainly in males in our analyses (Supplementary Table S4). Regarding the Xp22.32 locus, the rs5916169 variant, located at 127 kb from our index variant, is associated with functional connectivity16. However, this variant is not in LD (r2 = 0.005) with the AD index variant.
Although this study represents the largest XWAS for AD risk to date, we did not find any genome-wide-significant genetic association with AD risk among X-chromosome common variants. Technical or analytical reasons can partly explain this result, such as: 1) overall lower variant density, 2) lower coverage by genotyping platforms, 3) lower call rate of variants, 4) lower imputation quality, or 5) a lower effective sample size in males on the X-chromosome compared to the autosomes36. However, it is also possible that fewer genome-wide significant associations of X-chromosome loci with AD risk exist than on autosomes due to a lower density of functional variants on the X-chromosome. Indeed, Gorlov et al., 202336 found a lower density of variants in both exonic and intronic regions on the X-chromosome compared to autosomes, which they link to a stronger selection against X-chromosome mutations.
In conclusion, this XWAS found no common genetic risk factor for AD on the non-pseudoautosomal region of the X-chromosome but identified suggestive signals with moderate impact on AD risk, which warrant further investigations. In particular, future analyses of sequencing data will help to address some of the technical issues described above, and will further allow to study the impact of X-chromosome rare variants or structural variants on AD risk.
Online Methods
1) Samples
The XWAS is based on 115,841 AD or AD-proxy cases (58% females) and 613,671 controls (55% females) of European ancestry from 35 case-control studies, 2 family studies (LOAD and FHS), and 2 biobanks (UKB and FinnGen) (Supplementary material and Supplementary Table S1). 55,868 of the 115,841 cases were AD-proxy cases. Females were considered as AD-proxy cases if they indicated having at least one parent with dementia37. For males, only the mother’s status was used to define the proxy status (Supplementary material).
In a sensitivity analysis including only the diagnosed AD cases, a total of 63,838 AD-cases (59% females) and 806,335 controls (55% females) was considered (Supplementary Table S1).
In addition to the classical autosomal QC, an X-chromosome specific QC was performed prior to imputation for each study (Supplementary material and Supplementary Table S5). We did not analyze the PAR regions due to a lack of variants on most genotyping chips. Related individuals were excluded from UKB samples but were kept in FinnGen, where related individuals’ exclusion accounts for about 40% of the sample size38.
Thirty-four studies were imputed with the TOPMed panel (N = 112,690) and three studies were imputed with the 1000 Genomes panel (March 2012) (FHS, CHS and RS, N = 10,102, Supplementary Table S5). The FinnGen was imputed with a Finnish reference panel and the UKB with a combination of 1000 Genomes, HRC and UK10K panels.
2) Main analyses
a. Association tests
Since random X-chromosome inactivation is the most frequent case, we considered the r-XCI approach for our main analysis and the s-XCI and e-XCI approaches for secondary analyses. The approaches are described briefly below, while additional details are provided in the Supplementary material. For all the models, the analyses were adjusted on the principal components (PCs) and/or the genotyping center if necessary (Supplementary Table S5). Dosage or genotype probabilities were used for all studies but FinnGen, where best guessed genotypes were considered (Supplementary material).
r-XCI approach
The r-XCI approach is equivalent to an additive genetic model, where males are considered as homozygous females. Males’ and females’ genotypes were thus coded: genotype (G) = {0, 2} and G = {0, 1, 2} respectively. The association test was performed for each study in men and women jointly using an additive logistic regression model for case-control studies, a generalized estimating equation (GEE) model for family studies and a logistic mixed model for biobanks. To account for differences in genotypic variance between sexes, we considered a robust estimate of the variance for case-control studies39,40 and an adjustment on sex for family studies and biobanks (Supplementary Table S6). The association test on proxy status in UKB was performed separately for males and females, and a correction factor of 2 was applied to the association statistics (effect sizes and standard errors) of the female-only model (Supplementary material)37,41. The results were then combined across studies in a fixed effect meta-analysis with an inverse-variance weighted approach with METAL42.
e-XCI approach
Under the e-XCI hypothesis, males’ and females’ genotypes were coded G = {0, 1} and G = {0, 1, 2} respectively. Variant effects were estimated separately in females and in males, except in FinnGen, where the variant effects were estimated directly in both males and females combined with an adjustment on sex (Supplementary Table S6). Results were then combined across studies, males and females with a fixed effect meta-analysis, inverse variance weighted approach using METAL. We did not include AD-proxy in the e-XCI meta-analysis. As males and females are related in family studies, only female results from LOAD and FHS were included in the meta-analysis. The sex-stratified models were adjusted on PCs and/or the genotyping center only, except for two ADGC studies (PFIZER and TGEN2) and the CHARGE studies (FHS, RS and CHS), where models were additionally adjusted on age (Supplementary Table S7 and Supplementary material).
s-XCI approach
For the skewed XCI approach, males’ and females’ genotypes were coded G = {0, 2} and G = {0, 1, 2} respectively. A general genotypic model, including both an additive and a dominance variable, was estimated in females from case-control studies to account for non-random inactivation through the dominance variable, which equals 1 in female heterozygotes, and 0 otherwise. The χ2 test of the dominance effect was then added to the χ2 test of the additive effect estimated under r-XCI, which results in a two degree of freedom (df) test of the association of the variant with AD risk including its potential skewedness40,43 (Supplementary Table S6). We did not include family studies and biobanks in the s-XCI approach.
While analyses and QC of the results (see below) were performed with the coding scheme described above, odds-ratio and confidence intervals are provided on the real XCI scale, i.e G = {0, 1} for males and G = {0, 0.5, 1} for females under r-XCI and s-XCI, but G = {0, 1} for males and G = {0, 1, 2} for females under e-XCI (Supplementary Table S6).
Sex-stratified analyses
We additionally performed a sex-stratified analysis per study, and we combined the results across studies in males and females separately with a fixed effect meta-analysis and inverse-variance weighted approach using METAL42. Proxy cases were not included in this analysis. We then compared the variant effect sizes of males and females with a Wald test (Supplementary material).
b. Quality control of the results and definition of associated loci
A QC of the results was carried out for all the studies. We filtered out variants with at least one missing datum (on effect, standard error, or p-value), an absolute effect size greater than 5, or an imputation quality less than 0.3. We also filtered out the variants whose effective allele count (product of the imputation quality and the expected minimum minor allele count between the cases and the controls) was less than 5, and less than 10 for LOAD 44. For datasets imputed with 1000G and the UKB, we excluded variants for which the conversion of position or alleles from GRCh37 to GRCh38 was not possible or problematic, and variants with a difference in frequency > 0.5 compared with the reference panels TOPMed or 1000G.
After the meta-analysis, we filtered rare variants (MAF < 1%), the variants analyzed in less than 40% of AD cases (considering the effective sample size of females UKB-proxy, which is the raw sample size divided by four37), variants with heterogeneity p-value < 5 x 10-8 and variants where the difference between the maximum frequency and the minimum frequency across studies was higher than 0.4.
Inflation of the test statistics was checked in each study and in the meta-analysis by computing a genomic inflation factor lambda with the median approach implemented in the GenABEL 1.8-0 R package45, on common variants in low LD (r2 < 0.2) (Supplementary material). A signal was considered genome-wide or X-chromosome-wide significant in either approach if associated with AD risk with P ≤ 5 × 10−8 or P ≤ 1.7 x 10-6. This X-chromosome wide threshold is based on R = 2.93%, the relative number of tests performed on the X-chromosome (n = 257,766) versus on the autosomes (n = 8,525,514) in the EADB-core study, the largest dataset imputed with the TOPMed reference panel. As the genome-wide threshold of 5x10-8 corresponds to the Bonferroni correction for one million tests, we computed the corresponding threshold for the X-chromosome as 0.05 / (R*1,000,000) = 1.7 x 10-6.
Several sensitivity analyses of the signals were performed. Sensitivity analyses excluding AD-proxy or biobank samples were performed for the r-XCI and e-XCI meta-analyses (samples from biobanks, including proxy, were not included in the s-XCI analysis in the first place). Additionally, for the r-XCI signals, an analysis adjusted on sex, without robust variance, was performed. The results were obtained by meta-analyzing the sex-stratified models for all case-control studies and UKB, and a sex-combined model adjusted on sex for FinnGen, with males coded as homozygous females for all models (family studies were excluded) (Supplementary material, Supplementary Table S7). Sensitivity analyses including an adjustment on age and the number of APOEε4 and APOEε2 alleles were also performed for all signals. Results were obtained from the meta-analysis of adjusted sex-stratified models with the adequate coding of males and excluding family studies. Finally, a sensitivity analysis was performed using a stricter imputation quality filter (r2 > 0.8).
Data availability
Summary statistics will be made available upon publication through the European Bioinformatics Institute GWAS Catalog (https://www.ebi.ac.uk/gwas/).
Acknowledgments
EADB: This study was supported by grants from the Fondation pour la Recherche sur Alzheimer (convention 2022-A-01 and cluster grant), and the JPco-fuND-2 ‘Multinational research projects on Personalized Medicine for Neurodegenerative Diseases’ PREADAPT project (ANR-19-JPW2-0004). We thank the many study participants, researchers and staff for collecting and contributing to the data, the high-performance computing service at the University of Lille and the staff at CEA-CNRGH for their help with sample preparation and genotyping and excellent technical assistance. We thank Antonio Pardinas for his help. We thank the Netherlands Brain Bank. This research was conducted using the UKBB resource (application number 61054). This work was funded by a grant (EADB) from the EU Joint Programme – Neurodegenerative Disease Research. Inserm UMR1167 is also funded by the Inserm, Institut Pasteur de Lille, Lille Métropole Communauté Urbaine and French government’s LABEX DISTALZ program (Development of Innovative Strategies for a Transdisciplinary Approach to ALZheimer’s disease). This work was also supported by the Research Council of Finland grants 338182 and 334802, the Sigrid Jusélius Foundation, and the Strategic Neuroscience Funding of the University of Eastern Finland.
Full consortium acknowledgements and funding are in the Supplementary Note.