Abstract
Genome-wide association studies (GWAS) have identified more than 75 genetic risk loci for Alzheimer’s Disease (AD), however for a substantial portion of these loci the genetic variants or genes directly involved in AD risk remain to be found. A GWAS locus defined by the index SNP rs1476679 in ZCWPW1 is one of the largest AD loci as the association signal spans 56 potential risk genes. The three most compelling candidate genes in this locus are ZCWPW1, PILRA and PILRB, based on genetic, transcriptomic, and proteomic evidence. We performed amplicon-based target enrichment and next-generation sequencing of the exons, exon-intron boundaries, and UTRs of ZCWPW1, PILRA and PILRB on an Illumina MiSeq platform in 1048 Flanders-Belgian late-onset AD patients and 1037 matched healthy controls. Along with the single-marker association testing, the combined effect of Sanger-validated rare variants was evaluated in SKAT-O. No common variants (n = 40) were associated with AD. We identified 20 validated deleterious rare variants (MAF < 1%, CADD score ≥ 20), 14 of which in ZCWPW1. This included 4 predicted loss-of-function (LoF) mutations that were exclusively found in patients (P = 0.011). Haplotype sharing analysis revealed distant common ancestors for two LoF mutations. Single-molecule long-read Nanopore sequencing analysis unveiled that all LoF mutations are phased with the risk haplotype in the locus. Our results support the recent report for the role of ultra-rare LoF ZCWPW1 variants in AD and suggest a potential risk mechanism for AD through ZCWPW1 haploinsufficiency.
Introduction
Alzheimer’s disease (AD) is a neurodegenerative brain disorder that is the most common cause of dementia1. AD is a complex disease with a high heritability estimate2; over the past decade more than 75 genetic susceptibility loci have been identified in genome-wide association studies (GWAS)3–10. One of these risk loci is denoted as the Zinc finger CW-type and PWWP domain containing 1 (ZCWPW1) locus, because the discovery lead SNP rs1476679 was located intronic in ZCWPW13. The ZCWPW1 locus is among the 10 most significant genetic risk loci for AD, but is also one of the most complex loci, as the association signal spans 620 Kb, comprising 56 genes of which 29 are protein-coding, including nearby genes harboring other genome-wide significant SNPs, such as the Paired Immunoglobulin-Like Type 2 Receptor Alpha (PILRA) and Beta (PILRB) genes (Figure 1). Four variants across the locus were reported as lead variant with similar effect size and direction in seven subsequent large-scale AD GWAS: (i) the discovery lead SNP rs14766794, (ii) exonic common variant rs18597885,7,8 in PILRA (p.Gly78Arg), (iii) 3’ UTR variant rs125391726 in Neuronal Tyrosine Phosphorylated Phosphoinositide-3-Kinase Adaptor 1 (NYAP1), and (iv) intronic variant rs73848789,10 in PMS2P1 (PMS1 Homolog 2, Mismatch Repair System Component Pseudogene 1) which is a pseudogene located between SPDYE3 (Speedy/RINGO Cell Cycle Regulator Family Member E3) and PILRB (Figure 1). This phenomenon, caused by linkage disequilibrium (LD) and relatively high gene density in the locus, complicates the interpretation of this association signal in terms of mechanism of action in Alzheimer’s disease. Subsequent whole exome and whole genome sequencing studies, expression quantitative trait loci (eQTL) based studies, and functional studies have proposed several different candidate genes in the locus.
First, a UK whole exome sequencing (WES) project observed an increased burden of PILRA variants in AD patients as their top-associated hit, suggesting a direct role of PILRA in AD pathogenesis11. In contrast, one of the largest WES studies on AD reported to date pinpointed significant enrichment of ultra-rare (allele frequency < 0.01%) ZCWPW1 predicted loss-of-function (LoF) variants in AD patients12, and additionally, a nominally significant association of genic rare (allele frequency < 1%) variants within ZCWPW1 was reported in a recent large-scale WGS study on AD13. Second, the discovery lead SNP in the locus, rs1476679, has strong regulatory potential, with the top RegulomeDB14 rank score among the lead SNPs of the significant loci from all published AD GWAS, and two independent studies observed that rs1476679 is an eQTL for PILRB15,16. However, annotation with other eQTL catalogues (GTEx17, Brain xQTL Serve18, eQTLGen19, and numerous datasets analyzed uniformly by eQTL Catalogue database20) of AD-relevant tissues and cell types reveals that rs1476679 is also a significant eQTL for other genes in the locus, including ZCWPW1 and PILRA. Furthermore, the GWAS association signal in the locus was found to be highly colocalized with lipopolysaccharide-stimulated monocyte eQTL signals for ZCWPW1 and PILRA8. Moreover, it was suggested that rs1476679 might be controlling the expression of nearby genes via CTCF-mediated chromatin loops21. Third, a ligand binding assay based functional study proposes that the lead SNP rs1859788, encoding the amino acid substitution p.Gly78Arg in PILRA, might be a protective variant in the locus as it potentially reduces inhibitory signaling in microglia22.
The literature on the function of PILRB and PILRA proteins indicates that they operate as activating and inhibitory receptors (respectively) for the regulation of the immune response. PILRA contains an immunoreceptor tyrosine-based inhibitory motif (ITIM) in its cytoplasmic tail that recruits phosphatases such as PTPN6/SHP-1, meanwhile PILRB is thought to act through pairing with immunoreceptor tyrosine-based activation motif (ITAM)-containing proteins such as TYROBP/DAP1223. On the other hand, until recently, the function of ZCWPW1 protein was largely unknown, and a role in epigenetic regulation as a histone modification reader (especially for H3K4me3) was proposed based on the structure and biochemical information of its zf-CW domain24. Recently, ZCWPW1 function was studied for the first time in consecutive studies which showed that it is required during meiosis prophase I in a sex-dependent manner in mice25 and it is essential for repairing of double strand breaks (DSBs) marked by histone modification writer PRDM9 during meiosis26–28.
Taken together, it is not clear yet how this locus contributes to AD; however, the current literature prioritizes three genes in the locus: ZCWPW1, PILRA and PILRB. To this end, in this study we aimed to investigate genetic variation in the locus by sequencing UTRs, exons and exon-intron boundaries of these three genes in our well-characterized case-control cohort.
Materials and Methods
Study Cohort
The study cohort consists of 2085 Flanders-Belgian individuals. Patients were ascertained at the Memory and Neurology Clinics of the BELNEU consortium. The patient group consisted of 1048 late-onset AD cases, of whom 65.4% female, with a mean onset age of 78.8 ± 5.7 years. Possible, probable, and definite AD diagnosis was based on NINCDS-ADRA29 and/or NIA-AA30,31 criteria. Control individuals were recruited from partners of patients, or were volunteers from the Belgian community, and included 1037 age-matched healthy individuals (64.8% female and mean inclusion age of 70.2 ± 9.1 years). All control individuals scored >25 on the Montreal Cognitive Assessment (MoCA)32 test, and were negative for subjective memory complaints, neurological or psychiatric antecedents, and family history of neurodegeneration. Both patient and control group individuals originated from the same geographical area (Flanders-Belgium), with no evidence of population stratification (Supplementary Figure 1). All participants and/or their legal guardian signed written informed consent forms before inclusion. The study protocols were approved by the ethics committees of the Antwerp University Hospital and the University of Antwerp, and the ethics committees of the participating neurological centers of the BELNEU consortium.
Targeted Resequencing
For sequencing of regions of interest, we used genomic DNA (gDNA) samples that were amplified using Illustra GenomiPhi v2 kit (Thermo Fisher, Waltham, MA, USA). We targeted UTRs, exon-intron boundaries (20 bp long) and exons of all protein-coding transcripts of PILRB, PILRA and ZCWPW1 genes based on GENCODE v19. Total length of targeted regions was 9185 bp. A custom-designed multiplex target enrichment assay (Agilent, Santa Clara, CA, USA) was prepared and optimized prior to sequencing. We amplified 47 amplicons targeting 43 exons of interest using primers with universal adapter sequences (primer sequences available upon request). Prepared amplicon libraries were then barcoded for each sample with Nextera XT sequences (Illumina, San Diego, CA, USA) targeting previously incorporated adapters. Indexed libraries were 2 × 300 bp paired-end (MiSeq Reagent Kit v3) sequenced on a MiSeq platform (Illumina, San Diego, CA, USA) in a total of 5 runs. Two amplicons (PILRB exon 5 and ZCWPW1 exon 14) that did not achieve sufficient coverage were additionally sequenced by Sanger sequencing, consisting of PCR amplification, followed by dideoxy-termination with BigDye termination cycle sequencing kit v3.1 (Thermo Fisher) and sequencing with ABI3730 DNA Analyzer (Thermo Fisher). Resulting chromatograms were analyzed independently and blindly by two investigators using novoSNP33 and Seqman (DNAstar, Madison, WI, USA). All deleterious rare variants (CADD score ≥ 20 and MAF < 1%) and other rare variants of interest were validated using Sanger sequencing on gDNA of all carriers, as described above.
Bioinformatic Processing
Adapter clipping of demultiplexed FASTQs was performed with fastq-mcf. Reads were aligned to the hg19 reference genome with Burrows-Wheeler Aligner MEMv0.7.534. We merged duplicate sample aligned reads belonging to the same individual using SAMtools35 (v1.6) merge. Sequencing coverage was analyzed using SAMtools depth and visualized with Circos36 plot (Supplementary Figure 2).
Variants were called with GATK 4.1.437 HaplotypeCaller. Multiallelic variant sites were split into biallelic records and raw indels were left-aligned and normalized using BCFtools (v1.9) norm. Variants were annotated with dbSNP build 151. Variant quality control was done by using standard hard filtering parameters according to the GATK Best Practices38,39. Additionally, with VCFTools40 (v0.1.16), we applied genotype level QC by assigning individual genotypes with genotype quality (GQ) < 20 and depth (DP) < 8 as missing. Moreover, using vcffilterjdk741 with a custom filtering java code (available upon request), we also assigned genotypes as missing if heterozygous genotypes have allele depth ratios exceeding 1:3 ratio, and if homozygous genotypes do not have allele depth ratios larger than 9:1 or smaller than 1:9 ratios. Finally, we removed variants that were missing in more than 20% of the study cohort, deviated from Hardy-Weinberg Equilibrium (HWE P < 10−6 in controls) or differentially missing between cases and controls (P < 10−6). Variants were annotated with ANNOVAR42, SnpEff43 and CADD v1.644. We used the latest Ensembl release (104) to select the canonical transcripts of PILRB (ENST00000609309.2), PILRA (ENST00000198536.7), and ZCWPW1 (ENST00000684423.1), which were used as reference transcripts in this study to report the predicted functional consequences of the identified variants. Variant frequencies were queried in the most recent versions of the Genome Aggregation Database45 (gnomAD, v2.1.1), the Exome Aggregation Consortium46 (ExAC, v1) and Healthy Exomes47 (HEX, v1) browsers.
Genotyping Array Data
Genotyping array data were available for n=1803 samples included in this study through the framework of European DNA Biobank (EADB) project. All pre- and post-genotyping sample and variant quality control procedures were described previously9. These data were used for (i) investigating the genetic ancestry of the included samples to control for population stratification, (ii) refining possible haplotypes of predicted LoF mutation carriers, (iii) investigating pairwise identity by descent (IBD) estimates between predicted deleterious mutation carriers both in a genome-wide (for checking cryptic relatedness) and locus specific manner, and (iv) obtaining genotypes for intronic rs34919929 variant that is in LD with GWAS discovery lead SNP rs1476679 and more proximal to ZCWPW1 LoF variants of interest for Nanopore sequencing experiments. Both genetic principal components and pairwise IBD estimates were calculated in PLINK on quality-controlled, common (MAF ≥ 1%), non-missing (variant missingness ≤ 0.02), and LD-pruned (PLINK parameters: “-- indep-pairwise 500kb 1 0.2”) variants that were out of the long-range LD loci that are likely to confound genomic scans: LCT (2q21), HLA (including MHC), 8p23 and 17q21.31 inversions, and 24 other long-range LD regions48. The genetic principal components were analyzed in a PCA plot together with the n=2504 individuals from the 1000 Genomes (1KG) Project49.
Statistical Analyses
Single marker association testing of the variants in the targeted regions was performed in PLINK50 (v1.9), using a χ2 allelic test to compare allele frequencies of rare variants (MAF < 1%) between patients and controls, and a logistic regression model for common variants (MAF ≥ 1%) to investigate the additive effect of the minor allele on Alzheimer’s disease status, adjusted by sex and age at onset (for patients) or age at inclusion (for control individuals) covariates. Odds ratios are reported for the minor allele with 95% confidence intervals (CI). Statistical significance thresholds are adjusted for multiple testing by Bonferroni-correction.
For gene-based rare variant (MAF < 1%) association analysis, the rare variants were classified into two groups based on their predicted functional consequences on the canonical transcripts of the genes as protein-altering (i.e. missense, nonsense, frameshift, and splice site disrupting) and predicted LoF (i.e. nonsense, frameshift, and splice site disrupting) variants and collapsed per gene. Gene-based optimized sequence kernel association test (SKAT-O)51 was used (“SKATBinary” function with ‘method = “SKATO”’ option, available in R package SKAT v1.3.2.1) to assess the combined effect of these rare variants on Alzheimer’s disease.
Haplotype Sharing Analysis
A short tandem repeat (STR) panel was designed to investigate the possible haplotypes shared by non-singleton ZCWPW1 predicted LoF mutation carriers. We preselected STRs described in Marshfield catalogue52 based on the following criteria: (i) at least 70% heterozygosity, (ii) proximity to locus of interest up to 5 million base-pairs, (iii) non-overlapping STR with different sizes. Eleven Marshfield STRs following these criteria and spanning 8.9 Mb and 5.33 cM were selected; six upstream STRs: D7S2431, D7S1796, D7S554, D7S651, D7S647, D7S2480, and five downstream STRs: D7S477, D7S666, D7S2448, D7S2504, D7S2494. The primers targeting these STRs were designed to contain HEX or FAM fluorescent labels, and the reactions were run in two-plex format. STR lengths were scored independently and blindly by two investigators using LGV (v3.01) in-house software. A subset of Flanders-Belgian control individuals (n = 172) were also genotyped with the same STR panel to estimate the frequencies of STR lengths in the study population.
Nanopore Sequencing
Single-molecule long-read gDNA sequencing was performed on a MinION platform (Oxford Nanopore Technologies (ONT), Oxford, UK) to investigate phase information of the risk haplotype with ZCWPW1 predicted LoF mutations. We chose rs34919929 for phasing with predicted LoF variants in the locus as rs34919929 is in linkage disequilibrium with GWAS index SNP rs1476679 (r2=0.98 in European population) and more proximal to ZCWPW1 LoF variants of interest, allowing the design of shorter amplicons. We designed primers (Supplementary Table 1) with ONT adapter using Primer353 (v4.0.1) to generate two amplicons spanning rs34919929 and ZCWPW1 LoF mutations. The first amplicon (∼2.2 Kb) contained sites for rs34919929 and LoF mutation rs774275324. The second amplicon (∼6.5 Kb) contained sites for rs34919929, and LoF mutations rs774275324 and rs1180932049. The long amplicon spanned the short amplicon, however we decided to include both amplicons in the experiment to make use of the relatively better amplification efficacy of the short amplicon. Two amplicons were generated for two rs774275324 carriers that are heterozygous for the risk haplotype, two rs1180932049 carriers that are heterozygous for the risk haplotype and one rs1180932049 carrier that is homozygous for risk haplotype (as positive control). Amplicons generated for another three control samples without LoF mutations and with homozygous reference, heterozygous and homozygous alternative genotypes for rs34919929 variant were included in the experiment as negative controls. All PCR amplifications were performed with 35 cycles using KAPA LongRange HotStart PCR Kit KK3502 (KAPA Biosystems, MA, USA). After the reactions, excess primers and nucleotides were cleaned with ExoSAP-IT (Thermo Fisher).
Amplicons then were barcoded with the PCR Barcoding Expansion 1-96 kit (Oxford Nanopore Technologies) using 20 amplification cycles on a 1/200 diluted template. After purification with Agencourt AMPure XP beads (Beckman Coulter, Brea, CA, USA) and concentration measurement with Qubit (Thermo Fisher), amplicons were pooled equimolarly. The sequencing library was prepared as previously described54. SQK-LSK109 chemistry and FLO-MIN106 flow cell were used for sequencing.
Base calling of the raw reads was performed with ONT basecaller Guppy (v3.0.3) on the Promethion compute device. After demultiplexing of basecalled FASTQ reads with qcat (v1.0.1), alignment of demultiplexed reads to hg19 reference genome was performed with minimap255 (v2.17). Variants were called with bcftools mpileup. Aligned reads were visualized in Integrative Genomics Viewer56 (IGV; v2.8.0) and phasing of variants of interest was performed using WhatsHap57 (v0.18).
Results
Single Marker Association
A total of 292 quality-controlled variants were identified in the targeted regions, of which 40 were common (MAF ≥ 1%). Chi-squared allelic association and logistic regression tests showed no association of any variants after Bonferroni correction. The GWAS index variants in the locus did show similar size and direction of effects to the original GWAS reports (Supplementary Table 2).
Four rare variants were nominally associated with Alzheimer’s disease (Supplementary Table 3). Of those, rs531189257 was exclusively found in 10 patients (MAF in AD patients = 0.53%). rs531189257 is a rare (gnomAD NFE MAF = 0.44%) intronic variant in the canonical PILRA transcript, and a 3’ UTR variant on the short transcript of PILRA (ENST00000484934.1).
Deleterious Rare Variants and Gene-based Rare Variant Association
We then sought to investigate the rare variants (MAF < 1%) that are predicted to be deleterious (with either CADD ≥ 20 or “high impact” annotation by SnpEff), which includes mostly predicted LoF variants and high impact missense variants. We identified 20 validated deleterious variants (Table 1 and Figure 2).
Three deleterious rare variants are located in PILRB, three in PILRA and fourteen in ZCWPW1. These include one novel frameshift mutation (p.Arg111fs) in PILRB, one novel stop-gain (p.Glu458*) and one novel missense (p.Thr4Met) mutation in ZCWPW1. All deleterious variants have a CADD score above 20, with the exception for PILRA p.Met1? start-loss mutation. As a result of the PILRB p.Arg111fs frameshift mutation (identified in a patient), a premature stop codon is introduced on the last exon of PILRB (at amino acid position 219), thus nonsense mediated decay (NMD) is not expected for this transcript. However, PILRB function of the mutated product is most likely impaired due to loss of nearly half of the canonical amino acid sequence that translates into topological and functional domains. PILRA seemed to be more tolerant against predicted LoF variants, as both variants were observed more in controls than in patients. On the other hand, we observed that predicted LoF variants (n=4) in ZCWPW1 were exclusive to seven patients. These validated ultra-rare variants (gnomAD MAF < 0.01%) consisted of one novel stop-gain mutation and three splice site mutations. The stop-gain mutation, p.Glu458* that occurs at the 458th amino acid of the canonical ZCWPW1 isoform whose total length is 649 amino acids, is predicted to result in haploinsufficiency through NMD. Furthermore, we identified two splice donor mutations (ZCWPW1 c.631+1G>T and c.-30+1G>A) and one splice acceptor mutation (ZCWPW1 c.-29-1G>A) in five patients. Interestingly, one predicted LoF mutation carrier (ZCWPW1 c.-30+1G>A) also carried two predicted deleterious rare missense mutations (ZCWPW1 p.Glu105Gly and ZCWPW1 p.Glu95Lys, occurring in cis according to IGV investigation of sequencing reads as shown in the Supplementary Figure 4) within the coiled coil structure motif of the protein.
We investigated the combined effect of rare variants per gene using two classes of mutation: protein-altering and predicted LoF, revealing significant (P = 0.011) enrichment for ZCWPW1 predicted LoF rare variants (Table 2). We did not observe any significant associations for protein-altering variants with Alzheimer’s disease in any of the genes.
ZCWPW1 predicted LoF carriers were all clinically diagnosed with probable or possible AD. Only carriers of the c.-29-1G>A mutation were diagnosed with additional features (frontal deficits (n=2), evidence of a vascular component (n=1)). The onset ages were rather high (mean onset age 77.7 ± 5.87 years), with exception of one carrier (69 years) who was also APOE ε4 homozygous. Otherwise, we did not observe a correlation between onset age and APOE genotype. Two of the three ZCWPW1 c.-29-1G>A carriers had a familial history of the disease in the first degree. We did not have the biomaterials of these relatives to explore possible segregation of the mutation.
Haplotype Sharing Analysis
Because we observed two of these ultra-rare predicted LoF variants in multiple patients, we investigated allele sharing in the ZCWPW1 susceptibility locus in these carriers (LOAD-P2 and LOAD P3 for ZCWPW1 c.631+1G>T and LOAD-P4, LOAD-P5 & LOAD-P6 for ZCWPW1 c.-29-1G>A, see Supplementary Figure 3).
STR-based haplotype analyses revealed that ZCWPW1 c.631+1G>T carriers shared alleles at eight consecutive STR markers, spanning a region of 5.5 to 8.3 Mb. The three carriers of ZCWPW1 c.-29-1G>A shared alleles at six consecutive STR markers spanning a region of 4 to 5.5 Mb. Frequencies of these STR lengths are varying between 18.8% and 34% in the Flanders-Belgian population. LOAD-P4 and LOAD-P5 shared alleles at three additional distal markers, LOAD-P5 and LOAD-P6 shared alleles at three additional distal markers, and LOAD-P4 and LOAD-P6 shared additional alleles at five distal markers. Moreover, we used available genotyping array data of ZCWPW1 predicted LoF carriers (excluding LOAD-P5 for whom these data were not available) to further refine these possible haplotypes constructed by STR markers. Comparison of n=1789 quality-controlled common variants between D7S2431 and D7S2494 revealed that the common haplotype between ZCWPW1 c.631+1G>T carriers is limited by rs6970990 upstream and rs727708 downstream (the most proximal segments where identity by state [IBS] is not observed), spanning a region of 3.5 to 5.4 Mb. Similarly, the common SNPs rs1488514 and rs10245317 adjusted possible common haplotype length between LOAD-P4 and LOAD-P6 to a region of 6.9 to 7.9 Mb. All common SNPs in the highlighted haplotypes (Supplementary Figure 3) were shared by ZCWPW1 predicted LoF carriers, supporting STR genotyping results.
Furthermore, we investigated pairwise identity-by-descent (IBD) between six ZCWPW1 predicted LoF carriers based on whole-genome and locus-specific (limited to possible haplotypes shown in Supplementary Figure 3) SNP genotype data. Genome-wide pairwise IBD estimates (Supplementary Table 4) confirmed that these carriers are not close relatives (the highest pairwise IBD estimate PI-HAT = 0.015 between LOAD-P4 and LOAD-P3). However, when only locus-specific LD-pruned high-quality common variants are considered, we found strongly increased IBD proportions for carriers of the same mutation (LOAD-P2 – LOAD-P3 PI-HAT = 0.65 and LOAD-P4 – LOAD-P6 PI-HAT = 0.5), indicative of ancestral DNA segments (Supplementary Table 4).
Nanopore Sequencing
We further investigated the possible relationship between identified ZCWPW1 predicted LoF variants and the risk haplotype in the locus defined by genome-wide associated variant(s) in the ZCWPW1 locus. We observed that 3 out of 7 ZCWPW1 LoF variant carriers (LOAD-P1, LOAD-P5, LOAD-P7) have homozygous genotype for the risk allele of the index variant rs1476679 (TT), therefore these LoF variants are on the risk haplotype. In order to assess the phasing of the risk haplotype (using the proxy SNP rs34919929) and the LoF variants of the other four carriers, we performed single-molecule long-read sequencing as shown in Figure 2.
WhatsHap phasing (Supplementary Table 5) of these single molecule long-reads revealed that all LoF variants are phased with the risk haplotype in ZCWPW1 locus. We confirmed these results by viewing WhatsHap-processed haplotype-tagged aligned reads on IGV (Supplementary Figures 5-7).
Discussion
ZCWPW1, PILRA and PILRB have all been proposed as functional candidate genes in the ZCWPW1 risk locus for Alzheimer’s disease. Here we report a targeted-resequencing study covering the UTRs, exons and splice sites of PILRB, PILRA and ZCWPW1 in an AD case-control cohort of Flanders-Belgian origin.
The most significant result of this study was identification of ultra-rare ZCWPW1 predicted LoF variants that were exclusively found in patients. Despite their low expected allele frequency based on gnomAD non-Finnish European (NFE) genotype frequency data, ZCWPW1 c.631+1G>T and ZCWPW1 c.-29-1G>A mutations were strongly enriched in our patient cohort (ORs for increased frequency compared to gnomAD NFE frequency were calculated as 20.56 [95% CI = 1.71-179.74] and 30.84 [95% CI = 4.13-230.11] respectively). Increased allele sharing in the locus between carriers of these mutations is compatible with a shared common ancestor, indicative of a founder effect in the Flanders-Belgian population. Long range haplotype phasing further demonstrated that all predicted LoF variants were located on the ZCWPW1 risk haplotype defined by rs1476679 risk-increasing allele T, although it should be noted that this is the more common allele (gnomAD NFE allele frequency 70%).
Our findings are in line with a recent report from one of the largest WES study to date12, reporting enrichment of ultra-rare ZCWPW1 predicted LoF variants in patients compared to controls (3.42 times enriched in patients, P = 0.02). In fact, this was the second most significant result among 24 AD-implicated genes after enrichment of ultra-rare SORL1 predicted LoF variants in patients.
The significant enrichment identified for the predicted ZCWPW1 LoF mutations in our cohort is mainly driven by three splice site mutations. ZCWPW1 c.631+1G>T mutation occurs at the splice donor site of ZCWPW1 exon 7, and a potential skipping event of exon 7 would disrupt the open reading frame (ORF) and introduce a premature stop codon in exon 8 (canonical amino acid position 221), meanwhile potential intron retention of intron 7 would also introduce a premature stop codon, thus both scenarios would lead to NMD of the mutated transcript. The other two splice site variants are in the 5’ UTR of ZCWPW1 and occur in the intron 2 acceptor site (c.-29-1G>A) and donor site (c.-30+1G>A). Skipping exon 3 would cause exclusion of canonical start codon of ZCWPW1, and the earliest downstream ATG site on the same ORF could only produce a truncated protein lacking the first 68 amino acids if translation efficiency is adequate. A possible intron retention effect of these mutations would introduce a GC-poor (40%) 4.3 Kb long fragment before the canonical start codon and significantly decreasing the GC content of 5’ UTR of ZCWPW1 that is naturally at 60%. Meanwhile, skipping of exon 2 would cause the exclusion of 109 bases prior to the canonical start codon that could alter translational efficacy and alternative splicing regulation. Moreover, the LoF effects of such 5’ UTR splice variants were previously reported for neurodegenerative diseases58,59.
The function of ZCWPW1 protein is not well-known, except for its involvement in epigenetic regulation as a histone modification reader24 and for its the recently proposed role during meiosis25–27. ZCWPW1 has a consistent expression in different brain regions including hippocampus17, however its exact function in brain is yet to be examined. The current literature points out that loss of ZCWPW1 might be contributing to AD through inadequate epigenetic regulation as epigenetic mechanisms are suggested to be playing an important role in AD60. Furthermore, even though the only known significant brain eQTL effect of rs1476679 for ZCWPW1 (identified in putamen part of basal ganglia) suggests a ZCWPW1-decreasing effect of the rs1476679-C protective allele17, the same protective allele is also significantly associated with increased ZCWPW1 levels in blood19 and in stimulated monocytes20,61 for which substantial colocalization with GWAS was also observed8. The observation of predicted LoF mutations in ZCWPW1 in AD patients aligns well with these observations. A recent single-nuclei RNAseq study62 on parietal lobe of AD brains showed upregulation of ZCWPW1 in oligodendrocytes and downregulation in neurons. This suggests that these cell types should be prioritized when further investigating ZCWPW1 loss-of-function variants. Moreover, as ZCWPW1 is thought to be an epigenetic regulator, it could be interesting to compare epigenetic profiles of brain tissues of LoF carriers with non-carriers in the future; differentially regulated regions could hint at the targets of ZCWPW1.
We should note that even though we assessed the rare coding variants in the most compelling genes of the ZCWPW1 locus based on literature evidence, we cannot rule out the possibility that the rare variants in other genes more distal to the GWAS index variants could be contributing to Alzheimer’s disease.
Furthermore, the strongest association we obtained from our single marker testing was observed for rs531189257 intronic variant in PILRA that was exclusive to 10 patients (P = 0.002). This rare variant is also a 3’ UTR variant in a noncoding short transcript of PILRA (ENST00000484934.1) and it could be speculated that its possible effect on AD risk might be through regulatory effects on PILRA. However, it is important to note that replication is needed to further comment on the possible effects of this rare variant.
In conclusion, in this study we assayed three genes in the ZCWPW1 AD risk locus and found four ZCWPW1 predicted LoF mutations that were exclusively and significantly found in seven LOAD patients. Further investigation on these carriers revealed possible shared haplotypes hinting at distant common ancestors of non-singleton LoF carriers, meanwhile long-read Nanopore sequencing analysis unveiled that all LoF mutations are phased with the risk haplotype in the locus. Taken together, our results suggest a potential risk mechanism for AD through ZCWPW1 haploinsufficiency, which should be supported with functional experiments in the future. Identification of ZCWPW1 LoF variants in AD patients may open up new opportunities to study downstream effects of these mutations in Alzheimer’s disease.
Web Sources
fastq-mcf, https://github.com/ExpressionAnalysis/ea-utils/blob/wiki/FastqMcf.md
Phase 3 VCFs of 1KG samples, http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/
GENCODE, https://www.gencodegenes.org/
BCFtools, http://samtools.github.io/bcftools/bcftools.html
qcat, https://github.com/nanoporetech/qcat
RegulomeDB, http://www.regulomedb.org/
CADD, https://cadd.gs.washington.edu/
Ensembl, https://www.ensembl.org/index.html
GTEx, https://gtexportal.org/home/index.html
eQTL Catalogue database, https://www.ebi.ac.uk/eqtl/
gnomAD browser, https://gnomad.broadinstitute.org/
ExAC browser, http://exac.broadinstitute.org/
Healthy Exomes (HEX) browser, https://www.alzforum.org/exomes/hex
Single-nuclei Brain RNA-seq expression browser, http://ngi.pub/snuclRNA-seq/
UniProt, https://www.uniprot.org/
pyGenomeTracks, https://github.com/deeptools/pyGenomeTracks
ggplot2: https://ggplot2.tidyverse.org/
Acknowledgements
The authors thank the personnel of the VIB Neuromics Support Facility of the VIB-UAntwerp Center for Molecular Neurology and the personnel of the Human Biobank of the VIB NBD group. The research was funded in part by the Alzheimer Research Foundation (SAO-FRA), the Research Foundation Flanders (FWO), and the University of Antwerp Research Fund, Belgium. FK is a supported by a PhD fellowship from the University of Antwerp Research Fund. We thank the high-performance computing service of the University of Lille. We thank all the CEA-CNRGH staff who contributed to sample preparation and genotyping for GSA genotyping for their excellent technical assistance. This work was supported in part by a grant (European Alzheimer DNA BioBank, EADB) from the EU Joint Programme – Neurodegenerative Disease Research (JPND). Inserm UMR1167 is also funded by Inserm, Institut Pasteur de Lille, the Lille Métropole Communauté Urbaine, the French government’s LABEX DISTALZ program (development of innovative strategies for a transdisciplinary approach to Alzheimer’s disease).
Footnotes
↵$ The following members of the BELNEU Consortium have contributed to the clinical and pathological phenotyping and follow-up of the Belgian patient cohorts: Johan Goeman, Roeland Crols (Hospital Network Antwerp (ZNA) Middelheim and Hoge Beuken, Antwerp, Belgium); Dirk Nuytten (Hospital Network Antwerp (ZNA) Stuivenberg, Antwerp, Belgium); Rudy Mercelis (Antwerp University Hospital, Edegem); Mathieu Vandenbulcke (University of Leuven and University Hospitals Leuven, Leuven, Belgium); Anne Sieben, Jan L. De Bleecker, Patrick Santens (University Hospital Ghent, Ghent, Belgium); Jan Versijpt, Alex Michotte (University Hospital Brussels, Brussels, Belgium); Olivier Deryck, Ludo Vanopdenbosch, Bruno Bergmans (AZ Sint-Jan Brugge, Bruges, Belgium); Christiana Willems, Nina De Klippel (Jessa Hospital, Hasselt, Belgium); Jean Delbeck (General Hospital Sint-Maria, Halle); Adrian Ivanoiu (Saint-Luc University Hospital, Université Catholique de Louvain, Louvain-la-Neuve, Belgium); and Eric Salmon (University of Liege and Memory Clinic, CHU Liege, Liege, Belgium).