Abstract
Recently, large-scale case-control analyses have been prioritized in the study of ALS. Yet the same effort has not been put forward to investigate additive moderate phenotypic effects of genetic variants in genes driving ALS risk, despite case-level evidence suggesting a potential oligogenic risk model. Considering its direct clinical and therapeutic implications, a large-scale robust investigation of oligogenicity in ALS is greatly needed.
Here, we leveraged the Project MinE ALS Sequencing Consortium genome sequencing datasets of individuals with ALS (n = 6711) and controls (n = 2391) to identify signals of association between oligogenicity in known ALS genes (n=26) and disease risk, as well as clinical outcomes.
Applying regression models to a discovery and replication cohort, we observed that the risk imparted from carrying rare variants in multiple known ALS genes was significant and was greater than the risk associated with carrying only a single rare variant, both in the presence and absence of variants in the most well-established ALS genes, such as C9orf72. However, in contrast to risk, the relationships between oligogenicity and ALS clinical outcomes, such as age of onset and survival, might not follow the same pattern as we did not observe any associations.
Our findings represent the first large-scale, case-control assessment of oligogenic associations in ALS to date and confirm that oligogenic events involving known ALS risk genes are indeed relevant for the risk of disease in approximately 6% of ALS but not necessarily for disease onset and survival. This must be considered in genetic counselling and testing by ensuring the use of comprehensive gene panels even when a potential pathogenic variant has already been identified. Moreover, in the age of stratified medication and gene therapy, it supports the need of a complete genetic profile for the correct choice of therapy in all ALS patients.
Introduction
As an intermediary between monogenic and polygenic disease transmission, oligogenic inheritance refers to the additive moderate phenotypic effect of genetic variants in a few genes that together drive disease presentation. Establishing whether oligogenicity plays a role in the development of a disease is important for more accurate diagnosis, as it may clarify the role of low penetrance variants or explain atypical clinical presentations. It could also drive the development of treatment by highlighting multiple potential targets. Therefore, identifying an oligogenic component of risk in a disease considered to be predominantly monogenic has direct implication for genetic counseling and risk assessment1. Although not well studied, oligogenicity has been described in a selected number of diseases such as Bardet-Biedl syndrome, Charcot-Marie-Tooth (CMT), and long QT syndrome2–8. Interestingly, all three forms of transmission — monogenic, polygenic and oligogenic — have been reported in amyotrophic lateral sclerosis (ALS).
In recent years, large-scale case-control analyses have been prioritized in the study of monogenic and polygenic forms of ALS, often including thousands of samples to maximize the statistical power of discovery9–16. However, the same effort has not been put forward to investigate oligogenic events driving ALS risk. Case-level evidence has suggested an oligogenic risk model in ALS17,18, which conforms with the proposed multistep hypothesis of ALS that describes multiple molecular events — including the possibility of multiple genetic variants — occurring to trigger ALS onset19,20. Yet large case-control analyses have not been performed using ALS cohorts to confirm the role of oligogenicity in the disease. Previous reports of the phenomenon had modest sample sizes, or were limited to case studies that did not include control cohorts, and did not directly assess the disease risk or influence on clinical outcomes imparted by carrying multiple variants in ALS genes with respect to carrying only one17,18,21–27.
Here, we leveraged the Project MinE ALS Sequencing Consortium large-scale, genome sequencing datasets of individuals with ALS (n = 6711) and controls (n = 2391) to identify signals of association between oligogenicity in known ALS genes and disease risk. Further, we investigated whether carrying multiple rare variants influences clinical features of disease, including age of onset and survival. The presented analyses represent the first large-scale, case-control assessment of oligogenic associations in ALS to date.
Methods
Study cohort and sample sequencing
Genome sequencing data obtained from the Project MinE ALS Sequencing Consortium were comprised of a discovery subset (individuals with ALS = 4518 and controls = 1821) and a replication cohort (individuals with ALS = 2193 and controls = 570) corresponding to the two main releases of Project MinE (data freeze 1 and 2). Details regarding sequencing methodology and quality control were previously described28,29. Briefly, all samples were sequenced using either the Illumina HiSeq 2000 platform or Illumina HighSeq X platform (San Diego, CA, USA), and sequencing reads were aligned to the hg19 reference genome to call single nucleotide variants, insertions, and deletions. Quality control was performed at both an individual and variant level, and included assessment of read depth and coverage, ancestry-defining principal component analysis (PCA), and identity-by-descent analysis. Controls were matched to the individuals with ALS based on age, sex, and geographical region.
Post quality control, 4299 individuals with ALS and 1815 controls from the Project MinE discovery subset, and 2057 individuals with ALS and 513 controls from the replication cohort were retained for analysis. Genetic data and clinical outcomes, including age of ALS onset and survival period, were available for the ALS patients. Survival period was defined as years from diagnosis to death, or years from diagnosis to last follow-up, as appropriate.
Variant annotation and filtering
Variants were annotated using the Ensembl variant effect predictor (VEP) and Annovar30,31. Variants were further annotated based on minor allele frequencies (MAFs) using the Genome Aggregation Database (gnomAD) v2.1.1 non-neurological dataset32. Only variants considered rare, with an MAF < 0.01 in both gnomAD and the Project MinE controls, were retained for further analysis.
Twenty-six genes previously associated with ALS in the literature were selected for the analysis, as previously described33, and are hereafter referred to as known ALS genes (Supplemental Table 1). Only genes with at least two publications reporting rare coding variants in individuals with ALS were included. Genes with unclear inheritance patterns or limited or refuted gene-disease validity classification from the ClinGen ALS Gene Curation Expert Panel (GCEP) as of January 2022 were excluded (Clinical Genome ALS). Genes were also subclassified based on the strength of evidence regarding their association with ALS. “Established ALS genes” included those with a definitive relationship with ALS based on curation by the ClinGen ALS GCEP. All other genes were considered “ALS-associated genes”. For C9orf72 and ATXN2 only the ALS-associated repeat expansions were included, while for the remaining 24 known ALS genes only variants within the protein coding regions were retained.
The rare, protein-coding variants were binned into three classes: 1) synonymous variants; 2) missense variants (missense single nucleotide variants and in-frame insertions or deletions); and 3) protein truncating variants (PTVs; stop lost and gained, start lost, transcript amplification, frameshift, transcript ablation, and splice acceptor and donor variants).
Pathogenic repeat expansion detection
All genome samples were also assessed for the C9orf72 GGGGCC and AXTN2 CAG repeat expansions using ExpansionHunter34. ExpansionHunter has been previously validated for the repeat expansions in both C9orf72 and ATXN235–37. A hexanucleotide repeat expansion of >30 copies in C9orf72 is considered pathogenic for ALS, whereas a trinucleotide intermediate repeat expansion of 29-32 copies in ATXN2 is considered a risk factor for ALS38–40.
Statistical analyses
To model the influence of carrying a single variant in a known ALS gene (singleton) or carrying at least one variant in more than one known ALS genes (oligogenic) on ALS risk and clinical outcomes, regression analyses were applied. More specifically, risk of ALS as a function of singleton or oligogenic carrier status were assessed in individuals with ALS and controls using logistic regressions adjusting for sex, ten ancestry defining PCs, and total genetic load (summation of synonymous, missense, and PTVs, as previously described)9. Associations between ALS age of onset and singleton or oligogenic carrier status in individuals with ALS were assessed using linear regressions adjusting for sex, site of onset, ten ancestry defining PCs, and total genetic load. Finally, associations between ALS survival period and singleton or oligogenic carrier status in individuals with ALS were assessed using Cox proportional hazard regressions adjusting for the same co-variates described above.
To control for any residual synonymous inflation not fully corrected by population structure based PCs, and to determine whether the C9orf72 or ATXN2 repeat expansions were driving associations, all regression models assessed associations with five variant type bins: 1) synonymous variants; 2) missense variants and PTVs; 3) missense variants, PTVs, and ATXN2 repeats; 4) missense variants, PTVs, and C9orf72 repeats; and 5) missense variants, PTVs, ATXN2 repeats and C9orf72 repeats. Statistical analyses were performed using the logistf library from R statistical software [4.2.2]41 in R Studio [1.1.414]. Data visualization was performed using the ggplot2 R package (v3.3.6)42.
Results
Oligogenic risk of ALS
Following the assessment for rare single nucleotide variants and repeat expansions in known ALS genes, we determined the number of singleton and oligogenic carriers (Table 1). In total, 1161 individuals with ALS (n = 4299) were considered singleton carriers, defined as carrying one non-synonymous, rare variant in one known ALS gene, in the Project MinE discovery subset. Whereas 255 individuals with ALS were considered oligogenic carriers, defined as carrying at least one non-synonymous, rare variant in two or more known ALS genes, in the Project MinE discovery subset. Similarly, 540 individuals with ALS (n = 2057) and 131 individuals with ALS were considered singleton and oligogenic carriers, respectively in the Project MinE replication subset.
Using our model, an enrichment analysis indeed demonstrated that carrying ≥ 2 rare (MAF < 0.01), non-synonymous variants in multiple ALS genes was significantly associated with disease risk, with greater odds than observed in the singleton analysis (Figure 1). The increased odds relative to singleton carrier status was observed for oligogenic carriers when only missense variants or PTVs were considered (singleton OR = 1.22 [1.09-1.37], p = 1.77e-03; oligogenic OR = 1.46 [1.20-1.78], p = 1.70e-04), as well as when C9orf72 repeat expansions (singleton OR = 1.42 [1.34-1.51], p = 3.21e-08; oligogenic OR = 1.82 [1.50-2.22], p = 4.99e-10), AXTN2 repeat expansions (singleton OR = 1.22 [1.09-1.37], p = 2.50e-03; oligogenic OR = 1.65 [1.23-2.21], p = 5.00e-04), and both repeat expansions (singleton OR = 1.49 [1.30-1.71], p = 1.22e-09; oligogenic OR = 2.27 [1.69-3.05], p = 1.70e-09) were included in the analysis.
Upon confirmation of our risk analysis using the Project MinE replication subset, we observed that singleton carrier status was only significantly associated with ALS when missense variants or PTVs in known ALS genes and the C9orf72 and ATXN2 repeat expansions were included in the analysis (OR = 1.30 [1.05-1.61], p = 1.70e-02; Supplemental Figure 1). In contrast, oligogenic carrier status was significantly associated with ALS when only missense variants or PTVs were considered (OR = 1.39 [1.02-1.90], p = 4.00e-02), as well as when C9orf72 repeat expansions (OR = 1.57 [1.15-2.15], p = 4.50e-03), AXTN2 repeat expansions (OR = 1.49 [1.07-2.08], p = 1.30e-02), and both repeat expansions (OR = 1.73 [1.24-2.42], p = 5.70e-04) were included in the analysis.
We also assessed whether oligogenic carrier status of variants with lower MAFs were associated with an increased risk of ALS in comparison to singleton carrier status. In a similar manner to the MAF < 0.01 rare variant assessment, oligogenic carrier status for variants of MAF < 0.001, MAF < 0.0001, and AC = 1 were significantly associated with disease risk, with greater odds than observed in the singleton analyses of the Project MinE discovery subset (Supplemental Figure 2). These findings were replicated in the analyses of singleton and oligogenic carrier statuses of variants with lower MAFs in the Project MinE replication cohort (Supplemental Figure 1).
Oligogenic influence on ALS clinical outcomes
From the Project MinE discovery cohort, 4299 individuals with ALS had their age of onset and survival periods captured. Summary of the clinical outcomes of individuals with ALS carrying zero, singleton, or oligogenic non-synonymous, rare variants in known ALS genes are presented in Table 1.
Linear regressions adjusting for sex, site of onset, ten ancestry defining PCs, and total variant count were applied to determine whether singleton or oligogenic carrier statuses were associated with ALS age of onset. We found that carrying a singleton, rare (MAF < 0.01), non-synonymous variant was significantly associated with lower age of onset when the C9orf72 repeat expansion was included in the analysis (β = -1.40 [-2.24 - -0.56], p = 1.4e-03; Figure 2). The singleton association was also observed for variants with an MAF < 0.001 and MAF < 0.0001 when the C9orf72 repeat expansion was included in the analyses and for MAF < 0.0001 when the C9orf72 repeat expansion was excluded (p = 0.031) (Supplemental Figure 3).
Carrying oligogenic, rare (MAF < 0.01), non-synonymous variants in multiple known ALS genes was only marginally significantly associated with age of onset when the C9orf72 repeat expansion was included in the analysis (p = 0.042, Figure 2), while a lack of oligogenic association was observed for variants of lower MAF (Supplemental Figure 3).
Similarly, Cox proportional hazard models adjusting for sex, site of onset, ten ancestry defining PCs, and total variant count, were applied to determine whether singleton or oligogenic carrier statuses were associated with ALS survival period. We found that carrying a singleton, rare (MAF < 0.01), non-synonymous variant was not significantly associated with ALS survival period (Figure 3); however, carrying a singleton missense variant or PTV of MAF < 0.001 and MAF < 0.0001, or a C9orf72 repeat expansion were significantly associated with an increased hazard ratio (p0.001 = 0.041 and p0.0001 = 0.0056, Supplemental Figure 4).
Carrying oligogenic, rare, non-synonymous variants in multiple known ALS genes was not associated with decreased or increased survival period, for any MAF classes (Supplemental Figure 4).
Gene involvement in oligogenic events
In the Project MinE discovery cohort, individuals with ALS had the highest frequency of oligogenic carriers when one of the rare variants carried was in NEK1 (1.84%) closely followed by ANXA11 (1.81%) and the C9orf72 repeat expansion (1.48%; Figure 4A,B). However, ANXA11 and NEK1 were also the genes with the highest oligogenic carrier frequencies in controls (1.37% and 0.99%, respectively), whereas the C9orf72 repeat expansion was only observed in one control with another rare variant in a known ALS gene, specifically VAPB (Supplemental Figure 5-6).
The only genes for which oligogenic carriers with ALS but no controls were observed carrying at least one rare variant in the gene along with a rare variant in another known ALS gene were CHMP2B, PFN1, SOD1, and UBQLN2 (Supplemental Figure 5). Oligogenic carriers with ALS with two or more rare variants in the most well-established ALS genes are displayed in Figure 4B. Although these are considered very rare events, 129 carriers with ALS were observed across the Project MinE discovery cohort (2.8%). Oligogenic carriers with ALS with one or more rare variant(s) in a well-established ALS gene in addition to one or more rare variant(s) in an ALS-associated gene are displayed in Figure 4C.
Discussion
While evidence from previous case reports has suggested that there could be an oligogenic burden to developing ALS, our study is the first at scale involving large discovery and replication datasets of thousands of people with ALS and controls. Across the two subsets of the Project MinE Sequencing Consortium, 6% of individuals with ALS were considered oligogenic carriers. This proportion was relatively comparable to the 6.82% of Australian individuals with sporadic ALS and 3.8% of North American individuals with ALS that were found to be oligogenic carriers in cohorts of more modest sample size18,21. In this oligogenic subgroup of people with ALS, we observed that the risk imparted from carrying rare variants in multiple known ALS genes was significant and was greater than the risk associated with carrying only a single rare variant, in principle consistent with the multi-step hypothesis of ALS19,20. Further, our results have direct implication for genetic counselling and testing. Having shown that variants in more than one known ALS gene affect risk in approximately 6% of patients, it follows that all ALS-associated genes must be included in clinical genetic testing even when a potential pathogenic variant has already been identified. Without the use of comprehensive gene panels, complications may arise in cases of familial testing, whereby even in families for which there is a known pathogenic gene variant, the lack of knowledge regarding any additional variants contributing to disease risk may result in false reassurance in the case of a negative genetic test. Additionally, a lack of understanding regarding all potentially pathogenic variants carried by ALS patients may limit their enrolment in precision medicine-based clinical trials.
In contrast to the risk associations, our results suggested that the association between oligogenicity and clinical outcomes of ALS remains unclear and might not involve the same genes. The only association between ALS age of onset and carrying more than one rare variant in a known ALS gene was observed when considering missense variants and PTVs with an MAF < 0.01, and the C9orf72 repeat expansion (p = 0.042). Yet, this association was only nominally significant, which — in addition to the absence of further oligogenic associations with ALS age of onset — results in a lack of clarity regarding the validity of this relationship. We only observed an association between singleton carriers and survival period when the C9orf72 repeat expansion was included in the analysis, which may be explained by the C9orf72 pathogenic repeat expansion being associated with a decreased survival period individually43,44. Overall, consistent with recent reports11,45,46, these results suggest the genetic architecture underlying ALS risk is decoupled from that underlying survival. Moreover, considering that every modification of risk is expected to correspond to an effect on age of onset16,47, our results suggest that this relationship might only explain a limited proportion of the age of onset variability in ALS and the presence of concurring independent mechanisms with large effect is possible.
Oligogenic events involving the C9orf72 repeat expansion were particularly frequent among the individuals with ALS (1.48%), the statistical power from which may contribute to our observations of the repeat’s contribution to clinical outcomes described above. As one of the most commonly inherited forms of ALS in Europeans48, we sought to examine whether the oligogenic effect was primarily driven by this repeat expansion. Excluding C9orf72, our risk assessments confirmed that oligogenic events involving missense variants and PTVs, both in the presence and absence of the ATXN2 repeat expansion, conferred significant risk to ALS. While a previous assessment of oligogenicity involving C9orf72 suggested the repeat expansion was sufficient to cause ALS alone49, our results suggested an increased risk from oligogenic events involving C9orf72 in comparison to C9orf72 singleton events. Oligogenic events could help explain the recent incomplete penetrance estimates of C9orf72 expansions50. Further, Ciura et al identified pathways by which the C9orf72 and ATXN2 pathogenic repeat expansions may genetically interact, suggesting an actual biological impact of oligogenicity involving C9orf72 in ALS pathogenesis51. Additional functional analyses will be required to determine how variants in multiple known ALS genes may synergistically induce pathology.
Many other known ALS genes were also more commonly observed oligogenic events carried by individuals with ALS than controls. Oligogenic events in the genes CHMP2B, PFN1, SOD1, and UBQLN2 were entirely unique to individuals with ALS in the discovery subset (absent in controls). Oligogenic events in the genes DNAJC7, FUS, HNRNPA1, TUB4A4, and VCP or the C9orf72 repeat expansion were only each observed in one control (0.055%). Of these ten genes, seven are established ALS genes — referring to those that have been classified as having a ‘definitive’ relationship with ALS according to the ClinGen ALS GCEP. While it could be proposed that oligogenic events involving established ALS genes are driving the observed risk associations, a large proportion of individuals with ALS carried oligogenic events involving only ALS-associated genes — referring to genes that have not been defined as having a definitive relationship with ALS according to the ClinGen ALS GCEP — or involving at least one variant in an established ALS gene and at least one variant in an ALS-associated gene (2.8%). We suspect that these oligogenic events involving variants in ALS-associated genes may encompass a large subset of cases in which only a singleton variant may not have contributed enough risk to drive disease onset. Yet how the two variants within each identified oligogenic event interact remains unknown, and it is possible that some events may represent cases of genetic modification — encompassing a variant driving disease risk in combination with a variant modifying disease presentation — as has been observed for oligogenic cases of complex neuropathy and retinal degeneration, among other diseases8,52–56.
Collectively, our results reveal that oligogenic events contribute significant risk to ALS, both in the presence and absence of variants in the most well-established ALS genes, such as C9orf72. Moreover, the observed lack of influence of oligogenicity on survival of ALS supports the recent hypothesis of decoupling between mechanisms underlying the risk of ALS and its progression. Although our study represents the largest systematic analysis of oligogenicity to date, even greater sample sizes and variant effect studies are required to determine the exact consequences of carrying multiple ALS-associated variants on disease progression and outcomes. Nevertheless, our findings confirm that oligogenic events are relevant in ALS, which may be of particular importance when the variants involved have uncertain pathogenic significance or are observed in genes with probable ALS associations. The potential implications of these variants on ALS clinical correlates and molecular pathology warrant further exploration. In the age of stratified medication and gene therapy, implicating oligogenicity in a relevant proportion of ALS patients supports the need for a complete genetic profile for accurate genetic counselling and the correct choice of therapy in all ALS patients.
Data availability
Individual whole-genome sequencing data are available and can be requested through Project MinE (https://www.projectmine.com/research/data-sharing/). A data access committee controls access to raw data, ensuring a FAIR data setup (https://www.datafairport.org). Details on the frequencies and gene burden test results are available on the ProjectMinE databrowser 29 (http://databrowser.projectmine.com).
Funding
This is an EU Joint Programme-Neurodegenerative Disease Research (JPND) project. The project is supported through the following funding organizations under the aegis of JPND– http://www.neurodegenerationresearch.eu/ [United Kingdom, Medical Research Council (MR/L501529/1 and MR/R024804/1) and Economic and Social Research Council (ES/L008238/1)]. AAC is a NIHR Senior Investigator. AAC receives salary support from the National Institute for Health and Care Research (NIHR) Dementia Biomedical Research Unit at South London and Maudsley NHS Foundation Trust and King’s College London. The work leading up to this publication was funded by the European Community’s Health Seventh Framework Program (FP7/2007–2013; grant agreement number 259867) and Horizon 2020 Program (H2020-PHC-2014-two-stage; grant agreement number 633413). This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme (grant agreement no. 772376–EScORIAL. This study represents independent research part funded by the NIHR Maudsley Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, King’s College London, or the Department of Health and Social Care. AAD is supported by the Canadian Institute of Health Research Banting Postdoctoral Fellowship Program. AI is funded by South London and Maudsley NHS Foundation Trust, MND Scotland, Motor Neurone Disease Association, National Institute for Health and Care Research, Spastic Paraplegia Foundation, Rosetrees Trust, Darby Rimmer MND Foundation, the Medical Research Council (UKRI) and Alzheimer’s Research UK. SMKF is supported by grants from ALS Canada, Brain Canada, the Michael J. Fox Foundation, and the Montreal Neurological Institute-Hospital. Project MinE Belgium was supported by a grant from IWT (n° 140935), the ALS Liga België, the National Lottery of Belgium and the KU Leuven Opening the Future Fund. AAK is funded by the ALS Association Milton Safenowitz Research Fellowship, The Motor Neurone Disease Association (MNDA) Fellowship, The Darby Rimmer Foundation, and The NIHR Maudsley Biomedical Research Centre.
Acknowledgments
Samples used in this research were in part obtained from the UK National DNA Bank for MND Research, funded by the MND Association and the Wellcome Trust. Part of the samples were obtained from The Project MinE and MND centres internationally. We thank people with MND and their families for their participation in this project. The authors acknowledge use of the King’s Computational Research, Engineering and Technology Environment (CREATE) (https://create.kcl.ac.uk), which is delivered in partnership with the National Institute for Health and Care Research (NIHR) Biomedical Research Centres at South London and Maudsley and Guy’s and St. Thomas’ NHS Foundation Trusts, and part-funded by capital equipment grants from the Maudsley Charity (award 980) and Guy’s and St. Thomas’ Charity (TR130505). We also acknowledge Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (United Kingdom), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome Trust.