Abstract
Alzheimer’s Disease (AD) is a highly prevalent neurodegenerative disorder. Despite increasing evidence of important metabolic dysregulation in AD, the underlying metabolic changes that may impact amyloid plaque formation are not understood, particularly for late onset AD. This study analyzed genome-wide association studies (GWAS), transcriptomics and proteomics data obtained from several data repositories to obtain differentially expressed (DE) multi-omics elements in mouse models of AD. We characterized the metabolic modulation in these datasets using gene ontology, and transcription factor, pathway and cell-type enrichment analysis. A predicted lipid signature was extracted from genome-scale metabolic networks (GSMN) and subsequently validated in a lipidomic dataset derived from cortical tissue of ABCA7-null mice, a mouse model of one of the genes associated with late onset AD. Moreover, a metabolome-wide association study (MWAS) was performed to further characterize the association between dysregulated lipid metabolism in human blood serum and AD.
We found 203 DE transcripts, 164 DE proteins and 58 DE GWAS-derived mouse orthologs associated with significantly enriched metabolic biological processes. Lipid and bioenergetics metabolic pathways were significantly over-represented across the AD multi-omics datasets. Microglia and astrocytes were significantly enriched in the lipid-predominant AD-metabolic transcriptome. We also extracted a predicted lipid signature that was validated and robustly modelled class separation in the ABCA7 mice cortical lipidome, with 11 of these lipid species exhibiting statistically significant modulations. MWAS revealed 298 AD single nucleotide polymorphisms (SNP)-metabolite associations, of which 70% corresponded to lipid classes.
These results support the importance of lipid metabolism dysregulation in AD and highlight the suitability of mapping AD multi-omics data into GSMNs to identify metabolic alterations.
1 Introduction
Alzheimer’s Disease (AD) is a neurodegenerative disorder prevalent in later life characterized by amyloid deposition, hyperphosphorylated tau aggregation into neurofibrillary tangles and a sustained neuroinflammatory response (DeTure & Dickson 2019). With the proportion of the population over 65 years of age increasing annually, a mechanistic understanding of the disease is urgently needed (Xie et al. 2020). There are several emerging lines of evidence highlighting the importance of metabolic dysfunctions in AD. Impaired glycolysis and bioenergetics shifts towards fatty-acid and amino-acid metabolism seem to indicate that mitochondrial dysfunction or substrate switch play a role in AD pathogenesis (Perez Ortiz & Swerdlow 2019). Cholesterol metabolism can also exert lipotoxic effects in the AD brain via ceramide production modulation (Cutler et al. 2004). Furthermore, there are several genes linked to AD onset and progression that are also related to brain lipid metabolism. The apolipoprotein epsilon4 (APOE4) allele, the strongest risk factor for AD development, is known to cause significant disruptions in brain lipid homeostasis in both human carriers and transgenic animals (Fernandez et al. 2019). Similarly, triggering receptor expressed on myeloid cells-2 (TREM2), another gene strongly associated with AD, actively undergoes lipid-sensing and consequently induces changes in the microglia lipidome (Nugent et al. 2020). Finally, loss-of-function variant of the ATP - binding-casette, subfamily-A, member-7 gene (ABCA7) has been strongly associated with late-onset AD (De Roeck et al. 2019). ABCA7 has been implicated in AD pathology through amyloid-precursor protein (APP) endocytosis, impaired amyloid-beta (Aβ) clearance and, although not fully elucidated, lipid metabolism dysregulation via sterol regulatory element binding protein 2 (SREBP2) (Aikawa et al. 2018).
Despite all the accumulating evidence, mechanistic explanations of AD have mostly been centered around amyloid or tau-centric hypotheses, and therefore much remains to be understood regarding the underlying metabolic processes (Johnson et al. 2020). Multi-omics approaches have the potential to overcome the limitations of the current knowledge in this field. These approaches can provide a comprehensive view of a particular pathophysiological state by interrogating molecular changes across several levels of biological functions (Canzler et al. 2020). A promising methodological approach relevant to the study of metabolites is genome scale metabolic networks (GSMN), which uses genomics and transcriptomics data to predict metabolic pathway modulations (Pinu et al. 2019). GSMN also allow for the interpretation of multi-omics data via metabolic subnetwork curation, thus providing an attractive metabolic framework which can be effectively validated using metabolomics and lipidomics data (Frainay & Jourdan 2017).
The aim of this study was to validate the presence of metabolic perturbations in AD using multi-omics pathway-based integration and extraction of metabolic subnetworks from open source data (Figure 1). We found consistent perturbations of lipid and energy metabolism across three AD multi-omics datasets compiled from previous studies, from which we extracted 133 lipid species predicted to be dysregulated in AD which we then validated in an ABCA7 knock-out (KO) mouse dataset acquired with ultraperformance liquid chromatography-mass spectrometry (UPLC-MS). The importance of this association was explored further by performing a metabolome-wide association study (MWAS) of the blood plasma metabolome for AD risk loci carriers in two human cohorts using 1H NMR spectroscopy.
This study also highlights the suitability of interpreting multi-omics data in the context of GSMNs, as the predicted lipid terms and species were not only found in the cortical ABCA7 lipidome, but its associated multivariate model robustly separated ABCA7 mice from their wild-type (WT) litter-mates.
2 Materials and Methods
2.1 Data collection of AD mouse brain transcriptomics and proteomics data
The gene expression omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/) (Clough & Barrett 2016) was queried on 15/06/20 for gene expression studies using “Alzheimer’s Disease” as our search term. The following criteria were employed for dataset selection: Mus musculus organism, expression profiling by array as study-type, tissue as attribute, brain tissue expression compared to WTs and a minimum of 3 animals per condition. This search yielded 11 datasets (GSE25926, GSE53480, GSE60460, GSE77574, GSE77373, GSE109055, GSE111737, GSE113141, GSE141509 and GSE74441) from 9 studies (Aydin et al. 2011; Polito et al. 2014; Hamilton et al. 2015; Marsh et al. 2016; Wang et al. 2017; Faivre et al. 2018; Hou et al. 2018; Fang et al. 2019; Preuss et al. 2020). The proteomics identifications (PRIDE) repository (Jones et al. 2006) was queried on 01/07/20 for proteomics studies applying the following filters: Alzheimer’s Disease as disease, brain as organism-part and Mus musculus as organism. Datasets comparing the AD proteome against WTs, with minimum 3 animals per condition and with deposited proteinGroups.txt files were included. This search yielded 4 datasets (PXD007795, PXD011068, PXD012238, and PXD007813) from 4 publications (Palomino-Alonso et al. 2017; Hamezah et al. 2019; Kim et al. 2019; Lachen-Montes et al. 2019). However, differences in protein expression failed to reach statistical significance after controlling for the false discovery rate (FDR) in two studies (Palomino-Alonso et al. 2017; Hamezah et al. 2019), and thus their corresponding datasets were excluded. A description of all included datasets can be found in Table 1.
2.2 Differential expression (DE) analysis of AD mouse transcriptomics and proteomics data
Processed transcriptomics datasets were retrieved from GEO using the GEOquery Bioconductor-based package (version 2.54.1) (Davis & Meltzer 2007) in the R environment, version 3.6.2 (https://www.R-project.org). Datasets were log-2 transformed and graphically inspected to verify appropriate data normalization; probes that were not mapped to any genes, mapped to more than one gene and probes with missing values (N/As) were filtered out. Differential expression analysis was performed using significance analysis of microarray (SAM) with samr package (version 3.0) (Tusher et al. 2001) within the R environment. SAM can control for the total number of false positives through both gene specific t-tests and a maximum local tolerable FDR (Tusher et al. 2001). Upon 200 permutation-based SAM analysis, multiple testing correction was applied by adjusting the total false positives to 3% and the local FDR for 90th percentile of DE genes to 5% in every dataset.
Proteomics datasets were analyzed using Perseus (version 1.6.5) (Tyanova et al. 2016). Initially, proteins only identified by reverse-decoy, site or known contaminants were excluded, as well as proteins with 2/3 of replicates per group reporting N/As. Protein intensities were then log-2 transformed and remaining N/As were replaced using normal distribution values, as most proteomics studies assume N/As are indicative of low-expression proteins (Tyanova et al. 2016). DE proteins were determined using a two-tailed Student’s t-test with a 200 FDR permutation-based method and a 0.050 p-value cut-off (Tusher et al. 2001). In isobaric tag for relative and absolute quantification (iTRAQ) experiments, an additional fold change (FC) 1.17-0.83 cut-off was introduced to determine DE proteins. iTRAQ experiments are prone to interference/ratio distortion (Pappireddi et al. 2019), and thus a combination of p-value, FDR and FC cut-off is the most suitable approach to detect biological variability (Oberg & Mahoney 2012).
2.3 AD genome-wide association studies (GWAS) gene-based analysis and mouse ortholog determination
AD GWAS summary statistics were obtained from a meta-analysis of the UK-Biobank and International Genomics of Alzheimer’s Project (IGAP) cohorts, which evaluated GWAS with AD by-proxy in 388364 individuals across both cohorts (Marioni et al. 2018). Summary statistics (ID: GCST005922) were retrieved from the NHGRI-EBI GWAS-Catalog (https://www.ebi.ac.uk/gwas/) (Buniello et al. 2019) on 07/07/2020.
Gene-based analysis was performed with multi-marker analysis of genomic annotation (MAGMA, version 1.07bb) (de Leeuw et al. 2015), using gene locations from the genome reference consortium-human build-37 (GRCh37, NCBI) and a reference panel of European ancestry from the 1000 genomes project phase-3 (Auton et al. 2015). MAGMA provides a combined p-statistic of genes significantly associated with single nucleotide polymorphisms (SNPs) (de Leeuw et al. 2015); we used a combined 0.050 p-value as a significance cut-off. Significant genes were imported into Ensembl –Biomart on 20/07/2020 (version GrCh37.13; https://grch37.ensembl.org/biomart/martview) to determine high confidence mouse orthologs (Zerbino et al. 2018). Upon excluding genes associated with either several or no mouse orthologs, only those exhibiting one-to-one bidirectional orthology with 60% protein sequence similarity across both species were considered high-quality mouse orthologs (Mancuso et al. 2019).
2.4 Gene ontology (GO) analysis and AD-metabolic multi-omics extraction
DE transcripts, proteins and GWAS-orthologs were initially mapped onto the BioCyc Mus musculus GSMN (Caspi et al. 2016) using MetExplore, which provides a framework for metabolic subnetwork extraction(Cottret et al. 2018). DE transcripts, protein-coding and GWAS-orthologs genes that were not mapped onto the GSMN were removed; the resulting omics lists are referred to as “all-mapped” data throughout this study. Significantly enriched functional terms were identified in all-mapped AD omics datasets using the database for annotation, visualization and integrated discovery (DAVID, version 6.8) (https://david,neifcrf.gov/) (Dennis et al. 2003). and the Mus musculus genome as background. GO analysis was performed using a hypergeometric test with an EASE score of 0.1 and a count threshold of 2. Terms with both raw p-value and Benjamini-Hochberg (B-H) FDR-adjusted p-value (α) below 0.050 were considered statistically significant. Metabolism - related transcripts, proteins and GWAS-orthologs were manually extracted from significantly enriched biological processes (BP).
2.5 Transcription Factor (TF) enrichment analysis
TF enrichment analysis was performed on all-mapped AD genes and proteins, as well as their metabolic counterparts, using ChIP-X enrichment analysis 3 (ChEA3) (https://maayanlab.cloud/chea3/). ChEA3 performs enrichment analysis based on TF’s target genes coverage using the Fishers exact test and B-H adjusted p-value at 0.050 threshold (Keenan et al. 2019). The ENCODE library was chosen as our reference set, as it incorporates TF-target associations from human and mouse data (Davis et al. 2018). Significantly enriched TF were manually cross-referenced with the mouse transcription factor atlas to verify its mouse tissue expression (Zhou et al. 2017).
2.6 Pathway enrichment analysis of AD-metabolic multi-omics data
AD metabolic transcripts, proteins and GWAS-orthologs lists were mapped onto the BioCyc Mus musculus GSMN (Caspi et al. 2016) in MetExplore (Cottret et al. 2018). Metabolic pathway enrichment analysis was performed using hyper-geometric tests with right-tailed Fishers exact tests with B-H correction for multiple testing (α=0.050).
2.7 Expression-weighted cell-type enrichment (EWCE) of AD-metabolic multi-omics data
EWCE was conducted on AD-metabolic transcriptomics, proteomics and GWAS-orthologs datasets using the EWCE package in R (version 0.99.2)(Skene & Grant 2016). EWCE computes an enrichment p-value that describes the probability of an input gene list having a meaningful expression within a specific cell-type upon 10000 random permutations (Skene & Grant 2016). A cortical and hippocampal single-cell RNA-sequencing dataset with large coverage was used as background (Zeisel et al. 2015); B-H adjusted p-values were calculated using the R base package. A conditional EWCE analysis was also performed on the combined AD-metabolic multi-omics dataset to probe the relationships between enriched cell-types, using an approach originally developed for GWAS data analysis (Skene et al. 2018).
2.8 Metabolic subnetwork extraction
To ultimately validate lipid alterations highlighted during pathway enrichment analysis, a metabolic subnetwork containing all lipid terms or species in significantly enriched lipid pathways was mined across the AD-metabolic transcriptome and proteome using MetExplore (Cottret et al. 2018). After excluding non-lipid metabolites, a combined predicted lipid signature across the AD multi-omics datasets was created, which was visualized using MetExploreViz (Chazalviel et al. 2018). Lipid identifiers were then retrieved from LIPIDMAPS (Fahy et al. 2009).
2.9 Cortical ABCA7-KO lipidomics dataset
We also employed a lipidomics dataset of cortical extracts of 7 WT and 7 ABCA7-KO 11-months old mice, with 3 females and 4 males per group, as described previously (Aikawa et al. 2018). Lipidomic extraction was performed on ∼50mg cortex tissue using a modified Folch extraction (Su et al. 2019). Global lipidomic profiling of the cortical extracts and 3 pooled samples was acquired using a reverse-phase ultraperformance liquid chromatography - mass spectrometry (RP-UPLC-MS) on a Synapt Quadruple-Time of Flight mass spectrometer (Waters Corp., Manchester, UK) in positive and negative mode. Details of systems configuration and analytical conditions have been previously reported (Andreas et al. 2020). Data processing was performed with KniMet (Liggi et al. 2018). Briefly, signals extracted using the R library XCMS (Tautenhahn et al. 2012) were retained if present in at least 50% of the pooled samples with a Coefficient of Variation <= 20. Remaining signals were subjected to imputation of N/As using K-Nearest Neighbour (KNN), probabilistic quotient normalization (PQN) based on pooled samples, and annotation using LIPID MAPS (https://lipidmaps.org/; (Fahy et al. 2009)), retention time matching to standards and fragmentation data.
2.10 Multivariate statistical analysis
Multivariate statistical analysis was performed on both positive and negative mode for the original ABCA7-KO and validated lipid signature subsets using P-SIMCA (Umetrics, Sweden) following log-transformation of intensities and Pareto-scaling. Orthogonal projections to latent structures-discriminant analysis (OPLS-DA) models, which allow to evaluate the impact of group membership by separating the variance attributed or orthogonal to class membership into components, were created for both original datasets and validated subset in positive and negative ion mode (Griffin et al. 2020). Lipids in the validated subset in positive and negative mode with variable influence of projection (VIP) > 1 were retained for univariate analysis, as OPLS-DA generated VIP > 1 indicate specific variables with important contributions to the model (Liu et al. 2020). The suitability of the models were assessed through inspection of their R2(cum)X and Q2 values, which respectively represent the percentage of model-captured variation and predictive capability (Liu et al. 2020). Models were further validated with a 100 permutation-based test, in which the correlation coefficient for the permuted class-membership variable is plotted against the R 2(cum)X and Q2(cum) (Murgia et al. 2017).
2.11 Univariate statistical analysis
AD multi-omics lipid species that had an associated VIP score above 1 in the original ABCA7 KO lipidomics dataset underwent univariate statistical analysis using GraphPad Prism (p<0.05). Negative-mode acquired lipids underwent both a Student t-test and Mann-Whitney non-parametric test comparing genotype (p<0.05), whereas positive-mode acquired lipids were analyzed using One-way ANOVA comparing genotype and sex correcting for multiple testing using B-H method (α<0.05).
2.12 Metabolome-wide association study (MWAS) of the blood plasma metabolome for AD risk loci carriers
We performed an MWAS using nuclear magnetic resonance (NMR) spectra of blood from 3258 individuals from the Airwave Health Monitoring Study (Airwave) and the Rotterdam Study (RS) prospective cohorts (Elliott et al. 2014; Ikram et al. 2020). Ethical approval for access to the Airwave cohort was granted following application to the access committee via the Dementia Platform UK (https://portal.dementiasplatform.uk/). Access to the RS cohort was granted following access to the Management Committee and conducted under approval from the Ministry of Health, Welfare and Sport of the Netherlands. Blood samples were heparin plasma for Airwave and serum for RS. Average age at enrolment in 2004 was 40.9 years for men and 38.5 years for women in the Airwave cohort; the RS cohort mean age of recruitment was 55 for both genders in 1990 (Elliott et al. 2014; Ikram et al. 2020)
Sample preparation and metabolic profiling in these cohorts have been extensively described (Tzoulaki et al. 2019; Robinson et al. 2020). Briefly, 1H NMR solvent suppression pulse and T2-Carr-Purcell-Meiboom-Gill (CPMG) spectra were acquired per sample (Dona A.C. et al. 2014) and additionally lipid quantification was applied on the 1H NMR solvent suppression pulse spectra using a commercial package (Jiménez et al. 2018). Resonances associated with both protons attached to the fatty acid and the head group (largely choline and glycerol) along with protons from cholesterol and cholesterol esters were classified as belonging to the lipid class.
MWAS was performed using 47 unique genetic loci based on three recent GWAS meta-analysis on AD to identify AD risk loci carriers (Lambert et al. 2013; Jansen et al. 2019; Kunkle et al. 2019). These studies evaluated genome-wide associations with late-onset AD (LOAD) in individuals across the IGAP and UK-Biobank cohorts.
2.13 MWAS association statistics
We carried out a linear regression to calculate the effect estimates of each SNP with all metabolomic features (23,571 data points for original NMR spectra and 105 features for the fitted lipid data) with adjustment for age, sex, and cohort. Prior to the analysis, each cohort data was residualised using 10 principal components from genome-wide scans to adjust for population stratification. To account for multiple testing, we used a permutation-based method to estimate the Metabolome Wide Significance Level (MWSL) to consider the high degree of correlation in metabolomics datasets (Chadeau-Hyam M et al. 2010; Castagné R et al. 2017). A P-value threshold giving a 5% Family-Wise Error Rate was computed for each SNP in each data platform.
3 Results
3.1 DE analysis of mapped AD mouse transcriptomics and proteomics data
DE transcripts and proteins in the AD mouse brain with potential metabolic functions were extracted from the GEO and PRIDE repositories, respectively. Microarray expression profiles from 11 datasets were obtained from 5 distinct brain regions (frontal cortex, hippocampus, sub-ventricular zone, brain hemisphere and whole-brain) and 5 AD mouse models (APP/PS1, 5xFAD, 3xTgAD, APP-KI and Tg4510; Table 1). SAM revealed 2884 DE genes with a 90th percentile FDR below 5%. Of these, 594 were accurately mapped onto the GSMN, which were used to generate the all-mapped AD transcriptomics dataset. Furthermore, proteomics datasets from the hippocampus and olfactory bulb of 5xFAD and Tg2576 mice, respectively, were also obtained (Table 1). Permutation-based analysis revealed 1537 DE proteins (FDR p < 0.050), of which 392 were mapped onto the GSMN and therefore constituted the all - mapped AD proteomics dataset. DE proteins from two additional studies (Palomino-Alonso et al. 2017; Hamezah et al. 2019) failed to reach statistical significance upon FDR correction and thus these datasets were removed from further analysis.
3.2 Mapped high-quality mouse orthologs identification from gene-based AD GWAS analysis
High-confidence mouse orthologs of significantly associated genes in human AD GWAS studies were also identified to gain a more comprehensive view of metabolic perturbations in AD. Gene-based analysis with MAGMA (de Leeuw et al. 2015) using summary statistics from 388364 individuals in the UK-Biobank and IGAP cohorts (Marioni et al. 2018) revealed 18178 gene-level associations with human AD SNPs, of which 1664 were considered significant (combined p-value < 0.05). After applying high-quality mouse orthology criteria (Mancuso et al. 2019), 1356 high-quality orthologs of AD SNPs-associated human genes were identified. The all-mapped AD GWAS-orthologs dataset was generated by accurately mapping 258 GWAS-orthologs onto the GSMN.
3.3 Differential GO and TF enrichment analysis across AD multi-omics datasets
Potential TF and GO enrichment were investigated across the AD multi-omics datasets. More than 25% of mapped AD protein-coding genes were also found in the AD transcriptomics dataset (Figure 2A). In terms of up-stream regulation, 67 TF were significantly enriched in the all-mapped AD proteome, whereas only 17 TF were enriched in the all-mapped AD transcriptome (Table S1). Despite these differences, CCCTC-binding factor (CTCF), TAL BHLH transcription factor 1 (TAL1), MYC associated factor X (MAX) and basic helix-loop-helix family member E40 (BHLHE40) were among the top10 potential enriched TFs across both datasets (FDR p<0.050, Figure 2B, Table S1).
GO analysis revealed shared functional terms across the three datasets (Figure 2C-D). Oxidation-reduction, lipid and fatty-acid metabolic processes were enriched in all-mapped AD transcriptomics and proteomics (FDR p<0.050, Figure 2C). Six additional lipid-related BP terms were over-represented in all-mapped AD transcriptomics data, whereas the TCA cycle was only enriched in the AD proteome (FDR p<0.050, Figure 2C). Transferase, catalytic, ATP binding, kinase activity, nucleotide binding and serine/threonine-kinase activity were among the top10 over-represented terms across all-mapped AD multi-omics datasets (FDR p<0.050, Figure 2D). Cytosol and mitochondria were the cellular compartment (CC) terms most over-represented in the all-mapped AD transcriptome and proteome respectively; membrane was the only significant CC term in the AD GWAS - orthologs dataset (Table S2).
3.4 Lipid-related metabolic pathways and regulators are enriched across AD-metabolic multi-omics datasets
Given the elevated number of metabolic BP significantly enriched across the three multi-omics datasets, the DE 203 transcripts, 164 proteins and 58 GWAS-orthologs genes mapped to these BP were subjected to further characterization. The largest degree of overlap was again found between AD-metabolic transcripts and proteins (Figure 3A). Although there were substantially more enriched TFs in the AD-metabolic proteome (Table S3), lipid-associated TFs such as estrogen-related receptor alpha (ESRRα) and sterol regulatory element binding transcription factor 1 (SREBF1) were overrepresented in the AD-metabolic transcriptome and proteome (FDR p<0.050, Figure 3B). Pathway enrichment analysis reflected differential metabolic processes across the multi-omics datasets (Figure 3C). Pathways related to cholesterol, phospholipases and fatty-acid metabolism were significantly over-represented in the AD-metabolic transcriptomics dataset, whereas the AD-metabolic proteome was associated with mitochondrial processes such as TCA cycle, glycolysis and NADH electron transfer (FDR p<0.050, Figure 3C). Lipid processes such as CPD-diacylglycerol and phosphatidylglycerol synthesis were also enriched in AD-metabolic proteome (Figure 3C). Thyroid hormone metabolism was significantly enriched in the GWAS-orthologs dataset with 66% pathway coverage (Table S4).
3.5 Astrocytes and microglia are independently enriched in the AD-metabolic transcriptome
To determine whether cell-type enrichment differences across the AD-metabolic multi-omics datasets could account for the differential pathway over-representation described previously, unconditional EWCE was performed. Significant astrocyte (FDR p-value=0.0000001, standard deviation from the bootstrapped mean or S.D.f.M=7.266) and microglia enrichment (FDR p-value=0.0000001, S.D.f.M=5.770) was found in the AD-metabolic transcriptomics dataset (Figure 4A). Oligodendrocyte and astrocyte enrichment in the AD-metabolic proteome lost significance upon multiple-testing correction (FDR p-value=0.07 & S.D.f.M=2.474 and FDR p-value=0.095 & S.D.f.M=2.049 respectively, Figure 4B). Astrocyte enrichment was also similarly lost in the GWAS-orthologs dataset (FDR p-value=0.336, S.D.f.M=1.75, Figure 4C).
Conditional cell-type enrichment was performed on a combined AD-metabolic multi-omics dataset to investigate enrichment relationships. Controlling for microglia did not ablate astrocytic enrichment (FDR p-value=0.0000001, S.D.f.M=7.540) and vice-versa (FDR p-value=0.0000001, S.D.f.M=4.476), suggesting astrocyte and microglia enrichments were independent of each other (Figure 4D). Oligodendrocyte enrichment was however dependent on microglia and astrocytes, as significance was lost upon controlling for either of them (FDR p-value=0.0389 & S.D.f.M=2.531 and FDR p-value=0.0389 & S.D.f.M= 2.387 respectively, Figure 4D). Cell-type enrichment statistics can be found in Table S5.
3.6 Validation of AD multi-omics lipid signatures in ABCA7 KO mice cortex
Given the number of significantly enriched lipid pathways, the results obtained from the multi-omics datasets were validated by comparing them to an internally acquired lipidomics UPLC-MS dataset from cortical extracts of ABCA7-KO and WT mice. To do so, a metabolic subnetwork containing all the significantly enriched lipid pathways was extracted from the generic mouse GSMN (Figure 3C, Table S4). This subnetwork involved 119 genes, 81 reactions and 107 metabolites. Of these, 73 were lipid species or terms, as some of them referred to a lipid sub-class, for example a CDP-diacylglycerol, rather than unique species. Upon lipid identifier retrieval, those 73 terms were associated with 133 lipid species, which generated the AD multi-omics predicted lipid signature.
Twenty-eight terms and 60 lipid species from the predicted AD multi-omics lipid signature were found and therefore validated in the ABCA7-KO and WT lipidomes. In particular, 40 lipid species were validated in the negative-mode dataset and 20 species in the positive-mode dataset. The original MS data, containing 5025 and 5811 features in positive and negative ionization mode, respectively, was hence filtered based on these two subsets of lipid species. OPLS-DA was then performed on both original and filtered datasets to assess the presence of any possible separation based on gender and/or genotype, and the potential impact of this feature reduction procedure on the model robustness.
The OPLS-DA model for the negative-mode validated lipid signature was able to separate ABCA7 and WT samples with an even higher degree of robustness than the original dataset (Q2cum=0.74 and Q2cum=0.56, respectively), which was validated via permutation testing (Table 2, Figure 5A-B). As illustrated by the model’s score plot, sample separation was substantially influenced by genotype rather that by variation orthogonal to class membership (Figure 5B).
Genotype separation was also captured in the OPLS-DA models for the positive-mode original dataset, although less readily differentiated than its negative-mode counterpart (Table 2). The robustness of the OPLS-DA model assessing genotype separation for the positive-mode validated lipid signature was impacted by the presence of an outlier (Table 2). A strong genotype-sex interaction influenced sample separation in the original positive-mode cortical dataset (Q2cum=0.406, Figure 5C), but not in the negative mode cortical dataset (Q2cum=0.36, Table 2). Since the AD multi-omics datasets did not consider sex composition, the positive-mode validated lipid signature should not account for genotype-sex interactions either. Indeed, the genotype-sex interaction was not recapitulated in the positive–mode validated signature subset (Q 2cum=0.20, Table 2, Figure 5D), while the same model for the negative subset was not calculated due to the lack of statistical power on the correspondent analysis of the original dataset. Therefore, the validated lipid signature in the negative mode seemed robustly influenced by ABCA7 genotype.
We then inspected the VIP scores of the original ABCA7 datasets to investigate whether the predicted lipid signature could play a role in driving class separation in relation to the entire ABCA7 lipidome. Out of the 17 predicted lipid species with a VIP score above 1 in the original ABCA7 lipidome (Table 3), 11 were significantly modulated, suggesting the AD multi-omics lipid signature was able to successfully predict significant changes in the ABCA7 cortical lipidome.
Predicted lipid signature was derived from an extracted metabolic subnetwork containing all significantly enriched lipid metabolic pathways in the AD transcriptomics and proteomics datasets, which contained 73 lipid terms. If species in the predicted lipid signature referred to a lipid class, all of the detected compounds belonging to that lipid class were considered for the analysis. This approach yielded 133 unique lipid species, which were mapped to 60 and 20 lipids detected in negative and positive ion mode, respectively. Of these predicted lipid species, 17 had a VIP score > 1 in the OPLS - DA models for the original ABCA7 datasets. *References to p <0.050 significance upon unpaired t-test and Mann Whitney non-parametric testing on intensity differences between ABCA7 and WT mice in the original negative mode ABCA7 dataset. $ refers to significance upon One-way ANOVA using B-H correction for multiple testing on differences between ABCA7-males and ABCA7-females, WT-females or WT-males in the original positive mode ABCA7 dataset.
3.7 Validation of lipid-AD risk loci associations in the Airwave and RS cohorts
Lastly, we performed a MWAS using 1H NMR spectra of human blood serum from 3258 individuals from the Airwave and RS cohorts (Elliott et al. 2014; Ikram et al. 2020). As these cohorts consist of predominantly healthy individuals, we used 47 known AD risk loci to identify AD risk carriers (Lambert et al. 2013; Jansen et al. 2019; Kunkle et al. 2019).
After performing MWAS, we detected 298 SNP-metabolite associations from the three NMR pulse sequences, out of which 107 in the lipoprotein, 13 in the CPMG, and 178 in the solvent suppression pulse sequence spectra datasets. Association with APOE was found for 83% of these, reflecting the importance of this gene in regulating components of the blood metabolome (Figure 6a).
To examine the associations further we classified the detected metabolites according to their chemical characteristics and biological role into lipids, amino acids, carbohydrates, glycolysis intermediates, TCA cycle intermediates, ketone bodies and other metabolites. Lipids included resonances that were associated with both protons attached to the fatty acid and the head group (largely choline and glycerol) along with protons from cholesterol and cholesterol esters. The dominant class was represented by lipids, comprising over 70% of the associations (Figure 6b).
4 Discussion
The main aim of this study was to validate the presence of metabolic perturbations in AD using multi-omics pathway-based integration and metabolic-subnetwork extraction. We hypothesized that metabolic alterations detected at multiple omics levels could predict a robust metabolic signature in the AD metabolome. If validated, these results would provide a comprehensive perspective on AD metabolism while supporting the use of GSMNs to identify consistent metabolic alterations in AD.
GO analysis of AD transcriptomics, proteomics and GWAS-orthologs data revealed numerous enriched metabolic BP. Although the initial mapping of DE transcripts, proteins and GWAS-orthologs certainly removed elements with no metabolic roles, this step did not disproportionately influence metabolic BP term over-representation per se, as only 3 out of 11 BP in mapped GWAS-orthologs were metabolic. Lipid and fatty-acid BP enrichment was found across the AD all-mapped transcriptome and proteome. This observation was further supported by TAL1, MAX and BHLHE40 over-representation in both datasets. TAL1 modulates lipid metabolism in the context of cell membrane integrity (Kassouf et al. 2010), MAX-MYC interaction strongly dysregulates fatty-acid metabolism in neurodegeneration (Carroll et al. 2018) and BHLHE40 is necessary for insulin-mediated SREBP1 induction, a lipid homeostasis regulator (Tian et al. 2018).
Pathway and TF enrichment analysis implicated differential metabolic processes across the AD multi-omics datasets, which also exhibited different cell-type enrichments. Cholesterol biosynthesis, phospholipases, fatty-acid metabolism and SREBF1 were strongly enriched in the AD-metabolic transcriptome, which also exhibited astrocyte and microglia cell-type enrichment. These multi-level results provide further evidence supporting the existence of wide-spread lipidomic alterations in AD microglia (Wang et al. 2015). Previously, an allele variant in the SREPF1 gene was found to be neuroprotective in APOE4 carriers in terms of dementia incidence (Spell et al. 2004). Extensive lipidome changes are present in TREM2 - defficent microglia, another gene variant heavily implicated in AD pathogenesis (Nugent et al. 2020). Phospholipase-amyloid interactions seem to facilitate microglia Aβ endocytosis, therefore contributing to neuroinflammation (Teng et al. 2019). Aerobic respiration, TCA cycle and glycolysis were enriched in the AD-metabolic proteome; these pathways are consistent with signs of mitochondrial dysfunction that are commonly found in neurodegeneration (Wang et al. 2020). Indeed, significant energy metabolism deficits have been detected in human(Johnson et al. 2020) and AD mice brain proteomes (Yu et al. 2018).
The main finding in this study is the validation of a predicted lipid signature derived from an extracted metabolic subnetwork with all significantly enriched lipid pathways in AD multi-omics datasets. The OPLS-DA model for the validated lipid signature in negative ion mode LC-MS dataset was capable of driving class separation based on ABCA7 genotype with a higher degree of robustness than in the original dataset; the reduced number of features was not a confounding factor for the model, but instead allowed for the removal of features originally decreasing the model robustness. Multi-omics integration is being increasingly used to draw biologically meaningful conclusions over large datasets (Pinu et al. 2019), and has been previously applied to AD data to infer metabolic perturbations using protein ranking and gene-set enrichment (Bundy et al. 2019; Bai et al. 2020), gene-protein interaction networks (Canchi et al. 2019) and protein-protein interaction networks (Zhang et al. 2020). To our knowledge, this is the first study using multi-omics pathway-based integration and metabolic subnetwork extraction to identify and subsequently validate a lipid metabolic signature in the AD lipidome.
Eleven lipid species from the validated lipid signature were significantly modulated in the cortical ABCA7 lipidome, of which four belonged to the cholesterol biosynthesis pathway. Lathosterol and cholesterol were significantly decreased in the ABCA7-KO lipidome compared to WT, whereas 7-dehydro-cholesterol and 4α-hydroxymethyl-4β-methyl-5α-cholesta-8,24-dien-3β-ol were significantly decreased in ABCA7-females compared to ABCA7-males. The evidence is mixed regarding cholesterol and intermediate sterols changes in ABCA7 mice. One study showed no cholesterol changes in ABCA7-KO mice brains (Satoh et al. 2015); serum cholesterol levels were however decreased in female ABCA7-KO mice (Kim et al. 2005). This study appears more aligned with the latter, as decreased free - cholesterol levels and sex-specific sterol intermediates differences were detected. This discrepancy is extended to other AD mouse models. Free-cholesterol and lathosterol levels exhibited non-significant changes in TgCRND8 (Yang et al. 2014) and APP/PS1 mice (Bogie et al. 2019), whereas lanosterol and cholesteryl acetate were up-regulated in APOE4 mice (Nuriel et al. 2017). Despite these disagreements, the importance of sterol intermediates in AD is reflected therapeutically, as a recent drug-repurposing screen identified several tau - reducing compounds which targeted cholesterol-esters (van der Kant et al. 2019).
We also performed an MWAS analysis using SNPs previously associated with LOAD and metabolites detected in blood plasma from the Airwave and Rotterdam cohorts using 1H NMR spectroscopy. Mean ages of recruitment in these cohorts are relatively young, and thus our reported 298 SNP-metabolite associations may represent early stages of the disease, as the brain begins to accumulate neurodegenerative features that ultimately results in Mild Cognitive Impairment (MCI) and AD. Using three distinct NMR pulse sequences, we were able to detect a range of metabolites including lipids, amino acids, glycolysis, TCA cycle intermediates and ketone bodies. Lipids were the commonest metabolite class represented in metabolite-SNP associations, suggesting that dysregulation of lipid metabolism may be some of the earliest events in AD.
There are important limitations associated with this study. Firstly, this study included multi-omics data from several brain regions, ages and AD mouse models. Therefore, region and age-specific TF upstream-regulation and metabolic alterations that are frequent in AD (González-Domínguez et al. 2014) were not assessed. It is also notoriously difficult to annotate lipid species into GSMNs due to the complexities associated with lipid nomenclature and identification (Poupin et al. 2020). This study successfully overcame this limitation by allowing second-order lipid species matching to their associated broader lipid term whenever unique lipid species matching was not possible (Poupin et al. 2020). This study was also limited in that cell type enrichment analysis could not distinguish whether astrocytic and microglia enrichment was associated with gliosis in disease rather than AD pathology per se, as cell type proportions could not be adequately controlled in silico. Additionally, APOE-associated SNPs dominated our MWAS analysis, which could be attributed to the known association of ApoE with dyslipidemia and atherosclerosis (Bouchareychas & Raffai 2018). Furthermore, 1H NMR spectra of blood plasma detect a high proportion of lipids compared with other classes of metabolites and is relatively insensitive as a technique. We are currently performing mass spectrometry to expand the coverage of the metabolome to further investigate the earliest molecular events in AD.
5 Conclusions
In summary, this study highlights the suitability of integrating multi-omics data into GSMNs to identify metabolic alterations in AD. Pathway-based integration of multi-omics data revealed distinct perturbations in lipid metabolism in the AD mouse brain. Predicted lipids extracted from the over-represented lipid pathway’s metabolic subnetwork was validated in the ABCA7 lipidome, with its associated multi-variate model robustly modelling class separation. Furthermore, more than 70% of 298 SNP-metabolite associations in a MWAS corresponded to lipid species, thus validating the presence of lipidomic dysregulation in AD.
Data Availability
We reference all data used on mouse studies which is freely available through omics repositories. These links are provided in the text. In addition, ethical approval for access to the Airwave cohort was granted following application to the access committee via the Dementia Platform UK (https://portal.dementiasplatform.uk/). Access to the RS cohort was granted following access to the Management Committee and conducted under approval from the Ministry of Health, Welfare and Sport of the Netherlands.
Author Contributions
M.E.G.S. and J.L.G. conceived and designed the study. M.E.G.S. retrieved and analyzed the transcriptomics, proteomics, GWAS and lipidomics data. B.R.D. acquired the lipidomic data. S.L. and B.R.D. processed the lipidomic data. I.K. performed the MWAS study, which used data from two on-going cohorts oversaw by P.E. M.E.G.S. and J.L.G. interpreted the data. M.E.G.S. drafted the manuscript, which received critical input from J.L.G. All authors have read and approved the published version of the manuscript.
Funding
This work was supported by the Medical Research Council UK, the UK Dementia Research Institute, National Institute for Health Research (NIHR) and Imperial Biomedical Research Centre.
Conflicts of Interest
The authors declare no conflict of interest.
Supplementary Materials
The following are available : Table S1: Transcription Factor enrichment analysis of all-mapped AD transcriptomics and proteomics datasets; Table S2: Biological Process (BP), Molecular Function (MF) and Cellular Compartment (CC) enrichment analysis of all-mapped AD transcriptomics, proteomics and GWAS-orthologs datasets; Table S3: Transcription Factor enrichment analysis of AD-metabolic transcriptomics and proteomics datasets; Table S4: Metabolic pathway enrichment analysis of AD-metabolic transcriptomics, proteomics and GWAS-orthologs datasets; Table S5: Unconditional and conditional EWCE analysis of AD-metabolic multi-omics datasets.
Acknowledgments
The authors would like to acknowledge Dr. Tomonori Aikawa and Professor Takahisa Kanekiyo from the Mayo Clinic, Jacksonville, Florida for providing the ABCA7 cortical mouse tissue.
Abbreviations
- Airwave
- Airwave Health Monitoring Study
- AD
- Alzheimer’s Disease
- Aβ
- amyloid-beta
- APP
- amyloid-precursor protein
- ABCA7
- ATP-binding-cassette subfamily-A member-7 gene
- APOE
- apolipoprotein epsilon
- ChEA3
- ChIP-X enrichment analysis 3
- DAVID
- database for annotation, visualization and integrated discovery
- DE
- differentially expressed
- EWCE
- expression weighted cell-type enrichment
- FDR
- false discovery rate
- FC
- fold change
- GEO
- gene expression omnibus
- GRCh37
- genome reference consortium-human build-37
- GSMN
- genome-scale metabolic networks
- GWAS
- genome-wide association studies
- IGAP
- international genomics of Alzheimer’s cohorts
- iTRAQ
- isobaric tag for relative and absolute quantification
- KO
- knock-out
- MAGMA
- multi-marker analysis of genomic annotation
- MWAS
- metabolome-wide association study
- NMR
- nuclear magnetic resonance
- OPLS-DA
- orthogonal projections to latent structures-discriminant analysis
- PQN
- probabilistic quotient normalization
- PRIDE
- protein identification database
- RP-UPLC-MS
- reverse-phase ultraperformance liquid chromatography-mass spectrometry
- RS
- Rotterdam study
- SAM
- significance analysis of microarray
- SNPs
- single nucleotide polymorphisms
- S.D.f.M
- standard deviation from the bootstrapped mean
- SREBP2
- sterol regulatory element binding protein 2
- TF
- transcription factor
- TREM2
- triggering receptor expressed on myeloid cells-2
- UPLC-MS
- ultraperformance liquid chromatography-mass spectrometry
- VIP
- variable influence of projection
- WT
- wild-type