Abstract
Background Decline in cognitive function is the most feared aspect of ageing. Poorer midlife cognitive function is associated with increased dementia and stroke risk. The mechanisms underlying variation in cognitive function are uncertain.
Methods We assessed associations between 1160 proteins’ plasma levels and two measures of cognitive function, the digit symbol substitution test (DSST) and the Montreal Cognitive Assessment in 1198 PURE-MIND participants. We assessed key MRI-ascertained structural brain phenotypes as potential mediators of associations between plasma protein levels and cognitive function. Potentially causal effects of protein levels on structural brain phenotypes and neurological outcomes were assessed using Mendelian randomisation (MR) analyses.
Results We identified five DSST performance-associated proteins (NCAN, BCAN, CA14, MOG, CDCP1), with NCAN and CDCP1 showing replicated association in an independent cohort, GS (N=1053). MRI-assessed structural brain phenotypes partially mediated (8-19%) associations between NCAN, BCAN, and MOG, and DSST performance. MR analyses suggested higher CA14 levels might cause larger hippocampal volume and increased stroke risk, whilst higher CDCP1 levels might increase stroke and intracranial aneurysm risk.
Conclusions We identify cognition-associated plasma proteins with potentially causal effects on brain structure and risk for neurological diseases. Our findings highlight candidates for further study and the potential for drug repurposing to reduce risk of stroke and cognitive decline.
1. Background
Decline in cognitive ability and dementia are the most feared aspects of ageing [1], providing a strong rationale for investigating the mechanisms underlying cognitive function. Poorer cognitive function is associated with a greater risk of Alzheimer’s dementia and stroke [2, 3]. This may be due to reduced “cognitive reserve”, which postulates that lower premorbid cognitive function leads to worse cognitive impairment for a given degree of neuropathology [4]. Better understanding of these mechanisms could inform strategies for the prevention and treatment of dementia and stroke. Recent studies have highlighted the potential for investigating cognition and structural brain phenotypes through the study of plasma proteins; however, they have been limited by the assessment of a small set of proteins, and/or a reliance on observational association analyses [5–8].
Here, we investigate associations between 1160 plasma proteins and cognitive function in the Prospective Urban and Rural Epidemiology (PURE)-MIND [9], and sought replication in the independent imaging subsample of the Generation Scotland cohort (henceforth, referred to as “GS”). Using a simple measure of processing speed, the digit symbol substitution task (DSST), and a cognitive screening tool, the Montreal Cognitive Assessment (MoCA), we carried out a screen for cognition-associated proteins, and then employed mediation analyses to assess the proportion of the protein expression-cognition relationship that could be explained by structural brain phenotypes, including measures of brain volume and white matter hyperintensity (WMH) volume. WMH is a magnetic resonance imaging (MRI) marker of white matter damage, and is one of the manifestations of age-related cerebral small vessel disease. Two sample Mendelian randomisation (MR) analyses were performed to assess potentially causal effects of genetically predicted protein levels on genetically predicted cognitive function, brain structure, stroke subtypes, and Alzheimer’s disease (see Figure 1 for an overview of the study design).
2. Methods
Sample information
This study used data from participants with European (N = 3514), Latin (N = 4309), or Persian (N = 1332) ancestry in the Population Urban Rural Epidemiology (PURE) biomarker sub-study [32]. Participants of non-European ancestry were excluded from the present analyses due to the need to align the PURE genetic data with external genetic datasets, which are predominantly from European participants. Participants of Latin and Persian ancestry were included due to their genetic overlap with European participants [32].
The PURE biomarker study is a nested case-cohort study of the original PURE study [33] with protein biomarkers and genotyping data [32]. Cases were selected if they had experienced at least one of the following adverse health events: myocardial infarction, stroke, heart failure, type II diabetes, or death from any cause. Cohort members were selected by random sampling to obtain a group of participants who were frequency-matched by major country-specific ethnicity to the cases. The PURE biomarker study also included European participants enrolled in PURE-MIND (N = 1198) [9]. Participants from selected countries in PURE [33] were invited to participate in PURE-MIND if they were aged ≥ 39 years, had no history of stroke, dementia, or other neurological disease; had no contraindication to MRI; and could complete cognitive assessments [9].
The European, Latin, and Persian PURE biomarker cohort participants were used to identify protein biomarker protein quantitative trait loci (pQTLs) [32], for use in Mendelian randomisation analyses, whilst data from the European PURE-MIND biomarker participants were used for observational association analyses. We sought replication of our observational findings in GS (max. N = 1053) [34, 35], which was recruited through re-contact of the Generation Scotland: Scottish Family Health Study (GS:SFHS) [36, 37].
Ethical approval
All centres contributing to PURE were required to obtain approval from their respective ethics committees (Institutional Review Boards). Participant data is confidential and only authorized individuals can access study-related documents. The participants’ identities are protected in documents transmitted to the Coordinating Office, as well as biomarker and genetic data. Participants provided informed consent to obtain baseline information, and to collect and store genetic and other biological specimens.
The GS:SFHS obtained ethical approval from the NHS Tayside Committee on Medical Research Ethics, on behalf of the National Health Service (reference: 05/S1401/89). All participants provided broad and enduring written informed consent for biomedical research. GS:SFHS has Research Tissue Bank Status (reference: 15/ES/0040), providing generic ethical approval for a wide range of uses within medical research. The imaging subsample of GS:SFHS (referred to as “GS” herein) received ethical approval from the NHS Tayside committee on research ethics (reference 14/SS/0039). All experimental methods were in accordance with the Helsinki declaration.
Brain imaging
PURE-MIND participants enrolled in the PURE biomarker cohort were scanned at four sites in Canada (three at 1.5T (two on General Electric (GE) scanners, one on a Phillips scanner), one at 3T (GE)). The brain imaging phenotypes assessed in this study were total brain volume (excluding ventricles), total white matter volume, hippocampal volume, average cortical thickness, a multi-region composite thickness measure designed to differentiate Alzheimer’s disease patients from clinically normal participants [38], silent brain infarcts (SBI), cerebral microbleeds (CMB), and WMH volumes. These will henceforth be referred to as the “structural brain phenotypes”. SBI and CMB were defined as described previously [9]. WMH volumes were estimated using Lesion Segmentation Tool (LST) in Statistical Parametric Mapping 12 (SPM12) [39]. For analyses, WMH volumes were natural log-transformed, after adding one to account for values of zero, to reduce positive skew [40]. Brain volume measurements, intracranial volumes (ICV), and average cortical thicknesses were derived from T1-weighted images using FreeSurfer v5.3 (http://surfer.nmr.mgh.harvard.edu/) [41, 42]. GS participants [34] were scanned at two sites in Scotland (both 3T) [34]. Brain volumes and ICVs were derived from T1-weighted images using Freesurfer version 5.3 [41, 42]. As in PURE-MIND, WMH volumes in GS were obtained using LST [8].
Assessment of cognitive function
General cognitive ability was measured in PURE-MIND and GS by trained assessors using the DSST (Wechsler Adult Intelligence Scale, 3rd Edition) [43]. Participants were scored according to the number of correct matches made within two minutes (maximum score: 133). PURE-MIND participants completed the Montreal Cognitive Assessment (MoCA) [10], a questionnaire-based test with scores 0 to 30.
Measurement of plasma protein expression
Proteomic and genetic analysis were conducted in the Clinical Research Laboratory & Biobank – Genetic & Molecular Epidemiology Laboratory (CRLB-GMEL), Hamilton, Canada. In the PURE biomarker cohort, plasma protein levels were measured by proximity extension assay using the Olink Proseek Target 96 reagent kit (Olink, Uppsala, Sweden). Thirteen panels (Cardiometabolic; Cardiovascular Disease II and III; Cell Regulation; Development; Immune Response; Inflammation, Metabolism; Neuro Exploratory; Neurology; Oncology I and III; and Organ Damage) were used to measure a total of 1196 biomarkers in 12066 participants, of which 3735 European, 4695 were Latin, and 1436 were Persian. The analytical performance of these panels has been validated previously and further information can be found elsewhere (https://www.olink.com/products-services/target/). Quality control and pre-processing were performed as described previously for the Cardiovascular Disease II panel [32], with the exception that the data were quantile normalised within three, rather than two, reagent lots. Missing biomarker values were imputed by the mean, separately for each reagent lot, and all values were rank-based inverse normalised by reagent lot, sex and ethnic group. Where multiple biomarker measurements from different proximity extension assays were available for a single protein, the mean value was taken. Following quality control, measurements were available for 1160 biomarkers in between 8369 and 9154 European, Latin, or Persian participants (depending on biomarker-specific missingness).
In GS, plasma protein levels were measured with the SOMAscan assay platform (SomaLogic Inc.), as described previously [44]. Following initial data processing and quality control steps, measures of 4058 proteins were available in 1095 participants. Prior to analysis, protein abundance measurements were log-transformed and rank-based inverse normalised.
Genotyping and imputation of the PURE-MIND discovery sample
PURE participant genotypes (Thermofisher Axiom Precision Medicine Research Array r.3) were called using Axiom Power Tools and in-house scripts. Samples were removed if: they had a low signal-to-noise contrast (Dish Quality Control < 0.82); low quality control rate (QCCR < 0.97); <95% call rate; disagreement between self-reported sex and/or ethnicity and genetically determined sex and/or ethnicity; were duplicated; or had excess heterozygosity. We removed variants with: a call rate <98.5%; Hardy-Weinberg equilibrium p < 1 x 10-5; plate or batch effects; non-Mendelian segregation within families; and/or a minor allele frequency <0.005%. Following quality control, 749,783 variants remained [32].
Imputation was performed on the 749,783 variants following the TOPMed Imputation server pipeline (https://imputation.biodatacatalyst.nhlbi.nih.gov/), using the TOPMed release 2 reference panel [45]. EAGLE v2.4 [46] and Minimac4 programs were applied for phasing and imputation, respectively. Imputed variants with an info score ≥ 0.3 and MAF ≥ 0.01, which did not deviate from Hardy-Weinberg equilibrium (p ≥ 1 x 10-5) were retained.
Assessment of the association between protein biomarkers and cognitive function and structural imaging phenotypes
We assessed the association between standardised protein levels and cognitive and structural brain phenotypes using linear (DSST, MoCA, total brain volume, white matter volume, hippocampal volume, WMH volume, cortical thickness) or logistic (CMB, SBI) regression. The cognitive or structural brain phenotype-of-interest was modelled as the dependent variable with the standardised protein expression level, age, age2, sex, ethnicity, and first ten genetic principal components included as independent variables. A sensitivity analysis was performed for DSST-associated proteins in which we further adjusted for education (a categorical variable with levels: (i) no education; (ii) high school or less; (iii) trade school; and (iv) college or university). We calculated Pearson’s correlation coefficient to assess the pairwise correlations between DSST-associated proteins. Within each analysis, we applied a Bonferroni correction to determine statistical significance, yielding the following significance thresholds: p < 4.31 x 10-5 when assessing associations with 1160 proteins; p < 2.5 x 10-3 when assessing associations with the five DSST-associated proteins across four DSST-associated structural brain phenotypes; and p < 5 x 10-3 when assessing 10 pairwise correlations between proteins.
We performed replication analyses in GS for the significant proteins identified in PURE-MIND. Mixed effects models were fitted using the lmekin function from the R package coxme v.2.2.17 [47] to assess the association of the outcome variable (DSST performance, total brain volume (excluding ventricles), cerebral white matter volume, hippocampal volume, and WMH volume) with standardised protein expression, covarying for age, age2, sex, study site (Dundee or Aberdeen), the delay between blood sampling and protein extraction, depression (a binary variable representing lifetime depression status), and a kinship matrix. When a brain volume phenotype was the outcome variable, additional covariates were included to account for ICV, the interaction between ICV and study site (to account for a site-associated batch effect on ICV measurement), and whether there was manual intervention using tools within Freesurfer during the quality control process. Replication was defined as a concordant direction of effect, meeting a Bonferroni-corrected threshold of p < 1.67 x 10-2 (accounting for the assessment of three DSST-associated proteins) or p < 7.14 x 10-3 (accounting for the assessment of seven structural brain phenotype-protein combinations).
Assessment of the association of DSST performance with MRI-derived structural brain phenotypes
To identify mediators of the association between protein expression and DSST performance, we first established the structural brain phenotypes that satisfied the requirements of potential mediators (i.e. associated with both DSST performance and at least one DSST-associated protein), and then formally tested the meditation relationship by bootstrap mediation analyses.
We estimated the association between DSST performance and structural brain phenotypes in PURE-MIND using linear models. All brain volume measurements were normalised to ICV and the models included covariates for age, age2, sex, and the first ten genetic principal components. We defined statistical significance as p < 0.00625 (Bonferroni correction for eight phenotypes) and sought replication of significant associations (N = 4) in GS. In GS, brain volumes were residualised for ICV, scanner location, the interaction between ICV and scanner location, and whether there was manual intervention during the quality control process. The resultant residuals were included as the dependent variable in a mixed effects model with DSST score, age, age2, sex, depression, and a kinship matrix as independent variables. Statistical significance was defined as p < 0.0125.
The DSST performance-associated brain MRI phenotypes (N = 4) were assessed as potential mediators of the protein level-DSST associations (N = 3, yielding a total of N = 9 mediations to assess) using bootstrap mediation analysis in PURE-MIND. Analyses were performed using the R package “mediation” [48] with 1000 bootstraps. We corrected for the nine potential mediation relationships assessed using a Bonferroni-corrected threshold of p < 5.56 x 10-3.
Functional and tissue-specific expression enrichment analyses
Proteins associated with DSST performance at p < 0.05 in PURE-MIND were included in functional and tissue-specific expression analyses in three groups: (i) all proteins; (ii) positively associated proteins; and (iii) negatively associated proteins. Enrichment was assessed relative to all measured proteins (n=1160). Functional enrichment analyses were performed using WebGestalt (http://www.webgestalt.org/) [49] using default parameter settings for the over-representation analysis method to assess enrichment for: (i) gene ontology categories (biological processes, molecular functions, and cellular compartments); (ii) Reactome pathways; and (iii) disease-associated genes (Disgenet). Tissue-specific enrichment analyses were performed using the “GTEx v8: 54 tissue types” and “GTEx v8: 30 general tissue types” gene expression datasets in FUnctional Mapping and Annotation (FUMA) [50]. For both the functional enrichment and tissue expression analyses, enrichment was assessed using a hypergeometric test and significant enrichment was defined as a Benjamini-Hochberg-adjusted p < 0.05, correcting for the number of tests performed within each analysis platform. Analyses were performed using web interfaces accessed on 18/04/2022 (WebGestalt and FUMA) and 14/01/2023 (FUMA).
Two-sample forward MR analyses
We performed two-sample forward MR analyses to identify potentially causal associations between genetically predicted plasma protein levels and: (i) cognitive function; (ii) structural brain phenotypes (total brain volume, cerebral white matter volume, hippocampal volume, WMH volume, and CMB); and (iii) disease outcomes (Alzheimer’s disease, all stroke, stroke subtypes (ischaemic, cardioembolic, large artery, and small vessel), and intracranial aneurysm).
Associations between single nucleotide polymorphisms (SNPs) and plasma protein expression levels were calculated in PURE. SNPs located within 200 kilobases up- or downstream of the RefSeq transcript corresponding to a protein-of-interest were assessed as potential pQTLs through separate GWASs of the European (N = 3514), Latin (N = 4309), and Persian (N = 1332) participants, with significant association defined as p < 5 x 10-6. Missense variants and SNPs affecting splice sites were excluded. The GWAS model has been described previously [32]. Effect estimates were then combined by inverse variance-weighted fixed effects meta-analysis using METAL [51], and an independent set of pQTLs obtained by pruning (r2 < 0.1 within the European, Latin, and Persian subgroups from the PURE cohort). Sensitivity analyses were performed in which the pruning threshold was adjusted to r2 < 0.01.
The independent set of pQTLs were assessed for their associations with cognitive function, structural brain phenotypes, and disease outcomes using summary statistics from published studies [52–59].
MR analyses were performed using the R packages MRBase for TwoSample MR v.0.5.6 [60], mr.raps v.0.4.1 [61], and MRPRESSO v.1.0 [62]. We employed several complementary MR approaches: IVW [63], weighted median [64], robust adjusted profile scores (RAPS) [61], MR-Egger [65], and MR-PRESSO [62]. We adopted the IVW approach as our primary methodology and defined statistical significance using a liberal within-outcome variable Bonferroni correction for the two proteins (CA14 and CDCP1) that could be assessed, yielding a significance threshold of p < 0.025 (or p < 0.05 when an outcome could only be assessed for one protein). The IVW approach has the greatest statistical power but also makes the most assumptions. Hence, we reported IVW findings only where there was: (i) no evidence of pleiotropy; and (ii) corroboration of the direction of effect from at least two other MR approaches. An MR-Egger intercept p < 0.05 was deemed to indicate directional pleiotropy. Heterogeneity amongst instrumental variables, suggestive of horizontal pleiotropy, was indicated by a significant Cochran’s Q (p < 0.05). If Cochran’s Q was significant, MR-PRESSO[62] was performed, and, if the MR-PRESSO global test was significant (p < 0.05), MR-PRESSO with outlier removal was performed. In addition to the above conditions, we only reported results where there were at least three IVs, there was no evidence of weak instrument bias (F-statistic > 10) [66], and when the correction causal direction had been assessed (indicated by the instrumental variables explaining a greater proportion of the variance in the exposure than in the outcome, and a Steiger test p < 0.05). For the sensitivity analyses, in which a more stringent r2 threshold was used to select independent pQTLs, only one or two IVs were available for each analysis. When two IVs were available, results from the IVW approach are reported, and when one IV was available, results from the Wald ratio test were reported.
Pairwise Conditional Analysis and Co-localisation Analysis (PWCoCo)
PWCoCo [30, 67] was performed to assess the existence of a shared causal variant between (i) pQTLs for each of the five proteins-of-interest and (ii) variants associated with the outcomes assessed in the two-sample MR analyses. PWCoCo analyses were performed for all conditionally distinct pQTLs and all conditionally distinct association signals in the outcome data. Analyses were performed using SNP-protein associations calculated in the European PURE-MIND participants (N = 3514). As for the MR analyses, SNPs had to be located within 200 kilobases up- or downstream of the RefSeq transcript corresponding to a protein-of-interest.
Analyses were performed using a C++ implementation of the PWCoCo algorithm, which utilises methods from the GCTA-COJO [68] and coloc [69] R packages. PWCoCo calculates the posterior probabilities (PP) for: the existence of no causal variant(s) for either trait (PP0); the existence of causal variant(s) for trait one or trait two (PP1 and 2, respectively); both traits being associated with the same region, with different causal variants (PP3); and both traits being associated with the same region, with a shared causal variant(s) (PP4). Failure to find evidence in support of colocalisation can be due to lack of power [69]; therefore, we limited our analyses to those where PP3 + PP4 ≥ 0.8. Colocalisation was defined as PP4/PP3 > 5, as suggested previously [70].
Software
Statistical analyses and plot generation were performed in R (versions 3.6.0, 4.1.1, 4.1.2, 4.2.0, 4.2.1)
3. Results
Participants in PURE MIND (N=1198) and GS (N=1053) (Table 1) were similar in age, sex distribution, and clinical characteristics.
Identification of protein biomarkers of cognitive function and enrichment analyses
Five proteins were associated with DSST performance in PURE-MIND (Figure 2; Figure 3; Supplementary Table 1). Higher plasma levels of neurocan (NCAN; β = 2.03 (indicating a 2.03 higher DSST score per a standard deviation higher NCAN level), p = 9.11 x 10-8), brevican (BCAN; β = 1.91, p = 5.56 x 10-7), carbonic anhydrase 14 (CA14; β = 1.90, p = 5.90 x 10-7), and myelin-oligodendrocyte glycoprotein (MOG; β = 1.82, p = 2.29 x 10-6), and lower levels of CUB domain-containing protein 1 (CDCP1; β = −1.57, p = 3.97 x 10-5) were associated with better DSST performance, below the Bonferroni significance threshold. Adjustment for educational attainment modestly attenuated the effect estimate for all five proteins (Supplementary Table 1). Levels of NCAN, BCAN and MOG were positively correlated (0.251 ≤ r ≥ 0.615; all p < 2.20 x 10-16), whilst CDCP1 and CA14 expression levels were negatively correlated (r = −1.01, p = 4.82 x 10-4; Supplementary Table 2). Three proteins (NCAN, BCAN and CDCP1) proteins were also measured in GS, of which two replicated their association with DSST performance: NCAN (β = 1.40, p = 1.07 x 10-3) and CDCP1 (β = −1.99, p = 9.21 x 10-6; Figure 3). MoCA performance was not associated with the level of any protein (all p ≥ 2.34 x 10-4; Supplementary Table 3).
Proteins nominally associated (p < 0.05) with DSST performance (N = 184) were enriched for brain expressed proteins, most significantly for proteins with hippocampal expression (FDR-corrected p = 0.0154 Supplementary Table 4). Better DSST performance was nominally associated with lower levels of 90 proteins. These proteins mapped to the following immune pathways “interleukin-10 signalling”, “glomerulonephritis”, “regulation of granulocyte chemotaxis”, “positive regulation of leukocyte chemotaxis”, “positive regulation of leukocyte migration”, and “inflammation” (FDR-corrected p ≤ 0.0337; Supplementary Table 5).
Structural brain phenotypes as mediators of protein biomarker-DSST performance associations
In PURE-MIND, better DSST performance was associated with greater cerebral white matter volume (β = 0.0615, p = 4.34 x 10-7), greater total brain volume (β = 0.0349, p = 9.64 x 10-6), greater hippocampal volume (β = 2.97, p = 4.79 x 10-3), and lower log-transformed WMH volume (β = −3.20, p = 1.18 x 10-6). These associations replicated in GS (Supplementary Table 6).
Assessment of the relationships between protein levels and DSST-associated structural brain phenotypes in PURE-MIND revealed systematic differences between those proteins for which higher levels were associated with better DSST performance (NCAN, BCAN, CA14, and MOG), and CDCP1, which was negatively associated with DSST performance (Figure 4). Whilst NCAN, BCAN, CA14, and MOG showed a positive direction of association with total brain, cerebral white matter, and hippocampal volume measurements and a negative association with WMH volume, the converse was true for CDCP1. The associations between NCAN levels and total brain, cerebral white matter, and hippocampal volumes reached statistical significance (p ≤ 2.56 x 10-5) and were replicated in GS (p ≤ 6.70 x 10-3). BCAN levels were significantly associated with all four brain volumes (p ≤ 4.36 x 10-4), with the associations with total brain and cerebral white matter volumes replicating in GS (p ≤ 3.44 x 10-3). The associations between MOG levels and total brain and cerebral white matter volumes attained statistical significance (p ≤ 2.63 x 10-9), but could not be assessed in GS. We did not identify any significant associations with CA14 or CDCP1 levels after correction for multiple testing.
In PURE-MIND, cerebral white matter volume explained a significant proportion of variance in the relationship between MOG (19%), BCAN (15%), and NCAN (13%) levels and DSST performance (all p < 2 x 10-16) (Supplementary Table 7). Log-transformed WMH volume was a significant partial mediator of the association between BCAN levels and DSST performance (p = 0.002), mediating 8% of the relationship.
Identification of potentially causal relationships between protein levels and cognitive function structural brain phenotypes, and disease outcomes
Inverse variance weighted (IVW) Mendelian randomisation (MR) analyses were performed to assess the effects of genetically predicted CA14 and CDCP1 levels on cognitive function, structural brain phenotypes, and Alzheimer’s disease and stroke. A lack of instrumental variable (IV) SNPs precluded the assessment of BCAN, NCAN and MOG.
A one standard deviation higher level of genetically predicted plasma CA14 was associated with a larger hippocampal volume (β = 0.0990 [95% CI: 0.0272 to 0.171], p = 6.87 x 10-3), and a greater risk of all stroke (odds ratio (OR) = 1.07 [95% CI: 1.01 to 1.14], p = 0.0153; Supplementary Table 8). A one standard deviation higher level of genetically predicted plasma CDCP1 was associated with an increased risk of all stroke (OR = 1.12[95% CI: 1.03 to 1.22], p = 0.0116), ischaemic stroke (OR = 1.13[95% CI: 1.03 to 1.23], p = 9.65 x 10-3), and intracranial aneurysm (OR = 1.28 [95% CI: 1.06 to 1.55], p = 9.84 x 10-3). These associations were corroborated by similar effect estimates from weighted median and MR-RAPS analyses. No evidence of directional or horizontal pleiotropy were observed, and the correct causal direction was assessed. Neither the level of CA14 or CDCP1 was associated with risk of Alzheimer’s disease (p ≥ 0.129).
Sensitivity analyses were performed in which instrumental variables (IVs) were selected using a stricter threshold for independence. For genetically predicted CA14, these analyses supported the association with hippocampal volume (β = 0.144 [95% CI: 0.0435 to 0.244], p = 4.97 x 10-3), and produced a consistent, although non-significant, effect estimate for the association with risk for all stroke (Supplementary Table 8). For CDCP1, the sensitivity analyses demonstrated a consistent, although non-significant, effect estimate for the association with risk for intracranial aneurysm. The associations between genetically predicted CDCP1 and risk for (i) all stroke and (ii) ischaemic stroke could not be meaningfully interpreted due to heterogeneity between the two available IVs (Q-test p ≤ 0.0187).
Pairwise Conditional Analysis and Co-localisation Analyses (PWCoCo) were performed to assess the presence of a shared variant for each of the five proteins-of-interest and the same outcomes as assessed by two-sample MR analyses. We were only adequately powered to assess co-localisation between SNPs associated with one pair of traits: MOG plasma level and cognitive function. We did not observe any evidence in support of co-localisation or conditional co-localisation (posterior probability (PP)4/PP3 ≤ 4.81 x 10-4).
4. Discussion
In this large-scale analysis of the associations between the plasma levels of 1160 proteins and cognitive function, we identify CA14 and CDCP1 as being associated with processing speed, as measured by the DSST, and having potentially causal effects on hippocampal volume (CA14), and risk of stroke (both) and intracranial aneurysm (CDCP1).
Other proteins (BCAN, NCAN, and MOG) were associated with DSST performance and important structural brain phenotypes, with cerebral white matter volume mediating a significant proportion (13-19%) of the relationship between the levels of all three proteins and DSST performance, and WMH volume mediating 8% of the relationship between BCAN levels and DSST performance. Potentially causal effects of these proteins could not be assessed due to a lack of genetic instruments. Enrichment analyses of proteins that were nominally significantly associated with DSST performance revealed a significant enrichment for brain-expressed proteins.
There were no significant associations between plasma protein levels and performance on the MoCA. This might reflect the fact that the MoCA is a screening tool for mild cognitive impairment [10], meaning its sensitivity to detect variation in cognitive function in non-clinical groups is likely to be limited.
CA14 is one of fifteen isoforms of the carbonic anhydrase family of zinc metalloprotease enzymes, which catalyse the reversible hydration of carbon dioxide [11]. CA14 is expressed by neurons [12] and involved in regulating extracellular pH following synaptic transmission [13, 14]. Consistent with our findings, acute inhibition of CA14 leads to impaired performance on cognitive tasks in mice [15]. Carbonic anhydrase activation may lead to beneficial cognitive effects in rodents [16]. In keeping with our MR results, there are neuroprotective effects of carbonic anhydrase inhibition in models of amyloidosis, Huntington’s disease, and ischaemic and haemorrhagic stroke [16]. The mechanisms by which carbonic anhydrase inhibition and activation exert their effects are uncertain [15, 16]. FDA-approved carbonic anhydrase inhibitors, and thus the majority of carbonic anhydrase inhibitors investigated to date, are pan-carbonic anhydrase inhibitors. Of the carbonic anhydrase family members measured in our study (CA1, 2, 3, 4, 5A, 6, 9, 12, 13, and 14), only CA14 levels were significantly associated with DSST performance. Further studies are required to determine the therapeutic potential for carbonic anhydrase modulation in the context of cognitive impairment, Alzheimer’s disease, and stroke.
The extracellular matrix (ECM) proteins NCAN and BCAN are brain-specific chondroitin sulfate proteoglycans, which are expressed by neurons and astrocytes (NCAN and BCAN), and oligodendrocytes (BCAN). They contribute to the formation of a specialised structure, the perineuronal net (PNN), which plays a key role in memory and neuronal plasticity, and which is disrupted in Alzheimer’s disease [17]. Our findings are consistent with those of Harris et al. (2020) [5], who found plasma levels of NCAN and BCAN were positively associated with brain volume. Plasma levels of BCAN have previously been found to be positively associated with Mini Mental State Examination performance and reduced in patients with Alzheimer’s disease or mild cognitive impairment [7]. Mice that are lacking either NCAN or BCAN expression show normal development and memory function but reduced hippocampal long term potentiation [18, 19], whilst quadruple knock-outs, which lack NCAN, BCAN, and two additional ECM proteins (tenascin-C and tenascin-R) show an altered ratio of excitatory to inhibitory synapses and a reduction in the number and complexity of hippocampal PNNs [20]. Genetic variation in the gene encoding A Disintegrin and Metalloproteinase with Thrombospondin Motifs 4 (ADAMTS4), which degrades the four members of the lectican family (including NCAN and BCAN), has been implicated in Alzheimer’s disease [21]. Taken together, the evidence suggests NCAN, BCAN and their regulators as molecules-of-interest in Alzheimer’s disease.
MOG is an oligodendrocyte-expressed membrane glycoprotein, the exact function of which is unknown [22].
CDCP1 is a widely expressed transmembrane glycoprotein that acts as a ligand for T cell-expressed Cluster of Differentiation 6 (CD6), and is implicated in autoimmune conditions [23]. CDCP1 is amenable to modulation by approved drug treatments: Itolizumab, which is used to treat psoriasis, disrupts CDCP1-CD6 binding and downregulates T-cell-mediated inflammation [24], whilst atomoxetine, a treatment for attention deficit hyperactivity disorder, which is being considered for the treatment of mild cognitive impairment, reduced cerebrospinal fluid (CSF) CDCP1 levels [25]. Intriguingly, findings in mice suggest a functional link between CDCP1 and MOG [6].
Our study has several strengths. We measured 1160 proteins, associated with a wide range of physiological processes, in a large, well-characterised cohort. Replication analyses, where possible, were performed in an independent cohort in which proteins were measured using an independent methodology. The availability of genetic and brain MRI data permitted an exploration of causality and putative causal pathways. The use of MR to identify potentially causal associations will have offered protection against some of the common confounders of observational analyses [26], with the use of multiple MR methods, which generally gave concordant estimates of effect, mitigating against the individual biases of different MR methodologies [27]. Moreover, by requiring instrumental variables to be located in cis to their target protein, we limited the chance of pleiotropic effects [28].
There are also several limitations to consider.
First, the 1160 proteins measured represent a small subset of the circulating proteome [29]. Replication analyses were only performed for those proteins for which data were available in the GS cohort, meaning that we did not assess replication of CA14 or MOG.
Second, the availability of suitable IVs mean that our primary MR analyses were only performed for CA14 and CDCP1. Whilst we required a minimum of three IVs for the primary MR analyses, our sensitivity analyses, in which a stricter threshold for independence was applied to the IVs necessitated the use of fewer than three IVs in each analysis. As such, the results of the sensitivity analyses should be interpreted with this caveat in mind.
Third, for all but one pair of traits, we were insufficiently powered to assess co-localisation between genetic variants associated with protein level and cognition, structural brain phenotypes, and disease outcomes. This means that it is possible that significant MR findings might reflect the presence of separate causal variants in linkage disequilibrium (LD) with one another [30]
Fourth, we measured protein levels in the plasma, rather than in the brain or CSF. It is, however, important to note the striking enrichment for brain-expressed proteins amongst the DSST-associated proteins. Previous analyses of the GS cohort, in which replication was sought in the present study, have identified the levels of several plasma proteins as being associated with multiple markers of brain health [8]. These findings support the use of the plasma to assess brain-related phenotypes and emphasise the need for additional research to explain the mechanisms controlling the efflux of brain-expressed proteins into the bloodstream in non-clinical populations. Moreover, the use of cis pQTLs, which are likely to be shared across tissues [31], as IVs in our MR analyses, supports the possibility that the MR-identified associations reflect the actions of the proteins-of-interest in the brain.
5. Conclusions
We identified protein biomarkers of cognitive function that may causally affect brain structure and risk for stroke and intracranial aneurysm. Notwithstanding the need for replication, our findings prompt several hypotheses that should be assessed by future studies. Our apparently paradoxical findings of higher CA14 levels being associated with both better cognitive function and increased stroke risk suggest that molecular findings can inform a more nuanced understanding of the relationship between premorbid cognitive function and neurological disease risk. It is possible that improved risk stratification may be achieved through the combination of cognitive assessment and biomarker measurement. The availability of approved drugs targeting our identified proteins raises the possibility of drug repurposing for novel therapeutic interventions to prevent cognitive decline, stroke, and intracranial aneurysm.
Declarations
Ethics approval and consent to participate
All centres contributing to PURE were required to obtain approval from their respective ethics committees (Institutional Review Boards). Participant data is confidential and only authorized individuals can access study-related documents. The participants’ identities are protected in documents transmitted to the Coordinating Office, as well as biomarker and genetic data. Participants provided informed consent to obtain baseline information, and to collect and store genetic and other biological specimens.
All components of Generation Scotland received ethical approval from the NHS Tayside Committee on Medical Research Ethics (REC Reference Number: 05/S1401/89). All participants provided broad and enduring written informed consent for biomedical research. Generation Scotland has also been granted Research Tissue Bank status by the East of Scotland Research Ethics Service (REC Reference Number: 15/0040/ES), providing generic ethical approval for a wide range of uses within medical research.. The imaging subsample of Generation Scotland received ethical approval from the NHS Tayside committee on research ethics (reference 14/SS/0039). This study was performed in accordance with the Helsinki declaration.
Consent for publication
Not applicable
Availability of data and materials
The PURE datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
According to the terms of consent for GS participants, access to individual-level data (omics and phenotypes) must be reviewed by the GS Access Committee. Applications should be made to access{at}generationscotland.org.
Competing interests
MC is supported by a Canadian Institute of Health Research doctoral award and has received consulting fees from Bayer AG. MP is supported by the EJ Moran Campbell Internal Career Research Award from McMaster University. DAG is a part-time employee of Optima partners, a health data consultancy based at the Bayes centre, The University of Edinburgh. SH is an employee of Bayer AG. AMM has previously received speaker’s fees from Illumina and Janssen and research grant funding from The Sackler Trust. SY is supported by the Heart and Stroke Foundation/Marion W Burke Chair in Cardiovascular Disease. GP is supported by the CISCO Professorship in Integrated Health Systems. The other authors declare no competing interests.
Funding
PURE
The PURE study is an investigator-initiated study that is funded by the Population Health Research Institute, the Canadian Institutes of Health Research (CIHR), Heart and Stroke Foundation of Ontario, support from CIHR’s Strategy for Patient Oriented Research, through the Ontario SPOR Support Unit, as well as the Ontario Ministry of Health and Long-Term Care and through unrestricted grants from several pharmaceutical companies (with major contributions from AstraZeneca [Canada], Sanofi-Aventis [France and Canada], Boehringer Ingelheim [Germany and Canada], Servier, and GlaxoSmithKline), and additional contributions from Novartis and King Pharma and from various national or local organisations in participating countries as follows: Argentina—Fundacion ECLA; Bangladesh—Independent University, Bangladesh and Mitra and Associates; Brazil—Unilever Health Institute, Brazil; Canada—Public Health Agency of Canada and Champlain Cardiovascular Disease Prevention Network; Chile—Universidad de la Frontera; Colombia—Colciencias (grant number 6566–04–18062); South Africa—The North-West University, SANPAD (SA and Netherlands Programme for Alternative Development), National Research Foundation, Medical Research Council of South Africa, The South Africa Sugar Association, Faculty of Community and Health Sciences; Sweden—grants from the Swedish State under the Agreement concerning research and education of doctors, the Swedish Heart and Lung Foundation, the Swedish Research Council, the Swedish Council for Health, Working Life and Welfare, King Gustaf V’s and Queen Victoria Freemasons Foundation, AFA Insurance, Swedish Council for Working Life and Social Research, Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning, grant from the Swedish State under (LäkarUtbildningsAvtalet) Agreement, and grant from the Västra Götaland Region; and United Arab Emirates— Sheikh Hamdan Bin Rashid Al Maktoum Award for Medical Sciences, Dubai Health Authority, Dubai. The PURE biomarker project was supported by Bayer and the CIHR. The biomarker project was led by PURE investigators at the Population Health Research Institute (Hamilton, Canada) in collaboration with Bayer scientists. Bayer directly compensated the Population Health Research Institute for measurement of the biomarker panels, scientific, methodological, and statistical work. Genetic analyses were supported by CIHR (G-18–0022359) and Heart and Stroke Foundation of Canada (application number 399497) in the form of funding to GP.
GS
This work was supported by the Wellcome Trust [104036/Z/14/Z, 220857/Z/20/Z, and 216767/Z/19/Z] and an MRC Mental Health Data Pathfinder Grant [MC_PC_17209] to AMM. DAG is funded by the Wellcome Trust Translational Neuroscience PhD Programme at the University of Edinburgh [108890/Z/15/Z]. LS and ANH are supported by Medical Research Council [MR/L023784/2]: Dementias Platform UK. LS is also supported by a Medical Research Council Award to the University of Oxford [MC_PC_17215]. AS is supported through the Wellcome-University of Edinburgh Institutional Strategic Support Fund (Reference 204804/Z/16/Z), and indirectly through the Lister Institute of Preventive Medicine award with reference 173096. Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates [CZD/16/6] and the Scottish Funding Council [HR03006]. Genotyping of the GS:SFHS samples was carried out by the Genetics Core Laboratory at the Clinical Research Facility, Edinburgh, Scotland, and was funded by the UK’s Medical Research Council and the Wellcome Trust [104036/Z/14/Z].
Authors’ contributions
Conception and design: RMW, WNW, GP; data analysis: RMW, MC, NP; drafting the article: RMW, WNW, GP; data preparation: MC, NP, MP, DAG, AC, HCW, AK, LS, XS, EE, MOD; data collection: ANH, AMM, EE, SH, SR, MOD, SY, WNW, GP; revision of the article: RMW, MC, NP, MP; WNW, GP; all authors read and approved the final manuscript.
Data Availability
The PURE datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. According to the terms of consent for GS participants, access to individual-level data (omics and phenotypes) must be reviewed by the GS Access Committee. Applications should be made to access@generationscotland.org.
Acknowledgements
We are grateful to all the families who took part in Generation Scotland, the general practitioners and the Scottish School of Primary Care for their help in recruiting them, and the whole Generation Scotland team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, healthcare assistants and nurses.
We thank Dr Alison Offer for assistance in producing the forest plots.
Abbreviations
- ACME
- Average causal mediation effect
- BCAN
- brevican
- CA14
- carbonic anhydrase 14
- CD6
- cluster of differentiation 6
- CDCP1
- CUB-domain containing protein 1
- CI
- confidence interval
- CMB
- cerebral microbleed
- CSF
- cerebrospinal fluid
- DSST
- digit symbol substitution test
- ECM
- extracellular matrix
- FDR
- false discovery rate
- GS
- Generation Scotland imaging subsample
- GS:SFHS
- Generation Scotland: Scottish Family Health Study
- IV
- instrumental variable
- IVW
- inverse variance weighted
- MOG
- myelin oligodendrocyte glycoprotein
- MoCA
- Montreal Cognitive Assessment
- MR
- Mendelian randomisation
- MRI
- magnetic resonance imaging
- NCAN
- neurocan
- PNN
- perineuronal net
- OR
- odds ratio
- PP
- posterior probability
- pQTL
- protein quantitative trait loci
- PWCoCo
- pairwise conditional analysis and co-localisation analyses
- PURE
- Prospective Urban and Rural Epidemiology
- RAPS
- robust adjusted profile score
- SBI
- silent brain infarct
- SD
- standard deviation
- SVD
- small vessel disease
- WMH
- white matter hyperintensity