Abstract
Delirium is an acute change in cognition, common in hospitalised older adults, and associated with high healthcare and human cost. In this work we shed light into the currently poorly understood genetic and proteomic background of delirium. We conducted the largest to date multi-ancestry analysis of genetic variants associated with delirium (1,059,130 individuals, 11,931 cases), yielding the Apolipoprotein E (APOE) gene as a strong risk factor with possible population and age-varying effects. A multi-trait analysis of delirium with Alzheimer disease identified 5 delirium genetic risk loci. Investigation of plasma proteins associated with up to 16-years incident delirium (32,652 individuals, 541 cases) revealed known and novel protein biomarkers, implicating brain vulnerability, inflammation and immune response processes. Integrating proteins and APOE genetic risk with demographics significantly improved incident delirium prediction compared to demographics alone. Our results pave the way to better understanding delirium’s aetiology and guiding further research on clinically relevant biomarkers.
Delirium is a complex neurocognitive condition, affecting nearly 25% of hospitalised older adults1. It is characterised by an acute, often reversible, disturbance of the patient’s cognitive ability, attention and awareness2. Multiple adverse outcomes have been strongly associated with delirium, including increased mortality, prolonged hospitalisation and accelerated dementia onset3. It has also been estimated that delirium costs more than $182 billion dollars per year to the European healthcare systems4. Despite its high healthcare burden, however, the current understanding of the genetic and biological mechanisms underlying delirium’s pathophysiology is still limited, hindering personalised medicine efforts to predict, prevent and treat the condition. Given delirium’s increasing presence in the global ageing populations5, alleviating its human and economic cost6 through personalised medicine is all the more important.
Previous studies on the genetic determinants of delirium have been small in scale and inconclusive3,5,7, largely focusing on a single or small sets of candidate genes8–17. The apolipoprotein E gene (APOE), specifically its ε4 haplotype, is the most intensively studied gene10,12,14–16, although without firm conclusions on its relationship with delirium5,18. The advancements in Genome-Wide Association Studies (GWAS) over the last decades19 have allowed researchers to search across the full spectrum of the human genome for genetic risk factors involved in neurocognitive disorders20–22, offering invaluable insight into disease mechanisms. However, research on delirium has fallen behind on this aspect, with only a few, relatively underpowered delirium GWAS having been conducted so far23–25. Moreover, those studies did not address comprehensively the biological implications of the potential gene-disease associations and were conducted primarily on individuals of European descent. Apart from genomics, studies of high-throughput layers of molecular data, such as proteins (proteomics)26–29, metabolites (metabolomics)30, lipids (lipidomics)31, gene expression (transcriptomics)32 or DNA methylation sites (epigenomics)33,34 are also gaining track recently with regard to delirium. Although promising in identifying delirium risk biomarkers, these “omics” studies have so far been limited in sample size35 and lacking validation in many cases28.
In the current study we aim to upscale the efforts in identifying genetic and proteomic determinants of delirium risk. To achieve this, we: (1) conducted the largest so far meta-analysis of delirium GWAS datasets, the first to include individuals from diverse ancestries; (2) tested for plasma proteome signatures of incident delirium for up to 16 years of follow-up in UK Biobank (UKB)36; (3) conducted a multi-trait meta-analysis between delirium and Alzheimer Disease, leveraging the shared genetic basis of the two conditions5 and boosting the power to detect genetic associations for delirium37.
Results
APOE gene as delirium genetic risk factor
To identify genetic variants associated with delirium, a multi-ancestry genome-wide association meta-analysis (GWAMA) was conducted on eight sub-cohorts (Supplementary Table 1) from four global ancestries: European (EUR, ncases = 7,988, ncontrols = 549,568, 52.6% of total sample size), Finnish (FIN, ncases = 3,371, ncontrols = 388,560, 37%), African (AFR, ncases = 348; ncontrols = 59,780, 5.7%), South Asian (SAS, ncases = 107; ncontrols = 9,356, 0.9%) and admixed American / Hispanic (AMR, ncases = 117; ncontrols = 39,977, 3.8%). In total, the GWAMA comprised of up to 1,059,130 individuals and 11,931 delirium cases, yielding results for 24,951,028 genetic variants.
Variants at the Apolipoprotein E (APOE) gene and within its close genomic region on chromosome 19 (Figure 1) were significantly associated with delirium. The lead variant rs429358 (C>T, Odds Ratio [95% Confidence Intervals]: 1.6 [1.55 – 1.65], p = 9.7×10−177) is an APOE missense variant, which together with the rs7412 C>T variant forms the APOE-ε4 haplotype (the rs429358-C and rs7412-C alleles), an established risk factor for Alzheimer Disease38. rs7412 also significantly associated with delirium in our meta-analysis (C>T, OR [95% CI]: 0.84 [0.79 – 0.88], p = 1.8×10−11).
The lead variant rs429358 showed population specific genetic effects for delirium in the contributing sub-cohorts (Figure 2). Significant associations were observed in all, except European [Michigan Genomics Initiative (MGI), p = 0.27] and admixed American [All of Us (AoU), p = 0.12] populations from the USA, with USA populations generally showing smaller effect sizes.
Conditional GWAS analysis on the APOE-ε4 haplotype resulted in all variants on the APOE region losing significance (Supplementary Figure 1), suggesting that APOE-ε4 is the sole independent genetic risk factor in the region. On the same analysis, an intronic variant within the ADAM32 gene on chromosome 8 gained significance (rs531178459, p = 3.3×10−8).
Furthermore, we tested to what extend the APOE association with delirium is driven by underlying dementia. Variants in the APOE region remained significant after adjusting for dementia status in the delirium GWAS (Supplementary Figure 2). Specifically, the lead variant from the unadjusted GWAMA, rs429358, showed again a strong association (p = 3.7×10−15). Variants on the SEC14L1 gene on chromosome 17 were also significant in the dementia-adjusted analysis.
Multi-trait analysis between delirium and Alzheimer disease
We conducted a multi-trait analysis of GWAS summary statistics (MTAG) between delirium and Alzheimer disease. MTAG can increase statistical power to detect new genetic associations, by leveraging the shared genetic information between related traits37. Our MTAG analysis identified 10 independent genetic loci associated with delirium, of which 5 replicated in the held-out set (Figure 3 and Supplementary Table 2). The closest genes mapped to the lead replicated variants included: CR1 (rs4844610 A>C; OR [95% CI]: 1.01 [1.006 - 1.014]; p = 1.4×10−8), BIN1 (rs6733839 T>C; OR [95% CI]: 1.015 [1.012 - 1.018]; p = 7×10−25), CLU (rs2279590 T>C; OR [95% CI]: 0.992 [0.989 – 0.995]; 4.2×10−8), MS4A4A (rs1582763 A>G; OR [95% CI]: 0.991 [0.988 - 0.994]) and TOMM40 (rs117310449 T>C; OR [95% CI]: 1.09 [1.07 - 1.1];p =7×10−38).
Protein risk factors for incident delirium
The proteome-wide association analysis on 32,652 European UKB participants (541 cases) revealed 109 out of the 2,919 total proteins, whose plasma levels were significantly associated with incident delirium up to 16-years of follow-up (Figure 4 and Supplementary Table 3). The APOE protein had a negative effect on incident delirium risk, meaning that higher plasma levels of the protein are associated with reduced future risk. The association was significant at the nominal level (0.05), but not after multiple test correction (OR [95% CI]: 0.86 [0.79 - 0.94], p = 7×10−4, Supplementary Table 3).
Further adjusting the individual protein models for APOE-ε4 status did not substantially alter the results (Supplementary Table 4), having highly correlated effect size estimates with the models adjusted only for age, sex and BMI (Pearson’s correlation r=0.99). Additionally including a protein*APOE-ε4 interaction term also yielded highly similar results (main effect correlation r=0.93). No significant protein*APOE-ε4 interaction has observed at a Bonferroni adjusted threshold of p-value < 1.7×10−5 (Supplementary Table 5).
The 109 proteome-wide significant proteins were found to be significantly enriched (q-value < 0.05) in several important inflammation and immune response biological pathways, such as interleukin and Tumour Necrosis Factor (TNF) signalling (Supplementary Table 6).
Protein selection and prediction models
A machine learning framework was applied to further pin down which plasma proteins are robustly associated with incident delirium. This approach revealed 19 proteins (stability-selected proteins, Supplementary Table 7) consistently selected as predictive of incident delirium in the training set. All of the stability-selected proteins represent a subset of the top individually significant proteins identified through the proteome-wide association analysis. The FGL1 protein was removed from subsequent analyses, as it had a non-significant contribution to the re-fit prediction models and was dropped during stepwise regression. The 18 remaining stability-selected proteins provide marginal prediction improvements of incident delirium on an independent test set, compared to predictions based on demographic factors alone (Figure 5 and Supplementary Table 8-9). Specifically, adding the 18 stability-selected proteins to the “basic” model that includes age, sex and BMI as predictors increased the AUC from 0.764 to 0.791, but the increase was not significant (DeLong test p-value = 0.09, Supplementary Table 9). However, adding proteins and APOE-ε4 status to the basic model showed a significant prediction improvement (AUC from 0.764 to 0.794, p-value = 0.049, Supplementary Table 9). Finally, the model fit only with the selected proteins performed worse that the basic model (AUC from 0.764 to 0.729, p-value = 0.21, Figure 5a and Supplementary Table 9). The precision-recall performance showed similar pattern (Figure 5b), with the full models having higher PR-AUC: 0.065 and 0.06 for the “APOE+proteomic+basic” and the “proteomic+basic” models respectively, compared to 0.043 for the basic model.
Discussion
In this analysis we conducted the largest to our knowledge multi-ancestry genome-wide meta-analysis on delirium. Genetic variants on the APOE gene on chromosome 19 were identified as significantly associated with delirium, with the top variant hit, rs429358, showing population-specific association patterns. The APOE gene encodes for APOE, a lipid transporter protein in the periphery and the brain. APOE is a strong risk factor for Alzheimer Disease (AD), through its diverse roles in pathways such as amyloid-β plaque deposition, neuroinflammation and dysregulation of lipid metabolism in the brain39.
The role of APOE gene in delirium is currently unclear, with previous meta-analyses reporting no association between APOE and delirium7,18. In UKB, a previous study on European participants found an association between APOE-ε4 status and delirium (hazard ratio = 3.73 [2.68 - 5.21])40. It has been suggested that interactions between APOE-ε4 and inflammation-related proteins can drive delirium development5. To assess this hypothesis, we tested whether the interaction term between each plasma protein level and APOE-ε4 status significantly associated with incident delirium (Supplementary Table 5). No protein x APOE-ε4 interaction reached significance adjusted for multiple testing (p-value threshold = 1.7×10−5). However, the CEND1 protein exhibited an interaction with APOE-ε4 marginally below threshold (beta(interaction) [95% CI] = 0.27 [0.10 – 0.42]; p(interaction) = 8×10−4). CEND1 is a mitochondrial neural differentiation protein, expressed in the nervous system41. It has previously been implicated in cognitive impairment in mice41 and AD in human42 brains. Additionally, APOE-ε4 expression in astrocytes has been implicated in impaired mitochondrial function43, although to our knowledge CEND1 and APOE have not been linked in previous studies. CEND1 role in delirium has also not been investigated so far.
Ancestry-dependent APOE-ε4 genetic effects on delirium have not been systematically assessed previously. For AD, the risk conferred by APOE-ε4 varies by ancestral background, with African/African Americans and Hispanics having less pronounced risk than white Europeans and Asians44,45. Additionally, higher APOE-ε4 expression levels have been observed in carries of European compared to African ancestry46. Here, findings suggest a similar pattern, that is rs429358-C having a higher effect in European, Finnish and south Asian populations than African and Hispanic/Admixed American. Overall, sub-populations from the USA (AoU, MGI) have weaker APOE effects than those from the UK or Finland (UKB, FinnGen). This might reflect the younger age of participants in USA-based studies (Supplementary Table 1) or phenotypic differences in delirium diagnoses across the healthcare systems of different countries. Age-dependent genetic effects have been previously described for APOE-ε4 with regard to AD44,47 and progression to mild cognitive impairment and AD48, showing increasing effects until an age of 70 – 75, with reduced effect on later ages44,47,48.
It is also possible that underlying dementia is driving the strong association between APOE and delirium observed here. 36% of UKB European delirium cases had a dementia diagnosis, compared to 1.4% in the control group (Supplementary Table 1). This is to be expected given the close relationship of the two disorders5, but it may hide delirium specific genetic effects and overemphasize the role of APOE. To this end, the APOE region remained significant after adjusting for all-cause dementia, although with weaker effect. This result may suggest that APOE association with delirium is not entirely through its role in dementia.
Adjusting for APOE-ε4 attenuated the genetic effects within the whole APOE region. This observation suggests that the significance of the genetic variants in the close proximity is driven by linkage disequilibrium with the APOE-ε4 haplotype, not secondary independent signals. Moreover, an intronic variant on the ADAM32 gene gained significance after adjusting for APOE-ε4. ADAM32 belongs to the ADAM family of metalloproteinases, some of which have been implicated in AD49. ADAM proteins are involved diverse functions, including immunity related pathways50.
Regarding our multi-trait analysis of delirium with AD (MTAG), five genetic loci were found to have a significant effect on delirium, supported by replication in the AoU EUR cohort. Among the replicated loci, several important AD risk genes51 were detected: BIN1, CLU, CR1, MS4A4A, TOMM40. The MS4A4A gene is expressed in macrophages and has been linked with AD52, vascular dementia and systemic lupus erythematosus53. The lead variant’s minor allele on the MS4A4A gene, rs1582763, has been previously associated with decreased risk of AD52. This variant’s association with delirium has not been reported before, but we also found a protective effect of the rs1582763 minor allele on delirium. BIN1 has been recently implicated in the regulation of calcium homeostasis in glutamatergic neurons, and its expression in AD human brains is reduced comparted to healthy brains54. The clusterin gene (CLU), also named Apolipoprotein J (APOJ) codes for a multifactorial protein, with apparent role in neurodegenerative diseases55. CLU, much like APOE is thought to be involved in amyloid-β plaque deposition in AD pathologies. With regard to delirium, protein expression of apolipoproteins including CLU and APOE were previously found to be downregulated in the cerebrospinal fluid (CSF) of delirium subjects compared to mild AD controls29. In our proteomic analysis, CLU protein levels were also downregulated in incident delirium subjects’ plasma (Supplementary Table 3), but not significantly so (p-value > 0.05). This discrepancy may reflect different CLU protein abundance between CSF and plasma tissues. The CR1 gene, implicated in complement activation, is believed to exert its role to AD pathogenesis through amyloid-β clearance, neuroinflammation and tauopathy (the deposition of abnormal tau protein in the brain)56. To the best of our knowledge, the role of CR1 in delirium has not been investigated previously. In our study, CR1 plasma protein levels had a nominally significant association with incident delirium (p-value = 0.013). TOMM40 genetic variants have been associated with AD before57. Given TOMM40‘s close proximity and high linkage disequilibrium with the APOE gene, its role in AD has frequently been contested57. However, it has also been suggested that TOMM40 independently affects AD risk through its role in regulating protein transportation in the mitochondria57,58. TOMM40 has not been previously implicated in delirium. Overall, given the small effect sizes of the MTAG-identified genes for delirium (Supplementary Table 2), and their prominent role in AD, it may be possible that their significance in our analysis is mainly driven due to their role in AD. Nonetheless, our findings suggest novel genes (e.g. MS4A4A, CR1, TOMM40) that may be of relevance to future delirium research and therapeutic targets investigations.
Several neurologically relevant and immune system related proteins have been robustly associated with incident delirium in our proteomic study. For example, plasma GFAP and NEFL (neuronal injury biomarkers) have been previously found increased in postoperative delirium patients59,60. Similarly, BCAN, a protein with a role in brain extracellular matrix formation, has been observed to be downregulated in brains of post-infection delirium and in AD patiens61. Lower plasma levels of SELENOP, an important selenium transporter in the brain has been associated with worse global cognition and AD62. Here, to the best of our knowledge, is the first time that an association between SELENOP and delirium has been reported. Additionally, systemic inflammation markers have been observed among the delirium-associated proteins in our study. For instance, C7, BTLA, FGL1 participate in the immune response, whereas LRG1 and LTA4H in inflammatory processes. LRG1 has been implicated in brain injury after sepsis in mice63, sepsis being a main driver of delirium aetiology3. Interleukins, identified through our enrichment analysis, play an important role in regulating immune response and inflammation and have frequently been implicated in delirium26,35. These results align with the proposed mechanisms of delirium pathophysiology, that is brain vulnerability, indicated by brain injury marker proteins, systemic and nervous system inflammation being driving factors for delirium3,5. At the same time, our results could inform future research in delirium prediction biomarkers, some of which are novel in delirium research (e.g SELENOP, CEND1).
The main strength of this analysis is the large-scale investigation of delirium genetic and proteomic risk factors. Both in terms of sample size and number of genetic variants / proteins tested, this is to our knowledge the largest study on the molecular background of delirium risk conducted so far. Additionally, the long follow-up period after protein measurements allows identification of biomarkers early on, prior to disease manifestation. On the other hand, some of the limitations of the study include the underdiagnosis of delirium in hospital health records from which the phenotype was derived64. This misclassification could introduce noise to the results, limiting discovery of genetic effects. Also, the small sample sizes of the non-European sub-populations hinders the identification of risk factors specific to them. Moreover, not the full spectrum of the human proteome is captured in the assayed plasma proteins, potentially missing proteins important for delirium biology and prognosis. Finally, although the use of plasma proteins as predictive biomarkers is of great significance, proteomic profiles of delirium-relevant tissues, such as brain would be invaluable.
In conclusion, our results point out to an oligogenic genetic architecture for delirium, with the APOE gene identified as a strong, potentially population specific genetic risk factor. However, further replication in larger non-European cohorts is required. Our plasma proteome analysis supports previous findings and discovers novel proteins implicated to delirium. Taken together, genetic and proteomic risk factors suggest a shared aetiology between delirium and dementias, possibly contributing to a better understanding of delirium’s complex biological origin and the discovery of clinically relevant biomarkers.
Methods
Study populations
The project utilises biomedical data from ancestrally diverse large-scale cohorts. Included cohorts were either (a) databases containing individual-level genomic measurements linked to healthcare records, or (b) previously published summary results from genomic studies on delirium phenotypes. Contributing individual-level cohorts include the UK Biobank65,66 (UKB) and the All of Us Research Program67 (AoU). Summary results have been obtained from ancestrally Finnish (FinnGen68; ncases = 3,371; ncontrols = 388,560) and European participants (Michigan Genomics Initiative cohort (MGI)69; ncases = 160; ncontrols = 44,654).
The UKB is a population-based prospective study, containing a rich set of genetic and phenotypic data for approximately 500,000 participants living across the United Kingdom. Participants, aged 40 to 69 years old at recruitment between 2006 – 2010, have been linked to their annually updated electronic health records, allowing longitudinal investigation of healthcare outcomes. Similarly, AoU includes genomic data and healthcare outcomes for approximately 245,000 individuals from diverse populations in the USA.
Delirium phenotype
The analysis focused on delirium episodes that were not triggered by substance intoxication or withdrawal2. For convenience, such delirium episodes will hereby be referred as simply delirium. Delirium cases were defined as individuals with one or more delirium-corresponding codes in their electronic health records (EHR), that is hospital inpatient, death register or primary care data. The relevant codes were: “F05” (delirium, not induced by alcohol and other psychoactive substances) for International Classification of Diseases, 10th version (ICD-10)70 and “293.0” (Acute confusional state) for ICD-971. Read v2 and v3 codes72 for primary care data mapping to delirium were obtained from a previously defined list by Kuan et al (2019)73, published in the HDRUK Phenotype Library (https://phenotypes.healthdatagateway.org/).
Discovery of genetic risk factors
A Genome-Wide Association Study (GWAS) framework was implemented in order to identify genetic variants associated with delirium in UKB’s and AoU’s ancestrally distinct sub-populations. For this purpose, the REGENIE software (version 3.2.2) was used, which carries out a logistic regression analysis between a disease phenotype and each genetic variant, accounting for covariates, population structure and relatedness of participants74.
In UKB, the set of imputed genotypes was used65 (Data-Field 22828), filtered to include variants with > 5 minor alleles in cases and controls, imputation score > 0.5, missingness rate < 3% and deviation from Hardy-Weinberg Equilibrium with p-value < 10−6. Individuals were filtered to exclude those with missingness rate > 5%, no mismatch between reported and genetically inferred sex (Data-Field 22001), no sex chromosome aneuploidy (Data-Field 22019), no excessive heterozygosity (Data-Field 22027) and no more than ten 3rd degree relatives (Data-Field 22021). The covariates considered for the UKB GWAS included: age, sex, genotyping batch (Data-Field 22000) and the first 20 pre-computed genomic principal components (data-field 22009). Here, age was defined as age at first delirium occurrence for cases and age at last data freeze (31 October 2022) or age at death for controls. GWAS were conducted separately for sub-populations of white British ancestry (EUR; ncases = 7,176; ncontrols = 385,097), African (AFR; black / black British; ncases = 115; ncontrols = 7,480) and south Asian (SAS; ncases = 107; ncontrols = 9,356) ethnic backgrounds. Summary statistics from the UKB GWAS were converted from GRCh37 to GRCh38 genomic coordinates using the LiftOver software75. In total the analysis covered approximately 22.5, 13.6 and 8.8 million genetic variants in EUR, AFR, SAS ancestries respectively.
In AoU, short read whole genome sequencing genotypes were used67 for conducting GWAS on European (EUR; ncases = 652; ncontrols = 119,817), African/African American (AFR; ncases = 233; ncontrols = 52,300), and admixed American/Hispanic (AMR; ncases = 117; ncontrols = 39,977) sub-populations. The same GWAS framework as described for UKB was followed, with the exception of not including genotyping batch and principal components 11-20 as covariates, as they were not available in the AoU datasets. Sub-populations with a low number of delirium cases (<20) were excluded from the analysis. Those consisted of east Asian ethic background in UKB and east Asian and middle Eastern genetic ancestries in AoU.
In order to increase power to detect genetic associations, our GWAS summary statistics and previously published GWAS results were combined into a multi-ancestry genome-wide meta-analysis. The METAL software (version 2020-05-05)76 was used to conduct a fixed effects inverse-variance meta-analysis on the set of 24,951,029 variants that were present in at least two studies. The genomic control correction method was applied in METAL to allow for multi-ancestry analysis. In total, up to 1,059,130 individuals (ncases = 11,931; ncontrols = 1,047,199) were included in the meta-analysis. Genome-wide significance was considered at a 5×10−8 p-value threshold. Significantly associated variants at the multi-ancestry meta-analysis were inspected for consistency at each contributing sub-cohort.
Conditional GWAS analysis on the APOE-ε4 haplotype count (0, 1 or 2) was conducted on the UKB EUR sub-cohort, to identify genetic variants associated with delirium independently of APOE. We inferred APOE-ε4 haplotypes for each participant based on their rs429358 and rs7412 genotypes, as described in previous studies38. Additionally, a sensitivity delirium GWAS adjusting for all-cause dementia was conducted in the UKB EUR sub-cohort, by including all-cause dementia status as covariate. The same framework as described for the UKB GWAS using REGENIE was implemented.
Wherever reported, Odds Ratios (OR) were calculated as OR = eβ, where β is the logistic regression coefficient. 95% Confidence Intervals (CI) for the ORs were calculated as OR95% CI = eβ ± 1.96 ∗ SEβ ∗ eβ.
Multi-trait analysis
Given the close inter-relationship between delirium and Alzheimer Disease (AD)5, we applied a joint analysis of summary statistics between delirium and AD. Such approach increases statistical power to detect genetic associations for each trait37. The multi-trait analysis of GWAS (MTAG) software was used for this purpose37, jointly analysing summary statistics from our delirium meta-analysis – excluding the AoU sets – and the largest to-date AD GWAS meta-analysis (Bellenguez, et al 2022)51. The AD GWAS was conducted on 487,511 European individuals (Stage I: 39,106 clinically diagnosed AD cases and 46,828 proxy AD cases) and 21 million variants. AD summary statistics were obtained from European Bioinformatics Institute GWAS Catalog (https://www.ebi.ac.uk/gwas/) under accession no. GCST90027158. The MTAG analysis for the discovery of delirium genetic risk variants was conducted on 9,883,704 SNPs that overlapped across the two disorders, filtered for minor allele frequency >= 0.01 and sample size N >= (2/3) * 90th percentile for each trait. MTAG results are trait-specific summary statistics (i.e., effect estimates, standard errors and p-values), interpreted similarly with single-trait GWAS results. The genome-wide significance threshold was defined as a p-value = 5×10−8. Independent lead variants were defined as the most significant variants within a ±500kb region, using the GWASLab python package (version 3.4.46)77.
The AoU EUR set was held out for replication of the MTAG lead hits. A multi-trait analysis with AD was conducted on the replication set as described above. Lead variants were considered replicated if they had a Bonferroni adjusted p-value < (0.05 / number of lead variants) and same direction of effect across the discovery and replication set.
Proteomic study population
Plasma proteome data were available in UKB for a subset of 53,075 participants. Protein measurements of 2,923 unique plasma proteins78 were derived from blood samples taken during randomly selected participants’ initial UKB assessment visit between 2006-2010. Proteins were measured using the antibody-based Olink Explore 3072 proximity extension assay. Proteome data have been previously undergone extensive quality control78. As additional filtering in the present analysis, European participants from batches 0 to 6 were extracted, as they have been reported to be highly representative of the UKB European population78. Moreover, protein measurements with >20% missing data were removed and the remaining proteins were mean-imputed, inverse-rank normalised and standardised to ensure homogeneity across the proteins. Delirium incident cases were defined as the participants whose first reported delirium episode was > 1 years after baseline, that is, the date of blood sample collection at the first UKB assessment visit (Data-Field 53-0.0). Delirium data were available for up to 16 years of follow-up after baseline. The final population consisted of 32,652 European participants and 2,919 plasma proteins, including 32,111 controls and 541 delirium incident cases.
Discovery of protein risk factors
To explore the relationships between baseline protein levels and incident delirium, we performed a proteome-wide association analysis on the UKB proteomic study population. Multivariable logistic regression models were fit between each protein as predictor and incident delirium status as outcome. In total, 2,919 models were fit, equal to the number of proteins. The models were adjusted for sex, Body Mass Index (BMI) and age at baseline. Associations were deemed significant at a Bonferroni adjusted p-value threshold: p-value < 1.7×10−5. For sensitivity analyses, we further adjusted protein models for APOE-ε4 haplotype status – zero, one or two copies of the haplotype – and for interaction between each protein and APOE-ε4.
We performed a pathway enrichment analysis on the set of Bonferroni-adjusted significant proteins emerging from the protein-wide association analysis. We used the Enrichr79 web-based tool and pathway annotations based on the Reactome80, Molecular Signatures Database (MSigDB)81 and Kyoto Encyclopedia of Genes and Genomes (KEGG)82 databases. The full set of Olink proteins was used as background genes. Fisher’s exacts tests were implemented to assess whether the identified proteins significantly overlap with the proteins in any of the pathways. Q-values were obtained by adjusting p-values for multiple testing using the Benjamini-Hochberg method.
Protein selection and prediction models
To investigate whether baseline plasma proteome can improve prediction of incident delirium, a supervised machine learning approach was implemented. The LASSO (Least Absolute Shrinkage and Selection Operator) method83 was utilised for the selection of important proteins and to avoid overfitting given the high multicollinearity of proteomics data. For this analysis, the full set was randomly split into a training (80%; ncases = 436; ncontrols = 25,733) and test (20%; ncases = 105; ncontrols = 6,378) set. A LASSO model for binary outcomes was implemented in the training set using the glmnet R package (version 4.1.8)84. Here, the whole set of 2,919 proteins adjusted for demographic covariates: age, sex and BMI were used as predictors of incident delirium. In brief, the coefficients penalty parameter lambda was tuned using a 10-fold Cross-Validation (CV) framework for 100 lambdas between 10−6 and 0.07. The model with lambda.1se was chosen as the most parsimonious, giving the strictest model such that cross-validated error is within one standard error of the minimum84. To increase the robustness of the LASSO protein selection, a stability selection85 approach was additionally applied. For each of 100 random subsampling iterations of the training set, including all 436 delirium cases and an equal number of 436 randomly selected controls, a LASSO model as described above was fit using the penalty factor tuned in the full training set. The proteins that were selected on at least half of the subsampling iterations were chosen as robustly selected (stability-selected proteins).
Four logistic regression models with incident delirium as outcome were subsequently re-fit in the training set: (a) using only demographic covariates (age, sex and BMI) as predictors (basic model); (b) using only the stability selected proteins as predictors (proteomic model); (c) using both demographic covariates and the stability selected proteins as predictors (proteomic + basic model); and (d) using demographics, stability selected proteins and APOE-ε4 haplotype status as predictors (APOE + proteomic + basic model). For the models that included proteins as predictors (b,c and d), stepwise regression models were fit, starting with all the stability-selected proteins and removing predictors until no AIC improvement was observed.
The performance of the models was evaluated in the held-out test set. Receiver operating characteristic (ROC) curves, Area Under the ROC Curve (AUC) and Precision – Recall curves and AUC (PR-AUC) estimates were used to compare the predictive performance of the three models in the test set. Precision – Recall metrics were chosen as they are more sensitive to binary outcome imbalance86, as is the situation here. Two-sided Delong tests were used to compare whether AUCs were significantly different between each model pair87.
Data availability
Details for accessing individual-level data can be found here:
for UK Biobank https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access
for All of Us Research Programme https://www.researchallofus.org/register
Details on obtaining delirium GWAS summary statistics used in this work can be found here:
for FinnGen R10 release https://www.finngen.fi/en/access_results
for MGI freeze 3 https://precisionhealth.umich.edu/our-research/michigangenomics
Summary statistics generated in this work will be made publicly available upon publication.
Code availability
All software used in the present study is publicly available. The code used for the all analyses in the study will be made available via GitHub upon publication.
Acknowledgments
This work used the Edinburgh Compute and Data Facility (ECDF) (http://www.ecdf.ed.ac.uk/).
This research has been conducted using the UK Biobank Resource project 788. This work uses data provided by patients and collected by the NHS as part of their care and support.
We gratefully acknowledge All of Us participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health’s All of Us Research Programme for making available the participant data examined in this study.
We want to acknowledge the participants and investigators of the FinnGen study.
The authors acknowledge the Michigan Genomics Initiative participants, Precision Health at the University of Michigan, the University of Michigan Medical School Central Biorepository, and the University of Michigan Advanced Genomics Core for providing data and specimen storage, management, processing, and distribution services, and the Center for Statistical Genetics in the Department of Biostatistics at the School of Public Health for genotype data curation, imputation, and management in support of the research reported in this publication.
This research was funded by the Legal & General Group (research grant to establish the independent Advanced Care Research Centre at University of Edinburgh). The funder had no role in conduct of the study, interpretation or the decision to submit for publication. The views expressed are those of the authors and not necessarily those of Legal & General.
For the purpose of open access, the author has applied a CC-BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.
- 10.↵
- 11.
- 12.↵
- 13.
- 14.↵
- 15.
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.
- 22.↵
- 23.↵
- 24.
- 25.↵
- 26.↵
- 27.
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵