Structured Abstract
Importance Psychiatric disorders display high levels of comorbidity and genetic overlap, necessitating multivariate approaches for parsing convergent and divergent psychiatric risk pathways. Identifying gene expression patterns underlying cross-disorder risk also stands to propel drug discovery and repurposing in the face of rising levels of polypharmacy.
Objective To identify gene expression patterns underlying genetic convergence and divergence across psychiatric disorders along with existing pharmacological interventions that target these genes.
Design This genomic study applied a multivariate transcriptomic method, Transcriptome-wide Structural Equation Modeling (T-SEM), to investigate gene expression patterns associated with four genomic factors indexing shared risk across 11 major psychiatric disorders. Follow-up tests, including overlap with gene sets for other outcomes and phenome-wide association studies, were conducted to better characterize T-SEM results. Public databases describing drug-gene pairs were used to identify drugs that could be repurposed to target genes found to be associated with cross-disorder risk.
Main Outcomes and Measures Gene expression patterns associated with genomic factors or disorder-specific risk and existing drugs that target these genes.
Results In total, T-SEM identified 451 genes whose expression was associated with the genomic factors and 41 genes with disorder-specific effects. We find the most hits for a Thought Disorders factor defined by bipolar disorder and schizophrenia. We identify 39 existing pharmacological interventions that could be repurposed to target gene expression hits for this same factor.
Conclusions and Relevance The findings from this study shed light on patterns of gene expression associated with genetic overlap and uniqueness across psychiatric disorders. Future versions of the multivariate drug repurposing framework outlined here have the potential to identify novel pharmacological interventions for increasingly common, comorbid psychiatric presentations.
Psychiatric disorders are highly polygenic and genome-wide analyses have identified hundreds of loci associated with these disorders.1–3 However, translating these findings into biological and clinical insights has been difficult. Many causal genetic variants are likely to be expression quantitative trait loci (eQTL) that affect trait variation via modulation of gene expression.4–6 Transcriptome-wide association studies (TWAS) can be used to quantity the effect of gene expression itself, the more proximal predictor, on a particular trait. TWAS carries the additional advantage of distilling hundreds of associated genetic variants into the more biologically interpretable subunit of gene-level results. Despite its scientific value, a rate limiting challenge for transcriptomic analyses is the pragmatic difficulty of obtaining gene expression data in relevant tissues, and most such studies of psychiatric disorders have relied on limited resources of postmortem brain tissue samples. Summary-based TWAS circumvents this issue by imputing gene expression for traits for which only genome-wide association study (GWAS) summary statistics are available. This approach has yielded insight into gene expression patterns associated with ADHD,7 major depression,8 treatment resistant depression,9 and autism spectrum disorder.10 This is with the interpretive caveat that prior findings do not account for pervasive genetic overlap across disorders.
Many gene expression patterns associated with psychiatric risk are unlikely to show diagnostic specificity given genetic correlations that indicate highly overlapping genetic signal across the psychiatric phenome.11–13 Transcriptome-wide Structural Equation Modeling (T-SEM)14 is a recently introduced extension of Genomic SEM15 for modeling tissue-specific gene expression within a multivariate network of genetically overlapping traits. We apply T-SEM here to investigate the effect of gene expression on four psychiatric genomic factors defined by the shared genetic signal across 11 childhood- and adult-onset psychiatric disorders. We also estimate the T-SEM heterogeneity metric (QGene), an index of genes with disorder-specific effects that allows for unpacking genetic sources of divergence for correlated traits.
Transcriptomic results can also be used to inform drug repurposing when combined with open-source databases that describe existing drugs that target specific genes. Drug repurposing can circumvent an otherwise costly and time-consuming drug development pipeline16 and may propel novel avenues for pharmacotherapy. A transcriptomics informed drug discovery approach has already been applied to identify potential pharmacological interventions for individual psychiatric disorders.9,17 Given psychiatric comorbidity that is the norm rather than the exception,18,19 it is unsurprising that polypharmacy is equally common, with most help-seeking children20 and adults21 prescribed ≥ 2 psychiatric medications from a single community visit. As T-SEM identifies gene expression patterns that transcend diagnostic boundaries, these may function as efficacious pharmacological targets that aid in reducing polypharmacy for comorbid or mixed clinical presentations.
Method
Univariate TWAS in FUSION
Transcriptomic imputation was performed using the FUSION software6 applied to publicly available, European GWAS summary statistics for 11 major psychiatric disorders (Table 1). Imputation was performed using 16 sets of functional weights including the 13 brain-based tissue types from the Genotype-Tissue Expression project (GTEx v8),22 the two dorsolateral prefrontal cortex (dlPFC) datasets from the CommonMind Consortium (CMC),23 and the prefrontal cortex data from PsychENCODE.24 Reference alleles were aligned across univariate GWAS summary statistics and, when this information was available, restricted to SNPs with a minor allele frequency minor allele frequency (MAF) > 1% and an imputation (INFO) score > 0.6. We restricted results to genes with an imputation accuracy (R2) > 0.7 and only performed imputation for genes with a maximum of 50% of the SNPs with functional weights missing. There were 77,943 tissue-specific gene expression estimates across all 11 disorders.
LD-score Regression and Psychiatric Factor Model
GWAS summary statistics were processed with the munge function prior to running multivariable LD-score Regression (LDSC)25 implemented within Genomic SEM.15 Multivariable LDSC produces the genetic covariance matrix and sampling covariance matrix that index genetic overlap and sampling dependencies across estimates, respectively, and are used as input to T-SEM. We used the LD score for the European subsample of 1000 Genomes, excluding the major histocompatibility complex (MHC) region, for LDSC estimation. All heritability estimates were converted to the liability scale by inputting the population prevalence and the sum of the effective sample size.26 To produce a set of findings comparable to previous work, we used the same model of these 11 disorders from Grotzinger et al. 2022,13 that provided good fit to the current data (CFI = .976, SRMR = .097; Figure 1). This model consists of four correlated genomic factors reflecting subsets of (i) Compulsive, (ii) Thought, (iii) Neurodevelopmental, and (iv) Internalizing disorders.
Transcriptome-wide Structural Equation Modeling
Transcriptome-wide Structural Equation Modeling (T-SEM) is a two-stage approach for examining the effect of tissue-specific gene expression within a multivariate model of genetically overlapping traits. In Stage 1, FUSION and LDSC are applied and the tissue-specific estimates for each gene produced by TWAS are used to expand both the genetic covariance and sampling covariance estimates. The read_fusion function was used to scale the univariate TWAS output to be partially standardized gene-phenotype covariances. Genomic inflation was corrected for by multiplying TWAS standard errors by the univariate LDSC intercept when this intercept was > 1. In Stage 2, the effect of tissue-specific gene expression on the four psychiatric factors was estimated. We also produce gene- and factor-specific metrics of heterogeneity via the QGene statistic, a metric that identifies genes likely to have disorder specific effects.14 QGene is estimated as the model χ2 difference test comparing a (i) a common pathways model in which the gene solely predicts the psychiatric factor to an (ii) an independent pathways model in which the gene directly predicts the psychiatric disorders that define the factor.
To better characterize transcriptomic results, we ran five follow-up tests (described in detail in Online Supplement). First, Bayesian colocalization analyses were utilized to determine the probability of shared (i.e., colocalized) causal variants across gene expression and the psychiatric factors. Second, the variance explained in nearby multivariate GWAS estimates by gene expression results was calculated. Third, joint analyses were conducted to estimate the conditional significance of physically proximal genes and genes with expression weights across multiple tissues. Conditional findings should be interpreted with caution as genes with nonsignificant conditional associations may simply have higher correlations with other genes. Fourth, Over Representation Analyses (ORA) were conducted using WebGestalt27 to identify overlap between factor hits and gene sets for external traits. The number of gene expression hits was either too small for these analyses or did not produce significant findings for any outcome except the Thought disorders factor. Fifth, we performed phenome-wide association studies (PheWAS) within the Vanderbilt BioVU European (n=70,439) and African Ancestry (n=15,174) participants to assess the clinical phenotypes associated with gene expression hits for the psychiatric genomic factors. PheWAS analyses used a strict Bonferroni correction for the number of tested associations across factors, genes, and phecodes (p < 1.81E-7). We did not detect any significant associations for individuals of African ancestry, which is likely due to reduced power. Nominally significant associations with psychiatric phenotypes are considered in the Online Supplement.
Multivariate GWAS of Psychiatric Factors
Multivariate GWAS summary statistics were estimated for the psychiatric factors to perform the follow-up colocalization and conditional analyses described above. This was achieved by estimating the SNP effects on the four correlated factors using the userGWAS function within Genomic SEM. Prior to running userGWAS, the 11 GWAS summary statistics for the disorders were run through the sumstats function to align results to the same reference allele, partially standardize estimates with respect to the trait variance, and restrict to MAF > 0.5% and INFO > 0.9 when this information was available. This resulted in a final set of 4,863,931 genetic variants.
Drug Repurposing
We utilize the Drug-Gene Interaction Database (DGIdb v.4.2.0) and the Broad Institute Connectivity Map (C-MAP) Drug Repurposing Database28 to identify existing pharmacological interventions that target T-SEM gene expression hits. These databases include information on the mechanisms of action (MOA) for each drug (e.g., antagonist). This allowed for matching drugs to genes that are likely to have therapeutic, as opposed to adverse, effects based on whether upward or downward expression was associated with psychiatric risk. Oncology, vaccine, and antibody drugs were removed as these forms of drug administration are less likely to be relevant to psychiatric treatment. In line with prior quality control of the DGIdb resource, only drug-gene pairs with interaction scores > 0.5 were retained.29 The interaction score reflects the product of supporting publications and the relative drug- and gene-specificity, such that higher values indicate greater support of a drug-gene interaction.30 This resulted in a final list of 3,678 and 5,638 drug-gene pairs from DGIdb and C-MAP, respectively. Drug-gene pairs were cross-referenced with the factor and QGene hits, though we find no significant results for QGene. In line with prior drug repurposing pipelines,31 we remove genes whose predicted expression was directionally discordant across tissue types for a given factor as the desired MOA is unclear.
Results
Hits on the factors reflect genes that were significant at p < 1.60E-7, reflecting the Bonferroni corrected threshold for the number of tested genes and factors (.05/[77,943Genes × 4Factors]), and were not QGene significant using the same threshold. As many genes are present across multiple tissues, we focus on results for unique gene IDs.
Compulsive Disorders Factor
Results revealed 2 hits for the Compulsive disorders factor, both of which were hits for anorexia nervosa (AN): C3orf62 and DALRD3 (Supplementary Table 1; Figure 2 for Miami plot). C3orf62 was also conditionally significant, explained 84.1% of the variance in nearby GWAS results, and was supported by a model of colocalized gene expression and GWAS associations (Supplementary Table 2). There were 2 significant QGene hits (SLC25A27; PPAPDC1A) with directionally opposing effects across AN, obsessive compulsive disorder, and Tourette’s syndrome (Supplementary Table 3; Supplementary Figure 1). No significant PheWAS associations or drug-gene pairs were identified for this factor.
Thought Disorders Factor
We identify 394 gene expression hits for the Thought disorders factor defined by bipolar disorder (BIP) and schizophrenia (SCZ; Supplementary Table 4). Highlighting the ability of T-SEM to leverage shared power across genetically overlapping traits, this included 79 novel genes that were not significant for any of the univariate disorder TWAS for any reference tissue. Most hits (63.1%) were supported by a model of colocalized gene expression and GWAS (mean poster probability across all hits = .528), while a smaller percentage (27.3%) indicated independent associations (mean posterior probability = .275). There were 198 conditionally significant factor hits that explained an average of 67.6% of the variance in nearby GWAS estimates (Supplementary Table 5). There were 31 significant QGene hits (Supplementary Table 6), 24 of which were significant hits for SCZ.
WebGestalt
ORA analyses identified significant overlap across Thought Disorders factor hits and eight gene sets, including gene sets for BIP and SCZ (Supplementary Table 7). The most significant gene set was for Alexander Disease, a rare and fatal neurological disorder that causes enlargement of astrocytes.32 In addition, there were several autoimmune related gene sets, including: autoimmune thyroid disease, systemic lupus erythematosus, and both immunoglobin A (IgA) deficiency and dysgammaglobulinaemia, which are themselves linked to increased risk for autoimmune diseases.33 PheWAS results in BioVU participants of European ancestry revealed 31 significant gene expression-phenotype associations, including 7 genes within the human leukocyte antigen (HLA) region linked to autoimmune diseases such as type I diabetes, multiple sclerosis, and celiac disease (Supplementary Table 8; Supplementary Figure 2). PheWAS results also identified several significant associations between GAS8 and CPNE7 expression and several skin cancer phenotypes.
We removed 36 unique genes with a mixture of positive and negative associations across tissue types for the factor hits. This yielded a final list of 358 unique genes associated with the Thought disorders factor for drug discovery follow-up. Drug repurposing results revealed a total of 39 drug-gene pairs across the DGIdb and C-MAP databases (Table 2). These pharmacological interventions may target the shared risk pathways across BIP and SCZ.
Neurodevelopmental Disorders Factor
Results revealed 1 hit for the Neurodevelopmental disorders factor for the KDM4A gene that explained 94.4% of the variance in nearby GWAS estimates (Supplementary Tables 9-10). There were 4 hits for QGene, 3 of which showed moderate, directionally opposing effects across traits (Supplementary Table 11). No significant PheWAS associations or drug-gene pairs were identified for this factor.
Internalizing Disorders Factor
There were 54 hits for the Internalizing disorders factor, 51 of which overlapped with univariate hits for major depressive disorder (MDD; Supplementary Table 12). Most hits (56.1%) were supported by a model of colocalized gene expression and GWAS (mean posterior probability = .543), while ∼30% were supported by a model of independent expression and GWAS associations (mean posterior probability = .305). There were 25 conditionally significant genes that explained an average of 83.4% of the variance in nearby GWAS estimates (Supplementary Table 13). There were 4 significant hits for QGene (Supplementary Table 14). PheWAS results revealed associations with 14 autoimmune phenotypes driven by chromosome 6 genes (Supplementary Table 15). No significant drug-gene pairs were identified.
Discussion
The current findings reflect a comprehensive transcriptomic examination of 11 major psychiatric disorders and the first strictly multivariate analyses of gene expression patterns associated with psychiatric risk. Applying T-SEM, we identified a cumulative set of 451 genes associated with the genomic factors. Results for the Compulsive and Neurodevelopmental factors were limited, likely reflecting the lower power for the traits defining these factors. Hits for the Internalizing factor included RPL31P12, a gene previously shown to be potentially causally associated with MDD34 and a top hit for a cross-trait analysis of MDD and insomnia.35 Another Internalizing hit, RAB27B, was recently associated with MDD, risk for Alzheimer’s disease, and a range of associated outcomes, including increased tangles, β-amyloid and cognitive decline.36
Among the top hits for the Thought disorders factor were NEK4 in the prefrontal cortex, which has been previously linked to BIP and SCZ,37 and ZSCAN2 expression in the prefrontal cortex, which has been associated with SCZ.38 Our multivariate findings indicate these prior univariate results likely index gene expression patterns underlying the shared risk pathways across BIP and SCZ. We also observed 24 QGene hits that were specific to SCZ, including the CRHR-IT1 that has been linked to loneliness,39 and VPS29, a gene uniquely associated with risk for SCZ in prior conditional analyses.40 In addition, the TBC1D5 SCZ and QGene hit has been found to interact with VPS29 to regulate the retromer complex,41 a highly conserved subunit associated with early onset Alzheimer’s disease.42 These SCZ-specific genes may reflect central components of the divergent risk pathways across SCZ and BIP.
We identified 39 drug-gene pairs for the Thought disorders factor. Lending support for the proposed link between immune dysfunction and psychiatric risk,43 several identified drugs were originally developed for autoimmune disorders. For example, Teneliximab was designed as a treatment for autoimmune diseases as a drug that targets the immune system involved CD40 gene.44 In addition, Larazotide Acetate is prescribed for the autoimmune disorder celiac disease.45 This subset of drugs is consistent with the ORA and PheWAS follow-up analyses that revealed overlap across Thought Disorders factor hits and autoimmune outcomes. These outcomes included autoimmune thyroid disease, which has itself been linked to increased risk for schizophrenia46 and psychosis.47
Antihypertensive drugs (e.g., Amlodipine, Nitrendipine, Nimodipine) that target the CACNA1D Thought disorders hit were also identified. The CACNA1D gene codes for the Cav1.3 α1 subunit, which regulates L-type calcium channels (LTCC) that are increasingly associated with psychiatric risk48 and considered viable therapeutic targets.49 Consistent with our results that CACNA1D is specifically associated with shared genetic risk pathways across BIP and SCZ, LTCC antagonist prescriptions were recently associated with reduced psychiatric hospitalizations for both disorders.50 Nimodipine has also been linked to improved cortical efficiency during working memory tasks, indicating this may have therapeutic value for the treatment of negative symptoms, such as cognitive dysfunction.51
Finally, we highlight that Riluzole, an antiglutamatergic agent targeting the KCNN3 hit, was found to be effective in treating negative symptoms in a randomized control trial for schizophrenia.52 The therapeutic potential of a pharmacological intervention targeting the glutamatergic system is further supported by increasing evidence linking this system to psychiatric risk.53 Prior findings that indicate the potential of Nimodipine and Riluzole to alleviate negative symptoms is exciting as these are typically untreated using current standard approaches.54 Within BIP, these two drugs may be especially useful for treating the cognitive deficits more often observed in individuals who experience mania with psychotic features, as compared to those with nonpsychotic mania.55
Limitations and Future Work
Several limitations should be noted. Most importantly, we highlight that results were performed strictly in European samples due to the limited availability of data for other ancestries. It is our hope that in the next few years the expansion of GWAS datasets for different ancestries will reach a sufficient size to extend these findings more representatively. As this work was conducted using common variants with an MAF > 1%, it will also be important to extend this framework and analyses to utilize whole-exome and -genome sequencing datasets. We note also that an analytic pipeline that utilizes a GWAS of the disorders does not account for individual differences in treatment response itself. Future work might then seek to identify the gene expression patterns for individuals with a treatment-resistant form of the disorder. Examining gene expression patterns associated with treatment nonresponse could be used to identify interventions for the subset of individuals most likely to require novel pharmacological approaches.
Conclusion
We apply T-SEM to 11 major psychiatric disorders to identify patterns of gene expression underlying both genetic convergence and divergence. As GWAS, gene expression, and drug-gene datasets continue to rapidly expand, the multivariate analyses employed here have the potential to identify novel drug targets with relevance for comorbid clinical presentations. Highlighting the promise of T-SEM informed drug repurposing, our analyses identify 39 drugs that target genes associated with the shared risk pathways across BIP and SCZ. Lending support to these findings, a subset of identified drugs have prior evidence from clinical trials indicating their therapeutic potential for one or both disorders. With recent advances in psychiatric genetics, we are finally beginning to deliver on one of the initial promises of psychiatric GWAS to propel drug discovery.
Data Availability
The data that support the findings of this study are all publicly available or can be requested for access. Specific download links for various datasets are directly below. Summary statistics for data from the Psychiatric Genomics Consortium (PGC) can be downloaded or requested here: https://www.med.unc.edu/pgc/download-results/ Summary statistics for the Anxiety phenotype can be downloaded here: https://drive.google.com/drive/folders/1fguHvz7l2G45sbMI9h_veQun4aXNTy1v Links to the LD-scores, reference panel data, and the code used to produce the current results can all be found at: https://github.com/GenomicSEM/GenomicSEM/wiki Links to the functional reference weights from GTEx version 8 and CMC can be found here: http://gusevlab.org/projects/fusion/ Links to the functional reference weights from PsychENCODE can be found here: http://resource.psychencode.org/ The DGIdb database can be found here (results were downloaded on March 4th, 2022) https://www.dgidb.org/ The C-MAP Repurposing database can be found her (results were downloaded on March 4th, 2022): https://clue.io/repurposing-app
https://www.med.unc.edu/pgc/download-results/
https://drive.google.com/drive/folders/1fguHvz7l2G45sbMI9h_veQun4aXNTy1v
https://github.com/GenomicSEM/GenomicSEM/wiki
http://gusevlab.org/projects/fusion/
Code Availability
GenomicSEM software (including the T-SEM extension), is an R package that is available from GitHub at the following URL: https://github.com/GenomicSEM/GenomicSEM
Directions for installing the GenomicSEM R package can be found at: https://github.com/GenomicSEM/GenomicSEM/wiki
The FUSION software used for univariate TWAS analyses is available here: http://gusevlab.org/projects/fusion/
The WebGestalt software documentation can be found here and the R package directly installed from the R console: http://www.webgestalt.org/
Data Availability
The data that support the findings of this study are all publicly available or can be requested for access. Specific download links for various datasets are directly below.
Summary statistics for data from the Psychiatric Genomics Consortium (PGC) can be downloaded or requested here:
https://www.med.unc.edu/pgc/download-results/
Summary statistics for the Anxiety phenotype can be downloaded here: https://drive.google.com/drive/folders/1fguHvz7l2G45sbMI9h_veQun4aXNTy1v
Links to the LD-scores, reference panel data, and the code used to produce the current results can all be found at: https://github.com/GenomicSEM/GenomicSEM/wiki
Links to the functional reference weights from GTEx version 8 and CMC can be found here: http://gusevlab.org/projects/fusion/
Links to the functional reference weights from PsychENCODE can be found here: http://resource.psychencode.org/
The DGIdb database can be found here (results were downloaded on March 4th, 2022) https://www.dgidb.org/
The C-MAP Repurposing database can be found her (results were downloaded on March 4th, 2022): https://clue.io/repurposing-app
Declaration of Interests
J.W.S. is a member of the Leon Levy Foundation Neuroscience Advisory Board, the Scientific Advisory Board of Sensorium Therapeutics (with equity), and PI of a collaborative study of the genetics of depression and bipolar disorder sponsored by 23andMe for which 23andMe provides analysis time as in-kind support but no payments.
Acknowledgements
The work presented here was supported by a gift from the Tommy Fuss Fund. ADG was supported by NIH Grants R01MH120219 and RF1AG073593. KS is supported by AHA827137. The dataset(s) used for the BioVU analyses described were obtained from Vanderbilt University Medical Center’s BioVU which is supported by numerous sources: institutional funding, private agencies, and federal grants. These include the NIH funded Shared Instrumentation Grant S10RR025141; and CTSA grants UL1TR002243, UL1TR000445, and UL1RR024975. Genomic data are also supported by investigator-led projects that include U01HG004798, R01NS032830, RC2GM092618, P50GM115305, U01HG006378, U19HL065962, R01HD074711; and additional funding sources listed at https://victr.vumc.org/biovu-funding/. We add that the current analyses would not have been possible without the enormous efforts put forth by the investigators and participants from ENIGMA, the Psychiatric Genetics Consortium, iPSYCH, and UK Biobank.