Subset-based method for cross-tissue transcriptome-wide association studies improves power and interpretability

Xinyu Guo; Nilanjan Chatterjee; Diptavo Dutta

doi:10.1101/2023.01.11.23284454

Abstract

Integrating results from genome-wide association studies (GWAS) and studies of molecular phenotypes like gene expressions, can improve our understanding of the biological functions of trait-associated variants, and can help prioritize candidate genes for downstream analysis. Using reference expression quantitative trait loci (eQTL) studies, several methods have been proposed to identify significant gene-trait associations, primarily based on gene expression imputation. Further, to increase the statistical power by leveraging substantial eQTL sharing across tissues, meta-analysis methods aggregating such gene-based test results across multiple tissues or contexts have been developed as well. However, most existing meta-analysis methods have limited power to identify associations when the gene has weaker associations in only a few tissues and cannot identify the subset of tissues in which the gene is “activated” in. For this, we developed a novel cross-tissue subset-based meta-analysis (CSTWAS) method which improves power under such scenarios and can extract the set of potentially “active” tissues. To improve applicability, CSTWAS uses only GWAS summary statistics and pre-computed correlation matrices to identify a subset of tissues that have the maximal evidence of gene-trait association. We further developed an adaptive monte-carlo procedure with the generalized Pareto distribution (GPD) to accurately estimate highly significant p-values for the test statistics. Through numerical simulations, we found that CSTWAS can maintain a well-calibrated type-I error rate, improves power especially when there is a small number of “active” tissues for a gene-trait association and identifies an accurate “active” tissue-set. By analyzing several GWAS summary statistics of three complex traits and diseases, we demonstrated that CSTWAS could identify novel biological meaningful signals while providing an interpretation of disease etiology by extracting a set of potentially “active” tissues.

Introduction

Genome-wide association studies (GWAS) have been immensely successful in identifying hundreds and thousands of genetic variants associated with a spectrum of diseases and traits^1,2. Despite this success, the majority of genetic variants identified by GWAS are in the noncoding region^3–5; hence, their functions are often challenging to interpret. Several studies have shown that the variants identified through GWAS are enriched in gene regulatory elements^6,7. Further, these variants have substantial overlap with those associated with different molecular phenotypes, including gene expression levels⁸, protein levels⁹, metabolites¹⁰, and others¹¹, indicating that they might have a regulatory role in the disease etiology. Thus, large-scale studies integrating such molecular data with genetic studies to identify potentially associated genes and intermediate molecular phenotypes have become increasingly important.

Transcriptomic studies that identify expression quantitative trait loci (eQTLs) meaning, variants associated with the expression levels of a gene in a particular tissue, have become one of the standard approaches to understand the regulatory role of variants. By integrating eQTL information of variants with the results of GWAS, a number of statistical tools have been developed to identify potential target genes through which genetic associations may be mediated^12–15. Given the lack of availability of data on outcome, genetic variants, and gene expression simultaneously on the same individuals, the major body of methodological research in this domain has adopted a two-step approach. First, a genetically regulated expression (GReX) score is constructed for a given gene, based on the local genetic variants (usually within 500kb ∼ 1Mb) using a reference gene expression study. Pre-computed models for constructing GReX using reference transcriptomic studies like GTEx¹⁶, TCGA¹⁷, and others^18–20 are publicly available. Subsequently, we test the association of this constructed GReX with the outcome, for significance. If significant, we can infer that the expression levels of the gene have a significant effect on the outcome, and this test can be conducted for each gene across the transcriptome. In the post-GWAS era, such transcriptome-wide association studies (TWAS) have emerged as an effective downstream GWAS analysis²¹ to identify trait-associated candidate genes. Many methods have been developed for conducting such TWAS using individual-level as well as summary-level data, like PrediXcan (for individual-level GWAS data)¹², S-PrediXcan¹³, and FUSION (for summary-level GWAS data)¹⁴.

The overall power of such expression-imputation methods depends on building an accurate predictive GReX from the reference data²². However, most gene expression studies till date have relatively small sample sizes^16,17. Further, the usual approach of TWAS is to perform an agnostic scan across all or multiple available tissues to identify tissue-gene pairs significantly associated with the outcome, which can increase the multiple testing burden. One potential way to improve the power of such tests is to aggregate or meta-analyze results across multiple tissues or contexts. Although gene expression depends on various factors, like tissue type, cell type, developmental stages, and others^23,24, there are substantial overlap between eQTLs across multiple related tissues¹⁶, which can be effectively exploited to share information through meta-analysis. Therefore, to increase the power of TWAS results, existing approaches either leverage information across multiple tissues or meta-analyze results from multiple tissues, such as UTMOST²⁵, MultiXcan¹⁵ and FUSION-OMNIBUS¹⁴.

However, even though these standard meta-analysis methods have proved to be effective in increasing the power to identify gene-trait associations, there are some important limitations. Firstly, these methods tend to demonstrate lower power in situations where the causal gene is activated in only a small number of tissues. Furthermore, it remains important to identify the tissues in which a trait-associated gene is “active”. Due to aggregation across multiple tissues, the current methods lack information on the set of potentially activated tissues for a gene-trait association, thus failing to provide effective interpretability.

Here, we developed a cross-tissue subset-based meta-analysis (CSTWAS) method that aims to increase power over existing methods while maintaining proper type-I error control and improves interpretation by extracting a subset of tissues in which the gene is potentially activated. Similar approaches have been utilized previously in the context of single variant meta-analysis²⁶, multiple phenotypes²⁷, gene-sets²⁸, and gene-environment interactions²⁹. CSTWAS, being a meta-analysis, can leverage association summary statistics across multiple tissues or contexts, and increase statistical power by aggregating weaker associations. Compared to standard meta-analysis approaches, the major advantage of CSTWAS is that it can identify a potential “active” subset of tissues or contexts, which helps in interpreting the association of the gene with the outcome. Further, through simulations, we show that CSTWAS can improve statistical power over standard approaches, especially when the number of active tissues for a gene is relatively low and maintains comparable power otherwise. CSTWAS only needs summary-level GWAS data and pre-computed covariance matrices (only need to be computed once for each transcriptomic reference panel) and hence can be performed with publicly available data. Through analysis of publicly available GWAS summary statistics for three distinct traits and diseases, we show that CSTWAS can identify important associations missed by standard methods as well as provide effective interpretation.

Methods

CSTWAS (Cross-tissue Subset-based Transcriptome Wide Association Study) is a subset-based meta-analysis of individual gene-based test results across different tissues resulting in a p-value corresponding to the null hypothesis that there is no association of the gene with the outcome across any of the tissues. Additionally, it outputs a set of potential active tissues for each gene with aggregated evidence of association, which can be interpreted as the set of active tissues. Starting from GWAS summary statistics, CSTWAS can be implemented in the following steps (Figure 1):

Perform gene-based tests with summary-level GWAS data across multiple tissues: Using GWAS summary statistics, perform standard gene-based expression imputation tests of associations (e.g., S-PrediXcan, FUSION, or others) for a given gene across different available tissues. In simulations and data analysis, we have used p-values from FUSION test with GTEx v7 reference panel.
Construct the test statistic for the gene by selecting a set of potential active tissues:
Next, we convert the p-values of the gene across different tissues to z-values, which quantifies the evidence of association on a gaussian scale. Let p₁, p₂, …, p_T be the FUSION p-values for specific gene expression across T tissues, we first convert these p-values to z-values z₁, z₂, …, z_T as: Where, Φ⁻¹ is the inverse standard normal cumulative distribution function. Suppose z_j is the converted z-value for a specific gene expression in tissue j, H is the entire set of tissues (such as all 48 tissues in GTEx v7), B is a non-empty subset of tissues in H, and |B| is the number of tissues in subset B, we construct a test statistic that maximizes the average association among all possible subsets of tissues. T : the number of tissues
H : the entire set of tissues
B : a subset of tissues
|B|: the number of tissues in subset B
Since there are 2^T − 1 non-empty possible subset B, calculation of this test statistics is computationally expensive, especially with a large number of reference tissues. To select the subset of tissues (B) more efficiently, we simplified the enumeration process by:
- Sort all z-values in decreasing order:
- Calculate the test statistics, starting with the largest z-value:
Calculating p-values by a stepwise approach:
The tissue-specific association test statistics (z_i) are correlated due to the correlation among GReX of multiple related tissues. So, it is not straightforward to derive the analytical null distribution of the CSTWAS test statistic. Instead, we adopt an efficient simulation-based approach. We first estimated the null covariance for across T tissues:
- For a specific gene across all tissues, ran gene-based test (we used FUSION here) analysis with a simulated null phenotype and obtained the converted z-values z₁, z₂, …, z_T
- Repeated the previous step K (=1000) times, getting a K × N matrix for the gene.
- Calculated the covariance matrix across tissue-specific association statistics (z_i)

Figure 1: Overview of CSTWAS Methods.

1. Construct tissue-specific GReX levels from GWAS summary-level statistics with GTEx as the reference panel. Here, many tools can be utilized, such as S-Predixcan and FUSION. 2. Aggregate gene expression information across tissues in a matrix. Then, for each gene expression, run CSTWAS to select a set of active tissues, aggregating the evidence of the associations between GReX and complex trait. 3. Estimate p-values for each CSTWAS statistic with null simulation and GPD estimation.

Given the estimated null covariance for , we assume that under the null hypothesis that the gene has no association with the outcome for any tissues, the tissue-specific association test statistics matrix Z follows a multivariant normal distribution. Subsequently, we estimate the p-value through a Monte Carlo (or simulation) procedure:

Generated a null gene expression z-value vector across tissues from the multivariate normal distribution.
Calculated the test statistics T(H)_null from simulated z-values and sorted decreasingly.
Repeated previous two steps M times and estimated the p-value by

It is to be noted that the precision of the p-value estimation is constrained by the number of monte-carlo iterations (M in this case). However, if there is strong evidence to support the identified set of potential active tissues, the p-value would be very small. Hence accurately estimating the low p-value by simulation would be computationally expensive. Therefore, we utilized the generalized Pareto distribution (GPD)-based tail estimation method³⁰ to estimate p-values less than 10⁻⁶. Overall, we fit the GPD to the right tail of the simulated test statistics and estimate the p-value by inverting the distribution function of the GPD. More specifically, we first model the right tail of test statistics by GPD. The GPD has the density function and cumulative distribution function Where, a is the scale parameter and k is the shape parameter. Also, 0 ≤ z < ∞ for k ≤ 0 and . We use the MLE approach³¹ to estimate the GPD distribution with the right tail of test statistics. The scale parameter a can be estimated by and the shape parameter k can be estimated by Here, and s² are the mean and the standard deviation of simulated test statistics. With the estimated parameters for GPD, we can construct the p-value as Here, N_exc is the exceedances threshold (usually use 250), N_sim is the total number of the simulated test statistics, F (.) is the GPD cumulative distribution, T(H) is the test statistics, and t is the threshold value of simulated statistics. In this case, controls that the estimation only focuses on the right tail of simulated test statistics.

Results

Simulations

We used numerical simulations to evaluate the type-I error rate and power of CSTWAS in comparison to standard methods of analysis. To reflect realistic LD patterns, we used genotypes for N unrelated individuals in the cis-region of a given gene G from UK Biobank. Using this, first, we estimate the GReX for G across T different tissues. This can be done using the publicly available reference models precomputed from transcriptomic studies. In particular, we use the reference FUSION models in GTEx v7 for this. Further, we normalize the GReX such that for a given tissue, the GReX across N individuals have mean = 0 and variance = 1. We denote the normalized estimated GReX for individual i at the t^th tissues as r_i(t), i = 1, 2, …, N and t = 1, 2, …, T. We next simulated a continuous phenotype for the i^th individual under a normal error model as: Where β_t denotes the effect of the GReX for the t^th tissue and ϵ_i is the normal error term. Subsequently, we conducted the GWAS using the individual-level genotype and simulated phenotype data and further conducted the FUSION test using the GWAS summary statistics to extract the gene-level p-values across T tissues for which pre-computed reference models of gene G were available in GTEx v7. Using these outputs from this gene-level test, we perform the cross-tissue subset-based TWAS. Across all the simulation scenarios, we set the sample size N to be unrelated White British individuals. We performed the simulations for two different genes: APOE (T = 24) and ABO (T = 45).

Type-I error

To evaluate the type-I error rate, we set each β_t = 0. Hence the phenotype y_i is generated independently of the genotypes from a univariate standard normal distribution. For both the genes, the results (Table 1) show that the type-I error rate for the cross-tissue subset-based test is relatively well calibrated across different levels (1 × 10⁻⁴, 1 × 10⁻⁵) and at the exome-wide threshold (2.5 × 10⁻⁶). This indicates that the chances of false discovery through this method are controlled at the desired level.

View this table:

Table 1: Type-1 error rates for the CSTWAS for genes APOE and ABO.

Empirical type-1 error rates across different levels, expressed as the proportion of corresponding levels.

Power

We compared the power of the CSTWAS test to several standard approaches: (a) Bonferroni corrected minimum p-value across all available tissues for a gene, which we call minP, (b) FUSION-Omnibus, which performs a multiple degree-of-freedom fixed effects meta-analysis of FUSION results across tissues for the gene and (c) UTMOST, which performs cross-tissue association test using a generalized Berk-Jones statistic³² and performs well in moderately sparse scenarios. As mentioned above, we focus on two different genes: APOE and ABO, which are significantly expressed in T = 24 and T = 45 tissues respectively. To evaluate the power of CSTWAS in comparison to the other methods, we randomly designated T_a tissues within the T tissues to be active, meaning the tissues for which the gene has non-zero effects on the phenotype. For these tissues, we generated the corresponding β_t from a normal distribution such that β_t ∼ N (0, v), where v is a variance parameter that was selected such that, the total variance explained by the gene in active tissues (h²) remained at a desired level (usually between 1-5%; Figure 2). We simulated a continuous phenotype using the above Gaussian error model, performed the FUSION across the T tissues and recorded the corresponding p-values. Subsequently, we applied CSTWAS, FUSION-Omnibus, and UTMOST to the results and calculated the empirical power as the proportion of simulation iterations that identified the gene to be significant at an exome-wide threshold of 2.5 × 10⁻⁶. To reflect a spectrum of possible association scenarios, we studied the empirical power across various values of and T_a and h².

Figure 2: Simulation Results.

Empirical power for CSTWAS. Simulation results with (a) APOE gene and (b) ABO gene. Estimated empirical power for CSTWAS compared to other methods for different values of h² and number of active tissues (T_a). See Simulation for more details.

The results (Figure 2) show that no test is uniformly most powerful across all the scenarios. We find that for both genes, CSTWAS is substantially more powerful under a sparse scenario, meaning a smaller number of active tissues (T_a). UTMOST has a slightly lower power compared to CSTWAS, in such scenarios, with the difference in power decreasing with increasing number of active tissues. With a larger number for T_a, FUSION-Omnibus and minP also, have a comparable or better performance than CSTWAS. This is potentially because, for smaller T_a with weaker signals, an increased number of null values for a larger number of tissues dilutes the signal and hence lowers the power. However, our method, being subset-based, can separate an optimal subset of tissues with potentially weaker signals. The pattern remains consistent across different h² values, although the difference in power at low T_a is reduced for higher h² which corresponds to the scenario where the gene has strong signal in each tissue.

Since, one principal motivation and advantage of CSTWAS is identifying potentially active tissues for a gene-trait association, we further inspected the sensitivity and specificity of the selected subset of tissues. Sensitivity is defined as the proportion of active tissues correctly identified by CSTWAS in the selected subset, and specificity is defined as the proportion of inactive genes correctly identified by CSTWAS. Because no other methods currently identify the active tissues, we compared the performance of CSTWAS to minP by defining the significant tissues (Bonferroni corrected p-value < 2.5×10⁻⁶) as active. We performed this numerical experiment with ABO gene only and set the total variance explained by the gene in active tissues (h²) to be 3% throughout. Across a spectrum different number of active tissues, CSTWAS maintained a higher sensitivity and specificity, with both the values decreasing, with increasing T_a as expected (Supplementary Figure 1). However, for low T_a, we found a substantial difference in specificity between the methods, indicating that the minP might select some correlated tissues as well, while CSTWAS accounting for the cross-tissue correlation pattern, selects the active set with high accuracy.

Simulation results highlight the utility of CSTWAS compared to the standard methods. Especially in a scenario, when the gene is activated weakly in only a few tissues, CSTWAS has substantially higher power to identify gene-trait associations which might not be adequately captured by FUSION-Omnibus or minP, while the performance of UTMOST is comparable. However, CSTWAS provides an additional biological interpretation by extracting the potential set of tissues where the gene is active and might be associated with the outcome.

Association analysis of three traits and diseases using CSTWAS

We applied CSTWAS to publicly available GWAS summary statistics for three different representative diseases and trait outcomes. For that, we constructed a null covariance matrix () for each gene using the FUSION reference models provided for GTEx v7. Further, we compared the results and genes identified by CSTWAS to those identified by FUSION in each tissue adjusting for multiple comparisons, which we call tissue-specific TWAS.

Bipolar Disorder (BD)

Bipolar disorder is a chronic psychiatric disorder that affects about 60 million people worldwide. Although primarily neurological, recently evidence has emerged that the genetic etiology of BD has complex involvement of multiple distinct tissues other than brain³³. To investigate that, we applied CSTWAS to the bipolar disorder GWAS summary statistics provided by the Psychiatric Genetics Consortium (PGC), containing 29,764 cases and 169,118 controls³⁴. Using CSTWAS, we identified 205 genes that are significantly associated with BD at an α level of 2.5 × 10⁻⁶ which is the exome-wide significance level using the Bonferroni correction for 20,000 genes (Figure 3). Among these, 55 associations are novel in that they are identified through CSTWAS alone and are not significant in the tissue-specific TWAS results. Meaning, these genes did not have a strong or significant association with BD in any of the individual tissues and can only identified by aggregating weaker associations through CSTWAS. Several of the novel genes identified have evidence in existing literature to be associated with BD and other neuropsychiatric processes (Table 2). For example, ZNF280D, a protein-coding gene, is closely related to its family member ZNF804A, which has been identified as an important factor in bipolar disorder development³⁵; SULT1A3, another protein-coding gene, has negative effects on sulfation and the impaired sulfation has been proved to have associations with many neurological diseases, such as bipolar disease³⁶. Since such gene-based results primarily identify genetic loci, where multiple highly correlated genes can reside, we additionally clumped (See Supplementary for details) the associations based on genomic positions to identify the unique loci or genomic regions identified by each test. CSTWAS identifies 82 unique and significant loci while 67 significant loci are identified by tissue-specific TWAS. Out of these, 57 were also identified by CSTWAS, indicating a high concordance between the methods along with several novel discoveries through CSTWAS (Supplementary Figure 2).

View this table:

Table 2: Significant genes associated with Bipolar disorder (BD) identified by CSTWAS.

Several example genes associated to BD, their CSTWAS p-value, minimum TWAS p-value across all available tissues in GTEx v7 and extracted set of potentially active tissues are shown. See Supplementary Table 1 for full results. Tissues related to central nervous system are highlighted in bold.

View this table:

Table 3: Significant genes associated with Breast Cancer (BC) identified by CSTWAS.

Several example genes associated to BC, their CSTWAS p-value, minimum TWAS p-value across all available tissues in GTEx v7 and extracted set of potentially active tissues are shown. See Supplementary Table 2 for full results. Tissues related to breast, female-specific tissues, and tissues which are known to be involved in BC are highlighted in bold.

View this table:

Table 4: Significant genes associated with Serum Urate Level (SUL).

Several example genes associated to SUL, their CSTWAS p-value, minimum TWAS p-value across all available tissues in GTEx v7 and extracted set of potentially active tissues are shown. See Supplementary Table 3 for full results. Liver and lower alimentary tissues are highlighted in bold.

Figure 3: Results for tissue-specific TWAS across multiple tissues and CSTWAS of bipolar disorder (BD).

(a) P-values for TWAS results in tissues reported in GTEx v7 using FUSION. Each dot represents the TWAS -log10(p-value) for a tissue-specific GReX association of bipolar disorder (BD) colored by tissues. Highly significant p-values (< 1 × 10⁻¹⁰: the ceiling cutoff) were shown by 1 × 10⁻¹⁰ (represented by triangles). P-values > 1 × 10⁻⁰³ are hidden for ease of viewing. (b) P-values for CSTWAS. Each dot represents the p-value for integrated GReX associations of BD across tissues colored by chromosomes. P-values < 1 × 10⁻¹⁰ were shown by 1 × 10⁻¹⁰. Five example genes, MEI4, HIST1H1D, ZNF280D, SULT1A3, and FAM65A, are annotated in both plots, which do not have significant associations (P-values > 2.5 × 10⁻⁶) in any of the tissues but are significant (P-values < 2.5 × 10⁻⁶) using CSTWAS.

In addition to increased power, CSTWAS can also provide important insights into the identified associations by extracting active tissues (Table 2 and Supplementary Table 1). For example, NTM, a neurotrimin protein-coding gene, is identified to be associated with BD (p-value = 2.72 × 10⁻¹⁰ ; See Link Availability). By considering each tissue independently, the tissue-specific TWAS only identified significant expression in Heart Left Ventricle (p-value = 5.96 × 10⁻¹¹) which might not be readily interpretable in the context of neurological processes. However, the subset-based approach successfully detected the weaker associations in Brain Anterior cingulate cortex BA24 (tissue-specific p-value = 4.45 × 10⁻⁴), Brain Spinal cord cervical c.1 (tissue-specific p-value = 5.63 × 10⁻³). and Nerve Tibial (tissue-specific p-value = 4.65 × 10⁻²) all of which are relevant in context of BD. These weaker associations may be lower power to be detected due to the smaller reference panel size of these tissues (N = 126 to 532). Thus, CSTWAS provides a novel interpretation of the association of NTM with BD by extracting the potential active tissue set, which includes several weaker signals not identified in tissue-specific analysis. It is to be noted that for several of the top significant genes, alongside CNS tissues, other tissues are selected. This can be caused due to many factors, including high eQTL sharing between tissues and pleiotropy.

Breast Cancer (BC)

According to CDC, breast cancer is the second most common cancer among women in the U.S. Evidence has been shown that some BC development factors might not only be restricted to breast tissue but can span multiple cell types and organs³⁷. To identify BC-related candidate genes and corresponding sets of active tissues, we applied CSTWAS to the breast cancer GWAS summary statistics, containing 17,881 cases and 410,350 controls³⁸. Using CSTWAS, we identified 37 genes that are significantly associated with BC at an exome-wide threshold of 2.5 × 10⁻⁶ (Figure 4). Among these associations, 11 novel associations are identified as significant in CSTWAS and are not significant in the tissue-specific TWAS results, meaning they did not have any strong association across the different tissues. For example, CSTWAS identifies FAT4 (p-value = 1.79 × 10⁻⁶), a protein-coding gene, which is identified to be a tumor suppressor in triple-negative breast cancer³⁹ and is repressed in several cancers due to promoter hypermethylation; FGF7 (p-value = 1.82 × 10⁻⁷), which increases breast cancer cell proliferation and migration in vitro, is also identified to be associated⁴⁰. There are 25 significant loci (clumped as before) identified by CSTWAS and 20 significant loci identified by tissue-specific TWAS. Among these identified significant loci, 14 are overlapped between the two approaches, indicating CSTWAS can recover most of result from tissue-specific TWAS with some novel findings (Supplementary Figure 3).

Figure 4: Results for tissue-specific TWAS across multiple tissues and CSTWAS of breast cancer (BC).

(a) P-values for TWAS results in multiple tissues. Each dot represents the TWAS p-value for a tissue-specific gene-GReX association of breast cancer (BC). Different tissues are plotted in different colors. Here, P-values < 1 × 10⁻¹⁰ (the ceiling cutoff) were shown by 1 × 10⁻¹⁰ (represented by triangles). And P-values > 1 × 10⁻³ are hidden for simplicity. (b) P-values for CSTWAS. Each dot represents the p-value for integrated GReX associations of breast cancer (BC) across tissues. P-values < 1 × 10⁻¹⁰ were shown by 1 × 10⁻¹⁰. Five genes, RP11-108M9.5, SCARNA6, FAT4, FGF7, and RP5-1039K5.16, are annotated in both plots, which do not have significant associations (P-values > 2.5 × 10⁻⁶) in any of the tissues but are significant (P-values < 2.5 × 10⁻⁶) using CSTWAS.

Moreover, several genes have weaker associations, which CSTWAS can successfully identify but are not identified through tissue-specific TWAS. For example, CSTWAS identified ORC2 (p-value = 6.41 × 10⁻⁷), a protein-coding gene, that has been shown to have abnormal behavior in many types of cancers, including breast cancer⁴¹. Tissue-specific TWAS only identified significant expression in Colon Sigmoid (p-value = 1.05 × 10⁻⁷). This result might not be readily interpretable for breast cancer. However, CSTWAS successfully detected the weaker signals—gene expression in Breast Mammary Tissue (tissue-specific p-value = 2.04 × 10⁻⁷), Ovary (tissue-specific p-value = 7.81 × 10⁻³) and Whole Blood (tissue-specific p-value = 3.55 × 10⁻²), which are all relevant tissues for BC. Thus CSTWAS, beyond improving power to identify associations, can additionally identify potential active tissue-set (p-value for overall association = 6.41 × 10⁻⁷), including the weaker signals which were not detected in the tissue-specific approach.

Serum Urate Level (SUL)

Human serum urate level is an indicator of uric acid production and excretion. A recent study has shown that SUL has a strong genetic component with heritability between 30% and 60% with the most relevant tissues for SUL being kidney and liver⁴². To identify potential genes and corresponding active tissues for SUL, we applied CSTWAS to the SUL GWAS summary statistics, containing 457,690 individuals⁴². In our analysis, we identified 243 genes that are significantly associated with SUL at an α level 2.5 × 10⁻⁶, which is the exome-wide significance level using the Bonferroni correction for 20,000 genes (Figure 5). Among these, 44 associations are novel that they are identified through CSTWAS alone and are not significant in the tissue-specific TWAS results. As before, we clumped the associations to identify the unique loci highlighted by each test. CSTWAS identifies 81 significant loci while 71 significant GReX associations identified by tissue-specific TWAS. Out of these, 65 were also identified by CSTWAS indicating high overlap with several novel discoveries (Supplementary Figure 4).

Figure 5: Results for tissue-specific TWAS across multiple tissues and CSTWAS of human serum urate levels (SUL).

(a) P-values for TWAS results in multiple tissues. Each dot represents the TWAS p-value for a tissue-specific GReX association of human serum urate level (SUL). Different tissues are plotted in different colors. Here, P-values < 1 × 10⁻¹⁰ (the ceiling cutoff) were shown by 1 × 10⁻¹⁰ (represented by triangles). And P-values > 1 × 10⁻³ are hidden for simplicity. (b) P-values for CSTWAS. Each dot represents the p-value for integrated GReX associations of human serum urate levels (SUL) across tissues. P-values < 1 × 10⁻¹⁰ were shown by 1×10⁻¹⁰. Five genes, PARL, CPSF7, RP11-794G24.1, CHD9, and ASGR1, are annotated in both plots, which do not have significant associations (P-values > 2.5 × 10⁻⁶) in any of the tissues but are significant (P-values < 2.5 × 10⁻⁶) using CSTWAS.

We further demonstrate that, CSTWAS can also provide important insights into the identified associations by extracting potentially active tissues. For example, LRRC16A, a protein-coding gene, has been shown to be associated with SUL⁴³. Tissue-specific TWAS only identified significant gene expression in Heart Atrial Appendage (p-value = 3.25 × 10⁻¹²), Heart Left Ventricle (p-value = 5.89 × 10⁻¹²), Muscle Skeletal (p-value = 2.93 × 10⁻⁹), Ovary (p-value = 2.26 × 10⁻⁴¹) which might be difficult to interpret since most of these tissues are not directly involved in the serum urate processes. However, CSTWAS, on the other hand, successfully detected additional weaker signals—LRRC16A gene expression in Colon Transverse (tissue-specific p-value = 5.25 × 10⁻⁷) and Thyroid (tissue-specific p-value = 5.98 × 10⁻⁴), which have been shown to be relevant to SUL processes. Thus, CSTWAS identified the potential active tissue set (p-value for overall association = 2.39 × 10⁻¹⁰), including the weaker signals which were not identified in the tissue-specific approach, possibly due to the smaller reference panel size of these tissues (N = 305 to 574) and provided an improved interpretation to the association of LRRC16A.

Discussions

Here we developed a subset-based cross-tissue meta-analysis method, CSTWAS, that integrates gene-based results across multiple tissues to identify associations of a gene with an outcome, and additionally identifies a potential set of tissues in which the gene is activated. Through simulations, we showed CSTWAS maintains desired type-I error rate and can outperform standard methods in several scenarios, especially where the gene has a weaker association in a small number of tissues. Also, through analysis of existing GWAS summary statistics of three traits and diseases, we demonstrated that CSTWAS could successfully detect weaker signals as well as provide interpretations for corresponding results.

The gain of statistical power over standard methods (like UTMOST, FUSION Omnibus or tissue-specific TWAS) is one of the key advantages of CSTWAS. The standard meta-analysis-based approaches fail to identify gene-trait association in scenarios, where the association signature is present in only a few tissues, since the majority of the current methods aim to aggregate results across all the tissues instead of choosing the “active” tissues. In the presence of few weaker associations, the signal is diluted by many null associations. In contrast, CSTWAS can extract a subset of tissues that are maximally associated with the outcome and hence improve statistical power by considering only potential associations. Further, the simulation results show that the CSTWAS can control type-I errors at exome-wide level as well. However, when the gene is activated in many tissues, the UTMOST or FUSION-OMNIBUS approach would provide higher or comparable power by selecting substantive associations.

One of the most prominent advantages of CSTWAS is the subset-based approach provides an interpretation of disease etiology through a potential subset of active tissues. The resulting subset of tissues can be interpreted as the tissues where the gene is actively mediating disease-related signals. While standard meta-analysis methods provide only a p-value for the association of the gene with the outcome, our method additionally identifies the potential active subsets of tissues and contexts as well. In addition, unlike the standard subset-based method searches across all possible subsets, which is exponentially scaled as the number of elements increases, we used an order-based method to improve the computational efficiency, which has previously been proposed in the context of gene-set associations and gene-environment associations. CSTWAS can search the potential active tissue set with computational complexity O(nlogn). Further, CSTWAS can be requires GWAS summary statistics as input only, using a correlation matrix which can be calculated using publicly available reference data. In our software implementation, we provide this precomputed correlation matrix that can be used with FUSION results.

However, CSTWAS also has some limitations. Since the method highly relies on the correlation structure of a gene, across tissues, it is difficult to distinguish between highly correlated tissues, such as brain tissues. Hence selection consistency needs to be further studied. Further, ancestry mismatches between GWAS and transcriptomic reference panels may lead to increased false discoveries and complicate interpretation. In addition, when constructing the test statistics, instead of enumerating through all possible subsets, our method uses ordered Z-statistics for computational efficiency, which loses the directions of effects. In the future, we plan to extend our method to efficiently incorporate both direction and magnitude of effects in the analysis.

Overall, CSTWAS provides a powerful approach to meta-analyze tissue-specific gene-based test results to identify active tissues for an associated gene. The simulations and analysis of GWAS summary statistics have demonstrated the statistical properties and computational efficiency of CSTWAS. However, meta-analyzing results across tissues is one of the many possible applications of our generalizable subset-based approach. Like expressions across tissues, the gene expressions are also highly correlated across multiple cell types, developmental stages, and contexts. With massive publicly available single-cell data as well as emerging cell-type-specific eQTL studies, the subset-based approach can be translated to identify cell types and stages where genes are actively associated with diseases. In the future, we will seek applications and adaptations of the CSTWAS to different multi-omics data, which will provide further insights into the complex genetic architecture of diseases.

Data Availability

All data produced in the present work are contained in the manuscript

https://github.com/Thewhey-Brian/CSTWAS

Link & Data Availability

Gene description (NTM): https://www.ncbi.nlm.nih.gov/gtr/genes/50863/791

Gene description (IDI1): https://www.genecards.org/cgi-bin/carddisp.pl?gene792=IDI1keywords=IDI1793

Bipolar Disorder GWAS Summary Statistics: https://doi.org/10.6084/m9.figshare.14671998

Serum Urate Level GWAS Summary Statistics: http://ckdgen.imbi.uni-freiburg.de

Breast Cancer GWAS Summary Statistics: https://www.ebi.ac.uk/gwas/studies/GCST90011804

GitHub for CSTWAS package: https://github.com/Thewhey-Brian/CSTWAS

References

1.↵
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
OpenUrl CrossRef PubMed
2.↵
Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).
OpenUrl CrossRef PubMed
3.↵
Visscher, P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 101, 5–22 (2017).
OpenUrl CrossRef PubMed
4.
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
OpenUrl CrossRef PubMed
5.↵
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
OpenUrl CrossRef PubMed
6.↵
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
OpenUrl Abstract/FREE Full Text
7.↵
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
OpenUrl CrossRef PubMed
8.↵
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
OpenUrl
9.↵
Zhang, J. et al. Plasma proteome analyses in individuals of European and African ancestry identify cis-pQTLs and models for proteome-wide association studies. 2021.03.15.435533 Preprint at https://doi.org/10.1101/2021.03.15.435533 (2022).
10.↵
Rhee, E. P. et al. Trans-ethnic genome-wide association study of blood metabolites in the Chronic Renal Insufficiency Cohort (CRIC) study. Kidney Int. 101, 814–823 (2022).
OpenUrl
11.↵
Huan, T. et al. Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat. Commun. 10, 4267 (2019).
OpenUrl CrossRef PubMed
12.↵
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
OpenUrl CrossRef PubMed
13.↵
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
OpenUrl CrossRef PubMed
14.↵
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
OpenUrl CrossRef PubMed
15.↵
Barbeira, A. N. et al. Integrating predicted transcriptome from multiple tissues improves association detection. PLOS Genet. 15, e1007889 (2019).
OpenUrl CrossRef PubMed
16.↵
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
OpenUrl Abstract/FREE Full Text
17.↵
Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
OpenUrl CrossRef PubMed
18.↵
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
OpenUrl CrossRef PubMed Web of Science
19.
Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).
OpenUrl Abstract/FREE Full Text
20.↵
Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014).
OpenUrl CrossRef PubMed
21.↵
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
OpenUrl CrossRef PubMed
22.↵
Cao, C. et al. Power analysis of transcriptome-wide association study: Implications for practical protocol choice. PLOS Genet. 17, e1009405 (2021).
OpenUrl
23.↵
Price, A. L. et al. Single-Tissue and Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated Individuals. PLOS Genet. 7, e1001317 (2011).
OpenUrl CrossRef PubMed
24.↵
Fu, J. et al. Unraveling the Regulatory Mechanisms Underlying Tissue-Dependent Genetic Variation of Gene Expression. PLOS Genet. 8, e1002431 (2012).
OpenUrl CrossRef PubMed
25.↵
Hu, Y. et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet. 51, 568–576 (2019).
OpenUrl CrossRef PubMed
26.↵
Bhattacharjee, S. et al. A Subset-Based Approach Improves Power and Interpretation for the Combined Analysis of Genetic Association Studies of Heterogeneous Traits. Am. J. Hum. Genet. 90, 821–835 (2012).
OpenUrl CrossRef PubMed
27.↵
Qi, G. et al. Genome-Wide Large-Scale Multi-Trait Analysis Characterizes Global Patterns of Pleiotropy and Unique Trait-Specific Variants. 2022.06.03.494686 Preprint at https://doi.org/10.1101/2022.06.03.494686 (2022).
28.↵
Dutta, D. et al. A powerful subset-based method identifies gene set associations and improves interpretation in UK Biobank. Am. J. Hum. Genet. 108, 669–681 (2021).
OpenUrl
29.↵
Yu, Y. et al. Subset-Based Analysis Using Gene-Environment Interactions for Discovery of Genetic Associations across Multiple Studies or Phenotypes. Hum. Hered. 83, 283–314 (2018).
OpenUrl
30.↵
Knijnenburg, T. A., Wessels, L. F. A., Reinders, M. J. T. & Shmulevich, I. Fewer permutations, more accurate P-values. Bioinformatics 25, i161–i168 (2009).
OpenUrl CrossRef PubMed Web of Science
31.↵
Hosking, J. R. M. & Wallis, J. R. Parameter and Quantile Estimation for the Generalized Pareto Distribution. Technometrics 29, 339–349 (1987).
OpenUrl CrossRef Web of Science
32.↵
Sun, R. & Lin, X. Genetic Variant Set-Based Tests Using the Generalized Berk–Jones Statistic With Application to a Genome-Wide Association Study of Breast Cancer. J. Am. Stat. Assoc. 115, 1079–1091 (2020).
OpenUrl
33.↵
Judd, L. L. et al. A Prospective Investigation of the Natural History of the Long-term Weekly Symptomatic Status of Bipolar II Disorder. Arch. Gen. Psychiatry 60, 261–269 (2003).
OpenUrl CrossRef PubMed Web of Science
34.↵
Stahl, E. A. et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat. Genet. 51, 793–803 (2019).
OpenUrl CrossRef PubMed
35.↵
Calabř, M. et al. ZNF804A Gene Variants Have a Cross-diagnostic Influence on Psychosis and Treatment Improvement in Mood Disorders. Clin. Psychopharmacol. Neurosci. 18, 231– 240 (2020).
OpenUrl
36.↵
Ross, K. A. Evidence for somatic gene conversion and deletion in bipolar disorder, Crohn’s disease, coronary artery disease, hypertension, rheumatoid arthritis, type-1 diabetes, and type-2 diabetes. BMC Med. 9, 12 (2011).
OpenUrl CrossRef PubMed
37.↵
Ferreira, M. A. et al. Genome-wide association and transcriptome studies identify target genes and risk loci for breast cancer. Nat. Commun. 10, 1741 (2019).
OpenUrl PubMed
38.↵
Rashkin, S. R. et al. Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts. Nat. Commun. 11, 4423 (2020).
OpenUrl
39.↵
Hou, L. et al. FAT4 functions as a tumor suppressor in triple-negative breast cancer. Tumor Biol. 37, 16337–16343 (2016).
OpenUrl
40.↵
Huang, T. et al. FGF7/FGFR2 signal promotes invasion and migration in human gastric cancer through upregulation of thrombospondin-1. Int. J. Oncol. 50, 1501–1512 (2017).
OpenUrl CrossRef
41.↵
Wang, X.-K. et al. Novel candidate biomarkers of origin recognition complex 1, 5 and 6 for survival surveillance in patients with hepatocellular carcinoma. J. Cancer 11, 1869–1882 (2020).
OpenUrl
42.↵
Tin, A. et al. Target genes, variants, tissues and transcriptional pathways influencing human serum urate levels. Nat. Genet. 51, 1459–1474 (2019).
OpenUrl PubMed
43.↵
Dong, Z. et al. Effects of multiple genetic loci on the pathogenesis from serum urate to gout. Sci. Rep. 7, 43614 (2017).Get tissue-specific gene expression
OpenUrl

View the discussion thread.

Posted January 12, 2023.

Download PDF

Supplementary Material

Data/Code

Citation Tools

Subject Area

Genetic and Genomic Medicine

Subject Areas

All Articles

Addiction Medicine (386)
Allergy and Immunology (701)
Anesthesia (193)
Cardiovascular Medicine (2859)
Dentistry and Oral Medicine (326)
Dermatology (244)
Emergency Medicine (431)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1011)
Epidemiology (12569)
Forensic Medicine (10)
Gastroenterology (807)
Genetic and Genomic Medicine (4447)
Geriatric Medicine (402)
Health Economics (716)
Health Informatics (2856)
Health Policy (1050)
Health Systems and Quality Improvement (1050)
Hematology (376)
HIV/AIDS (893)
Infectious Diseases (except HIV/AIDS) (13986)
Intensive Care and Critical Care Medicine (831)
Medical Education (415)
Medical Ethics (114)
Nephrology (464)
Neurology (4201)
Nursing (223)
Nutrition (617)
Obstetrics and Gynecology (788)
Occupational and Environmental Health (723)
Oncology (2205)
Ophthalmology (626)
Orthopedics (254)
Otolaryngology (319)
Pain Medicine (269)
Palliative Medicine (83)
Pathology (488)
Pediatrics (1172)
Pharmacology and Therapeutics (489)
Primary Care Research (483)
Psychiatry and Clinical Psychology (3658)
Public and Global Health (6787)
Radiology and Imaging (1494)
Rehabilitation Medicine and Physical Therapy (869)
Respiratory Medicine (902)
Rheumatology (430)
Sexual and Reproductive Health (433)
Sports Medicine (369)
Surgery (473)
Toxicology (57)
Transplantation (202)
Urology (174)

[1] 1.↵
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
OpenUrl CrossRef PubMed

[2] 2.↵
Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).
OpenUrl CrossRef PubMed

[3] 3.↵
Visscher, P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 101, 5–22 (2017).
OpenUrl CrossRef PubMed

[4] 4.
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
OpenUrl CrossRef PubMed

[5] 5.↵
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
OpenUrl CrossRef PubMed

[6] 6.↵
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
OpenUrl Abstract/FREE Full Text

[7] 7.↵
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
OpenUrl CrossRef PubMed

[8] 8.↵
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
OpenUrl

[9] 9.↵
Zhang, J. et al. Plasma proteome analyses in individuals of European and African ancestry identify cis-pQTLs and models for proteome-wide association studies. 2021.03.15.435533 Preprint at https://doi.org/10.1101/2021.03.15.435533 (2022).

[10] 10.↵
Rhee, E. P. et al. Trans-ethnic genome-wide association study of blood metabolites in the Chronic Renal Insufficiency Cohort (CRIC) study. Kidney Int. 101, 814–823 (2022).
OpenUrl

[11] 11.↵
Huan, T. et al. Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat. Commun. 10, 4267 (2019).
OpenUrl CrossRef PubMed

[12] 12.↵
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
OpenUrl CrossRef PubMed

[13] 13.↵
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
OpenUrl CrossRef PubMed

[14] 14.↵
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
OpenUrl CrossRef PubMed

[15] 15.↵
Barbeira, A. N. et al. Integrating predicted transcriptome from multiple tissues improves association detection. PLOS Genet. 15, e1007889 (2019).
OpenUrl CrossRef PubMed

[16] 16.↵
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
OpenUrl Abstract/FREE Full Text

[17] 17.↵
Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
OpenUrl CrossRef PubMed

[18] 18.↵
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
OpenUrl CrossRef PubMed Web of Science

[19] 19.
Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).
OpenUrl Abstract/FREE Full Text

[20] 20.↵
Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014).
OpenUrl CrossRef PubMed

[21] 21.↵
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
OpenUrl CrossRef PubMed

[22] 22.↵
Cao, C. et al. Power analysis of transcriptome-wide association study: Implications for practical protocol choice. PLOS Genet. 17, e1009405 (2021).
OpenUrl

[23] 23.↵
Price, A. L. et al. Single-Tissue and Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated Individuals. PLOS Genet. 7, e1001317 (2011).
OpenUrl CrossRef PubMed

[24] 24.↵
Fu, J. et al. Unraveling the Regulatory Mechanisms Underlying Tissue-Dependent Genetic Variation of Gene Expression. PLOS Genet. 8, e1002431 (2012).
OpenUrl CrossRef PubMed

[25] 25.↵
Hu, Y. et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet. 51, 568–576 (2019).
OpenUrl CrossRef PubMed

[26] 26.↵
Bhattacharjee, S. et al. A Subset-Based Approach Improves Power and Interpretation for the Combined Analysis of Genetic Association Studies of Heterogeneous Traits. Am. J. Hum. Genet. 90, 821–835 (2012).
OpenUrl CrossRef PubMed

[27] 27.↵
Qi, G. et al. Genome-Wide Large-Scale Multi-Trait Analysis Characterizes Global Patterns of Pleiotropy and Unique Trait-Specific Variants. 2022.06.03.494686 Preprint at https://doi.org/10.1101/2022.06.03.494686 (2022).

[28] 28.↵
Dutta, D. et al. A powerful subset-based method identifies gene set associations and improves interpretation in UK Biobank. Am. J. Hum. Genet. 108, 669–681 (2021).
OpenUrl

[29] 29.↵
Yu, Y. et al. Subset-Based Analysis Using Gene-Environment Interactions for Discovery of Genetic Associations across Multiple Studies or Phenotypes. Hum. Hered. 83, 283–314 (2018).
OpenUrl

[30] 30.↵
Knijnenburg, T. A., Wessels, L. F. A., Reinders, M. J. T. & Shmulevich, I. Fewer permutations, more accurate P-values. Bioinformatics 25, i161–i168 (2009).
OpenUrl CrossRef PubMed Web of Science

[31] 31.↵
Hosking, J. R. M. & Wallis, J. R. Parameter and Quantile Estimation for the Generalized Pareto Distribution. Technometrics 29, 339–349 (1987).
OpenUrl CrossRef Web of Science

[32] 32.↵
Sun, R. & Lin, X. Genetic Variant Set-Based Tests Using the Generalized Berk–Jones Statistic With Application to a Genome-Wide Association Study of Breast Cancer. J. Am. Stat. Assoc. 115, 1079–1091 (2020).
OpenUrl

[33] 33.↵
Judd, L. L. et al. A Prospective Investigation of the Natural History of the Long-term Weekly Symptomatic Status of Bipolar II Disorder. Arch. Gen. Psychiatry 60, 261–269 (2003).
OpenUrl CrossRef PubMed Web of Science

[34] 34.↵
Stahl, E. A. et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat. Genet. 51, 793–803 (2019).
OpenUrl CrossRef PubMed

[35] 35.↵
Calabř, M. et al. ZNF804A Gene Variants Have a Cross-diagnostic Influence on Psychosis and Treatment Improvement in Mood Disorders. Clin. Psychopharmacol. Neurosci. 18, 231– 240 (2020).
OpenUrl

[36] 36.↵
Ross, K. A. Evidence for somatic gene conversion and deletion in bipolar disorder, Crohn’s disease, coronary artery disease, hypertension, rheumatoid arthritis, type-1 diabetes, and type-2 diabetes. BMC Med. 9, 12 (2011).
OpenUrl CrossRef PubMed

[37] 37.↵
Ferreira, M. A. et al. Genome-wide association and transcriptome studies identify target genes and risk loci for breast cancer. Nat. Commun. 10, 1741 (2019).
OpenUrl PubMed

[38] 38.↵
Rashkin, S. R. et al. Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts. Nat. Commun. 11, 4423 (2020).
OpenUrl

[39] 39.↵
Hou, L. et al. FAT4 functions as a tumor suppressor in triple-negative breast cancer. Tumor Biol. 37, 16337–16343 (2016).
OpenUrl

[40] 40.↵
Huang, T. et al. FGF7/FGFR2 signal promotes invasion and migration in human gastric cancer through upregulation of thrombospondin-1. Int. J. Oncol. 50, 1501–1512 (2017).
OpenUrl CrossRef

[41] 41.↵
Wang, X.-K. et al. Novel candidate biomarkers of origin recognition complex 1, 5 and 6 for survival surveillance in patients with hepatocellular carcinoma. J. Cancer 11, 1869–1882 (2020).
OpenUrl

[42] 42.↵
Tin, A. et al. Target genes, variants, tissues and transcriptional pathways influencing human serum urate levels. Nat. Genet. 51, 1459–1474 (2019).
OpenUrl PubMed

[43] 43.↵
Dong, Z. et al. Effects of multiple genetic loci on the pathogenesis from serum urate to gout. Sci. Rep. 7, 43614 (2017).Get tissue-specific gene expression
OpenUrl

Subset-based method for cross-tissue transcriptome-wide association studies improves power and interpretability

Abstract

Introduction

Methods

Results

Simulations

Type-I error

Power

Association analysis of three traits and diseases using CSTWAS

Bipolar Disorder (BD)

Breast Cancer (BC)

Serum Urate Level (SUL)

Discussions

Data Availability

Link & Data Availability

References

Citation Manager Formats

Subject Area