Fine-mapping causal tissues and genes at disease-associated loci

Benjamin J. Strober; Martin Jinye Zhang; Tiffany Amariuta; Jordan Rossen; Alkes L. Price

doi:10.1101/2023.11.01.23297909

Abstract

Heritable diseases often manifest in a highly tissue-specific manner, with different disease loci mediated by genes in distinct tissues or cell types. We propose Tissue-Gene Fine-Mapping (TGFM), a fine-mapping method that infers the posterior probability (PIP) for each gene-tissue pair to mediate a disease locus by analyzing GWAS summary statistics (and in-sample LD) and leveraging eQTL data from diverse tissues to build cis-predicted expression models; TGFM also assigns PIPs to causal variants that are not mediated by gene expression in assayed genes and tissues. TGFM accounts for both co-regulation across genes and tissues and LD between SNPs (generalizing existing fine-mapping methods), and incorporates genome-wide estimates of each tissue’s contribution to disease as tissue-level priors. TGFM was well-calibrated and moderately well-powered in simulations; unlike previous methods, TGFM was able to attain correct calibration by modeling uncertainty in cis-predicted expression models. We applied TGFM to 45 UK Biobank diseases/traits (average N = 316K) using eQTL data from 38 GTEx tissues. TGFM identified an average of 147 PIP > 0.5 causal genetic elements per disease/trait, of which 11% were gene-tissue pairs. Implicated gene-tissue pairs were concentrated in known disease-critical tissues, and causal genes were strongly enriched in disease-relevant gene sets. Causal gene-tissue pairs identified by TGFM recapitulated known biology (e.g., TPO-thyroid for Hypothyroidism), but also included biologically plausible novel findings (e.g., SLC20A2-artery aorta for Diastolic blood pressure). Further application of TGFM to single-cell eQTL data from 9 cell types in peripheral blood mononuclear cells (PBMC), analyzed jointly with GTEx tissues, identified 30 additional causal gene-PBMC cell type pairs at PIP > 0.5—primarily for autoimmune disease and blood cell traits, including the well-established role of CTLA4 in CD8⁺ T cells for All autoimmune disease. In conclusion, TGFM is a robust and powerful method for fine-mapping causal tissues and genes at disease-associated loci.

Introduction

Heritable diseases often manifest in a highly tissue-specific manner, motivating intense efforts to elucidate tissue-specific mechanisms of disease¹. Previous studies have identified disease-critical tissues/cell-types based on genome-wide patterns^2–11, and have deeply dissected a limited number of GWAS loci^12–16. However, different GWAS loci may be mediated by different tissues, motivating genome-wide efforts to fine-map causal tissues and genes at individual GWAS loci.

Existing approaches, including colocalization^17–19 and transcriptome wide association studies (TWAS)^20–22, have implicated disease genes via the integration of GWAS data with expression quantitative trait loci (eQTLs) while considering the effect of each gene-tissue pair on disease in isolation. However, it is likely that most of these disease-implicated genes are not actually causal in the analyzed tissue; analogous to non-causal tagging variants implicated by linkage disequilibrium (LD) between variants²³, non-causal gene-tissue pairs can be implicated by correlations with causal gene-tissue pairs (involving a different gene and/or different tissue)^{11,22,24–27}. In addition, false-positive gene-tissue pairs can arise from correlations with non-mediated genetic variants, i.e., variants whose causal effects are not mediated by assayed expression levels^22,27,28. Previous fine-mapping approaches such as FOCUS²⁴ have proven valuable in disentangling causal effects across correlated genes in a single tissue (also see ref. ²⁷), but have not considered causal gene-tissue pairs.

Here, we introduce a new method, Tissue-Gene Fine-Mapping (TGFM), that infers the posterior inclusion probability (PIP) for each gene-tissue pair to mediate a disease association at a given locus; TGFM also assigns PIPs to causal genetic variants whose effects are not mediated by gene expression in assayed tissues and genes. TGFM models both gene-tissue pairs (using cis-predicted expression^20,21) and non-mediated genetic variants as potential causal genetic elements, and accounts for both correlations in cis-predicted expression across genes and tissues and LD between genetic variants, generalizing existing fine-mapping methods^{23,24,27,29–31}. TGFM incorporates genome-wide estimates of each tissue’s contribution to disease as tissue-level priors and employs a sampling approach to account for uncertainty in cis-predicted gene expression. We validated TGFM using extensive simulations with real genotypes, including comparisons to coloc¹⁷ and FOCUS²⁴. We applied TGFM to 45 UK Biobank traits³² using eQTL data from 38 GTEx tissues²⁵ and 9 fine-grained single-cell PBMC cell-types³³.

Results

Overview of TGFM

TGFM estimates the posterior inclusion probability (PIP) for each genetic element (gene-tissue pair or genetic variant) to have a non-zero causal effect on disease, in a model that includes mediated causal effects of each gene-tissue pair (via the cis-genetic component of expression of a given gene in a given tissue) and non-mediated causal effects of each genetic variant: where Y denotes phenotypes, g indexes genes, t indexes tissues, X is the matrix of genotypes, δ_gt is the vector of causal cis-eQTL effect sizes of each variant on gene expression in gene g and tissue t (thus Xδ_gt is the cis-genetic component of gene expression in gene g and tissue t), α_gt denotes the (scalar) effect of cis-genetic expression in gene g and tissue t on the disease or trait, β is the vector of non-mediated causal effects of each genetic variant on the disease or trait, and ϵ denotes environmental noise. We caution that, analogous to previous studies, inference of causal genetic elements relies on the assumption that all causal genetic elements have been assayed, which may not be true in practice (see Discussion); we use the word “causal” for simplicity, with this caveat in mind.

TGFM estimates the PIP of each genetic element by generalizing the Sum of Single Effects (SuSiE)^30,31 fine-mapping method to include both gene-tissue pairs and genetic variants; gene-tissue pairs are included via cis-predicted expression^20,21 (using an external eQTL data set such as GTEx²⁵ to build prediction models), which is an approximation to true cis-genetic expression (Methods). This approach allows for fine-mapping multiple causal genetic elements in a given locus, inferring causal effects underlying both marginal GWAS³⁴ and marginal TWAS^20–22 (i.e., the association between cis-predicted expression of a single gene-tissue pair and disease) associations by accounting for correlations between gene-tissue pairs due to co-regulation across genes/tissues^{11,22,24–27}, correlations between genetic variants due to LD²³, and/or correlations between gene-tissue pairs and genetic variants due to the inclusion of a genetic variant in a model of cis-predicted gene expression^22,27,28. TGFM employs a sampling approach to account for uncertainty in cis-predicted expression, avoiding false positives that arise from noisy estimation of cis-genetic expression.

In detail, TGFM consists of four steps. In step 1, we apply SuSiE to perform eQTL fine-mapping of each gene-tissue pair in the external gene expression data set (estimating a posterior distribution of the causal cis-eQTL effect sizes for each gene-tissue pair). In step 2, we randomly sample 100 cis-predicted expression models for each gene-tissue pair from the posterior distributions of causal cis-eQTL effect sizes estimated in step 1 (Methods). In step 3, we apply SuSiE to perform disease fine-mapping in the target data set (estimating the PIP of each genetic element) 100 times, iterating over the sampled cis-predicted expression models for each gene-tissue pair from step 2. In step 4, we average the results of step 3 across the 100 disease fine-mapping runs. TGFM utilizes a custom implementation of the SuSiE algorithm that provides efficient estimation of PIPs across 100 parallel SuSiE runs that differ only in their cis-predicted expression models (Methods). TGFM inference requires only summary-level GWAS data^31,35 consisting of GWAS z-scores for each variant and in-sample LD between genetic variants, in addition to external eQTL data sets across tissues of interest.

TGFM increases fine-mapping power by specifying tissue-specific prior probabilities for each genetic element in a locus that are informed by genome-wide data, analogous to functionally informed variant-level fine-mapping methods such as PolyFun³⁶; TGFM assigns one prior causal probability π_t for each gene-tissue pair from tissue t and one prior causal probability π_nm for each non-mediated genetic variant. We estimate π_t and π_nm in each disease/trait separately by iteratively running a computationally efficient approximation to TGFM (Methods), starting with flat priors and updating π_t and π_nm at each iteration until convergence. When analyzing a given locus with TGFM, we normalize the prior causal probabilities to sum to 1, analogous to PolyFun³⁶. We account for uncertainty in estimates of π_t and π_nm by using genomic bootstrapping, randomly sampling 100 sets of values of π_t and π_nm (one for each of the 100 disease fine-mapping runs in step 3) and averaging TGFM results across the random samples.

We restrict cis-predicted expression models to cis-eQTLs within 500kb of each gene’s transcription start site (TSS). We only assign cis-predicted expression models to gene-tissue pairs that are well-predicted by genetic variants, using the SuSiE “purity filter”³⁰ (see Methods). We apply TGFM to fine-map any of the 2,682 overlapping 3Mb loci spanning the entire genome³⁶ that contain at least 50 genetic variants and at least one genetic variant with marginal GWAS p-value less than 1e-5. Further details, including sampling cis-predicted expression models from SuSiE posterior distributions of causal cis-eQTL effect sizes, the custom implementation of the SuSiE algorithm providing efficient estimation of PIPs across 100 parallel SuSiE runs, and the computationally efficient approximation to TGFM used when estimating tissue-specific prior causal probabilities, are provided in the Methods section. We have released open-source software implementing TGFM (see Code availability), as well as posterior distributions of causal eQTL effect sizes across tissues and genes, and TGFM PIPs from this study (see Data availability).

Simulations

We performed simulations using real genotypes to assess the calibration and power of TGFM to identify causal tissues and genes underlying GWAS associations. We used real genotypes from unrelated UK Biobank (UKBB) British samples³² to simulate both gene expression phenotypes (for each gene-tissue pair) and quantitative trait phenotypes. Default simulation parameters were specified as follows: the gene expression sample size ranged from 300 to 1000; the quantitative trait sample size was set to 100,000 (disjoint from gene expression samples); we analyzed 426,593 SNPs and 1,976 genes on chromosome 1 (following ref. ¹¹); the number of tissues was set to 10, of which 2 were causal for the quantitative trait; the quantitative trait architecture was simulated to have average polygenicity³⁷, consisting of 2,700 causal non-mediated variants and 300 causal gene-tissue pairs (150 for each causal tissue) with the expected heritability per causal genetic element (non-mediated variant or gene-tissue pair) set to 0.0001 (expected quantitative trait heritability of 0.3, 10% of which was mediated through gene expression, consistent with genome-wide estimates from MESC²⁸); causal non-mediated variants were randomly selected with probability proportional to their expected per-variant heritability based on baseline-LD model annotations^3,38–40 (estimated using S-LDSC³ applied to the UKBB trait White blood cell count) in order to make the simulations as realistic as possible; the genetic architecture of gene expression across tissues was specified following ref. ¹¹: roughly, each heritable gene-tissue pair was randomly assigned 5 causal cis-eQTLs (expected per-SNP heritability: 0.015), 2 of the 5 causal eQTLs were specific to each tissue, and 3 of the 5 causal eQTLs were shared across tissues with effect size covariance set to mimic that of GTEx tissues²⁵; and causal gene-tissue pairs were randomly selected from all genetically heritable genes (true cis-SNP-heritability > 0) in each of the causal tissues. Non-default simulation parameter values were also explored. We performed 100 independent simulations, and averaged results across simulations. Further details of the simulation framework are provided in the Methods section.

We compared TGFM to two previously published methods, coloc¹⁷ and FOCUS²⁴. Briefly, coloc calculates the posterior probability of a shared causal variant between a GWAS disease/trait and a gene expression trait from a single gene-tissue pair without considering correlations between genes or gene-tissue pairs. FOCUS assigns PIPs for the expression of each gene in a given tissue to have non-zero causal effect on disease, while modeling correlations between genes in that tissue but not modeling correlations between different tissues or correlations between genes and non-mediated genetic variants. FOCUS can naturally be extended to model correlations between all gene-tissue pairs (without modeling correlations between genes and non-mediated genetic variants); we refer to the resulting method as FOCUS-TG. (In contrast, coloc does not model correlations between genes, and cannot be extended in this way.)

We first evaluated the calibration of TGFM, coloc, FOCUS and FOCUS-TG to fine-map causal gene-tissue pairs. Calibration was assessed using empirical false discovery rate (FDR), estimated as the proportion of false-positive gene-tissue pairs among all gene-tissue pairs above a given PIP threshold. Following ref. ³⁶, we assessed whether the empirical FDR is less than or equal to (1 – PIP threshold), a more conservative choice than (1 – average PIP) (which has been shown to be slightly mis-calibrated in previous fine-mapping simulations^36,41; also see Methods and secondary analyses below). Results are reported in Figure 1a-b and Supplementary Table 1. TGFM produced well-calibrated PIPs at all eQTL sample sizes and PIP thresholds. In contrast, coloc, FOCUS, and FOCUS-TG were poorly calibrated across all PIP thresholds, even at large eQTL sample sizes. We attribute the superior calibration of TGFM over other approaches to its joint modeling of gene-tissue pairs and non-mediated variants, as well as its sampling procedure that accounts for uncertainty in genetically predicted gene expression (see secondary analyses below). Results were similar at other PIP thresholds (Supplementary Figure 1).

Figure 1: Calibration and power of tissue-gene fine-mapping methods in simulations.

(a,b) Average gene-tissue pair fine-mapping FDR across 100 simulations for various fine-mapping methods (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 (a) and PIP=0.9 (b). Dashed horizontal line denotes 1 – PIP threshold (see main text). Numerical results are reported in Supplementary Table 1. (c,d) Average gene-tissue pair fine-mapping power across 100 simulations for various fine-mapping methods (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 (c) and PIP=0.9 (d). Error bars denote 95% confidence intervals. Numerical results are reported in Supplementary Table 2.

We next evaluated the power of TGFM, coloc, FOCUS and FOCUS-TG to fine-map causal gene-tissue pairs. Results are reported in Figure 1c-d and Supplementary Table 2. TGFM was moderately well-powered to detect causal gene-tissue pairs at larger eQTL sample sizes, with power ranging from 0.02-0.17 across eQTL sample sizes at a PIP threshold of 0.5. Other methods (coloc, FOCUS, FOCUS-TG) achieved higher power than TGFM, but this is largely moot due to the poor calibration of those methods (Figure 1a-b). Results were similar at other PIP thresholds (Supplementary Figure 2).

We compared the calibration and power of TGFM for fine-mapping causal gene-tissue pairs, genes, or non-mediated genetic variants. Gene PIPs were computed by aggregating gene-tissue PIPs across all gene-tissue pairs corresponding to the gene (defining a gene as causal if at least one corresponding gene-tissue pair is causal; Methods). Causal eQTL variants for causal gene-tissue pairs were not considered to be false positives for variant-level calibration but were not included as true positives in variant-level power computations (also see secondary analyses below). Calibration results for TGFM (Gene-Tissue), TGFM (Gene) and TGFM (Variant) are reported in Figure 2a-b and Supplementary Table 3. TGFM produced well-calibrated gene-level and variant-level PIPs. In contrast, gene-level coloc, FOCUS, and FOCUS-TG PIPs were poorly calibrated across all PIP thresholds even at large eQTL sample sizes (Supplementary Figure 3, analogous to Figure 1a-b). Results were similar at other PIP thresholds (Supplementary Figure 4).

Figure 2: Calibration and power of fine-mapping different classes of genetic elements with TGFM in simulations.

(a,b) Average fine-mapping FDR across 100 simulations using TGFM for different classes of genetic elements (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 (a) and PIP=0.9 (b). Dashed horizontal line denotes 1 – PIP threshold (see main text). Numerical results are reported in Supplementary Table 3. (c,d) Average fine-mapping power across 100 simulations using TGFM for different classes of genetic elements (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 (c) and PIP=0.9 (d). Error bars denote 95% confidence intervals. Numerical results are reported in Supplementary Table 4.

Power results for TGFM (Gene-Tissue), TGFM (Gene) and TGFM (Variant) are reported in Figure 2c-d and Supplementary Table 4. TGFM attained higher power to fine-map causal genes than causal gene-tissue pairs, which is expected as fine-mapping causal genes is an easier problem. Power for variant-level fine-mapping was invariant to eQTL sample size, such that variant-level fine-mapping was more powerful than gene-tissue or gene-level fine-mapping at smaller eQTL sample sizes—particularly at the stringent PIP>0.9 threshold, at which the latter were severely underpowered. Results were similar at other PIP thresholds (Supplementary Figure 5).

We performed 9 secondary analyses. First, we ran TGFM with a uniform prior (same π_nm and π_t for all tissues) instead of the default tissue-specific priors inferred from genome-wide data. TGFM with a uniform prior remained well-calibrated (Supplementary Figure 6a-b) but suffered substantially reduced power (Supplementary Figure 6c-d), highlighting the benefit of tissue-specific priors informed by genome-wide data. Second, we ran TGFM with a uniform prior and a single cis-predicted expression model (based on posterior mean causal cis-eQTL effect sizes) instead of averaging results across 100 sampled cis-predicted expression models. TGFM without sampling cis-predicted expression models suffered poor calibration, particularly at smaller eQTL sample sizes (Supplementary Figure 6a-b), highlighting the advantages of the sampling approach to account for uncertainty in cis-predicted expression. However, the calibration of this method was still better than the calibration of FOCUS-TG (Figure 1a-b), perhaps because FOCUS-TG does not account for non-mediated genetic variants. Third, we performed an alternative calibration analysis of TGFM (Variant) where causal eQTL variants for causal gene-tissue pairs were considered false positives for variant-level calibration. TGFM (Variant) was miscalibrated at small eQTL sample sizes and high PIP thresholds in this alternative calibration analysis (Supplementary Figure 7). The calibration worsened at small eQTL sample sizes, likely due to decreased power to detect causal gene-tissue pairs at small eQTL sample sizes, forcing the unmodeled gene-tissue pair effects to be captured by non-mediated variants. Fourth, we modified our calibration analyses to assess whether the empirical FDR is less than or equal to (1 – average PIP), a less conservative choice than (1 – PIP threshold) (ref. ^36,41; see Methods). We determined that TGFM (Gene-Tissue) was slightly miscalibrated only at small eQTL sample sizes and low PIP thresholds, TGFM (Gene) was well-calibrated across all eQTL sample sizes and PIP thresholds analyzed, TGFM (Variant) was slightly mis-calibrated at high PIP thresholds regardless of eQTL sample size, and coloc, FOCUS, and FOCUS-TG were severely miscalibrated across all eQTL sample sizes and PIP thresholds analyzed (Supplementary Figures 1, 4). The slight miscalibration of TGFM (Gene-Tissue) and TGFM (Variant) when using (1 – average PIP) is consistent with previous simulations of variant-level fine-mapping methods using polygenic trait architectures^36,41. Fifth, we ran TGFM at different simulated GWAS sample sizes ranging from 50,000 to 200,000 (instead of the default sample size of 100,000). TGFM remained well-calibrated regardless of GWAS sample size but attained increased power at larger GWAS sample sizes (Supplementary Figure 8). TGFM fine-mapping power at PIP > 0.5 increased 2.0-fold when doubling the eQTL sample size vs. 1.2-fold when doubling the GWAS sample size (relative to an eQTL sample size of 500 and GWAS sample size of 100,000), suggesting that TGFM attains a greater benefit from increasing the eQTL sample size under our default parameter settings. Sixth, we ran TGFM at different values of heritability of gene expression, ranging from 0.05 to 0.1 (instead of the default value of 0.075). TGFM remained approximately well-calibrated regardless of gene expression heritability but attained increased power at larger values of gene expression heritability (Supplementary Figure 9), analogous to the impact of varying eQTL sample size (Figure 2). Seventh, we investigated TGFM’s ability to provide unbiased estimates of the proportion of causal genetic elements (gene-tissue pairs or non-mediated variants) that are gene-tissue pairs, a parameter closely related to the proportion of disease heritability mediated by gene expression (estimated in ref. ²⁸). We first calculated the proportion of fine-mapped genetic elements at various PIP thresholds that are gene-tissue pairs. This approach yielded either upward or downward biased estimates of the true proportion depending on the eQTL sample size and PIP threshold (Supplementary Figure 10a), reflecting differential discovery power of non-mediated variants and gene-tissue pairs as a function of both eQTL sample size and PIP threshold (Figure 2c-d). We next calculated the expected proportion of fine-mapped genetic elements that are gene-tissue pairs by summing PIPs across genetic elements. This approach yielded conservative estimates of the true proportion, becoming less conservative at larger eQTL sample sizes (Supplementary Figure 10b), suggesting that this statistic can provide a conservative lower bound on the true proportion of causal genetic elements that are gene-tissue pairs. Eighth, we investigated the calibration and power of inference of disease-critical tissues via the TGFM tissue-specific prior, using genomic bootstrap to assess significance (see Methods); although inference of disease-critical tissues is not a primary goal of TGFM (and there exist previous methods for inferring disease-critical tissues using eQTL data^5,11), assessing the TGFM tissue-specific prior is of interest. We determined that inference of disease-critical tissues via the TGFM tissue-specific prior was well-calibrated and well-powered (Supplementary Figure 11). Ninth, we compared TGFM (Variant) PIPs with variant-level PIPs inferred by SuSiE. The variant-level PIPs were strongly correlated and consistent in magnitude, particularly after excluding from the analysis any variant that was correlated with a TGFM fine-mapped gene-tissue pair (Supplementary Figure 12).

Tissue-gene fine-mapping of 45 diseases and complex traits

We applied TGFM to fine-map tissues and genes for 45 diseases and complex traits from the UK Biobank (average N = 316K; previously analyzed with functionally informed variant-level fine-mapping³⁶; Methods and Supplementary Table 5) using gene expression data from 47 GTEx tissues²⁵, which were aggregated into 38 meta-tissues¹¹ (average N=259; Supplementary Table 6) to minimize eQTL sample size differences across tissues; below, we refer to these as “tissues” for simplicity. For each disease/trait we applied TGFM to 2,682 overlapping 3-Mb loci³⁶ spanning 119,270 (protein-coding) gene-tissue pairs with cis-predicted expression models (3,139 genes per tissue on average, Supplementary Table 6; 13,700 unique genes) and 10,545,304 genetic variants with MAF ≥ 0.005. We assigned a PIP to each gene-tissue pair, gene, and non-mediated genetic variant using the locus in which the genetic element was most central³⁶; the position of a gene was determined by its TSS. TGFM running times are reported in Supplementary Table 7. We have publicly released PIPs for all gene-tissue pairs, genes, and non-mediated variants for each disease/trait (see Data Availability).

Results are summarized in Figure 3 (16 independent traits³⁶), Supplementary Figure 13 (all 45 traits), and Supplementary Table 8. Across all 45 traits, TGFM identified 711 gene-tissue-trait triplets, 2,800 gene-trait pairs (aggregating gene-tissue PIPs across tissues; see above), and 5,893 non-mediated genetic variant-trait pairs at PIP > 0.5 (43 gene-tissue-trait triplets, 382 gene-trait pairs, and 2,675 non-mediated genetic variant-trait pairs at PIP > 0.9). The number of gene-tissue pairs with PIP > 0.5 ranged from 0 (Number of children) to 56 (White blood cell count) across traits, and ranged from 0 (coronary artery) to 197 (whole blood) across tissues. Of the 711 gene-tissue-trait triplets with PIP > 0.5, 180 (25%) had TWAS p-value > 0.05/119,270 = 4.2 × 10⁻⁷ (the Bonferroni significance threshold based on 119,270 gene-tissue pairs with cis-predicted expression models⁴²) and 136 (19%) had no nearby variants in the same fine-mapping region with GWAS p-value ≤ 5 × 10⁻⁸. Of the 110,828 gene-tissue-trait triplets with TWAS p-value ≤ 4.2 × 10⁻⁷, only 531 (0.5%) had TGFM PIP > 0.5. The proportion of causal genetic elements (variants or gene-tissue pairs) that were gene-tissue pairs was equal to 8.1% when counting PIP > 0.5 genetic elements across 16 independent traits (271 gene-tissue pairs and 3,074 non-mediated genetic variants), or 10.1% when summing PIPs across 16 independent traits (Methods), consistent with previous estimates of the proportion of trait heritability mediated by gene expression²⁸.

Figure 3: Summary results of fine-mapping genetic elements with TGFM for 16 independent UK Biobank diseases and traits.

We report the number of (a) Gene-tissue pairs, (b) Genes, and (c) (non-mediated) Variants fine-mapped using TGFM (y-axis; square root scale) across 16 independent UK Biobank traits (x-axis) at various PIP thresholds ranging from 0.2 to 1.0 (color-bars). Horizontal black lines denote the number of genetic elements fine-mapped at PIP=0.5. FEV1:FVC, ratio of forced expiratory volume in 1 second to forced vital capacity; Platelet volume, Mean platelet volume; Diastolic BP, Diastolic blood pressure; Reticulocyte count, High-light scatter reticulocyte count; Corp. hemoglobin, Mean corpuscular hemoglobin; FVC, Forced vital capacity. Results for all 45 UK Biobank diseases and traits are reported in Supplementary Figure 13, and numerical results are reported in Supplementary Table.

For each trait, we identified the most frequently implicated tissues, computing the proportion of fine-mapped gene-tissue pairs in each tissue by counting gene-tissue pairs with PIP > 0.5 (Methods). Results are reported in Figure 4a (14 representative traits), Supplementary Figure 14 (all 45 traits), and Supplementary Table 9. Implicated tissue-trait pairs were consistently concentrated in expected trait-critical tissues, e.g., 50% in spleen and 30% in lymphocytes for All autoimmune disease, 71% in skin (sun-exposed) for Vitamin D level, 60% in liver and 25% in whole blood for Total cholesterol, and 36% in artery tibial and 32% in artery aorta for Diastolic blood pressure. Results were similar at other PIP thresholds (Supplementary Figure 15). Separately, we assessed the statistical significance of implicated tissue-trait pairs by applying genomic bootstrapping to the TGFM tissue-specific prior (Methods). Results are reported in Supplementary Figure 14 and Supplementary Table 9. This approach identified 23 tissue-trait pairs with FDR ≤ 0.05 (64 tissue-trait pairs with FDR ≤ 0.2). Despite limited power, the TGFM tissue-specific prior identified 6 traits with more than one significantly associated tissue (FDR ≤ 0.05; 17 traits at FDR ≤ 0.2); this result motivates the use of TGFM over a two-step approach of separately identifying the causal gene using a gene-level fine-mapping method²⁴ and identifying the causal tissue using a method for identifying trait-critical tissues^5,7,11. Significant tissue-trait pairs were consistently concentrated in expected trait-critical tissues, analogous to Figure 4a. Although inference of trait-critical tissues is not a primary goal of TGFM (and there exist previous methods for inferring trait-critical tissues using eQTL data^5,11), these results validate the TGFM tissue-specific prior.

Figure 4: Validation of fine-mapped tissues and genes.

(a) Proportion of fine-mapped gene-tissue pairs in each tissue (x-axis) for 14 representative traits (y-axis). Proportions for each trait were calculated by counting the number of gene-tissue pairs with TGFM PIP > 0.5 in each tissue and normalizing the counts across tissues. Tissues are only displayed if their proportion is > 0.2 for at least one of the 14 representative traits. Results for all remaining traits and tissues are reported in Supplementary Figure 14, and numerical results are reported in Supplementary Table 9. The 14 representative traits were selected by including 12 of the 16 independent traits (Figure 3) with many high PIP gene-tissue pairs and two additional, interesting traits (All autoimmune and Vitamin D levels). (b) Average PoPS score (y-axis) of genes stratified by TGFM (Gene) PIP (x-axis). Averages were computed across genes for the 16 independent traits listed in Figure 3, as both PoPS score and TGFM gene PIPs are trait-specific. Error bars denote 95% confidence intervals. Numerical results are reported in Supplementary Table 11.

We observed instances where TGFM was unable to distinguish the causal tissue within a small set of highly correlated tissues. For example, for waist-hip-ratio adjusted for BMI (WHRadjBMI), TGFM fine-mapped only 1 gene-tissue pair in adipose visceral and 0 gene-tissue pairs in adipose subcutaneous tissue at PIP > 0.5, despite strong prior evidence of the role of adipose tissue in WHRadjBMI^7,43; however, TGFM fine-mapped 10 gene-tissue pairs in adipose (defined as adipose subcutaneous U adipose visceral) at PIP > 0.5 when summing PIPs of gene-tissue pairs across the two tissues (Methods). Unsurprisingly, the average correlation in cis-predicted gene expression of adipose subcutaneous vs. adipose visceral across all genes included in TGFM was very large (0.92). Average correlations for all pairs of tissues are reported in Supplementary Figure 16 and Supplementary Table 10; the average correlation ranged from 0.48 (whole blood and brain cerebellum) to 0.96 (brain substantia nigra and brain spinal cord), and the correlation patterns reflected known relationships between tissues.

We sought to validate genes prioritized by TGFM by assessing their overlap with genes prioritized by independent gene sets/scores. First, we assessed overlap with PoPS⁴⁴, a similarity-based gene score that prioritizes trait-relevant genes from gene-level features such as cell-type-specific expression. Results are reported in Figure 4b and Supplementary Table 11. The average PoPS score increased as a function of TGFM (Gene) PIPs in different bins, from −0.0046 (s.e. 0.0031) for trait-gene pairs with PIP < 0.01 to 0.34 (s.e. 0.047) for trait-gene pairs with PIP ≥ 0.9; this provides an external validation of genes prioritized by TGFM. Second, we assessed overlap with 10 non-disease-specific gene sets (e.g. High-pLI genes⁴⁵) that are known to be enriched/depleted for disease heritability (from Figure 4 of ref. ²⁶). Results are reported in Supplementary Figure 17a and Supplementary Table 12. Genes with TGFM (Gene) PIP > 0.5 were significantly (FDR < 0.05) overrepresented in Epigenetic modifier genes⁴⁶ (odds ratio: 1.78), High-pLI genes⁴⁵ (odds ratio: 1.31) and Mouse Genome Informatics (MGI) essential genes⁴⁷ (odds ratio: 1.31) and underrepresented in genes with the most SNPs within 100kb (odds ratio: 0.48), consistent with previous findings²⁶. Results were similar at other PIP thresholds (Supplementary Figure 17b).

TGFM pinpoints disease genes and tissues of action

We highlight 6 examples of fine-mapped (PIP > 0.5) gene-tissue-trait triplets that recapitulate known biology or nominate biologically plausible mechanisms. First, TGFM fine-mapped TPO (Thyroid Peroxidase) in thyroid for Hypothyroidism (Figure 5a; gene-tissue PIP: 0.88; gene PIP: 0.88). TPO is an enzyme involved in thyroid hormone biosynthesis and its involvement in Hypothyroidism has been well-studied^48,49, and TPO has been linked to Hypothyroidism in genetic association studies⁵⁰. Thyroid was also identified as a Hypothyroidism-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.21, bootstrap p = 0.04 for tissue-specific prior; Supplementary Figure 14).

Figure 5: Examples of fine-mapped gene-tissue-disease triplets identified by TGFM.

We report 6 example loci for which TGFM fine-mapped a gene-tissue pair (PIP > 0.5). In each example we report the marginal GWAS and TWAS association -log₁₀ p-values (y-axis) of non-mediated variants (blue circles) and gene-tissue pairs (red triangles). Marginal TWAS association -log₁₀ p-values were calculated by taking the median -log₁₀ TWAS p-value across the 100 sets of sampled cis-predicted expression models for each gene-tissue pair. The genomic position of each gene-tissue pair (x-axis) was based on the gene’s TSS. The color shading of each variant and gene-tissue pair was determined by its TGFM PIP. Any genetic element with TGFM PIP > 0.5 was made larger in size. Dashed horizontal blue and red lines represent GWAS significance (5 × 10⁻⁸) and TWAS significance (4.2 × 10⁻⁷) thresholds, respectively. Numerical results are reported in Supplementary Table 13.

Second, TGFM fine-mapped OVOL1 in lymphocytes for Eczema (Figure 5b, gene-tissue PIP: 0.75, gene PIP: 0.76). Recent work demonstrated that loss of OVOL1 results in skin inflammation and Eczema via an immune-mediated mechanism in T cells (a lymphocyte cell type)^51–53, and OVOL1 has previously been linked to Eczema in genetic association studies⁵⁴. Lymphocyte was suggestively implicated as an Eczema-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.36, bootstrap p = 0.08 for tissue-specific prior; Supplementary Figure 14). There exist 28 other gene-tissue pairs within 1 Mb of the TSS of OVOL1 (4 of which correspond to OVOL1 in a tissue other than lymphocytes) that had significant TWAS p-values (p ≤ 0.05 / 119,270 = 4.2 × 10⁻⁷) but were not fine-mapped by TGFM (all with PIP ≤ 0.01), underscoring the benefit of joint fine-mapping of gene-tissue pairs. TGFM also fine-mapped one non-mediated variant (rs56225074; PIP: 0.55) within 1 Mb of the TSS of OVOL1, perhaps due to finite eQTL sample size and/or absence of the causal cell-type or context in GTEx expression data^28,55 (see Discussion).

Third, TGFM fine-mapped PADI1 in skin (sun exposed) for Vitamin D level (Figure 5c; gene-tissue PIP: 0.64, gene PIP: 0.65). TGFM assigned PADI1 a PIP of 0.64 in skin (sun exposed) tissue (TWAS p = 1.7 × 10⁻²⁰) and a PIP of 0.01 in skin (not sun exposed) tissue (TWAS p = 3.8 × 10⁻¹⁹), demonstrating TGFM’s ability to distinguish trait-critical tissues from closely related tissues. PADI1 is known to interact with keratins during epidermal differentation^56,57, and has previously been linked to Vitamin D level in genetic association studies^58,59. Skin (sun exposed) was also identified as a Vitamin D level-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.72, bootstrap p = 0.008 for tissue-specific prior; Supplementary Figure 14).

Fourth, TGFM fine-mapped MC4R in brain cerebellum for Systolic blood pressure (Figure 5d, gene-tissue PIP: 0.66, gene PIP: 0.66), despite a non-significant TWAS p-value (p = 3.3 × 10⁻⁵ > 0.05/119,270 = 4.2 × 10⁻⁷). Previous work has shown that activation of MC4R in the central nervous system increases sympathetic nervous system activity and blood pressure^60–62, and MC4R has previously been linked to hypertension in genetic association studies⁶³. Brain cerebellum was identified as a Systolic blood pressure-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.16, bootstrap p = 0.002 for tissue-specific prior; Supplementary Figure 14), consistent with previous studies^61,64,65. Strong support of brain cerebellum genes from the tissue-specific prior enabled TGFM to prioritize MC4R-brain cerebellum instead of more significant nearby GWAS associations of non-mediated genetic variants.

Fifth, TGFM fine-mapped SLC20A2 in artery aorta for Systolic blood pressure (Figure 5e, gene-tissue PIP: 0.91, gene PIP: 0.91). Previous work has shown that loss of SLC20A2 results in Human idiopathic basal ganglia calcification^66,67 and well as arteriolar calcification⁶⁸, but SLC20A2 has not previously been linked to Systolic blood pressure to our knowledge. Artery aorta was also identified as a Systolic blood pressure-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.36, bootstrap p = 0.0008 for tissue-specific prior; Supplementary Figure 14). We note that we have highlighted two fine-mapped gene-tissue pairs for Systolic blood pressure involving two different tissues (artery aorta and brain cerebellum); this demonstrates the advantages of TGFM over a two-step approach of separately identifying the causal gene using a gene-level fine-mapping method²⁴ and identifying the causal tissue using a method for identifying trait-critical tissues^5,7,11.

Sixth, TGFM fine-mapped NMT1 in brain cerebellum for Menarche age (Figure 5f, gene-tissue PIP: 0.53, gene PIP: 0.86); TGFM also assigned NMT1 in brain limbic a PIP of 0.10. Recent work demonstrated that NMT1 cis-predicted expression in brain tissues was marginally associated with image-derived brain phenotypes^69,70, but NMT1 has not previously been linked to Menarche age to our knowledge. Brain cerebellum was also identified as a Menarche age-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.25, bootstrap p = 0.05 for tissue-specific prior; Supplementary Figure 14), consistent with previous studies^3,71. There exist 20 other gene-tissue pairs within 1 Mb of the TSS of NMT1 (11 of which correspond to NMT1 in a tissue other than brain cerebellum) that had significant TWAS p-values (p ≤ 0.05 / 119,270 = 4.2 × 10⁻⁷) but were not fine-mapped by TGFM (all with PIP ≤ 0.01), underscoring the benefit of joint fine-mapping of gene-tissue pairs. Additional examples are discussed in the Supplementary Note and Supplementary Figure 18 (including an additional implicated tissue for Vitamin D level).

TGFM pinpoints disease genes and fine-grained cell-types in single-cell eQTL data

It is widely hypothesized that eQTL in fine-grained cell types/contexts may help resolve the limited proportion of disease heritability explained by eQTL in bulk tissues^28,55. Accordingly, we applied TGFM to fine-map disease genes and fine-grained cell types in 45 disease and complex traits from the UK Biobank (same diseases/traits as above) using a recently generated single-cell eQTL data set spanning 9 fine-grained peripheral blood mononuclear cell (PBMC) cell types³³ (average N=112; Methods and Supplementary Table 14). We converted single-cell expression measurements in each cell type to pseudobulk expression (see Methods), and also included the 38 GTEx tissues analyzed above. For each disease/trait, we applied TGFM to 2,682 overlapping 3-Mb loci spanning 1,851 (protein-coding) gene-PBMC cell type pairs with cis-predicted expression models (206 genes per cell type on average, Supplementary Table 14; 1,066 unique genes), 119,270 (protein-coding) gene-tissue pairs with cis-predicted expression models (GTEx tissues; 3,139 genes per tissue on average, Supplementary Table 6; 13,700 unique genes) and 10,545,304 genetic variants with MAF ≥ 0.005. We assigned a PIP to each gene-PBMC cell type pair, gene-tissue pair, gene, and non-mediated genetic variant, analogous to above. We have publicly released PIPs for all gene-PBMC cell type pairs, gene-tissue pairs, genes, and non-mediated variants for each disease/trait (see Data Availability).

Results are reported in Figure 6a-b (18 representative traits), Supplementary Figure 19 (all 45 traits), and Supplementary Table 15. Across all 45 traits, TGFM identified 30 gene-PBMC cell type-trait triplets at PIP > 0.5; TGFM was not sufficiently powered to detect any gene-PBMC cell type-trait triplets at PIP > 0.9, likely due to the limited single-cell eQTL sample size. Of the 30 gene-PBMC cell type-trait triplets with PIP > 0.5, 25 involved a trait locus that had no confidently fine-mapped gene-tissue pair (no TGFM PIP > 0.5 in GTEx-only analysis corresponding to Figures 3-5). For the 18 representative traits, TGFM identified 12 gene-PBMC cell type pairs at PIP > 0.5 for autoimmune diseases and blood cell traits (Figure 6a) vs. 5 gene-PBMC cell type pairs at PIP > 0.5 for non-blood-related traits (Figure 6b; includes 2 gene-PBMC cell type pairs for Menarche age, which we conservatively labeled as non-blood-related even though it has been reported to be partially immune-mediated^72,73), increasing to 23 vs. 7 for all 45 traits; this validates the importance of gene expression in blood cell types for blood-related traits.

Figure 6: Summary results of fine-mapping gene-PBMC cell type pairs with TGFM for 18 representative UK Biobank diseases and traits.

(a-b) Number of gene-PBMC cell type pairs fine-mapped using TGFM (y-axis; square root scale) across 18 representative UK Biobank traits (x-axis) at various PIP thresholds ranging from 0.2 to 1.0 (color-bar), distinguishing between (a) autoimmune diseases and blood cell traits and (b) non-blood-related traits. Horizontal black lines denote the number of gene-PBMC cell type pairs fine-mapped at PIP=0.5. The 18 representative traits consist of the 16 independent traits (Figure 3) and two additional, interesting traits (All autoimmune and Vitamin D levels). Results for all 45 UK Biobank diseases and traits are reported in Supplementary Figure 19. (c-e) Number of gene-PBMC cell type pairs fine-mapped using TGFM (y-axis; square root scale) in each of the 9 PBMC cell types (x-axis) at various PIP thresholds ranging from 0.2 to 1.0 (color-bar) for (c) Monocyte count, (d) Lymphocyte count, and (e) All autoimmune disease. Horizontal black lines denote the number of gene-PBMC cell type pairs fine-mapped at PIP=0.5. Results for all 45 UK Biobank diseases and traits are reported in Supplementary Figure 20. Numerical results are reported in Supplementary Table 15.

For each trait, we identified the most frequently implicated PBMC cell types, computing the proportion of fine-mapped gene-PBMC cell type pairs in each PBMC cell type by counting gene-PBMC cell type pairs with PIP > 0.5. Results are reported in Figure 6c-e (3 representative blood-related traits), Supplementary Figure 20 (all 45 traits), and Supplementary Table 15. Despite low power, the fine-mapped gene-PBMC cell type pairs at PIP > 0.5 were concentrated in expected trait-critical PBMC cell types, e.g. 62.5% in non-classical monocyte (ncM) cells and 37.5% in classical monocyte (cM) cells for Monocyte count, 100% in CD4⁺ T (CD4) cells for Lymphocyte count, and 100% in CD8⁺ T (CD8) cells for All autoimmune disease^7,74,75. Results were similar at other PIP thresholds (Supplementary Figure 20).

We highlight 4 examples of fine-mapped (PIP > 0.5) gene-PBMC cell type-trait triplets that recapitulate known biology or nominate biologically plausible mechanisms. First, TGFM fine-mapped CTLA4 (Cytotoxic T-lymphocyte associated protein 4) in CD8⁺ T (CD8) cells for All autoimmune disease (Figure 7a; gene-PBMC cell type PIP: 0.84; gene PIP: 0.85). CTLA4 is a well-studied regulator of immune responses in T cells^76,77 and has previously been linked to autoimmune diseases (rheumatoid arthritis, systemic lupus erythematosus, and type 1 diabetes) in genetic association studies^78,79. CD8 cells were also identified as an All autoimmune disease-critical cell type genome-wide (bootstrap p = 0.05 for tissue-specific prior; Supplementary Table 17). CTLA4 did not meet the criteria for having a cis-predicted expression model in any GTEx tissue, underscoring the advantages of modeling gene expression in fine-grained PBMC cell types.

Figure 7: Examples of fine-mapped gene-PBMC cell type-disease triplets identified by TGFM.

We report 4 example loci for which TGFM fine-maps a gene-PBMC cell type pair (PIP > 0.5). In each example we report the marginal GWAS and TWAS association -log₁₀ p-values (y-axis) of non-mediated variants (blue circles) and gene-tissue (or gene-PBMC cell type) pairs (red triangles). Marginal TWAS association - log₁₀ p-values were calculated by taking the median -log₁₀ TWAS p-value across the 100 sets of sampled cis-predicted expression models for each gene-tissue (or gene-PBMC cell type) pair. The genomic position of each gene-tissue (or gene-PBMC cell type) pair (x-axis) was based on the gene’s TSS. The color shading of each variant and gene-tissue (or gene-PBMC cell type) pair was determined by its TGFM PIP. Any genetic element with TGFM PIP > 0.5 was made larger in size. Dashed horizontal blue and red lines represent GWAS significance (5 × 10⁻⁸) and TWAS significance (4.1 × 10⁻⁷) thresholds, respectively. Numerical results are reported in Supplementary Table 16.

Second, TGFM fine-mapped CD52 in classical monocyte (cM) cells for Monocyte count (Figure 7b; gene-PBMC cell type PIP: 0.55; gene PIP: 0.88; gene-tissue PIP ≤ 0.05 in all GTEx tissues); TGFM also assigned non-negligible PIPs to CD52 in multiple correlated PBMC cell types and GTEx tissues including 0.29 in non-classical monocyte (ncM) cells and 0.05 in GTEx whole blood. Previous work demonstrated that CD52 regulates immune homeostasis in monocytes and T cells by inhibiting NF-κB signalling^80–82, but CD52 has not previously been linked to Monocyte count to our knowledge. cM cells were also identified as a Monocyte count-critical cell type genome-wide (bootstrap p = 0.04 for tissue-specific prior; Supplementary Table 17). There exist 56 other gene-tissue pairs and 3 other gene-PBMC cell type pairs within 1 Mb of the TSS of CD52 (14 and 2 of which correspond to CD52 in a cell type or tissue other than cM, respectively) with significant TWAS p-values (p ≤ 0.05 / 121,121 = 4.1 × 10⁻⁷, where 121,121 = 119,270 + 1,851) but not fine-mapped by TGFM (all with PIP ≤ 0.01), underscoring the benefit of joint fine-mapping of gene-tissue and gene-PBMC cell type pairs. CD52 was not prioritized in any tissue in the GTEx-only analysis of Figure 3 (highest gene-tissue PIP = 0.07 (whole blood)), underscoring the advantages of modeling gene expression in fine-grained PBMC cell types.

Third, TGFM fine-mapped KLF13 in CD4⁺ T (CD4) cells for Lymphocyte count (Figure 7c; gene PBMC cell type PIP: 0.75; gene PIP: 0.75; gene-tissue PIP ≤ 0.01 in all GTEx tissues) despite a non-significant TWAS p-value (p = 3.5 × 10⁻⁶ > 0.05 / 121,121 = 4.1 × 10⁻⁷). KLF13 has been shown to regulate lymphocyte development and survival^83,84, but KLF13 has not previously been linked to Lymphocyte count in genetic association studies to our knowledge. CD4 cells were suggestively implicated as a Lymphocyte count-critical cell type genome-wide (bootstrap p = 0.12 for tissue-specific prior; Supplementary Table 17). KLF13 was not prioritized in any tissue in the GTEx-only analysis of Figure 3 (highest gene-tissue PIP = 0.002 (brain cerebellum)), again underscoring the advantages of modeling gene expression in fine-grained PBMC cell types.

Fourth, TGFM fine-mapped HMGB1 in B cells for Mean corpuscular hemoglobin (Figure 7d; gene-tissue PIP: 0.75; gene PIP: 0.75). HMGB1 has previously been shown to mediate anemia of inflammation (i.e. anemia resulting from a prolonged immune response) in mice^85,86, but HMGB1 has not previously been linked to Mean corpuscular hemoglobin to our knowledge. B cells were also identified as a Mean corpuscular hemoglobin-critical cell type genome-wide (bootstrap p = 0.05 for tissue-specific prior; Supplementary Table 17). HMGB1 did not meet the criteria for having a cis-predicted expression model in any GTEx tissue, again underscoring the advantages of modeling gene expression in fine-grained PBMC cell types. TGFM also fine-mapped one non-mediated variant (rs149180914; PIP: 0.59) within 1 Mb of the TSS of HMGB1 (see Discussion).

Discussion

We developed a new method, TGFM, that jointly fine-maps causal gene-tissue pairs and non-mediated genetic variants at disease-associated loci. We applied TGFM to 45 UK Biobank diseases and traits using 38 GTEx tissues and identified many causal gene-tissue pairs (PIP > 0.5), which were concentrated in known disease-critical tissues^2–11 and strongly enriched in known disease-relevant genes^44,45. Causal gene-tissue pairs identified by TGFM recapitulated known biology, but also included biologically plausible novel findings. We further applied TGFM to single-cell eQTL data from 9 cell types in PBMC (analyzed jointly with GTEx tissues) and identified additional causal gene-PBMC cell type pairs (PIP > 0.5), primarily for autoimmune disease and blood cell traits.

TGFM is distinct from previous methods for fine-mapping causal genes in three ways. First, TGFM identifies causal gene-tissue pairs, not just causal genes. Second, TGFM jointly models the disease contribution of each gene-tissue pair and non-mediated variant, disentangling causal gene-tissue pairs from both tagging gene-tissue pairs and tagging non-mediated genetic variants. Third, TGFM employs a sampling procedure that accounts for uncertainty in cis-predicted expression models. Our simulations show that TGFM attains accurate calibration, in contrast to previous methods such as coloc¹⁷ and FOCUS²⁴ (Figure 1). We attribute the superior calibration to both our joint modeling of gene-tissue pairs and non-mediated variants, and our sampling procedure that accounts for uncertainty cis-predicted expression models (Supplementary Figure 6a-b). We note that a recent preprint developed a method, causal-TWAS²⁷, to jointly fine-map genes (from a single tissue) and non-mediated genetic variants; however causal-TWAS only fine-maps genes, not fine-map gene-tissue pairs, and does not account for uncertainty in cis-predicted expression models. Another method, CAFEH¹⁹, was recently developed to identify causal variants underlying GWAS and eQTL in multiple tissues; however, CAFEH does not model co-regulation between gene-tissue pairs and is thus unable to distinguish causal from tagging gene-tissue pairs. Other recent studies have made valuable contributions in nominating causal gene-tissue pairs for disease by linking fine-mapped causal variants to causal genes using tissue-specific SNP-to-gene linking strategies such as Activity-By-Contact or EpiMap enhancer-gene linking^12–14. However, this approach, unlike TGFM, is based on heuristic prioritization and does not provide direct evidence that the regulatory variant’s effect is mediated by the nominated gene-tissue pair.

We note several limitations of our work. First, TGFM leverages in-sample LD from the GWAS sample^35,87 (analogous to other fine-mapping methods³⁶), but in-sample LD may be unavailable in some applications (e.g. disease consortium meta-analyses). Following ref. ³⁶, our recommendation when in-sample LD is unavailable is as follows: if there exists a LD reference panel from a population closely matching the GWAS sample population spanning at least 10% of the GWAS sample size, run the default version of TGFM with the LD reference panel; otherwise, run TGFM assuming a single causal genetic element per locus (in the latter case, no LD reference panel is needed). Second, TGFM may be susceptible to false positives in the case of unassayed causal genetic elements (analogous to other fine-mapping methods^23,30,36,41). Specifically, if the causal gene-tissue pair is not assayed, TGFM may prioritize a correlated assayed gene-tissue pair or a correlated non-mediated genetic variant. We anticipate that this limitation will be mitigated over time as emerging eQTL data sets increasingly capture diverse tissues, cell types, and cellular contexts⁸⁸. Third, TGFM is only moderately well-powered to detect causal gene-tissue pairs, particularly at lower eQTL sample sizes (Figure 1). In addition, at low eQTL sample sizes, undiscovered causal gene-tissue pairs may be falsely prioritized as non-mediated genetic variants (Supplementary Figure 7). We anticipate this limitation will be mitigated over time as eQTL data sets increase in size⁸⁹. Fourth, a recent study showed that GWAS and eQTLs studies are well-powered to detect different types of genetic variants, limiting the number of GWAS associations that can be explained by eQTL data at current sample sizes⁹⁰ Nevertheless, TGFM identified causal gene-tissue pairs at hundreds of GWAS loci using current eQTL data, and we anticipate that TGFM will identify a larger number of causal gene-tissue pairs over time as eQTL data sets increase in size⁸⁹ and capture increasingly diverse tissues, cell types, and cellular contexts⁸⁸. Fifth, it is theoretically possible that TGFM could prioritize gene-tissue pairs due to reverse causality, whereby the disease/trait causally impacts the gene-tissue pair. However, we believe that this is very unlikely in practice, as cis-eQTL variants with a detectable effect on gene expression at current eQTL sample sizes explain a substantial proportion of gene expression variation^25,91,92 whereas diseases/traits are highly polygenic with each causal variant (outside the HLA locus, which we exclude from our analyses) explaining a small proportion of disease/trait variation^37,93–95. Sixth, we have focused here on single-ancestry fine-mapping, but an important future direction is to extend this work to multi-ancestry fine-mapping (incorporating multi-ancestry eQTL analysis^96,97), which is likely to further increase fine-mapping power^98–100. Seventh, an important future direction is to extend TGFM to incorporate variant-level functional annotations that are enriched for disease heritability^3,38–40, which is likely to further increase fine-mapping power^36,101. Eighth, another important future direction is to extend TGFM to incorporate gene sets that are enriched for disease heritability explained by cis-predicted expression²⁶, which may also further increase fine-mapping power. Finally, we have focused here on cis-genetic components of gene expression, but TGFM could be extended to genetic components of other molecular traits^102–107. Despite these limitations, TGFM is a robust and powerful method for fine-mapping causal tissues and genes at disease-associated loci.

Methods

TGFM model overview

TGFM estimates the posterior inclusion probability (PIP) for each genetic element (gene-tissue pair or genetic variant) to have a non-zero causal effect on disease using a model that includes mediated causal effects of each gene-tissue pair (via the cis-genetic component of expression of a given gene in a given tissue) and non-mediated causal effects of each genetic variant: where Y denotes the phenotype vector across GWAS individuals, g indexes genes, t indexes tissues, X is the matrix of genotypes, W_gt is the vector of the cis-genetic component of gene expression across GWAS individuals in gene g and tissue t, α_gt denotes the (scalar) effect of cis-genetic expression in gene g and tissue t on the disease or trait, β denotes the vector of non-mediated causal effects of each genetic variant on the disease or trait, and ϵ denotes environmental noise. We assume that the trait Y, the cis-genetic component of gene expression W_gt in each gene g and tissue t, and the genotype vector of each variant (each column of X) are standardized to have mean 0 and variance 1. We model the cis-genetic component of gene expression as a linear combination of variant-level effects: where δ_gt denotes the vector of causal cis-eQTL effect sizes of each variant on gene expression in gene g and tissue t. We emphasize that we model the phenotype Y as a linear combination of the unobserved true cis-genetic component of gene expression W_gt (a deterministic function of the unobserved true causal eQTL effect sizes Δ_gt) in each gene and tissue. The predicted cis-genetic component of gene expression can be estimated (according to predicted causal eQTL effect sizes ), with uncertainty, from finite sample-size eQTL data sets in the specified tissue t, and provide noisy estimates of the true unobserved cis-genetic component of gene expression W_gt. Later (see below), we explain how TGFM models uncertainty in predicted cis-genetic expression.

TGFM places the Sum of Single Effects (SuSiE)^30,31 fine-mapping prior distribution on the vector of causal disease (mediated and non-mediated) effect sizes: where α denotes the vector of expression-mediated causal effects of genetic gene expression of each gene-tissue pair on the disease or trait, β denotes the vector of non-mediated causal effects of each genetic variant on the disease or trait, [α, β] denotes the concatenated vector of mediated and non-mediated genetic effects, l indexes fine-mapping components where a single component represents the disease signal from a single genetic element, γ_l denotes a Categorical random variable indicating which one of the genetic elements disease component l comes from, π denotes the simplex vector of prior probabilities on each genetic element being causal, d_l denotes a Gaussian random variable specifying the causal effect size of component l, and denotes the prior variance on d_l. This approach assumes the true causal disease effect sizes originate in a small number (l) of genetic elements with non-zero effects.

TGFM will automatically infer posterior distributions on the random variables defining equations 5 and 6 (γ_l, d_l, ; inference details provided below). Posterior inclusion probabilities (PIPs), or the probability that a genetic element has a non-zero effect on disease, can be calculated for each genetic element from these inferred posterior distributions as follows: where j indexes genetic elements, l indexes components, PIP_j denotes the PIP for genetic element j, and denotes the expected value of the posterior distribution on γ_l for genetic element j.

Sum of Single Effects (SuSiE) fine-mapping prior distribution and inference

The SuSiE prior distribution was developed in refs. ^30,31 for the purpose of fine-mapping trait-causal variants. We briefly summarize the SuSiE fine-mapping prior distribution here: where β denotes the vector of causal effects of each genetic variant on the disease or trait, l indexes fine-mapping components where a single component represents the disease signal from a single genetic variant, γ_l denotes a Categorical random variable indicating which one of the genetic variants disease component l comes from, π denotes the simplex vector of prior probabilities on each genetic variant being causal, d_l denotes a Gaussian random variable specifying the causal effect size of component l, and denotes the prior variance on d_l.

Briefly, the SuSiE prior distribution assumes that only a subset of variants (L total) have non-zero effect (i.e., are causal) on the trait, the effect sizes of each causal variant are independent, and the trait causal effect sizes can be calculated by summing the causal effect size of each of the L causal variants. A “single effect” refers to the effect of one of the L causal variants; hence why the model is called Sum of Single Effects.

Ref. ³⁰ proposed a simple model-fitting approach (to infer posterior distributions on random variables in equations 9 and 10), which the authors referred to as Iterative Bayesian Stepwise Selection (IBSS). Briefly, IBSS iteratively updates each of the L single effects while keeping all other single effects fixed. It is computationally simple to update a single effect given fixed values of all other single effects (details of updating a single effect provided in the supplement of Ref. ³⁰).

Overview of TGFM inference

TGFM inference consists of four steps. In step 1, we apply SuSiE to perform eQTL fine-mapping of each gene-tissue pair in the external gene expression data set (estimating a posterior distribution of the causal cis-eQTL effect sizes for each gene-tissue pair). In step 2, we randomly sample 100 cis-predicted expression models for each gene-tissue pair from the posterior distributions of causal cis-eQTL effect sizes estimated in step 1 (Methods). In step 3, we apply SuSiE to perform disease fine-mapping in the target data set (estimating the PIP of each genetic element) 100 times, iterating over the sampled cis-predicted expression models for each gene-tissue pair from step 2. In step 4, we average the results of step 3 across the 100 disease fine-mapping runs. TGFM utilizes a custom implementation of the SuSiE algorithm that provides efficient estimation of PIPs across 100 parallel SuSiE runs that differ only in their cis-predicted expression models.

TGFM inference Step 1: Estimating causal eQTL effect size distributions from external eQTL data

TGFM inference relies on probability distributions defining the causal eQTL effects for each gene-tissue pair. These causal eQTL effect size distributions are estimated by applying SuSiE³⁰ to eQTL data; SuSiE infers the following posterior distribution on the causal eQTL effect sizes for a given gene-tissue pair from eQTL data: where δ_gt denotes the vector of causal eQTL effect sizes corresponding to the effects of standardized cis-variants on gene g in tissue t, l indexes fine-mapping components where a single component represents the eQTL signal from a single cis-genetic variant, λ_l denotes a Categorical random variable indicating which one of the genetic variants component l comes from, denotes the simplex vector of inferred posterior probabilities on each genetic variant being causal for component l, d_lk|λ_lk = 1 denotes a Gaussian random variable specifying the causal effect size of component l for variant k conditioned on variant k being causal for component l, and and define the inferred posterior mean and variance on d_lk|λ_lk. The posterior mean causal eQTL effect sizes of this distribution are .

We restrict TGFM to gene-tissue pairs that are well-predicted by genetic variants, using the SuSiE “purity filter”³⁰. Specifically, this filter removes any gene-tissue pair such that all components defining the gene-tissue pair have a minimum absolute correlation between all variants in the component’s 95% credible set less than 0.5. The 95% credible set for a given component is calculated by selecting the minimum set of variants that contain the causal variant with 95% confidence (according to the inferred posterior distribution ).

Although in the present study we utilized SuSiE for generating distributions of causal eQTL effect sizes due to its computational efficiency, the TGFM inference procedure is generalizable to a variety of methods that generate probabilistic cis-predicted expression models potentially including probabilistic/Bayesian multivariable regression methods (for example, BSLMM¹⁰⁸, SBayesR¹⁰⁹, or LDpred2¹¹⁰) or bootstrapping¹¹¹.

TGFM inference Step 2: Randomly sample 100 cis-predicted expression models for each gene-tissue pair

Instantiations of the causal eQTL effect sizes (which determine instantiations of cis-predicted expression models) for each gene-tissue pair can be randomly sampled from that gene-tissue pair’s SuSiE-inferred posterior distribution (equations 11-13). To draw a single random sample of the causal eQTL effects for a given gene-tissue pair: (1) randomly sample the causal variant for each of the l components from (2) randomly sample the causal eQTL effect size of the sampled causal variant k (identified in Step 1) for each of the l components from (3) sum the sampled causal eQTL effects across the l components.

TGFM inference step 3: Fine-mapping inference conditioned on a sampled cis-predicted expression models

We describe here tissue-gene fine-mapping inference conditioned on setting the cis genetic component of gene expression for each gene-tissue pair equal to a sampled cis-predicted expression model (see TGFM inference step 2) for that gene-tissue pair. In this setting: where g indexes genes, t indexes tissues, denotes the predicted cis-genetic component of gene expression in gene g and tissue t (as determined by sampled cis-predicted expression model’s causal eQTL effect sizes ), Y denotes the phenotype vector across GWAS individuals, X is the matrix of genotypes, α_gt denotes the (scalar) effect of cis-genetic expression in gene g and tissue t on the disease or trait, β denotes the vector of non-mediated causal effects of each genetic variant on the disease or trait, and ϵ denotes environmental noise. The trait Y, the predicted cis-genetic component of gene expression in each gene g and tissue t, and the genotype vector of each variant (each column of X) are standardized to have mean 0 and variance 1. We place SuSiE fine-mapping prior distributions on the disease/trait causal effect sizes (equations 4-6).

In this setting, we use the existing SuSiE software for inference of posterior distributions on the fine-mapping random variables (γ_l, d_l, and in equations 5-6) and generation of corresponding PIPs for each genetic element (equation 7). We refer to these PIPs as conditional PIPs as they are conditional upon the predicted causal eQTL effect sizes. SuSiE inference applied to this task requires the following input: GWAS summary statistic z-scores for each non-mediated variant, transcriptome-wide association study (TWAS) summary statistic z-scores for each gene-tissue pair corresponding to the marginal association between predicted genetic gene expression and the trait, in-sample correlations between all pairs of genetic elements (variants and predicted genetic gene-tissue pairs) and specified prior probabilities π. We assume the user provides GWAS summary statistic z-scores for each variant, in-sample LD (ie. correlations between all pairs of genetic variants based on the GWAS samples), predicted causal eQTL effect sizes for each gene-tissue pair, and the prior causal probabilities π (we discuss below how π can be inferred). TWAS summary statistic z-scores and in-sample correlations between all genetic elements can be computed from the user-provided input as follows:

TWAS summary statistic z-scores for a particular gene-tissue pair can be easily calculated from GWAS summary statistic z-scores, in-sample variant LD, and predicted causal cis-eQTL effect sizes defining the genetic component of gene expression for the gene-tissue pair following ref. ²¹: where z_gt denotes the TWAS summary statistic z-score for gene g in tissue t, Z_gwas denotes the vector GWAS summary statistic z-scores across variants, denotes the vector of predicted causal eQTL effect sizes for gene g in tissue t, and Σ denotes the in-sample variant LD. We note that ref. ²¹ only requires LD from a reference panel (instead of in-sample LD) for TWAS; TGFM instead requires in-sample LD (see Discussion).

The correlation between the predicted genetic gene expression of two genes can be computed from in-sample variant LD and the predicted causal eQTL effect sizes of both genes: where gt indexes one gene-tissue pair and g′t′ indexes the other gene-tissue pair, denotes the predicted genetic component of gene expression in gene g in tissue t across in-sample GWAS individuals, denotes the vector of predicted causal eQTL effect sizes for gene g in tissue t, and Σ denotes the in-sample variant LD. Similarly, the in-sample correlation between a non-mediated variant and the predicted genetic component of gene-tissue pair is: where k indexes the non-mediated variant, gt indexes the gene tissue pair, X_k is the genotype of variant k across in-sample GWAS individuals, denotes the cis-predicted genetic component of gene expression in gene g and tissue t across in-sample GWAS individuals, Σ_k denotes row k of the in-sample variant LD matrix, and denotes the vector of causal eQTL effect sizes for gene g in tissue t.

TGFM inference step 4: Marginalize uncertainty in cis-predicted expression on fine-mapping PIPs by averaging fine-mapping results across 100 runs

TGFM PIPs for a given locus are calculated by marginalizing out the uncertainty in the cis-predicted causal eQTL effect sizes on the conditional PIPs: where is the TGFM PIP for genetic element j, δ is the set of causal eQTL effect sizes for all gene-tissue pairs in the fine-mapping region, is a specific instantiation of the set of causal eQTL effect sizes for all gene-tissue pairs in the fine-mapping region, is the conditional PIP of genetic element j conditioned on genetic gene expression determined by , and is the probability of estimating causal eQTL effect sizes given the eQTL data. The conditional PIPs, , can be inferred using SuSiE fine-mapping of both non-mediated variants and gene-tissue pairs where cis-predicted genetic gene expression is determined by (described in previous Methods subsection, TGFM inference step 3). The causal eQTL effect size distribution, p(δ), is approximated by the posterior distribution on causal eQTL effect sizes for each gene-tissue pair estimated by applying SuSiE to eQTL data (described in previous Methods subsection, TGFM inference step 1). In practice, we approximate in equation 18 by drawing 100 random samples of causal eQTL effect sizes across gene-tissue pairs from p(Δ) (described in previous Methods subsection, TGFM inference step 2), calculating the conditional PIP for each of the 100 random samples, and taking the average across 100 conditional PIPs. In the next methods subsection, we discuss how this approach can be extended to integrate the inferred tissue-specific prior.

In practice, we do not run the existing SuSiE package^30,31 100 times in series for each fine-mapping region. TGFM utilizes a custom implementation of the SuSiE algorithm that provides efficient estimation of PIPs across 100 parallel runs where each of 100 parallel SuSiE runs differ only in their cis-predicted expression models. The rate-limiting step of each iteration of the SuSiE algorithm involves multiplying the LD matrix with that iteration’s estimate of the trait-causal effect sizes. The custom TGFM implementation exploits the fact that the variant LD matrix (which constitutes the majority of the full correlation matrix of all pairs of genetic elements) is identical across all 100 runs; TGFM uses matrix multiplication to multiply the variant LD matrix with the current causal non-mediated variant effects for each of the 100 runs (corresponding to a single (KXK)X(KX100) matrix multiplication instead of 100 (KXK)X(KX1) multiplications in series where K is the number of variants in the region). In addition, the custom TGFM implementation of SuSiE does not compute the ELBO at each iteration for each of the 100 runs; computing the ELBO is computationally intensive and primarily utilized to assess convergence. Instead, we run each of the 100 runs for a pre-specified number of iterations; we use a default of 5 iterations which performed well in simulations (Figure 2). We set the default number of components l underlying each of the 100 runs to 10.

The iterative algorithm underlying SuSiE is not guaranteed to reach a global optimum and can get stuck in local optima³⁰. We found TGFM was prone to reaching local optima in which a non-causal gene-tissue pair was confidently fine-mapped (i.e., a false positive) when the gene-tissue pair was moderately correlated with multiple independent causal non-mediated variants. This is likely due to the greedy nature of the SuSiE iterative algorithm. We use the following initialization strategy to mitigate convergence on local optima: (1) Run TGFM where the causal effects are initialized to be zero (this is the default initialization used by SuSiE) (2) If the TGFM PIP for any gene-tissue pair in the fine-mapping region is greater than 0.2: (2a) Run SuSiE fine-mapping on only the non-mediated variants (2b) Run TGFM where the non-mediated variant effects are initialized to the converged values from step 2a and the gene-tissue pair effects are initialized to zero (2c) For each of the 100 TGFM runs, select the fitted TGFM model (either (1) or (2b)) with the larger ELBO.

Inference of tissue-specific TGFM prior causal probabilities

TGFM increases fine-mapping power by specifying tissue-specific prior probabilities for each genetic element in a locus that are informed by genome-wide data, similar to PolyFun³⁶. For each trait separately, TGFM assigns one prior causal probability π_t for each gene-tissue pair from tissue t and one prior causal probability π_nm for each non-mediated genetic variant where π_t reflects the prior probability that an arbitrary gene from tissue t at a disease-associated locus has a non-zero causal effect on the disease/trait and π_nm reflects the prior probability that an arbitrary non-mediated variant at a disease-associated locus has a non-zero causal effect on the disease/trait. We note that π_t is a genome-wide parameter reflecting the probability that an arbitrary gene from tissue t has non-zero effect on disease, which is related but distinct from genome-wide expression-mediated disease heritability parameters previously estimated in refs. ^11,26,28. We infer π_t and π_nm separately for each disease/trait using an iterative algorithm, starting with flat priors, and at each iteration: (1) updating the PIP of each genetic element using a computationally efficient approximation of TGFM (see next paragraph for details) given the current prior causal probabilities, which are normalized to sum to one across genetic elements at each locus, analogous to PolyFun³⁶. (2) updating the prior causal probabilities according to: where c is the genetic element class (for example a specific tissue or non-mediated variant), π_c is the prior causal probability corresponding to genetic element class c, k indexes genetic elements from genetic element class c, PIP_k is the current PIP of genetic element k, and |c| is the number of genetic elements belonging to genetic element class c.

It is computationally prohibitive to run TGFM genome-wide tens to hundreds of times for each trait while updating the prior probabilities at each iteration. We make two approximations to TGFM to allow for efficient computation of genome-wide PIPs at each iteration (and we emphasize that these approximations are only used for inference of the prior). First, we run TGFM with a single cis-predicted expression model for each gene-tissue pair (based on SuSiE posterior mean causal cis-eQTL effect sizes) instead of averaging results across 100 sampled cis-predicted expression models. We refer to this approach as TGFM (no sampling). Thus, PIPs at each iteration can be inferred by applying SuSiE to fine-map both non-mediated variants and gene-tissue pairs where genetic gene expression of each gene-tissue pair is determined by the posterior mean causal eQTL effect sizes. While we show TGFM (no sampling) generates poorly calibrated PIPs (Supplementary Figure 6), using TGFM (no sampling) to infer prior causal probabilities results in well-calibrated causal prior probabilities in simulations (Supplementary Figure 11a), perhaps because the causal prior probabilities integrate information across the genome (equation 19) and fine-mapping errors resulting from uncertainty in the genetic component of gene expression will be averaged out across all genes in the genome. Second, we only run TGFM (no sampling) inference once, during the first iteration. After running TGFM (no sampling) in the first iteration, we save the resulting Bayes Factors³⁰ for each component-genetic element pair; a Bayes Factor reflects the relative support for including that genetic element in that component of the model, irrespective of the prior probabilities. PIPs can be easily calculated based on the Bayes Factors and the current prior: where l indexes fine-mapping components, j and k index genetic elements in the fine-mapping region, denotes the expected value of the posterior distribution on γ_l for genetic element j, BF_lj denotes the Bayes Factor for genetic element j on component l, θ_j denotes the normalized prior causal probability of genetic element j, and PIP_j is the TGFM (no sampling) PIP for genetic element j. We do not recalculate the Bayes Factors after the first iteration, and simply used the saved Bayes Factors from the first iteration in all subsequent iterations. This approximation is reasonable as while we expect PIPs to change with the evolving prior, we do not expect posterior mean causal effect sizes of each fine-mapping component to drastically change with the evolving prior, ultimately leaving the Bayes Factors stable.

During inference of the causal prior probabilities, TGFM (no sampling) is run genome-wide on overlapping 3Mb windows³⁶. The prior probability updates at each iteration (equation 16) are calculated from PIPs corresponding to genetic elements located in the middle Mb of these 3 Mb windows. To limit to windows with at least one disease causal signal, we remove 3Mb windows from the prior probability inference procedure that do not pass the SuSiE “purity filter” (see above) after running TGFM (no sampling) with a uniform prior.

In addition, TGFM inference can also rely on probability distributions defining the uncertainty in our estimated prior causal probabilities (see below). Empirical distributions, as well as significance testing, on the causal prior probabilities can be calculated using 100 iterations of bootstrapping¹¹¹ across the genome (we refer to this as “genomic bootstrapping”). Specifically, for each of the 100 bootstraps, we randomly sample T fine-mapping 3Mb windows with replacement (assuming T total 3Mb fine-mapping windows for the disease/trait) and run the iterative algorithm on the bootstrapped regions. This procedure results in 100 empirical samples of the causal prior probabilities which reflect their estimation uncertainty. These 100 empirical samples are input directly into the sampling procedure underlying TGFM inference (see below). Significance testing of whether the prior causal probability is greater than zero for a particular genetic element class can be generated using a z-score computed from the mean and standard error of the bootstrapped distribution. For a single analyzed trait, we assess significance using Benjamini-Hochberg¹¹² FDR correction across all tissues and/or cell types included in the analysis.

Inference of TGFM (no sampling) is performed using the function ‘susie_rss’ from the SuSiE package^30,31 with default parameters. We used the same initialization strategy that was used by TGFM (described above). The iterative algorithm for inference of prior causal probabilities was run for 400 iterations.

TGFM inference including uncertainty in tissue-specific prior causal probabilities

The previous subsection described inference of probability distributions defining the tissue-specific prior causal probabilities. Here we described an extension of TGFM inference step 4 that integrates out uncertainty in both cis-predicted causal eQL effect sizes and the prior causal probabilities on the conditional PIPs: where is the TGFM PIP for genetic element j, δ is the set of causal eQTL effect sizes for all gene-tissue pairs in the fine-mapping region, is a specific instantiation of the set of causal eQTL effect sizes for all gene-tissue pairs in the fine-mapping region, π are the causal prior probabilities, are a specific instantiation of the causal prior probabilities, is the conditional PIP of genetic element j conditioned on genetic gene expression determined by and causal prior probabilities equal to is the probability of estimating causal eQTL effect sizes given the eQTL data, and is the probability of estimating prior causal probabilities . In practice, we approximate by drawing 100 random samples of causal eQTL effect sizes across gene-tissue pairs from p(δ) and prior causal probabilities from p(π), calculating the conditional PIP for each of the 100 random samples, and taking the average across 100 conditional PIPs.

Calculating gene-level PIPs with TGFM

We define a gene as causal for a disease/trait if there exists at least one tissue where the gene-tissue pair is causal for the trait. Gene-level PIPs can be computed by aggregating gene-tissue pair fine-mapping results across all gene-tissue pairs corresponding to the gene of interest: where l indexes fine-mapping components, g indexes genes, k ∈ g indexes all gene-tissue pairs corresponding to gene g, denotes the expected value of the posterior distribution on γ_l for gene g, denotes the expected value of the posterior distribution on γ_l for gene-tissue pair k, and is the gene-level PIP corresponding to gene g. Similar to TGFM PIPs for variants and gene-tissue pairs, we are describing here the calculation of conditional gene PIPs (conditional upon a given instantiation of predicted causal cis-eQTL effect sizes and prior causal probabilities). These conditional gene PIPs will be averaged across 100 samples of cis-predicted genetic gene expression and predicted causal prior probabilities (equation 22).

This approach can also be used to calculate PIPs for causal genes in a specified subset of tissues, or gene-tissue subset PIPs. For example, identifying gene-tissue subset pairs for the subset of adipose tissues (defined as adipose subcutaneous U adipose visceral). Gene-tissue subset PIPs can be computed by aggregating (as done in equations 23-24) gene-tissue pair fine-mapping results across all gene-tissue pairs corresponding to the gene of interest from tissues in the tissue subset of interest.

Simulation framework

We used real genotypes from unrelated UK Biobank British (UKBB) samples³² to simulate both gene expression phenotypes (for each gene-tissue pair) and quantitative trait phenotypes. Default simulation parameters were specified as follows: the gene expression sample size ranged from 300 to 1000; the quantitative trait sample size was set to 100,000 (disjoint from gene expression samples); we analyzed 426,593 SNPs and 1,976 genes on chromosome 1; the number of tissues was set to 10, of which 2 were causal for the quantitative trait; the quantitative trait architecture was simulated to have average polygenicity³⁷, consisting of 2,700 causal non-mediated variants and 300 causal gene-tissue pairs (150 for each causal tissue allowing for the option of two causal gene-tissue pairs from the same gene) with the expected heritability per causal genetic element (non-mediated variant or gene-tissue pair) set to 0.0001 (expected quantitative trait heritability of 0.3, 10% of which was mediated through gene expression, consistent with genome-wide estimates from MESC²⁸); causal non-mediated variants were randomly selected with probability proportional to their expected per-variant heritability based on baseline-LD model annotations^3,38,39 (estimated using S-LDSC³ applied to the UKBB trait White blood cell count) in order to make the simulations as realistic as possible. We simulated the genetic architecture of gene expression across related tissues using an approach similar to the approach implemented in ref. ¹¹: we simulated all 1,976 protein-coding genes on chromosome 1 to be expressed in all tissues, with 50% of these gene-tissue pairs being cis-genetically heritable (only cis-genetically heritable gene-tissue pairs could have a simulated causal effect on the trait, though we considered all expressed gene-tissue pairs for fine-mapping, not just those that were heritable); each heritable gene-tissue pair was randomly assigned 5 causal cis-eQTLs within 100Kb of the gene’s TSS, 3 of the 5 causal cis-eQTLs were shared across tissues and 2 of the 5 causal cis-eQTLs were specific to each tissue; each causal cis-eQTL explains 1.5% of the variance of each gene-tissue pair (resulting in an average gene heritability of .075); effect sizes of shared causal cis-eQTLs covaried across tissues as follows (based on ref. ¹¹): the tissues were split into 3 categories to mimic biological tissue modules in GTEx²⁵ (tissues 1-3, tissues 4-6, and tissues 7-10) and the correlation of shared cis-eQTL effect sizes across tissues was set to 0.8 and 0.74 for tissues in the same tissue category and across tissues in different tissue categories, respectively.

Our simulation indicates that PIPs reported by TGFM are slightly anti-conservative with respect to (1 – average PIP) (Supplementary Figures 1, 4), consistent with previous simulations of variant-level fine-mapping methods using polygenic trait architectures^36,41. Following ref. ³⁶, we circumvent this problem by using an alternative FDR estimator given by (1 – PIP threshold); setting all PIPs greater than the specified PIP threshold equal to the PIP threshold. For example, at a PIP threshold of 0.9, we treat all genetic elements with PIP ≥ 0.9 as if they had PIP = 0.9.

UK Biobank GWAS summary statistics and in-sample LD

We applied to TGFM to 45 of the 49 UK Biobank traits analyzed via functionally informed fine-mapping in ref. ³⁶ (average N=316K; Supplementary Table 5); the four excluded traits were Dermatology, Diabetes (any), Endocrine Diabetes, and Childless; these four diseases and traits were excluded due to redundancy and low heritability. We considered the set of 10,545,304 UK Biobank imputed variants with MAF ≥ 0.5% and INFO score ≥ 0.6, similar to previous work^36,113. We used GWAS summary statistics that were generated and described in ref. ³⁶. Briefly, the summary statistics were computed in ref. ³⁶ from n=337,426 unrelated British-ancestry individuals in UK Biobank using BOLT-LMM¹¹⁴ adjusting for sex, age and age squared, assessment center, genotyping platform, the top 20 genotyping principal components, and dilution factor for biochemical traits (see ref. ³⁶ for complete details). We used Liftover¹¹⁵ to convert variant positions from hg19 to hg38. Z-scores used by TGFM were computed from the Bolt-LMM output as follows: as the noninfinitesimal version of BOLT-LMM does not calculate effect sizes, we calculated z-scores by taking the square root of the BOLT-LMM chi-squared statistics and multiplying them by the sign of the effect size estimate from the infinitesimal version of BOLT-LMM.

We computed in-sample variant LD matrices using 337,426 unrelated British-ancestry individuals in UK Biobank (same individuals as summary statistics); missing values were imputed by the mean of a variant across individuals.

Overlapping 3Mb loci used for fine-mapping

We applied TGFM to fine-map each of the 2,682 overlapping 3Mb loci spanning the entire genome. Analogous to ref. ³⁶, the overlapping 3Mb loci had 1 Mb spacing between the start points of consecutive loci, were limited to autosomal chromosomes, and did not include 3 long-range LD regions including the MHC region (chromosome 6 positions 25499772-33532223, chromosome 8 positions 8142478-12142491, and chromosome 11 positions 45978449-57232526 in hg38; lifted over from long-range LD regions ignored in ref. ³⁶). Distinct from the windows generated in ref. ³⁶, for each disease/trait, we limited TGFM fine-mapping to loci with at least 50 genetic variants and at least one genetic variant with marginal GWAS p-value less than 1e-5.

GTEx cis-predicted expression models

We analyzed GTEx²⁵ data from 47 GTEx tissues, which were aggregated into 38 meta tissues of similar sample size consisting of European-ancestry individuals (average N=259, range: N=101-320 individuals, 23 meta-tissues with N=320; Supplementary Table 6) to reduce heterogeneity in eQTL sample sizes across tissues. Testis tissue was removed from analysis as it has outlier (cis- and trans-) eQTL discovery power after controlling for sample size (see ref. ²⁵ Figure 2C). Meta-tissues were constructed using the same individuals and tissue aggregation strategy as described in ref. ¹¹. Normalized expression matrices and covariates used in GTEx consortium’s single-tissue cis-eQTL analysis²⁵ were downloaded from the GTEx portal (https://gtexportal.org/home/datasets) for each of the 47 analyzed tissues. In each tissue, we subset individuals to those composing the corresponding meta-tissue, and then re-standardize gene expression of each gene to have mean zero and variance one in each subsetted tissue. Cis-eQTLs were called in each tissue independently while controlling for covariates and limiting to variant-gene pairs such that the variant is within 500Kb of the gene’s TSS. For each meta-tissue, we removed variants with MAF < .05 across samples in the meta-tissue, were strand-ambiguous, or did not overlap the 10,545,304 analyzed UK Biobank variants. Cis-predicted expression models were generated using SuSiE^30,31 applied to eQTL summary statistics and eQTL in-sample LD matrices (using the function ‘susie_rss’ from the SuSiE package^30,31 with default parameters) for each (gene, meta-tissue) pair, independently. If a meta-tissue is composed of more than 1 constituent tissues, meta-analyzed eQTL summary statistics were generated using a fixed-effect meta-analysis across constituent tissues and meta-analyzed in-sample LD was generated by computing variant-variant correlations across all samples composing the meta-analyzed tissue. After removing gene-tissue pairs that did not pass the SuSiE “purity filter” (see above), we identified 119,270 gene-tissue pairs with a cis-predicted expression model; all 119,270 cis-predicted expression models are publicly available (see Data availability).

PBMC cis-predicted expression models

PBMC single-cell eQTL (described and generated in ref. ³³) expression data was downloaded from the Human Cell Atlas Data Coordination Platform and genotype data downloaded from dbGaP (accession number phs002812.v1.p1). We removed individuals that were not European ancestry or had fewer than 2,500 detected cells. We generated pseudobulk expression in the 9 most abundant cell types (cell type assignment determined by ref. ³³) for all individuals in each cell type with greater than 5 cells. In each of the 9 cell types separately, we removed genes that were expressed in less than 80% of that cell type’s pseudobulk samples. Pseudobulk expression in each cell type was transformed using EdgeR’s logCPM function¹¹⁶ followed by normalizing each gene to have mean 0 and variance 1. The top 10 expression PCs for each cell type were calculated based on the normalized expression matrix. The number of pseudobulk samples per cell type is described in Supplementary Table 14 (average N=112).

Cis-eQTLs were called in each cell type independently while controlling for covariates (the top 10 expression PCs) and limiting to variant-gene pairs such that the variant is within 500Kb of the gene’s TSS. For each cell type, we removed variants with MAF < 0.05 across samples in the cell type, that were strand-ambiguous, or did not overlap the 10,545,304 analyzed UK Biobank variants. Cis-predicted expression models were generated using SuSiE^30,31 applied to eQTL summary statistics and eQTL in-sample LD matrices (using the function ‘susie_rss’ from the SuSiE package^30,31 with default parameters) for each (gene, cell type) pair, independently. After removing gene-PBMC cell type pairs that did not pass the SuSiE “purity filter” (see above), we identified 1,851 gene-PBMC cell type pairs with a cis-predicted expression model; all 1,851 cis-predicted expression models are publicly available (see Data availability).

Data Availability

We have made TGFM PIPs for gene tissue pairs, gene PBMC cell type pairs, genes, and non mediated variants across 45 diseases/traits (for both analyses of 38 GTEx tissues + analyses of 38 GTEx tissues and 9 PBMC cell types) publicly available at https://doi.org/10.7910/DVN/S26PFI, GTEx cis predicted expression models for all gene tissue pairs publicly available at https://doi.org/10.7910/DVN/8IPOPK, PBMC cis predicted expression models for all gene PBMC cell type pairs publicly available at https://doi.org/10.7910/DVN/A6K9QW, GWAS summary statistics for all 45 diseases/traits publicly available at https://doi.org/10.7910/DVN/GTEGPE. To limit the use of computational resources, we refer the reader to UK Biobank in sample LD (337K unrelated British ancestry samples) from ref. 36, which is publicly available at https://registry.opendata.aws/ukbb-ld/. The UK Biobank resource is publicly available via application (http://www.ukbiobank.ac.uk/).

Data availability

We have made TGFM PIPs for gene-tissue pairs, gene-PBMC cell type pairs, genes, and non-mediated variants across 45 diseases/traits (for both analyses of 38 GTEx tissues + analyses of 38 GTEx tissues and 9 PBMC cell types) publicly available at https://doi.org/10.7910/DVN/S26PFI, GTEx cis-predicted expression models for all gene-tissue pairs publicly available at https://doi.org/10.7910/DVN/8IPOPK, PBMC cis-predicted expression models for all gene-PBMC cell type pairs publicly available at https://doi.org/10.7910/DVN/A6K9QW, GWAS summary statistics for all 45 diseases/traits publicly available at https://doi.org/10.7910/DVN/GTEGPE. To limit the use of computational resources, we refer the reader to UK Biobank in-sample LD (337K unrelated British-ancestry samples) from ref. ³⁶, which is publicly available at https://registry.opendata.aws/ukbb-ld/. The UK Biobank resource is publicly available via application (http://www.ukbiobank.ac.uk/).

Code availability

Software implementing TGFM will be made publicly available at https://github.com/BennyStrobes prior to publication.

Acknowledgements

We are grateful to Arun Durvasula and Xilin Jiang for helpful discussions. This research was conducted using the UK Biobank resource under application no. 16549 and funded by National Institutes of Health (NIH) grants R01 MH101244, R37 MH107649, R01 HG006399, R01 MH115676, U01 HG012009, and F32 HG012889. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Footnotes

New supplementary figure (Supplementary Figure 18) and Supplementary Note now added to describe 6 additional examples of fine-mapped disease loci

References

1.↵
Hekselman, I. & Yeger-Lotem, E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat. Rev. Genet. 21, 137–150 (2020).
OpenUrl PubMed
2.↵
Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).
OpenUrl CrossRef PubMed
3.↵
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
OpenUrl CrossRef PubMed
4.
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
OpenUrl CrossRef PubMed
5.↵
Ongen, H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 49, 1676–1683 (2017).
OpenUrl CrossRef PubMed
6.
Calderon, D. et al. Inferring relevant cell types for complex traits by using single-cell gene expression. Am. J. Hum. Genet. 101, 686–699 (2017).
OpenUrl CrossRef PubMed
7.↵
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
OpenUrl CrossRef PubMed
8.
Shang, L., Smith, J. A. & Zhou, X. Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies. PLoS Genet. 16, e1008734 (2020).
OpenUrl CrossRef
9.
Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572–1580 (2022).
OpenUrl
10.
Jagadeesh, K. A. et al. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat. Genet. 54, 1479–1492 (2022).
OpenUrl CrossRef
11.↵
Amariuta, T., Siewert-Rocks, K. & Price, A. L. Modeling tissue co-regulation estimates tissue-specific contributions to disease. Nat. Genet. 55, 1503–1511 (2023).
OpenUrl
12.↵
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
OpenUrl CrossRef PubMed
13.
Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).
OpenUrl
14.↵
Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).
OpenUrl
15.
Downes, D. J. et al. Identification of LZTFL1 as a candidate effector gene at a COVID-19 risk locus. Nat. Genet. 53, 1606–1615 (2021).
OpenUrl
16.↵
Kosoy, R. et al. Genetics of the human microglia regulome refines Alzheimer’s disease risk loci. Nat. Genet. 54, 1145–1154 (2022).
OpenUrl CrossRef
17.↵
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
OpenUrl CrossRef PubMed
18.
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
OpenUrl CrossRef PubMed
19.↵
Arvanitis, M., Tayeb, K., Strober, B. J. & Battle, A. Redefining tissue specificity of genetic regulation of gene expression in the presence of allelic heterogeneity. Am. J. Hum. Genet. 109, 223–239 (2022).
OpenUrl CrossRef
20.↵
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
OpenUrl CrossRef PubMed
21.↵
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
OpenUrl CrossRef PubMed
22.↵
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
OpenUrl CrossRef PubMed
23.↵
Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).
OpenUrl CrossRef PubMed
24.↵
Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51, 675–682 (2019).
OpenUrl CrossRef PubMed
25.↵
The GTEx Consortium et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
OpenUrl Abstract/FREE Full Text
26.↵
Siewert-Rocks, K. M., Kim, S. S., Yao, D. W., Shi, H. & Price, A. L. Leveraging gene co-regulation to identify gene sets enriched for disease heritability. Am. J. Hum. Genet. 109, 393–404 (2022).
OpenUrl
27.↵
Zhao, S. et al. Adjusting for genetic confounders in transcriptome-wide association studies leads to reliable detection of causal genes. bioRxiv 2022.09.27.509700 (2022) doi:10.1101/2022.09.27.509700.
OpenUrl Abstract/FREE Full Text
28.↵
Yao, D. W., O’Connor, L. J., Price, A. L. & Gusev, A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 52, 626–633 (2020).
OpenUrl
29.↵
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
OpenUrl CrossRef PubMed
30.↵
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).
OpenUrl
31.↵
Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the “Sum of Single Effects” model. PLoS Genet. 18, e1010299 (2022).
OpenUrl
32.↵
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
OpenUrl CrossRef PubMed
33.↵
Perez, R. K. et al. Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus. Science 376, eabf1970 (2022).
OpenUrl
34.↵
Abdellaoui, A., Yengo, L., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: Realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023).
OpenUrl CrossRef
35.↵
Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017).
OpenUrl CrossRef PubMed
36.↵
Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).
OpenUrl
37.↵
O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019).
OpenUrl CrossRef
38.↵
Gazal, S. et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
OpenUrl CrossRef PubMed
39.↵
Gazal, S. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600–1607 (2018).
OpenUrl
40.↵
Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).
OpenUrl CrossRef
41.↵
Cui, R. et al. Improving fine-mapping by modeling infinitesimal effects. bioRxiv (2022) doi:10.1101/2022.10.21.513123.
OpenUrl Abstract/FREE Full Text
42.↵
Mancuso, N. et al. Large-scale transcriptome-wide association study identifies new prostate cancer risk regions. Nat. Commun. 9, 1–11 (2018).
OpenUrl CrossRef PubMed
43.↵
Emdin, C. A. et al. Genetic association of waist-to-hip ratio with cardiometabolic traits, type 2 diabetes, and coronary heart disease. JAMA 317, 626 (2017).
OpenUrl CrossRef PubMed
44.↵
Weeks, E. M. et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Nat. Genet. 55, 1267–1276 (2023).
OpenUrl
45.↵
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
OpenUrl CrossRef PubMed Web of Science
46.↵
Boukas, L. et al. Coexpression patterns define epigenetic regulators associated with neurological dysfunction. Genome Res. 29, 532–542 (2019).
OpenUrl Abstract/FREE Full Text
47.↵
Georgi, B., Voight, B. F. & Bućan, M. From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 9, e1003484 (2013).
OpenUrl CrossRef PubMed
48.↵
Bikker, H. et al. A 20-basepair duplication in the human thyroid peroxidase gene results in a total iodide organification defect and congenital hypothyroidism. J. Clin. Endocrinol. Metab. 79, 248–252 (1994).
OpenUrl CrossRef PubMed Web of Science
49.↵
Bakker, B. et al. Two decades of screening for congenital hypothyroidism in The Netherlands: TPO gene mutations in total iodide organification defects (an update). J. Clin. Endocrinol. Metab. 85, 3708–3712 (2000).
OpenUrl CrossRef PubMed Web of Science
50.↵
Hwangbo, Y. & Park, Y. J. Genome-wide association studies of autoimmune thyroid diseases, thyroid function, and thyroid cancer. Endocrinol. Metab. (Seoul) 33, 175 (2018).
OpenUrl
51.↵
Furue, K. et al. The IL -13– OVOL 1– FLG axis in atopic dermatitis. Immunology 158, 281–286 (2019).
OpenUrl
52.
Sun, P. et al. OVOL1 regulates psoriasis-like skin inflammation and epidermal hyperplasia. J. Invest. Dermatol. 141, 1542–1552 (2021).
OpenUrl
53.↵
Dragan, M. et al. Ovol1/2 loss-induced epidermal defects elicit skin immune activation and alter global metabolism. EMBO Rep. 24, (2023).
54.↵
the EArly Genetics and Lifecourse Epidemiology (EAGLE) Eczema Consortium. Multi-ancestry genome-wide association study of 21,000 cases and 95,000 controls identifies new risk loci for atopic dermatitis. Nat. Genet. 47, 1449–1456 (2015).
OpenUrl CrossRef PubMed
55.↵
Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs? Trends Genet. 37, 109–124 (2021).
OpenUrl
56.↵
Nachat, R. et al. Peptidylarginine deiminase isoforms are differentially expressed in the anagen hair follicles and other human skin appendages. J. Invest. Dermatol. 125, 34–41 (2005).
OpenUrl CrossRef PubMed
57.↵
Zhang, X. et al. Peptidylarginine deiminase 1-catalyzed histone citrullination is essential for early embryo development. Sci. Rep. 6, 1–11 (2016).
OpenUrl CrossRef PubMed
58.↵
Revez, J. A. et al. Genome-wide association study identifies 143 loci associated with 25 hydroxyvitamin D concentration. Nat. Commun. 11, 1647 (2020).
OpenUrl CrossRef PubMed
59.↵
Manousaki, D. et al. Genome-wide association study for vitamin D levels reveals 69 independent loci. Am. J. Hum. Genet. 106, 327–337 (2020).
OpenUrl CrossRef PubMed
60.↵
Maier, T. & Hoyer, J. Modulation of blood pressure by central melanocortinergic pathways. Nephrol. Dial. Transplant 25, 674–677 (2010).
OpenUrl CrossRef PubMed
61.↵
da Silva, A. A., do Carmo, J. M., Wang, Z. & Hall, J. E. The brain melanocortin system, sympathetic control, and obesity hypertension. Physiology (Bethesda) 29, 196–202 (2014).
OpenUrl CrossRef PubMed
62.↵
do Carmo, J. M. et al. Role of the brain melanocortins in blood pressure regulation. Biochim. Biophys. Acta Mol. Basis Dis. 1863, 2508–2514 (2017).
OpenUrl
63.↵
Wang, Y. & Wang, J.-G. Genome-wide association studies of hypertension and several other cardiovascular diseases. Pulse (Basel) 6, 169–186 (2019).
OpenUrl
64.↵
Jacob, J., Chopra, S. & Baby, C. Neuro-endocrine regulation of blood pressure. Indian J. Endocrinol. Metab. 15, 281 (2011).
OpenUrl
65.↵
Figueira, L. & Israel, A. Role of cerebellar adrenomedullin in blood pressure regulation. Neuropeptides 54, 59–66 (2015).
OpenUrl
66.↵
de Oliveira, D. F., de Lemos, R. R. & de Oliveira, J. R. M. Mutations at the SLC20A2 gene and brain resilience in families with idiopathic basal ganglia calcification (“Fahr’s disease”). Front. Hum. Neurosci. 7, 420 (2013).
OpenUrl
67.↵
Inden, M., Kurita, H. & Hozumi, I. Characteristics and therapeutic potential of sodium-dependent phosphate cotransporters in relation to idiopathic basal ganglia calcification. J. Pharmacol. Sci. 148, 152–155 (2022).
OpenUrl
68.↵
Wallingford, M. C. et al. SLC20A2 deficiency in mice leads to elevated phosphate levels in cerbrospinal fluid and glymphatic pathway-associated arteriolar calcification, and recapitulates human idiopathic basal ganglia calcification. Brain Pathol. 27, 64–76 (2017).
OpenUrl CrossRef
69.↵
Sargurupremraj, M. et al. Cerebral small vessel disease genomics and its implications across the lifespan. Nat. Commun. 11, 1–18 (2020).
OpenUrl CrossRef PubMed
70.↵
Zhao, B. et al. Transcriptome-wide association analysis of brain structures yields insights into pleiotropy with complex neuropsychiatric traits. Nat. Commun. 12, (2021).
71.↵
Herting, M. M. & Sowell, E. R. Puberty and structural brain development in humans. Front. Neuroendocrinol. 44, 122–137 (2017).
OpenUrl CrossRef
72.↵
Kane, L. & Ismail, N. Puberty as a vulnerable period to the effects of immune challenges: Focus on sex differences. Behav. Brain Res. 320, 374–382 (2017).
OpenUrl
73.↵
Resztak, J. A. et al. Analysis of transcriptional changes in the immune system associated with pubertal development in a longitudinal cohort of children with asthma. Nat. Commun. 14, 1–14 (2023).
OpenUrl CrossRef
74.↵
Alberts, B. et al. Lymphocytes and the cellular basis of adaptive immunity. (Garland Science, 2002).
75.↵
Lloyd, C. M. & Hessel, E. M. Functions of T cells in asthma: more than just TH2 cells. Nat. Rev. Immunol. 10, 838–848 (2010).
OpenUrl CrossRef PubMed Web of Science
76.↵
Walker, L. S. K. & Sansom, D. M. The emerging role of CTLA4 as a cell-extrinsic regulator of T cell responses. Nat. Rev. Immunol. 11, 852–863 (2011).
OpenUrl CrossRef PubMed
77.↵
Oyewole-Said, D. et al. Beyond T-cells: Functional characterization of CTLA-4 expression in immune and non-immune cell types. Front. Immunol. 11, (2020).
78.↵
Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506–511 (2003).
OpenUrl CrossRef PubMed Web of Science
79.↵
Wang, K. et al. CTLA-4 +49 G/A polymorphism confers autoimmune disease risk: An updated meta-analysis. Genet. Test. Mol. Biomarkers 21, 222–227 (2017).
OpenUrl CrossRef
80.↵
Bandala-Sanchez, E. et al. T cell regulation mediated by interaction of soluble CD52 with the inhibitory receptor Siglec-10. Nat. Immunol. 14, 741–748 (2013).
OpenUrl CrossRef PubMed
81.
Rashidi, M. et al. CD52 inhibits Toll-like receptor activation of NF-κB and triggers apoptosis to suppress inflammation. Cell Death Differ. 25, 392–405 (2018).
OpenUrl CrossRef
82.↵
Bhamidipati, K. et al. CD52 is elevated on B cells of SLE patients and regulates B cell function. Front. Immunol. 11, (2021).
83.↵
Zhou, M. et al. Krüppel-like transcription factor 13 regulates T lymphocyte survival in vivo. J. Immunol. 178, 5496–5504 (2007).
OpenUrl Abstract/FREE Full Text
84.↵
Outram, S. V. et al. KLF13 influences multiple stages of both B and T cell development. Cell Cycle 7, 2047–2055 (2008).
OpenUrl CrossRef PubMed Web of Science
85.↵
Valdés-Ferrer, S. I. et al. HMGB1 mediates anemia of inflammation in Murine sepsis survivors. Mol. Med. 21, 951–958 (2015).
OpenUrl
86.↵
Dulmovits, B. M. et al. HMGB1-mediated restriction of EPO signaling contributes to anemia of inflammation. Blood 139, 3181–3193 (2022).
OpenUrl CrossRef PubMed
87.↵
Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).
OpenUrl CrossRef
88.↵
Cuomo, A. S. E., Nathan, A., Raychaudhuri, S., MacArthur, D. G. & Powell, J. E. Single-cell genomics meets human genetics. Nat. Rev. Genet. 24, 535–549 (2023).
OpenUrl
89.↵
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
OpenUrl CrossRef
90.↵
Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat. Genet. (2023) doi:10.1038/s41588-023-01529-1.
OpenUrl CrossRef
91.↵
Jansen, R. et al. Conditional eQTL analysis reveals allelic heterogeneity of gene expression. Hum. Mol. Genet. 26, 1444–1451 (2017).
OpenUrl CrossRef PubMed
92.↵
Liu, C. et al. Whole genome DNA and RNA sequencing of whole blood elucidates the genetic architecture of gene expression underlying a wide range of diseases. Sci. Rep. 12, 1–11 (2022).
OpenUrl CrossRef PubMed
93.↵
Zhang, Y., Qi, G., Park, J.-H. & Chatterjee, N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 50, 1318–1326 (2018).
OpenUrl CrossRef PubMed
94.
O’Connor, L. J. The distribution of common-variant effect sizes. Nat. Genet. 53, 1243–1249 (2021).
OpenUrl CrossRef
95.↵
Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).
OpenUrl
96.↵
Bhattacharya, A. et al. Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: Lessons from the Global Biobank Metaanalysis Initiative. Cell Genom. 2, 100180 (2022).
OpenUrl
97.↵
Kachuri, L. et al. Gene expression in African Americans, Puerto Ricans and Mexican Americans reveals ancestry-specific patterns of genetic architecture. Nat. Genet. 55, 952–963 (2023).
OpenUrl
98.↵
Zaitlen, N., Paşaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).
OpenUrl CrossRef PubMed
99.
Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
OpenUrl CrossRef PubMed
100.↵
Lu, Z. et al. Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies. Am. J. Hum. Genet. 109, 1388–1404 (2022).
OpenUrl CrossRef
101.↵
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).
OpenUrl CrossRef PubMed
102.↵
McVicker, G. et al. Identification of genetic variants that affect histone modifications in human cells. Science 342, 747–749 (2013).
OpenUrl Abstract/FREE Full Text
103.
Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
OpenUrl Abstract/FREE Full Text
104.
Xu, Y. et al. An atlas of genetic scores to predict multi-omic traits. Nature 616, 123–131 (2023).
OpenUrl
105.
Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).
OpenUrl
106.
Hou, L. et al. Multitissue H3K27ac profiling of GTEx samples links epigenomic variation to disease. Nat. Genet. 55, 1665–1676 (2023).
OpenUrl
107.↵
Oliva, M. et al. DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits. Nat. Genet. 55, 112–122 (2023).
OpenUrl CrossRef
108.↵
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
OpenUrl CrossRef PubMed
109.↵
Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 1–11 (2019).
OpenUrl CrossRef PubMed
110.↵
Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2021).
OpenUrl
111.↵
Marriott, P., Efron, B. & Tibshirani, R. J. An introduction to the bootstrap. J. R. Stat. Soc. Ser. A Stat. Soc. 158, 347 (1995).
OpenUrl
112.↵
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995).
OpenUrl CrossRef PubMed
113.↵
Yuan, K. et al. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. bioRxiv 2023.01.07.23284293 (2023) doi:10.1101/2023.01.07.23284293.
OpenUrl CrossRef
114.↵
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
OpenUrl CrossRef PubMed
115.↵
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
OpenUrl Abstract/FREE Full Text
116.↵
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted November 08, 2023.

Download PDF

Supplementary Material

Data/Code

Citation Tools

Subject Area

Genetic and Genomic Medicine

Subject Areas

All Articles

Addiction Medicine (396)
Allergy and Immunology (708)
Anesthesia (199)
Cardiovascular Medicine (2909)
Dentistry and Oral Medicine (330)
Dermatology (249)
Emergency Medicine (437)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1027)
Epidemiology (12683)
Forensic Medicine (10)
Gastroenterology (824)
Genetic and Genomic Medicine (4542)
Geriatric Medicine (413)
Health Economics (723)
Health Informatics (2898)
Health Policy (1066)
Health Systems and Quality Improvement (1067)
Hematology (383)
HIV/AIDS (919)
Infectious Diseases (except HIV/AIDS) (14059)
Intensive Care and Critical Care Medicine (842)
Medical Education (422)
Medical Ethics (115)
Nephrology (466)
Neurology (4305)
Nursing (233)
Nutrition (633)
Obstetrics and Gynecology (797)
Occupational and Environmental Health (733)
Oncology (2251)
Ophthalmology (640)
Orthopedics (258)
Otolaryngology (324)
Pain Medicine (276)
Palliative Medicine (83)
Pathology (496)
Pediatrics (1192)
Pharmacology and Therapeutics (498)
Primary Care Research (492)
Psychiatry and Clinical Psychology (3721)
Public and Global Health (6889)
Radiology and Imaging (1518)
Rehabilitation Medicine and Physical Therapy (890)
Respiratory Medicine (912)
Rheumatology (433)
Sexual and Reproductive Health (437)
Sports Medicine (381)
Surgery (482)
Toxicology (60)
Transplantation (208)
Urology (178)

[1] 1.↵
Hekselman, I. & Yeger-Lotem, E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat. Rev. Genet. 21, 137–150 (2020).
OpenUrl PubMed

[2] 2.↵
Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).
OpenUrl CrossRef PubMed

[3] 3.↵
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
OpenUrl CrossRef PubMed

[4] 4.
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
OpenUrl CrossRef PubMed

[5] 5.↵
Ongen, H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 49, 1676–1683 (2017).
OpenUrl CrossRef PubMed

[6] 6.
Calderon, D. et al. Inferring relevant cell types for complex traits by using single-cell gene expression. Am. J. Hum. Genet. 101, 686–699 (2017).
OpenUrl CrossRef PubMed

[7] 7.↵
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
OpenUrl CrossRef PubMed

[8] 8.
Shang, L., Smith, J. A. & Zhou, X. Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies. PLoS Genet. 16, e1008734 (2020).
OpenUrl CrossRef

[9] 9.
Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572–1580 (2022).
OpenUrl

[10] 10.
Jagadeesh, K. A. et al. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat. Genet. 54, 1479–1492 (2022).
OpenUrl CrossRef

[11] 11.↵
Amariuta, T., Siewert-Rocks, K. & Price, A. L. Modeling tissue co-regulation estimates tissue-specific contributions to disease. Nat. Genet. 55, 1503–1511 (2023).
OpenUrl

[12] 12.↵
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
OpenUrl CrossRef PubMed

[13] 13.
Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).
OpenUrl

[14] 14.↵
Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).
OpenUrl

[15] 15.
Downes, D. J. et al. Identification of LZTFL1 as a candidate effector gene at a COVID-19 risk locus. Nat. Genet. 53, 1606–1615 (2021).
OpenUrl

[16] 16.↵
Kosoy, R. et al. Genetics of the human microglia regulome refines Alzheimer’s disease risk loci. Nat. Genet. 54, 1145–1154 (2022).
OpenUrl CrossRef

[17] 17.↵
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
OpenUrl CrossRef PubMed

[18] 18.
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
OpenUrl CrossRef PubMed

[19] 19.↵
Arvanitis, M., Tayeb, K., Strober, B. J. & Battle, A. Redefining tissue specificity of genetic regulation of gene expression in the presence of allelic heterogeneity. Am. J. Hum. Genet. 109, 223–239 (2022).
OpenUrl CrossRef

[20] 20.↵
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
OpenUrl CrossRef PubMed

[21] 21.↵
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
OpenUrl CrossRef PubMed

[22] 22.↵
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
OpenUrl CrossRef PubMed

[23] 23.↵
Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).
OpenUrl CrossRef PubMed

[24] 24.↵
Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51, 675–682 (2019).
OpenUrl CrossRef PubMed

[25] 25.↵
The GTEx Consortium et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
OpenUrl Abstract/FREE Full Text

[26] 26.↵
Siewert-Rocks, K. M., Kim, S. S., Yao, D. W., Shi, H. & Price, A. L. Leveraging gene co-regulation to identify gene sets enriched for disease heritability. Am. J. Hum. Genet. 109, 393–404 (2022).
OpenUrl

[27] 27.↵
Zhao, S. et al. Adjusting for genetic confounders in transcriptome-wide association studies leads to reliable detection of causal genes. bioRxiv 2022.09.27.509700 (2022) doi:10.1101/2022.09.27.509700.
OpenUrl Abstract/FREE Full Text

[28] 28.↵
Yao, D. W., O’Connor, L. J., Price, A. L. & Gusev, A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 52, 626–633 (2020).
OpenUrl

[29] 29.↵
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
OpenUrl CrossRef PubMed

[30] 30.↵
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).
OpenUrl

[31] 31.↵
Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the “Sum of Single Effects” model. PLoS Genet. 18, e1010299 (2022).
OpenUrl

[32] 32.↵
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
OpenUrl CrossRef PubMed

[33] 33.↵
Perez, R. K. et al. Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus. Science 376, eabf1970 (2022).
OpenUrl

[34] 34.↵
Abdellaoui, A., Yengo, L., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: Realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023).
OpenUrl CrossRef

[35] 35.↵
Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017).
OpenUrl CrossRef PubMed

[36] 36.↵
Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).
OpenUrl

[37] 37.↵
O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019).
OpenUrl CrossRef

[38] 38.↵
Gazal, S. et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
OpenUrl CrossRef PubMed

[39] 39.↵
Gazal, S. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600–1607 (2018).
OpenUrl

[40] 40.↵
Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).
OpenUrl CrossRef

[41] 41.↵
Cui, R. et al. Improving fine-mapping by modeling infinitesimal effects. bioRxiv (2022) doi:10.1101/2022.10.21.513123.
OpenUrl Abstract/FREE Full Text

[42] 42.↵
Mancuso, N. et al. Large-scale transcriptome-wide association study identifies new prostate cancer risk regions. Nat. Commun. 9, 1–11 (2018).
OpenUrl CrossRef PubMed

[43] 43.↵
Emdin, C. A. et al. Genetic association of waist-to-hip ratio with cardiometabolic traits, type 2 diabetes, and coronary heart disease. JAMA 317, 626 (2017).
OpenUrl CrossRef PubMed

[44] 44.↵
Weeks, E. M. et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Nat. Genet. 55, 1267–1276 (2023).
OpenUrl

[45] 45.↵
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
OpenUrl CrossRef PubMed Web of Science

[46] 46.↵
Boukas, L. et al. Coexpression patterns define epigenetic regulators associated with neurological dysfunction. Genome Res. 29, 532–542 (2019).
OpenUrl Abstract/FREE Full Text

[47] 47.↵
Georgi, B., Voight, B. F. & Bućan, M. From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 9, e1003484 (2013).
OpenUrl CrossRef PubMed

[48] 48.↵
Bikker, H. et al. A 20-basepair duplication in the human thyroid peroxidase gene results in a total iodide organification defect and congenital hypothyroidism. J. Clin. Endocrinol. Metab. 79, 248–252 (1994).
OpenUrl CrossRef PubMed Web of Science

[49] 49.↵
Bakker, B. et al. Two decades of screening for congenital hypothyroidism in The Netherlands: TPO gene mutations in total iodide organification defects (an update). J. Clin. Endocrinol. Metab. 85, 3708–3712 (2000).
OpenUrl CrossRef PubMed Web of Science

[50] 50.↵
Hwangbo, Y. & Park, Y. J. Genome-wide association studies of autoimmune thyroid diseases, thyroid function, and thyroid cancer. Endocrinol. Metab. (Seoul) 33, 175 (2018).
OpenUrl

[51] 51.↵
Furue, K. et al. The IL -13– OVOL 1– FLG axis in atopic dermatitis. Immunology 158, 281–286 (2019).
OpenUrl

[52] 52.
Sun, P. et al. OVOL1 regulates psoriasis-like skin inflammation and epidermal hyperplasia. J. Invest. Dermatol. 141, 1542–1552 (2021).
OpenUrl

[53] 53.↵
Dragan, M. et al. Ovol1/2 loss-induced epidermal defects elicit skin immune activation and alter global metabolism. EMBO Rep. 24, (2023).

[54] 54.↵
the EArly Genetics and Lifecourse Epidemiology (EAGLE) Eczema Consortium. Multi-ancestry genome-wide association study of 21,000 cases and 95,000 controls identifies new risk loci for atopic dermatitis. Nat. Genet. 47, 1449–1456 (2015).
OpenUrl CrossRef PubMed

[55] 55.↵
Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs? Trends Genet. 37, 109–124 (2021).
OpenUrl

[56] 56.↵
Nachat, R. et al. Peptidylarginine deiminase isoforms are differentially expressed in the anagen hair follicles and other human skin appendages. J. Invest. Dermatol. 125, 34–41 (2005).
OpenUrl CrossRef PubMed

[57] 57.↵
Zhang, X. et al. Peptidylarginine deiminase 1-catalyzed histone citrullination is essential for early embryo development. Sci. Rep. 6, 1–11 (2016).
OpenUrl CrossRef PubMed

[58] 58.↵
Revez, J. A. et al. Genome-wide association study identifies 143 loci associated with 25 hydroxyvitamin D concentration. Nat. Commun. 11, 1647 (2020).
OpenUrl CrossRef PubMed

[59] 59.↵
Manousaki, D. et al. Genome-wide association study for vitamin D levels reveals 69 independent loci. Am. J. Hum. Genet. 106, 327–337 (2020).
OpenUrl CrossRef PubMed

[60] 60.↵
Maier, T. & Hoyer, J. Modulation of blood pressure by central melanocortinergic pathways. Nephrol. Dial. Transplant 25, 674–677 (2010).
OpenUrl CrossRef PubMed

[61] 61.↵
da Silva, A. A., do Carmo, J. M., Wang, Z. & Hall, J. E. The brain melanocortin system, sympathetic control, and obesity hypertension. Physiology (Bethesda) 29, 196–202 (2014).
OpenUrl CrossRef PubMed

[62] 62.↵
do Carmo, J. M. et al. Role of the brain melanocortins in blood pressure regulation. Biochim. Biophys. Acta Mol. Basis Dis. 1863, 2508–2514 (2017).
OpenUrl

[63] 63.↵
Wang, Y. & Wang, J.-G. Genome-wide association studies of hypertension and several other cardiovascular diseases. Pulse (Basel) 6, 169–186 (2019).
OpenUrl

[64] 64.↵
Jacob, J., Chopra, S. & Baby, C. Neuro-endocrine regulation of blood pressure. Indian J. Endocrinol. Metab. 15, 281 (2011).
OpenUrl

[65] 65.↵
Figueira, L. & Israel, A. Role of cerebellar adrenomedullin in blood pressure regulation. Neuropeptides 54, 59–66 (2015).
OpenUrl

[66] 66.↵
de Oliveira, D. F., de Lemos, R. R. & de Oliveira, J. R. M. Mutations at the SLC20A2 gene and brain resilience in families with idiopathic basal ganglia calcification (“Fahr’s disease”). Front. Hum. Neurosci. 7, 420 (2013).
OpenUrl

[67] 67.↵
Inden, M., Kurita, H. & Hozumi, I. Characteristics and therapeutic potential of sodium-dependent phosphate cotransporters in relation to idiopathic basal ganglia calcification. J. Pharmacol. Sci. 148, 152–155 (2022).
OpenUrl

[68] 68.↵
Wallingford, M. C. et al. SLC20A2 deficiency in mice leads to elevated phosphate levels in cerbrospinal fluid and glymphatic pathway-associated arteriolar calcification, and recapitulates human idiopathic basal ganglia calcification. Brain Pathol. 27, 64–76 (2017).
OpenUrl CrossRef

[69] 69.↵
Sargurupremraj, M. et al. Cerebral small vessel disease genomics and its implications across the lifespan. Nat. Commun. 11, 1–18 (2020).
OpenUrl CrossRef PubMed

[70] 70.↵
Zhao, B. et al. Transcriptome-wide association analysis of brain structures yields insights into pleiotropy with complex neuropsychiatric traits. Nat. Commun. 12, (2021).

[71] 71.↵
Herting, M. M. & Sowell, E. R. Puberty and structural brain development in humans. Front. Neuroendocrinol. 44, 122–137 (2017).
OpenUrl CrossRef

[72] 72.↵
Kane, L. & Ismail, N. Puberty as a vulnerable period to the effects of immune challenges: Focus on sex differences. Behav. Brain Res. 320, 374–382 (2017).
OpenUrl

[73] 73.↵
Resztak, J. A. et al. Analysis of transcriptional changes in the immune system associated with pubertal development in a longitudinal cohort of children with asthma. Nat. Commun. 14, 1–14 (2023).
OpenUrl CrossRef

[74] 74.↵
Alberts, B. et al. Lymphocytes and the cellular basis of adaptive immunity. (Garland Science, 2002).

[75] 75.↵
Lloyd, C. M. & Hessel, E. M. Functions of T cells in asthma: more than just TH2 cells. Nat. Rev. Immunol. 10, 838–848 (2010).
OpenUrl CrossRef PubMed Web of Science

[76] 76.↵
Walker, L. S. K. & Sansom, D. M. The emerging role of CTLA4 as a cell-extrinsic regulator of T cell responses. Nat. Rev. Immunol. 11, 852–863 (2011).
OpenUrl CrossRef PubMed

[77] 77.↵
Oyewole-Said, D. et al. Beyond T-cells: Functional characterization of CTLA-4 expression in immune and non-immune cell types. Front. Immunol. 11, (2020).

[78] 78.↵
Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506–511 (2003).
OpenUrl CrossRef PubMed Web of Science

[79] 79.↵
Wang, K. et al. CTLA-4 +49 G/A polymorphism confers autoimmune disease risk: An updated meta-analysis. Genet. Test. Mol. Biomarkers 21, 222–227 (2017).
OpenUrl CrossRef

[80] 80.↵
Bandala-Sanchez, E. et al. T cell regulation mediated by interaction of soluble CD52 with the inhibitory receptor Siglec-10. Nat. Immunol. 14, 741–748 (2013).
OpenUrl CrossRef PubMed

[81] 81.
Rashidi, M. et al. CD52 inhibits Toll-like receptor activation of NF-κB and triggers apoptosis to suppress inflammation. Cell Death Differ. 25, 392–405 (2018).
OpenUrl CrossRef

[82] 82.↵
Bhamidipati, K. et al. CD52 is elevated on B cells of SLE patients and regulates B cell function. Front. Immunol. 11, (2021).

[83] 83.↵
Zhou, M. et al. Krüppel-like transcription factor 13 regulates T lymphocyte survival in vivo. J. Immunol. 178, 5496–5504 (2007).
OpenUrl Abstract/FREE Full Text

[84] 84.↵
Outram, S. V. et al. KLF13 influences multiple stages of both B and T cell development. Cell Cycle 7, 2047–2055 (2008).
OpenUrl CrossRef PubMed Web of Science

[85] 85.↵
Valdés-Ferrer, S. I. et al. HMGB1 mediates anemia of inflammation in Murine sepsis survivors. Mol. Med. 21, 951–958 (2015).
OpenUrl

[86] 86.↵
Dulmovits, B. M. et al. HMGB1-mediated restriction of EPO signaling contributes to anemia of inflammation. Blood 139, 3181–3193 (2022).
OpenUrl CrossRef PubMed

[87] 87.↵
Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).
OpenUrl CrossRef

[88] 88.↵
Cuomo, A. S. E., Nathan, A., Raychaudhuri, S., MacArthur, D. G. & Powell, J. E. Single-cell genomics meets human genetics. Nat. Rev. Genet. 24, 535–549 (2023).
OpenUrl

[89] 89.↵
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
OpenUrl CrossRef

[90] 90.↵
Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat. Genet. (2023) doi:10.1038/s41588-023-01529-1.
OpenUrl CrossRef

[91] 91.↵
Jansen, R. et al. Conditional eQTL analysis reveals allelic heterogeneity of gene expression. Hum. Mol. Genet. 26, 1444–1451 (2017).
OpenUrl CrossRef PubMed

[92] 92.↵
Liu, C. et al. Whole genome DNA and RNA sequencing of whole blood elucidates the genetic architecture of gene expression underlying a wide range of diseases. Sci. Rep. 12, 1–11 (2022).
OpenUrl CrossRef PubMed

[93] 93.↵
Zhang, Y., Qi, G., Park, J.-H. & Chatterjee, N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 50, 1318–1326 (2018).
OpenUrl CrossRef PubMed

[94] 94.
O’Connor, L. J. The distribution of common-variant effect sizes. Nat. Genet. 53, 1243–1249 (2021).
OpenUrl CrossRef

[95] 95.↵
Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).
OpenUrl

[96] 96.↵
Bhattacharya, A. et al. Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: Lessons from the Global Biobank Metaanalysis Initiative. Cell Genom. 2, 100180 (2022).
OpenUrl

[97] 97.↵
Kachuri, L. et al. Gene expression in African Americans, Puerto Ricans and Mexican Americans reveals ancestry-specific patterns of genetic architecture. Nat. Genet. 55, 952–963 (2023).
OpenUrl

[98] 98.↵
Zaitlen, N., Paşaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).
OpenUrl CrossRef PubMed

[99] 99.
Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
OpenUrl CrossRef PubMed

[100] 100.↵
Lu, Z. et al. Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies. Am. J. Hum. Genet. 109, 1388–1404 (2022).
OpenUrl CrossRef

[101] 101.↵
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).
OpenUrl CrossRef PubMed

[102] 102.↵
McVicker, G. et al. Identification of genetic variants that affect histone modifications in human cells. Science 342, 747–749 (2013).
OpenUrl Abstract/FREE Full Text

[103] 103.
Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
OpenUrl Abstract/FREE Full Text

[104] 104.
Xu, Y. et al. An atlas of genetic scores to predict multi-omic traits. Nature 616, 123–131 (2023).
OpenUrl

[105] 105.
Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).
OpenUrl

[106] 106.
Hou, L. et al. Multitissue H3K27ac profiling of GTEx samples links epigenomic variation to disease. Nat. Genet. 55, 1665–1676 (2023).
OpenUrl

[107] 107.↵
Oliva, M. et al. DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits. Nat. Genet. 55, 112–122 (2023).
OpenUrl CrossRef

[108] 108.↵
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
OpenUrl CrossRef PubMed

[109] 109.↵
Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 1–11 (2019).
OpenUrl CrossRef PubMed

[110] 110.↵
Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2021).
OpenUrl

[111] 111.↵
Marriott, P., Efron, B. & Tibshirani, R. J. An introduction to the bootstrap. J. R. Stat. Soc. Ser. A Stat. Soc. 158, 347 (1995).
OpenUrl

[112] 112.↵
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995).
OpenUrl CrossRef PubMed

[113] 113.↵
Yuan, K. et al. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. bioRxiv 2023.01.07.23284293 (2023) doi:10.1101/2023.01.07.23284293.
OpenUrl CrossRef

[114] 114.↵
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
OpenUrl CrossRef PubMed

[115] 115.↵
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
OpenUrl Abstract/FREE Full Text

[116] 116.↵
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
OpenUrl CrossRef PubMed Web of Science