Fine-mapping causal tissues and genes at disease-associated loci ================================================================ * Benjamin J. Strober * Martin Jinye Zhang * Tiffany Amariuta * Jordan Rossen * Alkes L. Price ## Abstract Heritable diseases often manifest in a highly tissue-specific manner, with different disease loci mediated by genes in distinct tissues or cell types. We propose Tissue-Gene Fine-Mapping (TGFM), a fine-mapping method that infers the posterior probability (PIP) for each gene-tissue pair to mediate a disease locus by analyzing GWAS summary statistics (and in-sample LD) and leveraging eQTL data from diverse tissues to build cis-predicted expression models; TGFM also assigns PIPs to causal variants that are not mediated by gene expression in assayed genes and tissues. TGFM accounts for both co-regulation across genes and tissues and LD between SNPs (generalizing existing fine-mapping methods), and incorporates genome-wide estimates of each tissue’s contribution to disease as tissue-level priors. TGFM was well-calibrated and moderately well-powered in simulations; unlike previous methods, TGFM was able to attain correct calibration by modeling uncertainty in cis-predicted expression models. We applied TGFM to 45 UK Biobank diseases/traits (average *N* = 316K) using eQTL data from 38 GTEx tissues. TGFM identified an average of 147 PIP > 0.5 causal genetic elements per disease/trait, of which 11% were gene-tissue pairs. Implicated gene-tissue pairs were concentrated in known disease-critical tissues, and causal genes were strongly enriched in disease-relevant gene sets. Causal gene-tissue pairs identified by TGFM recapitulated known biology (e.g., *TPO*-thyroid for Hypothyroidism), but also included biologically plausible novel findings (e.g., *SLC20A2*-artery aorta for Diastolic blood pressure). Further application of TGFM to single-cell eQTL data from 9 cell types in peripheral blood mononuclear cells (PBMC), analyzed jointly with GTEx tissues, identified 30 additional causal gene-PBMC cell type pairs at PIP > 0.5—primarily for autoimmune disease and blood cell traits, including the well-established role of *CTLA4* in CD8+ T cells for All autoimmune disease. In conclusion, TGFM is a robust and powerful method for fine-mapping causal tissues and genes at disease-associated loci. ## Introduction Heritable diseases often manifest in a highly tissue-specific manner, motivating intense efforts to elucidate tissue-specific mechanisms of disease1. Previous studies have identified disease-critical tissues/cell-types based on genome-wide patterns2–11, and have deeply dissected a limited number of GWAS loci12–16. However, different GWAS loci may be mediated by different tissues, motivating genome-wide efforts to fine-map causal tissues and genes at individual GWAS loci. Existing approaches, including colocalization17–19 and transcriptome wide association studies (TWAS)20–22, have implicated disease genes via the integration of GWAS data with expression quantitative trait loci (eQTLs) while considering the effect of each gene-tissue pair on disease in isolation. However, it is likely that most of these disease-implicated genes are not actually causal in the analyzed tissue; analogous to non-causal tagging variants implicated by linkage disequilibrium (LD) between variants23, non-causal gene-tissue pairs can be implicated by correlations with causal gene-tissue pairs (involving a different gene and/or different tissue)11,22,24–27. In addition, false-positive gene-tissue pairs can arise from correlations with non-mediated genetic variants, i.e., variants whose causal effects are not mediated by assayed expression levels22,27,28. Previous fine-mapping approaches such as FOCUS24 have proven valuable in disentangling causal effects across correlated genes in a single tissue (also see ref. 27), but have not considered causal gene-tissue pairs. Here, we introduce a new method, Tissue-Gene Fine-Mapping (TGFM), that infers the posterior inclusion probability (PIP) for each gene-tissue pair to mediate a disease association at a given locus; TGFM also assigns PIPs to causal genetic variants whose effects are not mediated by gene expression in assayed tissues and genes. TGFM models both gene-tissue pairs (using cis-predicted expression20,21) and non-mediated genetic variants as potential causal genetic elements, and accounts for both correlations in cis-predicted expression across genes and tissues and LD between genetic variants, generalizing existing fine-mapping methods23,24,27,29–31. TGFM incorporates genome-wide estimates of each tissue’s contribution to disease as tissue-level priors and employs a sampling approach to account for uncertainty in cis-predicted gene expression. We validated TGFM using extensive simulations with real genotypes, including comparisons to coloc17 and FOCUS24. We applied TGFM to 45 UK Biobank traits32 using eQTL data from 38 GTEx tissues25 and 9 fine-grained single-cell PBMC cell-types33. ## Results ### Overview of TGFM TGFM estimates the posterior inclusion probability (PIP) for each *genetic element* (gene-tissue pair or genetic variant) to have a non-zero causal effect on disease, in a model that includes *mediated* causal effects of each gene-tissue pair (via the cis-genetic component of expression of a given gene in a given tissue) and *non-mediated* causal effects of each genetic variant: ![Formula][1] where *Y* denotes phenotypes, *g* indexes genes, *t* indexes tissues, *X* is the matrix of genotypes, *δ**gt* is the vector of causal cis-eQTL effect sizes of each variant on gene expression in gene *g* and tissue *t* (thus *Xδ**gt* is the cis-genetic component of gene expression in gene *g* and tissue *t*), *α**gt* denotes the (scalar) effect of cis-genetic expression in gene *g* and tissue *t* on the disease or trait, *β* is the vector of non-mediated causal effects of each genetic variant on the disease or trait, and *ϵ* denotes environmental noise. We caution that, analogous to previous studies, inference of causal genetic elements relies on the assumption that all causal genetic elements have been assayed, which may not be true in practice (see Discussion); we use the word “causal” for simplicity, with this caveat in mind. TGFM estimates the PIP of each genetic element by generalizing the Sum of Single Effects (SuSiE)30,31 fine-mapping method to include both gene-tissue pairs and genetic variants; gene-tissue pairs are included via cis-*predicted* expression20,21 (using an external eQTL data set such as GTEx25 to build prediction models), which is an approximation to true cis-*genetic* expression (Methods). This approach allows for fine-mapping multiple causal genetic elements in a given locus, inferring causal effects underlying both marginal GWAS34 and marginal TWAS20–22 (i.e., the association between cis-predicted expression of a single gene-tissue pair and disease) associations by accounting for correlations between gene-tissue pairs due to co-regulation across genes/tissues11,22,24–27, correlations between genetic variants due to LD23, and/or correlations between gene-tissue pairs and genetic variants due to the inclusion of a genetic variant in a model of cis-predicted gene expression22,27,28. TGFM employs a sampling approach to account for uncertainty in cis-predicted expression, avoiding false positives that arise from noisy estimation of cis-genetic expression. In detail, TGFM consists of four steps. In step 1, we apply SuSiE to perform eQTL fine-mapping of each gene-tissue pair in the external gene expression data set (estimating a posterior distribution of the causal cis-eQTL effect sizes for each gene-tissue pair). In step 2, we randomly sample 100 cis-predicted expression models for each gene-tissue pair from the posterior distributions of causal cis-eQTL effect sizes estimated in step 1 (Methods). In step 3, we apply SuSiE to perform disease fine-mapping in the target data set (estimating the PIP of each genetic element) 100 times, iterating over the sampled cis-predicted expression models for each gene-tissue pair from step 2. In step 4, we average the results of step 3 across the 100 disease fine-mapping runs. TGFM utilizes a custom implementation of the SuSiE algorithm that provides efficient estimation of PIPs across 100 parallel SuSiE runs that differ only in their cis-predicted expression models (Methods). TGFM inference requires only summary-level GWAS data31,35 consisting of GWAS z-scores for each variant and in-sample LD between genetic variants, in addition to external eQTL data sets across tissues of interest. TGFM increases fine-mapping power by specifying tissue-specific prior probabilities for each genetic element in a locus that are informed by genome-wide data, analogous to functionally informed variant-level fine-mapping methods such as PolyFun36; TGFM assigns one prior causal probability *π**t* for each gene-tissue pair from tissue *t* and one prior causal probability *π**nm* for each non-mediated genetic variant. We estimate *π**t* and *π**nm* in each disease/trait separately by iteratively running a computationally efficient approximation to TGFM (Methods), starting with flat priors and updating *π**t* and *π**nm* at each iteration until convergence. When analyzing a given locus with TGFM, we normalize the prior causal probabilities to sum to 1, analogous to PolyFun36. We account for uncertainty in estimates of *π**t* and *π**nm* by using genomic bootstrapping, randomly sampling 100 sets of values of *π**t* and *π**nm* (one for each of the 100 disease fine-mapping runs in step 3) and averaging TGFM results across the random samples. We restrict cis-predicted expression models to cis-eQTLs within 500kb of each gene’s transcription start site (TSS). We only assign cis-predicted expression models to gene-tissue pairs that are well-predicted by genetic variants, using the SuSiE “purity filter”30 (see Methods). We apply TGFM to fine-map any of the 2,682 overlapping 3Mb loci spanning the entire genome36 that contain at least 50 genetic variants and at least one genetic variant with marginal GWAS p-value less than 1e-5. Further details, including sampling cis-predicted expression models from SuSiE posterior distributions of causal cis-eQTL effect sizes, the custom implementation of the SuSiE algorithm providing efficient estimation of PIPs across 100 parallel SuSiE runs, and the computationally efficient approximation to TGFM used when estimating tissue-specific prior causal probabilities, are provided in the Methods section. We have released open-source software implementing TGFM (see Code availability), as well as posterior distributions of causal eQTL effect sizes across tissues and genes, and TGFM PIPs from this study (see Data availability). ### Simulations We performed simulations using real genotypes to assess the calibration and power of TGFM to identify causal tissues and genes underlying GWAS associations. We used real genotypes from unrelated UK Biobank (UKBB) British samples32 to simulate both gene expression phenotypes (for each gene-tissue pair) and quantitative trait phenotypes. Default simulation parameters were specified as follows: the gene expression sample size ranged from 300 to 1000; the quantitative trait sample size was set to 100,000 (disjoint from gene expression samples); we analyzed 426,593 SNPs and 1,976 genes on chromosome 1 (following ref. 11); the number of tissues was set to 10, of which 2 were causal for the quantitative trait; the quantitative trait architecture was simulated to have average polygenicity37, consisting of 2,700 causal non-mediated variants and 300 causal gene-tissue pairs (150 for each causal tissue) with the expected heritability per causal genetic element (non-mediated variant or gene-tissue pair) set to 0.0001 (expected quantitative trait heritability of 0.3, 10% of which was mediated through gene expression, consistent with genome-wide estimates from MESC28); causal non-mediated variants were randomly selected with probability proportional to their expected per-variant heritability based on baseline-LD model annotations3,38–40 (estimated using S-LDSC3 applied to the UKBB trait White blood cell count) in order to make the simulations as realistic as possible; the genetic architecture of gene expression across tissues was specified following ref. 11: roughly, each heritable gene-tissue pair was randomly assigned 5 causal cis-eQTLs (expected per-SNP heritability: 0.015), 2 of the 5 causal eQTLs were specific to each tissue, and 3 of the 5 causal eQTLs were shared across tissues with effect size covariance set to mimic that of GTEx tissues25; and causal gene-tissue pairs were randomly selected from all genetically heritable genes (true cis-SNP-heritability > 0) in each of the causal tissues. Non-default simulation parameter values were also explored. We performed 100 independent simulations, and averaged results across simulations. Further details of the simulation framework are provided in the Methods section. We compared TGFM to two previously published methods, coloc17 and FOCUS24. Briefly, coloc calculates the posterior probability of a shared causal variant between a GWAS disease/trait and a gene expression trait from a single gene-tissue pair without considering correlations between genes or gene-tissue pairs. FOCUS assigns PIPs for the expression of each gene in a given tissue to have non-zero causal effect on disease, while modeling correlations between genes in that tissue but not modeling correlations between different tissues or correlations between genes and non-mediated genetic variants. FOCUS can naturally be extended to model correlations between all gene-tissue pairs (without modeling correlations between genes and non-mediated genetic variants); we refer to the resulting method as FOCUS-TG. (In contrast, coloc does not model correlations between genes, and cannot be extended in this way.) We first evaluated the calibration of TGFM, coloc, FOCUS and FOCUS-TG to fine-map causal gene-tissue pairs. Calibration was assessed using empirical false discovery rate (FDR), estimated as the proportion of false-positive gene-tissue pairs among all gene-tissue pairs above a given PIP threshold. Following ref. 36, we assessed whether the empirical FDR is less than or equal to (1 – PIP threshold), a more conservative choice than (1 – average PIP) (which has been shown to be slightly mis-calibrated in previous fine-mapping simulations36,41; also see Methods and secondary analyses below). Results are reported in Figure 1a-b and Supplementary Table 1. TGFM produced well-calibrated PIPs at all eQTL sample sizes and PIP thresholds. In contrast, coloc, FOCUS, and FOCUS-TG were poorly calibrated across all PIP thresholds, even at large eQTL sample sizes. We attribute the superior calibration of TGFM over other approaches to its joint modeling of gene-tissue pairs and non-mediated variants, as well as its sampling procedure that accounts for uncertainty in genetically predicted gene expression (see secondary analyses below). Results were similar at other PIP thresholds (Supplementary Figure 1). ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/08/2023.11.01.23297909/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2023/11/08/2023.11.01.23297909/F1) Figure 1: Calibration and power of tissue-gene fine-mapping methods in simulations. **(a**,**b)** Average gene-tissue pair fine-mapping FDR across 100 simulations for various fine-mapping methods (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 **(a)** and PIP=0.9 **(b)**. Dashed horizontal line denotes 1 – PIP threshold (see main text). Numerical results are reported in Supplementary Table 1. **(c**,**d)** Average gene-tissue pair fine-mapping power across 100 simulations for various fine-mapping methods (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 **(c)** and PIP=0.9 **(d)**. Error bars denote 95% confidence intervals. Numerical results are reported in Supplementary Table 2. We next evaluated the power of TGFM, coloc, FOCUS and FOCUS-TG to fine-map causal gene-tissue pairs. Results are reported in Figure 1c-d and Supplementary Table 2. TGFM was moderately well-powered to detect causal gene-tissue pairs at larger eQTL sample sizes, with power ranging from 0.02-0.17 across eQTL sample sizes at a PIP threshold of 0.5. Other methods (coloc, FOCUS, FOCUS-TG) achieved higher power than TGFM, but this is largely moot due to the poor calibration of those methods (Figure 1a-b). Results were similar at other PIP thresholds (Supplementary Figure 2). We compared the calibration and power of TGFM for fine-mapping causal *gene-tissue pairs, genes*, or *non-mediated genetic variants*. Gene PIPs were computed by aggregating gene-tissue PIPs across all gene-tissue pairs corresponding to the gene (defining a gene as causal if at least one corresponding gene-tissue pair is causal; Methods). Causal eQTL variants for causal gene-tissue pairs were not considered to be false positives for variant-level calibration but were not included as true positives in variant-level power computations (also see secondary analyses below). Calibration results for TGFM (Gene-Tissue), TGFM (Gene) and TGFM (Variant) are reported in Figure 2a-b and Supplementary Table 3. TGFM produced well-calibrated gene-level and variant-level PIPs. In contrast, gene-level coloc, FOCUS, and FOCUS-TG PIPs were poorly calibrated across all PIP thresholds even at large eQTL sample sizes (Supplementary Figure 3, analogous to Figure 1a-b). Results were similar at other PIP thresholds (Supplementary Figure 4). ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/08/2023.11.01.23297909/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2023/11/08/2023.11.01.23297909/F2) Figure 2: Calibration and power of fine-mapping different classes of genetic elements with TGFM in simulations. **(a**,**b)** Average fine-mapping FDR across 100 simulations using TGFM for different classes of genetic elements (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 **(a)** and PIP=0.9 **(b)**. Dashed horizontal line denotes 1 – PIP threshold (see main text). Numerical results are reported in Supplementary Table 3. **(c**,**d)** Average fine-mapping power across 100 simulations using TGFM for different classes of genetic elements (see legend) across eQTL sample sizes (x-axis) at PIP=0.5 **(c)** and PIP=0.9 **(d)**. Error bars denote 95% confidence intervals. Numerical results are reported in Supplementary Table 4. Power results for TGFM (Gene-Tissue), TGFM (Gene) and TGFM (Variant) are reported in Figure 2c-d and Supplementary Table 4. TGFM attained higher power to fine-map causal genes than causal gene-tissue pairs, which is expected as fine-mapping causal genes is an easier problem. Power for variant-level fine-mapping was invariant to eQTL sample size, such that variant-level fine-mapping was more powerful than gene-tissue or gene-level fine-mapping at smaller eQTL sample sizes—particularly at the stringent PIP>0.9 threshold, at which the latter were severely underpowered. Results were similar at other PIP thresholds (Supplementary Figure 5). We performed 9 secondary analyses. First, we ran TGFM with a uniform prior (same *π**nm* and *π**t* for all tissues) instead of the default tissue-specific priors inferred from genome-wide data. TGFM with a uniform prior remained well-calibrated (Supplementary Figure 6a-b) but suffered substantially reduced power (Supplementary Figure 6c-d), highlighting the benefit of tissue-specific priors informed by genome-wide data. Second, we ran TGFM with a uniform prior and a single cis-predicted expression model (based on posterior mean causal cis-eQTL effect sizes) instead of averaging results across 100 sampled cis-predicted expression models. TGFM without sampling cis-predicted expression models suffered poor calibration, particularly at smaller eQTL sample sizes (Supplementary Figure 6a-b), highlighting the advantages of the sampling approach to account for uncertainty in cis-predicted expression. However, the calibration of this method was still better than the calibration of FOCUS-TG (Figure 1a-b), perhaps because FOCUS-TG does not account for non-mediated genetic variants. Third, we performed an alternative calibration analysis of TGFM (Variant) where causal eQTL variants for causal gene-tissue pairs were considered false positives for variant-level calibration. TGFM (Variant) was miscalibrated at small eQTL sample sizes and high PIP thresholds in this alternative calibration analysis (Supplementary Figure 7). The calibration worsened at small eQTL sample sizes, likely due to decreased power to detect causal gene-tissue pairs at small eQTL sample sizes, forcing the unmodeled gene-tissue pair effects to be captured by non-mediated variants. Fourth, we modified our calibration analyses to assess whether the empirical FDR is less than or equal to (1 – average PIP), a less conservative choice than (1 – PIP threshold) (ref. 36,41; see Methods). We determined that TGFM (Gene-Tissue) was slightly miscalibrated only at small eQTL sample sizes and low PIP thresholds, TGFM (Gene) was well-calibrated across all eQTL sample sizes and PIP thresholds analyzed, TGFM (Variant) was slightly mis-calibrated at high PIP thresholds regardless of eQTL sample size, and coloc, FOCUS, and FOCUS-TG were severely miscalibrated across all eQTL sample sizes and PIP thresholds analyzed (Supplementary Figures 1, 4). The slight miscalibration of TGFM (Gene-Tissue) and TGFM (Variant) when using (1 – average PIP) is consistent with previous simulations of variant-level fine-mapping methods using polygenic trait architectures36,41. Fifth, we ran TGFM at different simulated GWAS sample sizes ranging from 50,000 to 200,000 (instead of the default sample size of 100,000). TGFM remained well-calibrated regardless of GWAS sample size but attained increased power at larger GWAS sample sizes (Supplementary Figure 8). TGFM fine-mapping power at PIP > 0.5 increased 2.0-fold when doubling the eQTL sample size vs. 1.2-fold when doubling the GWAS sample size (relative to an eQTL sample size of 500 and GWAS sample size of 100,000), suggesting that TGFM attains a greater benefit from increasing the eQTL sample size under our default parameter settings. Sixth, we ran TGFM at different values of heritability of gene expression, ranging from 0.05 to 0.1 (instead of the default value of 0.075). TGFM remained approximately well-calibrated regardless of gene expression heritability but attained increased power at larger values of gene expression heritability (Supplementary Figure 9), analogous to the impact of varying eQTL sample size (Figure 2). Seventh, we investigated TGFM’s ability to provide unbiased estimates of the proportion of causal genetic elements (gene-tissue pairs or non-mediated variants) that are gene-tissue pairs, a parameter closely related to the proportion of disease heritability mediated by gene expression (estimated in ref. 28). We first calculated the proportion of fine-mapped genetic elements at various PIP thresholds that are gene-tissue pairs. This approach yielded either upward or downward biased estimates of the true proportion depending on the eQTL sample size and PIP threshold (Supplementary Figure 10a), reflecting differential discovery power of non-mediated variants and gene-tissue pairs as a function of both eQTL sample size and PIP threshold (Figure 2c-d). We next calculated the *expected* proportion of fine-mapped genetic elements that are gene-tissue pairs by summing PIPs across genetic elements. This approach yielded conservative estimates of the true proportion, becoming less conservative at larger eQTL sample sizes (Supplementary Figure 10b), suggesting that this statistic can provide a conservative lower bound on the true proportion of causal genetic elements that are gene-tissue pairs. Eighth, we investigated the calibration and power of inference of disease-critical tissues via the TGFM tissue-specific prior, using genomic bootstrap to assess significance (see Methods); although inference of disease-critical tissues is not a primary goal of TGFM (and there exist previous methods for inferring disease-critical tissues using eQTL data5,11), assessing the TGFM tissue-specific prior is of interest. We determined that inference of disease-critical tissues via the TGFM tissue-specific prior was well-calibrated and well-powered (Supplementary Figure 11). Ninth, we compared TGFM (Variant) PIPs with variant-level PIPs inferred by SuSiE. The variant-level PIPs were strongly correlated and consistent in magnitude, particularly after excluding from the analysis any variant that was correlated with a TGFM fine-mapped gene-tissue pair (Supplementary Figure 12). ### Tissue-gene fine-mapping of 45 diseases and complex traits We applied TGFM to fine-map tissues and genes for 45 diseases and complex traits from the UK Biobank (average *N* = 316K; previously analyzed with functionally informed variant-level fine-mapping36; Methods and Supplementary Table 5) using gene expression data from 47 GTEx tissues25, which were aggregated into 38 *meta-tissues*11 (average N=259; Supplementary Table 6) to minimize eQTL sample size differences across tissues; below, we refer to these as “tissues” for simplicity. For each disease/trait we applied TGFM to 2,682 overlapping 3-Mb loci36 spanning 119,270 (protein-coding) gene-tissue pairs with cis-predicted expression models (3,139 genes per tissue on average, Supplementary Table 6; 13,700 unique genes) and 10,545,304 genetic variants with MAF ≥ 0.005. We assigned a PIP to each gene-tissue pair, gene, and non-mediated genetic variant using the locus in which the genetic element was most central36; the position of a gene was determined by its TSS. TGFM running times are reported in Supplementary Table 7. We have publicly released PIPs for all gene-tissue pairs, genes, and non-mediated variants for each disease/trait (see Data Availability). Results are summarized in Figure 3 (16 independent traits36), Supplementary Figure 13 (all 45 traits), and Supplementary Table 8. Across all 45 traits, TGFM identified 711 gene-tissue-trait triplets, 2,800 gene-trait pairs (aggregating gene-tissue PIPs across tissues; see above), and 5,893 non-mediated genetic variant-trait pairs at PIP > 0.5 (43 gene-tissue-trait triplets, 382 gene-trait pairs, and 2,675 non-mediated genetic variant-trait pairs at PIP > 0.9). The number of gene-tissue pairs with PIP > 0.5 ranged from 0 (Number of children) to 56 (White blood cell count) across traits, and ranged from 0 (coronary artery) to 197 (whole blood) across tissues. Of the 711 gene-tissue-trait triplets with PIP > 0.5, 180 (25%) had TWAS p-value > 0.05/119,270 = 4.2 × 10−7 (the Bonferroni significance threshold based on 119,270 gene-tissue pairs with cis-predicted expression models42) and 136 (19%) had no nearby variants in the same fine-mapping region with GWAS p-value ≤ 5 × 10−8. Of the 110,828 gene-tissue-trait triplets with TWAS p-value ≤ 4.2 × 10−7, only 531 (0.5%) had TGFM PIP > 0.5. The proportion of causal genetic elements (variants or gene-tissue pairs) that were gene-tissue pairs was equal to 8.1% when counting PIP > 0.5 genetic elements across 16 independent traits (271 gene-tissue pairs and 3,074 non-mediated genetic variants), or 10.1% when summing PIPs across 16 independent traits (Methods), consistent with previous estimates of the proportion of trait heritability mediated by gene expression28. ![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/08/2023.11.01.23297909/F3.medium.gif) [Figure 3:](http://medrxiv.org/content/early/2023/11/08/2023.11.01.23297909/F3) Figure 3: Summary results of fine-mapping genetic elements with TGFM for 16 independent UK Biobank diseases and traits. We report the number of **(a)** Gene-tissue pairs, **(b)** Genes, and **(c)** (non-mediated) Variants fine-mapped using TGFM (y-axis; square root scale) across 16 independent UK Biobank traits (x-axis) at various PIP thresholds ranging from 0.2 to 1.0 (color-bars). Horizontal black lines denote the number of genetic elements fine-mapped at PIP=0.5. FEV1:FVC, ratio of forced expiratory volume in 1 second to forced vital capacity; Platelet volume, Mean platelet volume; Diastolic BP, Diastolic blood pressure; Reticulocyte count, High-light scatter reticulocyte count; Corp. hemoglobin, Mean corpuscular hemoglobin; FVC, Forced vital capacity. Results for all 45 UK Biobank diseases and traits are reported in Supplementary Figure 13, and numerical results are reported in Supplementary Table. For each trait, we identified the most frequently implicated tissues, computing the proportion of fine-mapped gene-tissue pairs in each tissue by counting gene-tissue pairs with PIP > 0.5 (Methods). Results are reported in Figure 4a (14 representative traits), Supplementary Figure 14 (all 45 traits), and Supplementary Table 9. Implicated tissue-trait pairs were consistently concentrated in expected trait-critical tissues, e.g., 50% in spleen and 30% in lymphocytes for All autoimmune disease, 71% in skin (sun-exposed) for Vitamin D level, 60% in liver and 25% in whole blood for Total cholesterol, and 36% in artery tibial and 32% in artery aorta for Diastolic blood pressure. Results were similar at other PIP thresholds (Supplementary Figure 15). Separately, we assessed the statistical significance of implicated tissue-trait pairs by applying genomic bootstrapping to the TGFM tissue-specific prior (Methods). Results are reported in Supplementary Figure 14 and Supplementary Table 9. This approach identified 23 tissue-trait pairs with FDR ≤ 0.05 (64 tissue-trait pairs with FDR ≤ 0.2). Despite limited power, the TGFM tissue-specific prior identified 6 traits with more than one significantly associated tissue (FDR ≤ 0.05; 17 traits at FDR ≤ 0.2); this result motivates the use of TGFM over a two-step approach of separately identifying the causal gene using a gene-level fine-mapping method24 and identifying the causal tissue using a method for identifying trait-critical tissues5,7,11. Significant tissue-trait pairs were consistently concentrated in expected trait-critical tissues, analogous to Figure 4a. Although inference of trait-critical tissues is not a primary goal of TGFM (and there exist previous methods for inferring trait-critical tissues using eQTL data5,11), these results validate the TGFM tissue-specific prior. ![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/08/2023.11.01.23297909/F4.medium.gif) [Figure 4:](http://medrxiv.org/content/early/2023/11/08/2023.11.01.23297909/F4) Figure 4: Validation of fine-mapped tissues and genes. **(a)** Proportion of fine-mapped gene-tissue pairs in each tissue (x-axis) for 14 representative traits (y-axis). Proportions for each trait were calculated by counting the number of gene-tissue pairs with TGFM PIP > 0.5 in each tissue and normalizing the counts across tissues. Tissues are only displayed if their proportion is > 0.2 for at least one of the 14 representative traits. Results for all remaining traits and tissues are reported in Supplementary Figure 14, and numerical results are reported in Supplementary Table 9. The 14 representative traits were selected by including 12 of the 16 independent traits (Figure 3) with many high PIP gene-tissue pairs and two additional, interesting traits (All autoimmune and Vitamin D levels). **(b)** Average PoPS score (y-axis) of genes stratified by TGFM (Gene) PIP (x-axis). Averages were computed across genes for the 16 independent traits listed in Figure 3, as both PoPS score and TGFM gene PIPs are trait-specific. Error bars denote 95% confidence intervals. Numerical results are reported in Supplementary Table 11. We observed instances where TGFM was unable to distinguish the causal tissue within a small set of highly correlated tissues. For example, for waist-hip-ratio adjusted for BMI (WHRadjBMI), TGFM fine-mapped only 1 gene-tissue pair in adipose visceral and 0 gene-tissue pairs in adipose subcutaneous tissue at PIP > 0.5, despite strong prior evidence of the role of adipose tissue in WHRadjBMI7,43; however, TGFM fine-mapped 10 gene-tissue pairs in adipose (defined as adipose subcutaneous U adipose visceral) at PIP > 0.5 when summing PIPs of gene-tissue pairs across the two tissues (Methods). Unsurprisingly, the average correlation in cis-predicted gene expression of adipose subcutaneous vs. adipose visceral across all genes included in TGFM was very large (0.92). Average correlations for all pairs of tissues are reported in Supplementary Figure 16 and Supplementary Table 10; the average correlation ranged from 0.48 (whole blood and brain cerebellum) to 0.96 (brain substantia nigra and brain spinal cord), and the correlation patterns reflected known relationships between tissues. We sought to validate genes prioritized by TGFM by assessing their overlap with genes prioritized by independent gene sets/scores. First, we assessed overlap with PoPS44, a similarity-based gene score that prioritizes trait-relevant genes from gene-level features such as cell-type-specific expression. Results are reported in Figure 4b and Supplementary Table 11. The average PoPS score increased as a function of TGFM (Gene) PIPs in different bins, from −0.0046 (s.e. 0.0031) for trait-gene pairs with PIP < 0.01 to 0.34 (s.e. 0.047) for trait-gene pairs with PIP ≥ 0.9; this provides an external validation of genes prioritized by TGFM. Second, we assessed overlap with 10 non-disease-specific gene sets (e.g. High-pLI genes45) that are known to be enriched/depleted for disease heritability (from Figure 4 of ref. 26). Results are reported in Supplementary Figure 17a and Supplementary Table 12. Genes with TGFM (Gene) PIP > 0.5 were significantly (FDR < 0.05) overrepresented in Epigenetic modifier genes46 (odds ratio: 1.78), High-pLI genes45 (odds ratio: 1.31) and Mouse Genome Informatics (MGI) essential genes47 (odds ratio: 1.31) and underrepresented in genes with the most SNPs within 100kb (odds ratio: 0.48), consistent with previous findings26. Results were similar at other PIP thresholds (Supplementary Figure 17b). ### TGFM pinpoints disease genes and tissues of action We highlight 6 examples of fine-mapped (PIP > 0.5) gene-tissue-trait triplets that recapitulate known biology or nominate biologically plausible mechanisms. First, TGFM fine-mapped *TPO* (*Thyroid Peroxidase*) in thyroid for Hypothyroidism (Figure 5a; gene-tissue PIP: 0.88; gene PIP: 0.88). *TPO* is an enzyme involved in thyroid hormone biosynthesis and its involvement in Hypothyroidism has been well-studied48,49, and TPO has been linked to Hypothyroidism in genetic association studies50. Thyroid was also identified as a Hypothyroidism-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.21, bootstrap p = 0.04 for tissue-specific prior; Supplementary Figure 14). ![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/08/2023.11.01.23297909/F5.medium.gif) [Figure 5:](http://medrxiv.org/content/early/2023/11/08/2023.11.01.23297909/F5) Figure 5: Examples of fine-mapped gene-tissue-disease triplets identified by TGFM. We report 6 example loci for which TGFM fine-mapped a gene-tissue pair (PIP > 0.5). In each example we report the marginal GWAS and TWAS association -log10 p-values (y-axis) of non-mediated variants (blue circles) and gene-tissue pairs (red triangles). Marginal TWAS association -log10 p-values were calculated by taking the median -log10 TWAS p-value across the 100 sets of sampled cis-predicted expression models for each gene-tissue pair. The genomic position of each gene-tissue pair (x-axis) was based on the gene’s TSS. The color shading of each variant and gene-tissue pair was determined by its TGFM PIP. Any genetic element with TGFM PIP > 0.5 was made larger in size. Dashed horizontal blue and red lines represent GWAS significance (5 × 10−8) and TWAS significance (4.2 × 10−7) thresholds, respectively. Numerical results are reported in Supplementary Table 13. Second, TGFM fine-mapped *OVOL1* in lymphocytes for Eczema (Figure 5b, gene-tissue PIP: 0.75, gene PIP: 0.76). Recent work demonstrated that loss of *OVOL1* results in skin inflammation and Eczema via an immune-mediated mechanism in T cells (a lymphocyte cell type)51–53, and *OVOL1* has previously been linked to Eczema in genetic association studies54. Lymphocyte was suggestively implicated as an Eczema-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.36, bootstrap p = 0.08 for tissue-specific prior; Supplementary Figure 14). There exist 28 other gene-tissue pairs within 1 Mb of the TSS of *OVOL1* (4 of which correspond to *OVOL1* in a tissue other than lymphocytes) that had significant TWAS p-values (p ≤ 0.05 / 119,270 = 4.2 × 10−7) but were not fine-mapped by TGFM (all with PIP ≤ 0.01), underscoring the benefit of joint fine-mapping of gene-tissue pairs. TGFM also fine-mapped one non-mediated variant (rs56225074; PIP: 0.55) within 1 Mb of the TSS of *OVOL1*, perhaps due to finite eQTL sample size and/or absence of the causal cell-type or context in GTEx expression data28,55 (see Discussion). Third, TGFM fine-mapped *PADI1* in skin (sun exposed) for Vitamin D level (Figure 5c; gene-tissue PIP: 0.64, gene PIP: 0.65). TGFM assigned *PADI1* a PIP of 0.64 in skin (sun exposed) tissue (TWAS p = 1.7 × 10−20) and a PIP of 0.01 in skin (not sun exposed) tissue (TWAS p = 3.8 × 10−19), demonstrating TGFM’s ability to distinguish trait-critical tissues from closely related tissues. *PADI1* is known to interact with keratins during epidermal differentation56,57, and has previously been linked to Vitamin D level in genetic association studies58,59. Skin (sun exposed) was also identified as a Vitamin D level-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.72, bootstrap p = 0.008 for tissue-specific prior; Supplementary Figure 14). Fourth, TGFM fine-mapped *MC4R* in brain cerebellum for Systolic blood pressure (Figure 5d, gene-tissue PIP: 0.66, gene PIP: 0.66), despite a non-significant TWAS p-value (p = 3.3 × 10−5 > 0.05/119,270 = 4.2 × 10−7). Previous work has shown that activation of *MC4R* in the central nervous system increases sympathetic nervous system activity and blood pressure60–62, and *MC4R* has previously been linked to hypertension in genetic association studies63. Brain cerebellum was identified as a Systolic blood pressure-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.16, bootstrap p = 0.002 for tissue-specific prior; Supplementary Figure 14), consistent with previous studies61,64,65. Strong support of brain cerebellum genes from the tissue-specific prior enabled TGFM to prioritize *MC4R*-brain cerebellum instead of more significant nearby GWAS associations of non-mediated genetic variants. Fifth, TGFM fine-mapped *SLC20A2* in artery aorta for Systolic blood pressure (Figure 5e, gene-tissue PIP: 0.91, gene PIP: 0.91). Previous work has shown that loss of *SLC20A2* results in Human idiopathic basal ganglia calcification66,67 and well as arteriolar calcification68, but *SLC20A2* has not previously been linked to Systolic blood pressure to our knowledge. Artery aorta was also identified as a Systolic blood pressure-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.36, bootstrap p = 0.0008 for tissue-specific prior; Supplementary Figure 14). We note that we have highlighted two fine-mapped gene-tissue pairs for Systolic blood pressure involving two different tissues (artery aorta and brain cerebellum); this demonstrates the advantages of TGFM over a two-step approach of separately identifying the causal gene using a gene-level fine-mapping method24 and identifying the causal tissue using a method for identifying trait-critical tissues5,7,11. Sixth, TGFM fine-mapped *NMT1* in brain cerebellum for Menarche age (Figure 5f, gene-tissue PIP: 0.53, gene PIP: 0.86); TGFM also assigned *NMT1* in brain limbic a PIP of 0.10. Recent work demonstrated that *NMT1* cis-predicted expression in brain tissues was marginally associated with image-derived brain phenotypes69,70, but *NMT1* has not previously been linked to Menarche age to our knowledge. Brain cerebellum was also identified as a Menarche age-critical tissue genome-wide (proportion of fine-mapped gene-tissue pairs = 0.25, bootstrap p = 0.05 for tissue-specific prior; Supplementary Figure 14), consistent with previous studies3,71. There exist 20 other gene-tissue pairs within 1 Mb of the TSS of *NMT1* (11 of which correspond to *NMT1* in a tissue other than brain cerebellum) that had significant TWAS p-values (p ≤ 0.05 / 119,270 = 4.2 × 10−7) but were not fine-mapped by TGFM (all with PIP ≤ 0.01), underscoring the benefit of joint fine-mapping of gene-tissue pairs. Additional examples are discussed in the Supplementary Note and Supplementary Figure 18 (including an additional implicated tissue for Vitamin D level). ### TGFM pinpoints disease genes and fine-grained cell-types in single-cell eQTL data It is widely hypothesized that eQTL in fine-grained cell types/contexts may help resolve the limited proportion of disease heritability explained by eQTL in bulk tissues28,55. Accordingly, we applied TGFM to fine-map disease genes and fine-grained cell types in 45 disease and complex traits from the UK Biobank (same diseases/traits as above) using a recently generated single-cell eQTL data set spanning 9 fine-grained peripheral blood mononuclear cell (PBMC) cell types33 (average *N*=112; Methods and Supplementary Table 14). We converted single-cell expression measurements in each cell type to pseudobulk expression (see Methods), and also included the 38 GTEx tissues analyzed above. For each disease/trait, we applied TGFM to 2,682 overlapping 3-Mb loci spanning 1,851 (protein-coding) gene-PBMC cell type pairs with cis-predicted expression models (206 genes per cell type on average, Supplementary Table 14; 1,066 unique genes), 119,270 (protein-coding) gene-tissue pairs with cis-predicted expression models (GTEx tissues; 3,139 genes per tissue on average, Supplementary Table 6; 13,700 unique genes) and 10,545,304 genetic variants with MAF ≥ 0.005. We assigned a PIP to each gene-PBMC cell type pair, gene-tissue pair, gene, and non-mediated genetic variant, analogous to above. We have publicly released PIPs for all gene-PBMC cell type pairs, gene-tissue pairs, genes, and non-mediated variants for each disease/trait (see Data Availability). Results are reported in Figure 6a-b (18 representative traits), Supplementary Figure 19 (all 45 traits), and Supplementary Table 15. Across all 45 traits, TGFM identified 30 gene-PBMC cell type-trait triplets at PIP > 0.5; TGFM was not sufficiently powered to detect any gene-PBMC cell type-trait triplets at PIP > 0.9, likely due to the limited single-cell eQTL sample size. Of the 30 gene-PBMC cell type-trait triplets with PIP > 0.5, 25 involved a trait locus that had no confidently fine-mapped gene-tissue pair (no TGFM PIP > 0.5 in GTEx-only analysis corresponding to Figures 3-5). For the 18 representative traits, TGFM identified 12 gene-PBMC cell type pairs at PIP > 0.5 for autoimmune diseases and blood cell traits (Figure 6a) vs. 5 gene-PBMC cell type pairs at PIP > 0.5 for non-blood-related traits (Figure 6b; includes 2 gene-PBMC cell type pairs for Menarche age, which we conservatively labeled as non-blood-related even though it has been reported to be partially immune-mediated72,73), increasing to 23 vs. 7 for all 45 traits; this validates the importance of gene expression in blood cell types for blood-related traits. ![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/08/2023.11.01.23297909/F6.medium.gif) [Figure 6:](http://medrxiv.org/content/early/2023/11/08/2023.11.01.23297909/F6) Figure 6: Summary results of fine-mapping gene-PBMC cell type pairs with TGFM for 18 representative UK Biobank diseases and traits. **(a-b)** Number of gene-PBMC cell type pairs fine-mapped using TGFM (y-axis; square root scale) across 18 representative UK Biobank traits (x-axis) at various PIP thresholds ranging from 0.2 to 1.0 (color-bar), distinguishing between **(a)** autoimmune diseases and blood cell traits and **(b)** non-blood-related traits. Horizontal black lines denote the number of gene-PBMC cell type pairs fine-mapped at PIP=0.5. The 18 representative traits consist of the 16 independent traits (Figure 3) and two additional, interesting traits (All autoimmune and Vitamin D levels). Results for all 45 UK Biobank diseases and traits are reported in Supplementary Figure 19. **(c-e)** Number of gene-PBMC cell type pairs fine-mapped using TGFM (y-axis; square root scale) in each of the 9 PBMC cell types (x-axis) at various PIP thresholds ranging from 0.2 to 1.0 (color-bar) for **(c)** Monocyte count, **(d)** Lymphocyte count, and **(e)** All autoimmune disease. Horizontal black lines denote the number of gene-PBMC cell type pairs fine-mapped at PIP=0.5. Results for all 45 UK Biobank diseases and traits are reported in Supplementary Figure 20. Numerical results are reported in Supplementary Table 15. For each trait, we identified the most frequently implicated PBMC cell types, computing the proportion of fine-mapped gene-PBMC cell type pairs in each PBMC cell type by counting gene-PBMC cell type pairs with PIP > 0.5. Results are reported in Figure 6c-e (3 representative blood-related traits), Supplementary Figure 20 (all 45 traits), and Supplementary Table 15. Despite low power, the fine-mapped gene-PBMC cell type pairs at PIP > 0.5 were concentrated in expected trait-critical PBMC cell types, e.g. 62.5% in non-classical monocyte (ncM) cells and 37.5% in classical monocyte (cM) cells for Monocyte count, 100% in CD4+ T (CD4) cells for Lymphocyte count, and 100% in CD8+ T (CD8) cells for All autoimmune disease7,74,75. Results were similar at other PIP thresholds (Supplementary Figure 20). We highlight 4 examples of fine-mapped (PIP > 0.5) gene-PBMC cell type-trait triplets that recapitulate known biology or nominate biologically plausible mechanisms. First, TGFM fine-mapped *CTLA4* (*Cytotoxic T-lymphocyte associated protein 4*) in CD8+ T (CD8) cells for All autoimmune disease (Figure 7a; gene-PBMC cell type PIP: 0.84; gene PIP: 0.85). *CTLA4* is a well-studied regulator of immune responses in T cells76,77 and has previously been linked to autoimmune diseases (rheumatoid arthritis, systemic lupus erythematosus, and type 1 diabetes) in genetic association studies78,79. CD8 cells were also identified as an All autoimmune disease-critical cell type genome-wide (bootstrap p = 0.05 for tissue-specific prior; Supplementary Table 17). *CTLA4* did not meet the criteria for having a cis-predicted expression model in any GTEx tissue, underscoring the advantages of modeling gene expression in fine-grained PBMC cell types. ![Figure 7:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2023/11/08/2023.11.01.23297909/F7.medium.gif) [Figure 7:](http://medrxiv.org/content/early/2023/11/08/2023.11.01.23297909/F7) Figure 7: Examples of fine-mapped gene-PBMC cell type-disease triplets identified by TGFM. We report 4 example loci for which TGFM fine-maps a gene-PBMC cell type pair (PIP > 0.5). In each example we report the marginal GWAS and TWAS association -log10 p-values (y-axis) of non-mediated variants (blue circles) and gene-tissue (or gene-PBMC cell type) pairs (red triangles). Marginal TWAS association - log10 p-values were calculated by taking the median -log10 TWAS p-value across the 100 sets of sampled cis-predicted expression models for each gene-tissue (or gene-PBMC cell type) pair. The genomic position of each gene-tissue (or gene-PBMC cell type) pair (x-axis) was based on the gene’s TSS. The color shading of each variant and gene-tissue (or gene-PBMC cell type) pair was determined by its TGFM PIP. Any genetic element with TGFM PIP > 0.5 was made larger in size. Dashed horizontal blue and red lines represent GWAS significance (5 × 10−8) and TWAS significance (4.1 × 10−7) thresholds, respectively. Numerical results are reported in Supplementary Table 16. Second, TGFM fine-mapped *CD52* in classical monocyte (cM) cells for Monocyte count (Figure 7b; gene-PBMC cell type PIP: 0.55; gene PIP: 0.88; gene-tissue PIP ≤ 0.05 in all GTEx tissues); TGFM also assigned non-negligible PIPs to *CD52* in multiple correlated PBMC cell types and GTEx tissues including 0.29 in non-classical monocyte (ncM) cells and 0.05 in GTEx whole blood. Previous work demonstrated that *CD52* regulates immune homeostasis in monocytes and T cells by inhibiting NF-κB signalling80–82, but *CD52* has not previously been linked to Monocyte count to our knowledge. cM cells were also identified as a Monocyte count-critical cell type genome-wide (bootstrap p = 0.04 for tissue-specific prior; Supplementary Table 17). There exist 56 other gene-tissue pairs and 3 other gene-PBMC cell type pairs within 1 Mb of the TSS of *CD52* (14 and 2 of which correspond to *CD52* in a cell type or tissue other than cM, respectively) with significant TWAS p-values (p ≤ 0.05 / 121,121 = 4.1 × 10−7, where 121,121 = 119,270 + 1,851) but not fine-mapped by TGFM (all with PIP ≤ 0.01), underscoring the benefit of joint fine-mapping of gene-tissue and gene-PBMC cell type pairs. *CD52* was not prioritized in any tissue in the GTEx-only analysis of Figure 3 (highest gene-tissue PIP = 0.07 (whole blood)), underscoring the advantages of modeling gene expression in fine-grained PBMC cell types. Third, TGFM fine-mapped *KLF13* in CD4+ T (CD4) cells for Lymphocyte count (Figure 7c; gene PBMC cell type PIP: 0.75; gene PIP: 0.75; gene-tissue PIP ≤ 0.01 in all GTEx tissues) despite a non-significant TWAS p-value (p = 3.5 × 10−6 > 0.05 / 121,121 = 4.1 × 10−7). *KLF13* has been shown to regulate lymphocyte development and survival83,84, but *KLF13* has not previously been linked to Lymphocyte count in genetic association studies to our knowledge. CD4 cells were suggestively implicated as a Lymphocyte count-critical cell type genome-wide (bootstrap p = 0.12 for tissue-specific prior; Supplementary Table 17). *KLF13* was not prioritized in any tissue in the GTEx-only analysis of Figure 3 (highest gene-tissue PIP = 0.002 (brain cerebellum)), again underscoring the advantages of modeling gene expression in fine-grained PBMC cell types. Fourth, TGFM fine-mapped *HMGB1* in B cells for Mean corpuscular hemoglobin (Figure 7d; gene-tissue PIP: 0.75; gene PIP: 0.75). *HMGB1* has previously been shown to mediate anemia of inflammation (i.e. anemia resulting from a prolonged immune response) in mice85,86, but *HMGB1* has not previously been linked to Mean corpuscular hemoglobin to our knowledge. B cells were also identified as a Mean corpuscular hemoglobin-critical cell type genome-wide (bootstrap p = 0.05 for tissue-specific prior; Supplementary Table 17). *HMGB1* did not meet the criteria for having a cis-predicted expression model in any GTEx tissue, again underscoring the advantages of modeling gene expression in fine-grained PBMC cell types. TGFM also fine-mapped one non-mediated variant (rs149180914; PIP: 0.59) within 1 Mb of the TSS of *HMGB1* (see Discussion). ## Discussion We developed a new method, TGFM, that jointly fine-maps causal gene-tissue pairs and non-mediated genetic variants at disease-associated loci. We applied TGFM to 45 UK Biobank diseases and traits using 38 GTEx tissues and identified many causal gene-tissue pairs (PIP > 0.5), which were concentrated in known disease-critical tissues2–11 and strongly enriched in known disease-relevant genes44,45. Causal gene-tissue pairs identified by TGFM recapitulated known biology, but also included biologically plausible novel findings. We further applied TGFM to single-cell eQTL data from 9 cell types in PBMC (analyzed jointly with GTEx tissues) and identified additional causal gene-PBMC cell type pairs (PIP > 0.5), primarily for autoimmune disease and blood cell traits. TGFM is distinct from previous methods for fine-mapping causal genes in three ways. First, TGFM identifies causal gene-tissue pairs, not just causal genes. Second, TGFM jointly models the disease contribution of each gene-tissue pair and non-mediated variant, disentangling causal gene-tissue pairs from both tagging gene-tissue pairs and tagging non-mediated genetic variants. Third, TGFM employs a sampling procedure that accounts for uncertainty in cis-predicted expression models. Our simulations show that TGFM attains accurate calibration, in contrast to previous methods such as coloc17 and FOCUS24 (Figure 1). We attribute the superior calibration to both our joint modeling of gene-tissue pairs and non-mediated variants, and our sampling procedure that accounts for uncertainty cis-predicted expression models (Supplementary Figure 6a-b). We note that a recent preprint developed a method, causal-TWAS27, to jointly fine-map genes (from a single tissue) and non-mediated genetic variants; however causal-TWAS only fine-maps genes, not fine-map gene-tissue pairs, and does not account for uncertainty in cis-predicted expression models. Another method, CAFEH19, was recently developed to identify causal variants underlying GWAS and eQTL in multiple tissues; however, CAFEH does not model co-regulation between gene-tissue pairs and is thus unable to distinguish causal from tagging gene-tissue pairs. Other recent studies have made valuable contributions in nominating causal gene-tissue pairs for disease by linking fine-mapped causal variants to causal genes using tissue-specific SNP-to-gene linking strategies such as Activity-By-Contact or EpiMap enhancer-gene linking12–14. However, this approach, unlike TGFM, is based on heuristic prioritization and does not provide direct evidence that the regulatory variant’s effect is mediated by the nominated gene-tissue pair. We note several limitations of our work. First, TGFM leverages in-sample LD from the GWAS sample35,87 (analogous to other fine-mapping methods36), but in-sample LD may be unavailable in some applications (e.g. disease consortium meta-analyses). Following ref. 36, our recommendation when in-sample LD is unavailable is as follows: if there exists a LD reference panel from a population closely matching the GWAS sample population spanning at least 10% of the GWAS sample size, run the default version of TGFM with the LD reference panel; otherwise, run TGFM assuming a single causal genetic element per locus (in the latter case, no LD reference panel is needed). Second, TGFM may be susceptible to false positives in the case of unassayed causal genetic elements (analogous to other fine-mapping methods23,30,36,41). Specifically, if the causal gene-tissue pair is not assayed, TGFM may prioritize a correlated assayed gene-tissue pair or a correlated non-mediated genetic variant. We anticipate that this limitation will be mitigated over time as emerging eQTL data sets increasingly capture diverse tissues, cell types, and cellular contexts88. Third, TGFM is only moderately well-powered to detect causal gene-tissue pairs, particularly at lower eQTL sample sizes (Figure 1). In addition, at low eQTL sample sizes, undiscovered causal gene-tissue pairs may be falsely prioritized as non-mediated genetic variants (Supplementary Figure 7). We anticipate this limitation will be mitigated over time as eQTL data sets increase in size89. Fourth, a recent study showed that GWAS and eQTLs studies are well-powered to detect different types of genetic variants, limiting the number of GWAS associations that can be explained by eQTL data at current sample sizes90 Nevertheless, TGFM identified causal gene-tissue pairs at hundreds of GWAS loci using current eQTL data, and we anticipate that TGFM will identify a larger number of causal gene-tissue pairs over time as eQTL data sets increase in size89 and capture increasingly diverse tissues, cell types, and cellular contexts88. Fifth, it is theoretically possible that TGFM could prioritize gene-tissue pairs due to reverse causality, whereby the disease/trait causally impacts the gene-tissue pair. However, we believe that this is very unlikely in practice, as cis-eQTL variants with a detectable effect on gene expression at current eQTL sample sizes explain a substantial proportion of gene expression variation25,91,92 whereas diseases/traits are highly polygenic with each causal variant (outside the HLA locus, which we exclude from our analyses) explaining a small proportion of disease/trait variation37,93–95. Sixth, we have focused here on single-ancestry fine-mapping, but an important future direction is to extend this work to multi-ancestry fine-mapping (incorporating multi-ancestry eQTL analysis96,97), which is likely to further increase fine-mapping power98–100. Seventh, an important future direction is to extend TGFM to incorporate variant-level functional annotations that are enriched for disease heritability3,38–40, which is likely to further increase fine-mapping power36,101. Eighth, another important future direction is to extend TGFM to incorporate gene sets that are enriched for disease heritability explained by cis-predicted expression26, which may also further increase fine-mapping power. Finally, we have focused here on cis-genetic components of gene expression, but TGFM could be extended to genetic components of other molecular traits102–107. Despite these limitations, TGFM is a robust and powerful method for fine-mapping causal tissues and genes at disease-associated loci. ## Methods ### TGFM model overview TGFM estimates the posterior inclusion probability (PIP) for each *genetic element* (gene-tissue pair or genetic variant) to have a non-zero causal effect on disease using a model that includes *mediated* causal effects of each gene-tissue pair (via the cis-genetic component of expression of a given gene in a given tissue) and *non-mediated* causal effects of each genetic variant: ![Formula][2] where *Y* denotes the phenotype vector across GWAS individuals, *g* indexes genes, *t* indexes tissues, *X* is the matrix of genotypes, *W**gt* is the vector of the cis-genetic component of gene expression across GWAS individuals in gene *g* and tissue *t, α**gt* denotes the (scalar) effect of cis-genetic expression in gene *g* and tissue *t* on the disease or trait, *β* denotes the vector of non-mediated causal effects of each genetic variant on the disease or trait, and *ϵ* denotes environmental noise. We assume that the trait *Y*, the cis-genetic component of gene expression *W**gt* in each gene *g* and tissue *t*, and the genotype vector of each variant (each column of *X*) are standardized to have mean 0 and variance 1. We model the cis-genetic component of gene expression as a linear combination of variant-level effects: ![Formula][3] where *δ**gt* denotes the vector of causal cis-eQTL effect sizes of each variant on gene expression in gene *g* and tissue *t*. We emphasize that we model the phenotype Y as a linear combination of the unobserved true cis-genetic component of gene expression *W**gt* (a deterministic function of the unobserved true causal eQTL effect sizes *Δ**gt*) in each gene and tissue. The *predicted* cis-genetic component of gene expression ![Graphic][4] can be estimated (according to predicted causal eQTL effect sizes ![Graphic][5]), with uncertainty, from finite sample-size eQTL data sets in the specified tissue *t*, and provide noisy estimates of the true unobserved cis-genetic component of gene expression *W**gt*. Later (see below), we explain how TGFM models uncertainty in predicted cis-genetic expression. TGFM places the Sum of Single Effects (SuSiE)30,31 fine-mapping prior distribution on the vector of causal disease (mediated and non-mediated) effect sizes: ![Formula][6] ![Formula][7] ![Formula][8] where *α* denotes the vector of expression-mediated causal effects of genetic gene expression of each gene-tissue pair on the disease or trait, *β* denotes the vector of non-mediated causal effects of each genetic variant on the disease or trait, [*α, β*] denotes the concatenated vector of mediated and non-mediated genetic effects, *l* indexes fine-mapping components where a single component represents the disease signal from a single genetic element, *γ**l* denotes a Categorical random variable indicating which one of the genetic elements disease component *l* comes from, *π* denotes the simplex vector of prior probabilities on each genetic element being causal, *d**l* denotes a Gaussian random variable specifying the causal effect size of component *l*, and ![Graphic][9] denotes the prior variance on *d**l*. This approach assumes the true causal disease effect sizes originate in a small number (*l*) of genetic elements with non-zero effects. TGFM will automatically infer posterior distributions on the random variables defining equations 5 and 6 (*γ**l*, *d**l*, ![Graphic][10]; inference details provided below). Posterior inclusion probabilities (PIPs), or the probability that a genetic element has a non-zero effect on disease, can be calculated for each genetic element from these inferred posterior distributions as follows: ![Formula][11] where *j* indexes genetic elements, *l* indexes components, *PIP**j* denotes the PIP for genetic element *j*, and ![Graphic][12] denotes the expected value of the posterior distribution on *γ**l* for genetic element *j*. ### Sum of Single Effects (SuSiE) fine-mapping prior distribution and inference The SuSiE prior distribution was developed in refs. 30,31 for the purpose of fine-mapping trait-causal variants. We briefly summarize the SuSiE fine-mapping prior distribution here: ![Formula][13] ![Formula][14] ![Formula][15] where *β* denotes the vector of causal effects of each genetic variant on the disease or trait, *l* indexes fine-mapping components where a single component represents the disease signal from a single genetic variant, *γ**l* denotes a Categorical random variable indicating which one of the genetic variants disease component *l* comes from, *π* denotes the simplex vector of prior probabilities on each genetic variant being causal, *d**l* denotes a Gaussian random variable specifying the causal effect size of component *l*, and ![Graphic][16] denotes the prior variance on *d**l*. Briefly, the SuSiE prior distribution assumes that only a subset of variants (*L* total) have non-zero effect (i.e., are causal) on the trait, the effect sizes of each causal variant are independent, and the trait causal effect sizes can be calculated by summing the causal effect size of each of the *L* causal variants. A “single effect” refers to the effect of one of the *L* causal variants; hence why the model is called Sum of Single Effects. Ref. 30 proposed a simple model-fitting approach (to infer posterior distributions on random variables in equations 9 and 10), which the authors referred to as Iterative Bayesian Stepwise Selection (IBSS). Briefly, IBSS iteratively updates each of the *L* single effects while keeping all other single effects fixed. It is computationally simple to update a single effect given fixed values of all other single effects (details of updating a single effect provided in the supplement of Ref. 30). ### Overview of TGFM inference TGFM inference consists of four steps. In step 1, we apply SuSiE to perform eQTL fine-mapping of each gene-tissue pair in the external gene expression data set (estimating a posterior distribution of the causal cis-eQTL effect sizes for each gene-tissue pair). In step 2, we randomly sample 100 cis-predicted expression models for each gene-tissue pair from the posterior distributions of causal cis-eQTL effect sizes estimated in step 1 (Methods). In step 3, we apply SuSiE to perform disease fine-mapping in the target data set (estimating the PIP of each genetic element) 100 times, iterating over the sampled cis-predicted expression models for each gene-tissue pair from step 2. In step 4, we average the results of step 3 across the 100 disease fine-mapping runs. TGFM utilizes a custom implementation of the SuSiE algorithm that provides efficient estimation of PIPs across 100 parallel SuSiE runs that differ only in their cis-predicted expression models. ### TGFM inference Step 1: Estimating causal eQTL effect size distributions from external eQTL data TGFM inference relies on probability distributions defining the causal eQTL effects for each gene-tissue pair. These causal eQTL effect size distributions are estimated by applying SuSiE30 to eQTL data; SuSiE infers the following posterior distribution on the causal eQTL effect sizes for a given gene-tissue pair from eQTL data: ![Formula][17] ![Formula][18] ![Formula][19] where *δ**gt* denotes the vector of causal eQTL effect sizes corresponding to the effects of standardized cis-variants on gene *g* in tissue *t, l* indexes fine-mapping components where a single component represents the eQTL signal from a single cis-genetic variant, *λ**l* denotes a Categorical random variable indicating which one of the genetic variants component *l* comes from, ![Graphic][20] denotes the simplex vector of inferred posterior probabilities on each genetic variant being causal for component *l, d**lk*|*λ**lk* = 1 denotes a Gaussian random variable specifying the causal effect size of component *l* for variant *k* conditioned on variant *k* being causal for component *l*, and ![Graphic][21]and ![Graphic][22] define the inferred posterior mean and variance on *d**lk*|*λ**lk*. The posterior mean causal eQTL effect sizes of this distribution are ![Graphic][23]. We restrict TGFM to gene-tissue pairs that are well-predicted by genetic variants, using the SuSiE “purity filter”30. Specifically, this filter removes any gene-tissue pair such that all components defining the gene-tissue pair have a minimum absolute correlation between all variants in the component’s 95% credible set less than 0.5. The 95% credible set for a given component is calculated by selecting the minimum set of variants that contain the causal variant with 95% confidence (according to the inferred posterior distribution ![Graphic][24]). Although in the present study we utilized SuSiE for generating distributions of causal eQTL effect sizes due to its computational efficiency, the TGFM inference procedure is generalizable to a variety of methods that generate probabilistic cis-predicted expression models potentially including probabilistic/Bayesian multivariable regression methods (for example, BSLMM108, SBayesR109, or LDpred2110) or bootstrapping111. ### TGFM inference Step 2: Randomly sample 100 cis-predicted expression models for each gene-tissue pair Instantiations of the causal eQTL effect sizes (which determine instantiations of cis-predicted expression models) for each gene-tissue pair can be randomly sampled from that gene-tissue pair’s SuSiE-inferred posterior distribution (equations 11-13). To draw a single random sample of the causal eQTL effects for a given gene-tissue pair: (1) randomly sample the causal variant for each of the *l* components from ![Graphic][25] (2) randomly sample the causal eQTL effect size of the sampled causal variant *k* (identified in Step 1) for each of the *l* components from ![Graphic][26] (3) sum the sampled causal eQTL effects across the *l* components. ### TGFM inference step 3: Fine-mapping inference conditioned on a sampled cis-predicted expression models We describe here tissue-gene fine-mapping inference conditioned on setting the cis genetic component of gene expression for each gene-tissue pair equal to a sampled cis-predicted expression model (see TGFM inference step 2) for that gene-tissue pair. In this setting: ![Formula][27] where *g* indexes genes, *t* indexes tissues, ![Graphic][28] denotes the *predicted* cis-genetic component of gene expression in gene *g* and tissue *t* (as determined by *sampled* cis-predicted expression model’s causal eQTL effect sizes ![Graphic][29]), *Y* denotes the phenotype vector across GWAS individuals, *X* is the matrix of genotypes, *α**gt* denotes the (scalar) effect of cis-genetic expression in gene *g* and tissue *t* on the disease or trait, *β* denotes the vector of non-mediated causal effects of each genetic variant on the disease or trait, and *ϵ* denotes environmental noise. The trait *Y*, the predicted cis-genetic component of gene expression ![Graphic][30] in each gene *g* and tissue *t*, and the genotype vector of each variant (each column of *X*) are standardized to have mean 0 and variance 1. We place SuSiE fine-mapping prior distributions on the disease/trait causal effect sizes (equations 4-6). In this setting, we use the existing SuSiE software for inference of posterior distributions on the fine-mapping random variables (*γ**l*, *d**l*, and ![Graphic][31] in equations 5-6) and generation of corresponding PIPs for each genetic element (equation 7). We refer to these PIPs as conditional PIPs as they are conditional upon the predicted causal eQTL effect sizes. SuSiE inference applied to this task requires the following input: GWAS summary statistic z-scores for each non-mediated variant, transcriptome-wide association study (TWAS) summary statistic z-scores for each gene-tissue pair corresponding to the marginal association between predicted genetic gene expression and the trait, in-sample correlations between all pairs of genetic elements (variants and predicted genetic gene-tissue pairs) and specified prior probabilities *π*. We assume the user provides GWAS summary statistic z-scores for each variant, in-sample LD (ie. correlations between all pairs of genetic variants based on the GWAS samples), predicted causal eQTL effect sizes for each gene-tissue pair, and the prior causal probabilities *π* (we discuss below how *π* can be inferred). TWAS summary statistic z-scores and in-sample correlations between all genetic elements can be computed from the user-provided input as follows: TWAS summary statistic z-scores for a particular gene-tissue pair can be easily calculated from GWAS summary statistic z-scores, in-sample variant LD, and predicted causal cis-eQTL effect sizes defining the genetic component of gene expression for the gene-tissue pair following ref. 21: ![Formula][32] where *z**gt* denotes the TWAS summary statistic z-score for gene *g* in tissue *t, Z**gwas* denotes the vector GWAS summary statistic z-scores across variants, ![Graphic][33] denotes the vector of predicted causal eQTL effect sizes for gene *g* in tissue *t*, and Σ denotes the in-sample variant LD. We note that ref. 21 only requires LD from a reference panel (instead of in-sample LD) for TWAS; TGFM instead requires in-sample LD (see Discussion). The correlation between the predicted genetic gene expression of two genes can be computed from in-sample variant LD and the predicted causal eQTL effect sizes of both genes: ![Formula][34] where *gt* indexes one gene-tissue pair and *g*′*t*′ indexes the other gene-tissue pair, ![Graphic][35] denotes the predicted genetic component of gene expression in gene *g* in tissue *t* across in-sample GWAS individuals, ![Graphic][36] denotes the vector of predicted causal eQTL effect sizes for gene *g* in tissue *t*, and Σ denotes the in-sample variant LD. Similarly, the in-sample correlation between a non-mediated variant and the predicted genetic component of gene-tissue pair is: ![Formula][37] where *k* indexes the non-mediated variant, *gt* indexes the gene tissue pair, *X**k* is the genotype of variant *k* across in-sample GWAS individuals, ![Graphic][38] denotes the cis-predicted genetic component of gene expression in gene *g* and tissue *t* across in-sample GWAS individuals, Σ*k* denotes row *k* of the in-sample variant LD matrix, and ![Graphic][39] denotes the vector of causal eQTL effect sizes for gene *g* in tissue *t*. ### TGFM inference step 4: Marginalize uncertainty in cis-predicted expression on fine-mapping PIPs by averaging fine-mapping results across 100 runs TGFM PIPs for a given locus are calculated by marginalizing out the uncertainty in the cis-predicted causal eQTL effect sizes on the conditional PIPs: ![Formula][40] where ![Graphic][41] is the TGFM PIP for genetic element *j, δ* is the set of causal eQTL effect sizes for all gene-tissue pairs in the fine-mapping region, ![Graphic][42] is a specific instantiation of the set of causal eQTL effect sizes for all gene-tissue pairs in the fine-mapping region, ![Graphic][43] is the conditional PIP of genetic element *j* conditioned on genetic gene expression determined by ![Graphic][44], and ![Graphic][45] is the probability of estimating causal eQTL effect sizes ![Graphic][46] given the eQTL data. The conditional PIPs, ![Graphic][47], can be inferred using SuSiE fine-mapping of both non-mediated variants and gene-tissue pairs where cis-predicted genetic gene expression is determined by ![Graphic][48] (described in previous Methods subsection, TGFM inference step 3). The causal eQTL effect size distribution, *p*(*δ*), is approximated by the posterior distribution on causal eQTL effect sizes for each gene-tissue pair estimated by applying SuSiE to eQTL data (described in previous Methods subsection, TGFM inference step 1). In practice, we approximate ![Graphic][49] in equation 18 by drawing 100 random samples of causal eQTL effect sizes across gene-tissue pairs from *p*(*Δ*) (described in previous Methods subsection, TGFM inference step 2), calculating the conditional PIP ![Graphic][50] for each of the 100 random samples, and taking the average across 100 conditional PIPs. In the next methods subsection, we discuss how this approach can be extended to integrate the inferred tissue-specific prior. In practice, we do not run the existing SuSiE package30,31 100 times in series for each fine-mapping region. TGFM utilizes a custom implementation of the SuSiE algorithm that provides efficient estimation of PIPs across 100 parallel runs where each of 100 parallel SuSiE runs differ only in their cis-predicted expression models. The rate-limiting step of each iteration of the SuSiE algorithm involves multiplying the LD matrix with that iteration’s estimate of the trait-causal effect sizes. The custom TGFM implementation exploits the fact that the variant LD matrix (which constitutes the majority of the full correlation matrix of all pairs of genetic elements) is identical across all 100 runs; TGFM uses matrix multiplication to multiply the variant LD matrix with the current causal non-mediated variant effects for each of the 100 runs (corresponding to a single (*K*X*K*)X(*K*X100) matrix multiplication instead of 100 (*K*X*K*)X(*K*X1) multiplications in series where *K* is the number of variants in the region). In addition, the custom TGFM implementation of SuSiE does not compute the ELBO at each iteration for each of the 100 runs; computing the ELBO is computationally intensive and primarily utilized to assess convergence. Instead, we run each of the 100 runs for a pre-specified number of iterations; we use a default of 5 iterations which performed well in simulations (Figure 2). We set the default number of components *l* underlying each of the 100 runs to 10. The iterative algorithm underlying SuSiE is not guaranteed to reach a global optimum and can get stuck in local optima30. We found TGFM was prone to reaching local optima in which a non-causal gene-tissue pair was confidently fine-mapped (i.e., a false positive) when the gene-tissue pair was moderately correlated with multiple independent causal non-mediated variants. This is likely due to the greedy nature of the SuSiE iterative algorithm. We use the following initialization strategy to mitigate convergence on local optima: (1) Run TGFM where the causal effects are initialized to be zero (this is the default initialization used by SuSiE) (2) If the TGFM PIP for any gene-tissue pair in the fine-mapping region is greater than 0.2: (2a) Run SuSiE fine-mapping on only the non-mediated variants (2b) Run TGFM where the non-mediated variant effects are initialized to the converged values from step 2a and the gene-tissue pair effects are initialized to zero (2c) For each of the 100 TGFM runs, select the fitted TGFM model (either (1) or (2b)) with the larger ELBO. ### Inference of tissue-specific TGFM prior causal probabilities TGFM increases fine-mapping power by specifying tissue-specific prior probabilities for each genetic element in a locus that are informed by genome-wide data, similar to PolyFun36. For each trait separately, TGFM assigns one prior causal probability *π**t* for each gene-tissue pair from tissue *t* and one prior causal probability *π**nm* for each non-mediated genetic variant where *π**t* reflects the prior probability that an arbitrary gene from tissue *t* at a disease-associated locus has a non-zero causal effect on the disease/trait and *π**nm* reflects the prior probability that an arbitrary non-mediated variant at a disease-associated locus has a non-zero causal effect on the disease/trait. We note that *π**t* is a genome-wide parameter reflecting the probability that an arbitrary gene from tissue *t* has non-zero effect on disease, which is related but distinct from genome-wide expression-mediated disease heritability parameters previously estimated in refs. 11,26,28. We infer *π**t* and *π**nm* separately for each disease/trait using an iterative algorithm, starting with flat priors, and at each iteration: (1) updating the PIP of each genetic element using a computationally efficient approximation of TGFM (see next paragraph for details) given the current prior causal probabilities, which are normalized to sum to one across genetic elements at each locus, analogous to PolyFun36. (2) updating the prior causal probabilities according to: ![Formula][51] where *c* is the genetic element class (for example a specific tissue or non-mediated variant), *π**c* is the prior causal probability corresponding to genetic element class *c, k* indexes genetic elements from genetic element class *c, PIP**k* is the current PIP of genetic element *k*, and |*c*| is the number of genetic elements belonging to genetic element class *c*. It is computationally prohibitive to run TGFM genome-wide tens to hundreds of times for each trait while updating the prior probabilities at each iteration. We make two approximations to TGFM to allow for efficient computation of genome-wide PIPs at each iteration (and we emphasize that these approximations are only used for inference of the prior). First, we run TGFM with a single cis-predicted expression model for each gene-tissue pair (based on SuSiE posterior mean causal cis-eQTL effect sizes) instead of averaging results across 100 sampled cis-predicted expression models. We refer to this approach as TGFM (no sampling). Thus, PIPs at each iteration can be inferred by applying SuSiE to fine-map both non-mediated variants and gene-tissue pairs where genetic gene expression of each gene-tissue pair is determined by the posterior mean causal eQTL effect sizes. While we show TGFM (no sampling) generates poorly calibrated PIPs (Supplementary Figure 6), using TGFM (no sampling) to infer prior causal probabilities results in well-calibrated causal prior probabilities in simulations (Supplementary Figure 11a), perhaps because the causal prior probabilities integrate information across the genome (equation 19) and fine-mapping errors resulting from uncertainty in the genetic component of gene expression will be averaged out across all genes in the genome. Second, we only run TGFM (no sampling) inference once, during the first iteration. After running TGFM (no sampling) in the first iteration, we save the resulting Bayes Factors30 for each component-genetic element pair; a Bayes Factor reflects the relative support for including that genetic element in that component of the model, irrespective of the prior probabilities. PIPs can be easily calculated based on the Bayes Factors and the current prior: ![Formula][52] ![Formula][53] where *l* indexes fine-mapping components, *j* and *k* index genetic elements in the fine-mapping region, ![Graphic][54] denotes the expected value of the posterior distribution on *γ**l* for genetic element *j, BF**lj* denotes the Bayes Factor for genetic element *j* on component *l, θ**j* denotes the normalized prior causal probability of genetic element *j*, and *PIP**j* is the TGFM (no sampling) PIP for genetic element *j*. We do not recalculate the Bayes Factors after the first iteration, and simply used the saved Bayes Factors from the first iteration in all subsequent iterations. This approximation is reasonable as while we expect PIPs to change with the evolving prior, we do not expect posterior mean causal effect sizes of each fine-mapping component to drastically change with the evolving prior, ultimately leaving the Bayes Factors stable. During inference of the causal prior probabilities, TGFM (no sampling) is run genome-wide on overlapping 3Mb windows36. The prior probability updates at each iteration (equation 16) are calculated from PIPs corresponding to genetic elements located in the middle Mb of these 3 Mb windows. To limit to windows with at least one disease causal signal, we remove 3Mb windows from the prior probability inference procedure that do not pass the SuSiE “purity filter” (see above) after running TGFM (no sampling) with a uniform prior. In addition, TGFM inference can also rely on probability distributions defining the uncertainty in our estimated prior causal probabilities (see below). Empirical distributions, as well as significance testing, on the causal prior probabilities can be calculated using 100 iterations of bootstrapping111 across the genome (we refer to this as “genomic bootstrapping”). Specifically, for each of the 100 bootstraps, we randomly sample *T* fine-mapping 3Mb windows with replacement (assuming *T* total 3Mb fine-mapping windows for the disease/trait) and run the iterative algorithm on the bootstrapped regions. This procedure results in 100 empirical samples of the causal prior probabilities which reflect their estimation uncertainty. These 100 empirical samples are input directly into the sampling procedure underlying TGFM inference (see below). Significance testing of whether the prior causal probability is greater than zero for a particular genetic element class can be generated using a z-score computed from the mean and standard error of the bootstrapped distribution. For a single analyzed trait, we assess significance using Benjamini-Hochberg112 FDR correction across all tissues and/or cell types included in the analysis. Inference of TGFM (no sampling) is performed using the function ‘susie_rss’ from the SuSiE package30,31 with default parameters. We used the same initialization strategy that was used by TGFM (described above). The iterative algorithm for inference of prior causal probabilities was run for 400 iterations. ### TGFM inference including uncertainty in tissue-specific prior causal probabilities The previous subsection described inference of probability distributions defining the tissue-specific prior causal probabilities. Here we described an extension of TGFM inference step 4 that integrates out uncertainty in *both* cis-predicted causal eQL effect sizes and the prior causal probabilities on the conditional PIPs: ![Formula][55] where ![Graphic][56] is the TGFM PIP for genetic element *j, δ* is the set of causal eQTL effect sizes for all gene-tissue pairs in the fine-mapping region, ![Graphic][57] is a specific instantiation of the set of causal eQTL effect sizes for all gene-tissue pairs in the fine-mapping region, *π* are the causal prior probabilities, ![Graphic][58] are a specific instantiation of the causal prior probabilities, ![Graphic][59] is the conditional PIP of genetic element *j* conditioned on genetic gene expression determined by ![Graphic][60] and causal prior probabilities equal to ![Graphic][61]![Graphic][62] is the probability of estimating causal eQTL effect sizes ![Graphic][63] given the eQTL data, and ![Graphic][64] is the probability of estimating prior causal probabilities ![Graphic][65]. In practice, we approximate ![Graphic][66] by drawing 100 random samples of causal eQTL effect sizes across gene-tissue pairs from *p*(*δ*) and prior causal probabilities from *p*(*π*), calculating the conditional PIP ![Graphic][67] for each of the 100 random samples, and taking the average across 100 conditional PIPs. ### Calculating gene-level PIPs with TGFM We define a gene as causal for a disease/trait if there exists at least one tissue where the gene-tissue pair is causal for the trait. Gene-level PIPs can be computed by aggregating gene-tissue pair fine-mapping results across all gene-tissue pairs corresponding to the gene of interest: ![Formula][68] ![Formula][69] where *l* indexes fine-mapping components, *g* indexes genes, *k* ∈ *g* indexes all gene-tissue pairs corresponding to gene *g*, ![Graphic][70] denotes the expected value of the posterior distribution on *γ**l* for gene *g*, ![Graphic][71] denotes the expected value of the posterior distribution on *γ**l* for gene-tissue pair *k*, and ![Graphic][72] is the gene-level PIP corresponding to gene *g*. Similar to TGFM PIPs for variants and gene-tissue pairs, we are describing here the calculation of conditional gene PIPs (conditional upon a given instantiation of predicted causal cis-eQTL effect sizes and prior causal probabilities). These conditional gene PIPs will be averaged across 100 samples of cis-predicted genetic gene expression and predicted causal prior probabilities (equation 22). This approach can also be used to calculate PIPs for causal genes in a specified subset of tissues, or gene-tissue subset PIPs. For example, identifying gene-tissue subset pairs for the subset of adipose tissues (defined as adipose subcutaneous U adipose visceral). Gene-tissue subset PIPs can be computed by aggregating (as done in equations 23-24) gene-tissue pair fine-mapping results across all gene-tissue pairs corresponding to the gene of interest from tissues in the tissue subset of interest. ### Simulation framework We used real genotypes from unrelated UK Biobank British (UKBB) samples32 to simulate both gene expression phenotypes (for each gene-tissue pair) and quantitative trait phenotypes. Default simulation parameters were specified as follows: the gene expression sample size ranged from 300 to 1000; the quantitative trait sample size was set to 100,000 (disjoint from gene expression samples); we analyzed 426,593 SNPs and 1,976 genes on chromosome 1; the number of tissues was set to 10, of which 2 were causal for the quantitative trait; the quantitative trait architecture was simulated to have average polygenicity37, consisting of 2,700 causal non-mediated variants and 300 causal gene-tissue pairs (150 for each causal tissue allowing for the option of two causal gene-tissue pairs from the same gene) with the expected heritability per causal genetic element (non-mediated variant or gene-tissue pair) set to 0.0001 (expected quantitative trait heritability of 0.3, 10% of which was mediated through gene expression, consistent with genome-wide estimates from MESC28); causal non-mediated variants were randomly selected with probability proportional to their expected per-variant heritability based on baseline-LD model annotations3,38,39 (estimated using S-LDSC3 applied to the UKBB trait White blood cell count) in order to make the simulations as realistic as possible. We simulated the genetic architecture of gene expression across related tissues using an approach similar to the approach implemented in ref. 11: we simulated all 1,976 protein-coding genes on chromosome 1 to be expressed in all tissues, with 50% of these gene-tissue pairs being cis-genetically heritable (only cis-genetically heritable gene-tissue pairs could have a simulated causal effect on the trait, though we considered all expressed gene-tissue pairs for fine-mapping, not just those that were heritable); each heritable gene-tissue pair was randomly assigned 5 causal *cis*-eQTLs within 100Kb of the gene’s TSS, 3 of the 5 causal *cis*-eQTLs were shared across tissues and 2 of the 5 causal *cis*-eQTLs were specific to each tissue; each causal *cis*-eQTL explains 1.5% of the variance of each gene-tissue pair (resulting in an average gene heritability of .075); effect sizes of shared causal cis-eQTLs covaried across tissues as follows (based on ref. 11): the tissues were split into 3 categories to mimic biological tissue modules in GTEx25 (tissues 1-3, tissues 4-6, and tissues 7-10) and the correlation of shared cis-eQTL effect sizes across tissues was set to 0.8 and 0.74 for tissues in the same tissue category and across tissues in different tissue categories, respectively. Our simulation indicates that PIPs reported by TGFM are slightly anti-conservative with respect to (1 – average PIP) (Supplementary Figures 1, 4), consistent with previous simulations of variant-level fine-mapping methods using polygenic trait architectures36,41. Following ref. 36, we circumvent this problem by using an alternative FDR estimator given by (1 – PIP threshold); setting all PIPs greater than the specified PIP threshold equal to the PIP threshold. For example, at a PIP threshold of 0.9, we treat all genetic elements with PIP ≥ 0.9 as if they had PIP = 0.9. ### UK Biobank GWAS summary statistics and in-sample LD We applied to TGFM to 45 of the 49 UK Biobank traits analyzed via functionally informed fine-mapping in ref. 36 (average *N*=316K; Supplementary Table 5); the four excluded traits were Dermatology, Diabetes (any), Endocrine Diabetes, and Childless; these four diseases and traits were excluded due to redundancy and low heritability. We considered the set of 10,545,304 UK Biobank imputed variants with MAF ≥ 0.5% and INFO score ≥ 0.6, similar to previous work36,113. We used GWAS summary statistics that were generated and described in ref. 36. Briefly, the summary statistics were computed in ref. 36 from n=337,426 unrelated British-ancestry individuals in UK Biobank using BOLT-LMM114 adjusting for sex, age and age squared, assessment center, genotyping platform, the top 20 genotyping principal components, and dilution factor for biochemical traits (see ref. 36 for complete details). We used Liftover115 to convert variant positions from hg19 to hg38. Z-scores used by TGFM were computed from the Bolt-LMM output as follows: as the noninfinitesimal version of BOLT-LMM does not calculate effect sizes, we calculated z-scores by taking the square root of the BOLT-LMM chi-squared statistics and multiplying them by the sign of the effect size estimate from the infinitesimal version of BOLT-LMM. We computed in-sample variant LD matrices using 337,426 unrelated British-ancestry individuals in UK Biobank (same individuals as summary statistics); missing values were imputed by the mean of a variant across individuals. ### Overlapping 3Mb loci used for fine-mapping We applied TGFM to fine-map each of the 2,682 overlapping 3Mb loci spanning the entire genome. Analogous to ref. 36, the overlapping 3Mb loci had 1 Mb spacing between the start points of consecutive loci, were limited to autosomal chromosomes, and did not include 3 long-range LD regions including the MHC region (chromosome 6 positions 25499772-33532223, chromosome 8 positions 8142478-12142491, and chromosome 11 positions 45978449-57232526 in hg38; lifted over from long-range LD regions ignored in ref. 36). Distinct from the windows generated in ref. 36, for each disease/trait, we limited TGFM fine-mapping to loci with at least 50 genetic variants and at least one genetic variant with marginal GWAS p-value less than 1e-5. ### GTEx cis-predicted expression models We analyzed GTEx25 data from 47 GTEx tissues, which were aggregated into 38 meta tissues of similar sample size consisting of European-ancestry individuals (average *N*=259, range: *N*=101-320 individuals, 23 meta-tissues with *N*=320; Supplementary Table 6) to reduce heterogeneity in eQTL sample sizes across tissues. Testis tissue was removed from analysis as it has outlier (cis- and trans-) eQTL discovery power after controlling for sample size (see ref. 25 Figure 2C). Meta-tissues were constructed using the same individuals and tissue aggregation strategy as described in ref. 11. Normalized expression matrices and covariates used in GTEx consortium’s single-tissue cis-eQTL analysis25 were downloaded from the GTEx portal ([https://gtexportal.org/home/datasets](https://gtexportal.org/home/datasets)) for each of the 47 analyzed tissues. In each tissue, we subset individuals to those composing the corresponding meta-tissue, and then re-standardize gene expression of each gene to have mean zero and variance one in each subsetted tissue. Cis-eQTLs were called in each tissue independently while controlling for covariates and limiting to variant-gene pairs such that the variant is within 500Kb of the gene’s TSS. For each meta-tissue, we removed variants with MAF < .05 across samples in the meta-tissue, were strand-ambiguous, or did not overlap the 10,545,304 analyzed UK Biobank variants. Cis-predicted expression models were generated using SuSiE30,31 applied to eQTL summary statistics and eQTL in-sample LD matrices (using the function ‘susie_rss’ from the SuSiE package30,31 with default parameters) for each (gene, meta-tissue) pair, independently. If a meta-tissue is composed of more than 1 constituent tissues, meta-analyzed eQTL summary statistics were generated using a fixed-effect meta-analysis across constituent tissues and meta-analyzed in-sample LD was generated by computing variant-variant correlations across all samples composing the meta-analyzed tissue. After removing gene-tissue pairs that did not pass the SuSiE “purity filter” (see above), we identified 119,270 gene-tissue pairs with a cis-predicted expression model; all 119,270 cis-predicted expression models are publicly available (see Data availability). ### PBMC cis-predicted expression models PBMC single-cell eQTL (described and generated in ref. 33) expression data was downloaded from the Human Cell Atlas Data Coordination Platform and genotype data downloaded from dbGaP (accession number phs002812.v1.p1). We removed individuals that were not European ancestry or had fewer than 2,500 detected cells. We generated pseudobulk expression in the 9 most abundant cell types (cell type assignment determined by ref. 33) for all individuals in each cell type with greater than 5 cells. In each of the 9 cell types separately, we removed genes that were expressed in less than 80% of that cell type’s pseudobulk samples. Pseudobulk expression in each cell type was transformed using EdgeR’s logCPM function116 followed by normalizing each gene to have mean 0 and variance 1. The top 10 expression PCs for each cell type were calculated based on the normalized expression matrix. The number of pseudobulk samples per cell type is described in Supplementary Table 14 (average *N*=112). Cis-eQTLs were called in each cell type independently while controlling for covariates (the top 10 expression PCs) and limiting to variant-gene pairs such that the variant is within 500Kb of the gene’s TSS. For each cell type, we removed variants with MAF < 0.05 across samples in the cell type, that were strand-ambiguous, or did not overlap the 10,545,304 analyzed UK Biobank variants. Cis-predicted expression models were generated using SuSiE30,31 applied to eQTL summary statistics and eQTL in-sample LD matrices (using the function ‘susie_rss’ from the SuSiE package30,31 with default parameters) for each (gene, cell type) pair, independently. After removing gene-PBMC cell type pairs that did not pass the SuSiE “purity filter” (see above), we identified 1,851 gene-PBMC cell type pairs with a cis-predicted expression model; all 1,851 cis-predicted expression models are publicly available (see Data availability). ## Supporting information Supplementary Material [[supplements/297909_file02.pdf]](pending:yes) Supplementary Tables [[supplements/297909_file03.zip]](pending:yes) ## Data Availability We have made TGFM PIPs for gene tissue pairs, gene PBMC cell type pairs, genes, and non mediated variants across 45 diseases/traits (for both analyses of 38 GTEx tissues + analyses of 38 GTEx tissues and 9 PBMC cell types) publicly available at [https://doi.org/10.7910/DVN/S26PFI](https://doi.org/10.7910/DVN/S26PFI), GTEx cis predicted expression models for all gene tissue pairs publicly available at [https://doi.org/10.7910/DVN/8IPOPK](https://doi.org/10.7910/DVN/8IPOPK), PBMC cis predicted expression models for all gene PBMC cell type pairs publicly available at [https://doi.org/10.7910/DVN/A6K9QW](https://doi.org/10.7910/DVN/A6K9QW), GWAS summary statistics for all 45 diseases/traits publicly available at [https://doi.org/10.7910/DVN/GTEGPE](https://doi.org/10.7910/DVN/GTEGPE). To limit the use of computational resources, we refer the reader to UK Biobank in sample LD (337K unrelated British ancestry samples) from ref. 36, which is publicly available at [https://registry.opendata.aws/ukbb-ld/](https://registry.opendata.aws/ukbb-ld/). The UK Biobank resource is publicly available via application ([http://www.ukbiobank.ac.uk/](http://www.ukbiobank.ac.uk/)). ## Data availability We have made TGFM PIPs for gene-tissue pairs, gene-PBMC cell type pairs, genes, and non-mediated variants across 45 diseases/traits (for both analyses of 38 GTEx tissues + analyses of 38 GTEx tissues and 9 PBMC cell types) publicly available at [https://doi.org/10.7910/DVN/S26PFI](https://doi.org/10.7910/DVN/S26PFI), GTEx cis-predicted expression models for all gene-tissue pairs publicly available at [https://doi.org/10.7910/DVN/8IPOPK](https://doi.org/10.7910/DVN/8IPOPK), PBMC cis-predicted expression models for all gene-PBMC cell type pairs publicly available at [https://doi.org/10.7910/DVN/A6K9QW](https://doi.org/10.7910/DVN/A6K9QW), GWAS summary statistics for all 45 diseases/traits publicly available at [https://doi.org/10.7910/DVN/GTEGPE](https://doi.org/10.7910/DVN/GTEGPE). To limit the use of computational resources, we refer the reader to UK Biobank in-sample LD (337K unrelated British-ancestry samples) from ref. 36, which is publicly available at [https://registry.opendata.aws/ukbb-ld/](https://registry.opendata.aws/ukbb-ld/). The UK Biobank resource is publicly available via application ([http://www.ukbiobank.ac.uk/](http://www.ukbiobank.ac.uk/)). ## Code availability Software implementing TGFM will be made publicly available at [https://github.com/BennyStrobes](https://github.com/BennyStrobes) prior to publication. ## Acknowledgements We are grateful to Arun Durvasula and Xilin Jiang for helpful discussions. This research was conducted using the UK Biobank resource under application no. 16549 and funded by National Institutes of Health (NIH) grants R01 MH101244, R37 MH107649, R01 HG006399, R01 MH115676, U01 HG012009, and F32 HG012889. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. ## Footnotes * New supplementary figure (Supplementary Figure 18) and Supplementary Note now added to describe 6 additional examples of fine-mapped disease loci * Received November 1, 2023. * Revision received November 8, 2023. * Accepted November 8, 2023. * © 2023, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at [http://creativecommons.org/licenses/by-nd/4.0/](http://creativecommons.org/licenses/by-nd/4.0/) ## References 1. 1.Hekselman, I. & Yeger-Lotem, E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat. Rev. Genet. 21, 137–150 (2020). [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31913361&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 2. 2.Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2504&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23263488&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 3. 3.Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3404&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26414678&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 4. 4.Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature14248&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25693563&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 5. 5.Ongen, H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 49, 1676–1683 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3981&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29058715&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 6. 6.Calderon, D. et al. Inferring relevant cell types for complex traits by using single-cell gene expression. Am. J. Hum. Genet. 101, 686–699 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2017.09.009&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29106824&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 7. 7.Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0081-4&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29632380&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 8. 8.Shang, L., Smith, J. A. & Zhou, X. Leveraging gene co-expression patterns to infer trait-relevant tissues in genome-wide association studies. PLoS Genet. 16, e1008734 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1008734&link_type=DOI) 9. 9.Zhang, M. J. et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nat. Genet. 54, 1572–1580 (2022). 10. 10.Jagadeesh, K. A. et al. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat. Genet. 54, 1479–1492 (2022). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-022-01187-9&link_type=DOI) 11. 11.Amariuta, T., Siewert-Rocks, K. & Price, A. L. Modeling tissue co-regulation estimates tissue-specific contributions to disease. Nat. Genet. 55, 1503–1511 (2023). 12. 12.Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-019-0538-0&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31784727&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 13. 13.Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021). 14. 14.Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021). 15. 15.Downes, D. J. et al. Identification of LZTFL1 as a candidate effector gene at a COVID-19 risk locus. Nat. Genet. 53, 1606–1615 (2021). 16. 16.Kosoy, R. et al. Genetics of the human microglia regulome refines Alzheimer’s disease risk loci. Nat. Genet. 54, 1145–1154 (2022). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-022-01149-1&link_type=DOI) 17. 17.Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1004383&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24830394&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 18. 18.Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2016.10.003&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27866706&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 19. 19.Arvanitis, M., Tayeb, K., Strober, B. J. & Battle, A. Redefining tissue specificity of genetic regulation of gene expression in the presence of allelic heterogeneity. Am. J. Hum. Genet. 109, 223–239 (2022). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2022.01.002&link_type=DOI) 20. 20.Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3367&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26258848&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 21. 21.Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3506&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26854917&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 22. 22.Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-019-0385-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 23. 23.Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41576-018-0016-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29844615&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 24. 24.Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51, 675–682 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-019-0367-1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30926970&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 25. 25.The GTEx Consortium et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNjkvNjUwOS8xMzE4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMTEvMDgvMjAyMy4xMS4wMS4yMzI5NzkwOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 26. 26.Siewert-Rocks, K. M., Kim, S. S., Yao, D. W., Shi, H. & Price, A. L. Leveraging gene co-regulation to identify gene sets enriched for disease heritability. Am. J. Hum. Genet. 109, 393–404 (2022). 27. 27.Zhao, S. et al. Adjusting for genetic confounders in transcriptome-wide association studies leads to reliable detection of causal genes. bioRxiv 2022.09.27.509700 (2022) doi:10.1101/2022.09.27.509700. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMi4wOS4yNy41MDk3MDB2MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIzLzExLzA4LzIwMjMuMTEuMDEuMjMyOTc5MDkuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 28. 28.Yao, D. W., O’Connor, L. J., Price, A. L. & Gusev, A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 52, 626–633 (2020). 29. 29.Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btw018&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26773131&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 30. 30.Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020). 31. 31.Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the “Sum of Single Effects” model. PLoS Genet. 18, e1010299 (2022). 32. 32.Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-018-0579-z&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30305743&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 33. 33.Perez, R. K. et al. Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus. Science 376, eabf1970 (2022). 34. 34.Abdellaoui, A., Yengo, L., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: Realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/J.AJHG.2022.12.011&link_type=DOI) 35. 35.Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrg.2016.142&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 36. 36.Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020). 37. 37.O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/J.AJHG.2019.07.003&link_type=DOI) 38. 38.Gazal, S. et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3954&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28892061&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 39. 39.Gazal, S. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600–1607 (2018). 40. 40.Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-019-0464-1&link_type=DOI) 41. 41.Cui, R. et al. Improving fine-mapping by modeling infinitesimal effects. bioRxiv (2022) doi:10.1101/2022.10.21.513123. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMi4xMC4yMS41MTMxMjN2MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIzLzExLzA4LzIwMjMuMTEuMDEuMjMyOTc5MDkuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 42. 42.Mancuso, N. et al. Large-scale transcriptome-wide association study identifies new prostate cancer risk regions. Nat. Commun. 9, 1–11 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-018-02974-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29317637&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 43. 43.Emdin, C. A. et al. Genetic association of waist-to-hip ratio with cardiometabolic traits, type 2 diabetes, and coronary heart disease. JAMA 317, 626 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2016.21042&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28196256&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 44. 44.Weeks, E. M. et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Nat. Genet. 55, 1267–1276 (2023). 45. 45.Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature19057&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27535533&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000381804900026&link_type=ISI) 46. 46.Boukas, L. et al. Coexpression patterns define epigenetic regulators associated with neurological dysfunction. Genome Res. 29, 532–542 (2019). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjg6IjI5LzQvNTMyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMTEvMDgvMjAyMy4xMS4wMS4yMzI5NzkwOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 47. 47.Georgi, B., Voight, B. F. & Bućan, M. From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 9, e1003484 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1003484&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23675308&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 48. 48.Bikker, H. et al. A 20-basepair duplication in the human thyroid peroxidase gene results in a total iodide organification defect and congenital hypothyroidism. J. Clin. Endocrinol. Metab. 79, 248–252 (1994). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1210/jc.79.1.248&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=8027236&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1994NW50900045&link_type=ISI) 49. 49.Bakker, B. et al. Two decades of screening for congenital hypothyroidism in The Netherlands: TPO gene mutations in total iodide organification defects (an update). J. Clin. Endocrinol. Metab. 85, 3708–3712 (2000). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1210/jc.85.10.3708&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11061528&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000089820800042&link_type=ISI) 50. 50.Hwangbo, Y. & Park, Y. J. Genome-wide association studies of autoimmune thyroid diseases, thyroid function, and thyroid cancer. Endocrinol. Metab. (Seoul) 33, 175 (2018). 51. 51.Furue, K. et al. The IL -13– OVOL 1– FLG axis in atopic dermatitis. Immunology 158, 281–286 (2019). 52. 52.Sun, P. et al. OVOL1 regulates psoriasis-like skin inflammation and epidermal hyperplasia. J. Invest. Dermatol. 141, 1542–1552 (2021). 53. 53.Dragan, M. et al. Ovol1/2 loss-induced epidermal defects elicit skin immune activation and alter global metabolism. EMBO Rep. 24, (2023). 54. 54.the EArly Genetics and Lifecourse Epidemiology (EAGLE) Eczema Consortium. Multi-ancestry genome-wide association study of 21,000 cases and 95,000 controls identifies new risk loci for atopic dermatitis. Nat. Genet. 47, 1449–1456 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3424&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26482879&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 55. 55.Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs? Trends Genet. 37, 109–124 (2021). 56. 56.Nachat, R. et al. Peptidylarginine deiminase isoforms are differentially expressed in the anagen hair follicles and other human skin appendages. J. Invest. Dermatol. 125, 34–41 (2005). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.0022-202X.2005.23763.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15982300&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 57. 57.Zhang, X. et al. Peptidylarginine deiminase 1-catalyzed histone citrullination is essential for early embryo development. Sci. Rep. 6, 1–11 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41598-016-0001-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27503664&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 58. 58.Revez, J. A. et al. Genome-wide association study identifies 143 loci associated with 25 hydroxyvitamin D concentration. Nat. Commun. 11, 1647 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-020-15421-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 59. 59.Manousaki, D. et al. Genome-wide association study for vitamin D levels reveals 69 independent loci. Am. J. Hum. Genet. 106, 327–337 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2020.01.017&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 60. 60.Maier, T. & Hoyer, J. Modulation of blood pressure by central melanocortinergic pathways. Nephrol. Dial. Transplant 25, 674–677 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ndt/gfp644&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20061320&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 61. 61.da Silva, A. A., do Carmo, J. M., Wang, Z. & Hall, J. E. The brain melanocortin system, sympathetic control, and obesity hypertension. Physiology (Bethesda) 29, 196–202 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1152/physiol.00061.2013&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24789984&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 62. 62.do Carmo, J. M. et al. Role of the brain melanocortins in blood pressure regulation. Biochim. Biophys. Acta Mol. Basis Dis. 1863, 2508–2514 (2017). 63. 63.Wang, Y. & Wang, J.-G. Genome-wide association studies of hypertension and several other cardiovascular diseases. Pulse (Basel) 6, 169–186 (2019). 64. 64.Jacob, J., Chopra, S. & Baby, C. Neuro-endocrine regulation of blood pressure. Indian J. Endocrinol. Metab. 15, 281 (2011). 65. 65.Figueira, L. & Israel, A. Role of cerebellar adrenomedullin in blood pressure regulation. Neuropeptides 54, 59–66 (2015). 66. 66.de Oliveira, D. F., de Lemos, R. R. & de Oliveira, J. R. M. Mutations at the SLC20A2 gene and brain resilience in families with idiopathic basal ganglia calcification (“Fahr’s disease”). Front. Hum. Neurosci. 7, 420 (2013). 67. 67.Inden, M., Kurita, H. & Hozumi, I. Characteristics and therapeutic potential of sodium-dependent phosphate cotransporters in relation to idiopathic basal ganglia calcification. J. Pharmacol. Sci. 148, 152–155 (2022). 68. 68.Wallingford, M. C. et al. SLC20A2 deficiency in mice leads to elevated phosphate levels in cerbrospinal fluid and glymphatic pathway-associated arteriolar calcification, and recapitulates human idiopathic basal ganglia calcification. Brain Pathol. 27, 64–76 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/bpa.12362&link_type=DOI) 69. 69.Sargurupremraj, M. et al. Cerebral small vessel disease genomics and its implications across the lifespan. Nat. Commun. 11, 1–18 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-019-13889-6&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31911652&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 70. 70.Zhao, B. et al. Transcriptome-wide association analysis of brain structures yields insights into pleiotropy with complex neuropsychiatric traits. Nat. Commun. 12, (2021). 71. 71.Herting, M. M. & Sowell, E. R. Puberty and structural brain development in humans. Front. Neuroendocrinol. 44, 122–137 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.yfrne.2016.12.003&link_type=DOI) 72. 72.Kane, L. & Ismail, N. Puberty as a vulnerable period to the effects of immune challenges: Focus on sex differences. Behav. Brain Res. 320, 374–382 (2017). 73. 73.Resztak, J. A. et al. Analysis of transcriptional changes in the immune system associated with pubertal development in a longitudinal cohort of children with asthma. Nat. Commun. 14, 1–14 (2023). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-022-35677-5&link_type=DOI) 74. 74.Alberts, B. et al. Lymphocytes and the cellular basis of adaptive immunity. (Garland Science, 2002). 75. 75.Lloyd, C. M. & Hessel, E. M. Functions of T cells in asthma: more than just TH2 cells. Nat. Rev. Immunol. 10, 838–848 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nri2870&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21060320&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000284601100010&link_type=ISI) 76. 76.Walker, L. S. K. & Sansom, D. M. The emerging role of CTLA4 as a cell-extrinsic regulator of T cell responses. Nat. Rev. Immunol. 11, 852–863 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nri3108&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22116087&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 77. 77.Oyewole-Said, D. et al. Beyond T-cells: Functional characterization of CTLA-4 expression in immune and non-immune cell types. Front. Immunol. 11, (2020). 78. 78.Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506–511 (2003). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature01621&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12724780&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000183162900035&link_type=ISI) 79. 79.Wang, K. et al. CTLA-4 +49 G/A polymorphism confers autoimmune disease risk: An updated meta-analysis. Genet. Test. Mol. Biomarkers 21, 222–227 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1089/gtmb.2016.0335&link_type=DOI) 80. 80.Bandala-Sanchez, E. et al. T cell regulation mediated by interaction of soluble CD52 with the inhibitory receptor Siglec-10. Nat. Immunol. 14, 741–748 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ni.2610&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23685786&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 81. 81.Rashidi, M. et al. CD52 inhibits Toll-like receptor activation of NF-κB and triggers apoptosis to suppress inflammation. Cell Death Differ. 25, 392–405 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/cdd.2017.173&link_type=DOI) 82. 82.Bhamidipati, K. et al. CD52 is elevated on B cells of SLE patients and regulates B cell function. Front. Immunol. 11, (2021). 83. 83.Zhou, M. et al. Krüppel-like transcription factor 13 regulates T lymphocyte survival in vivo. J. Immunol. 178, 5496–5504 (2007). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiamltbXVub2wiO3M6NToicmVzaWQiO3M6MTA6IjE3OC85LzU0OTYiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMy8xMS8wOC8yMDIzLjExLjAxLjIzMjk3OTA5LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 84. 84.Outram, S. V. et al. KLF13 influences multiple stages of both B and T cell development. Cell Cycle 7, 2047–2055 (2008). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.4161/cc.7.13.6234&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18604172&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000257457200023&link_type=ISI) 85. 85.Valdés-Ferrer, S. I. et al. HMGB1 mediates anemia of inflammation in Murine sepsis survivors. Mol. Med. 21, 951–958 (2015). 86. 86.Dulmovits, B. M. et al. HMGB1-mediated restriction of EPO signaling contributes to anemia of inflammation. Blood 139, 3181–3193 (2022). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1182/blood.2021012048&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35040907&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 87. 87.Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2017.08.012&link_type=DOI) 88. 88.Cuomo, A. S. E., Nathan, A., Raychaudhuri, S., MacArthur, D. G. & Powell, J. E. Single-cell genomics meets human genetics. Nat. Rev. Genet. 24, 535–549 (2023). 89. 89.Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/0022146515594631.Marriage&link_type=DOI) 90. 90.Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat. Genet. (2023) doi:10.1038/s41588-023-01529-1. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-023-01529-1&link_type=DOI) 91. 91.Jansen, R. et al. Conditional eQTL analysis reveals allelic heterogeneity of gene expression. Hum. Mol. Genet. 26, 1444–1451 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/hmg/ddx043&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 92. 92.Liu, C. et al. Whole genome DNA and RNA sequencing of whole blood elucidates the genetic architecture of gene expression underlying a wide range of diseases. Sci. Rep. 12, 1–11 (2022). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41598-021-04080-3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 93. 93.Zhang, Y., Qi, G., Park, J.-H. & Chatterjee, N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 50, 1318–1326 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-018-0193-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30104760&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 94. 94.O’Connor, L. J. The distribution of common-variant effect sizes. Nat. Genet. 53, 1243–1249 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-021-00901-3&link_type=DOI) 95. 95.Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022). 96. 96.Bhattacharya, A. et al. Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: Lessons from the Global Biobank Metaanalysis Initiative. Cell Genom. 2, 100180 (2022). 97. 97.Kachuri, L. et al. Gene expression in African Americans, Puerto Ricans and Mexican Americans reveals ancestry-specific patterns of genetic architecture. Nat. Genet. 55, 952–963 (2023). 98. 98.Zaitlen, N., Paşaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2009.11.016&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20085711&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 99. 99.Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2015.06.007&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26189819&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 100.100.Lu, Z. et al. Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies. Am. J. Hum. Genet. 109, 1388–1404 (2022). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2022.07.002&link_type=DOI) 101.101.Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1004722&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25357204&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 102.102.McVicker, G. et al. Identification of genetic variants that affect histone modifications in human cells. Science 342, 747–749 (2013). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNDIvNjE1OS83NDciO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMy8xMS8wOC8yMDIzLjExLjAxLjIzMjk3OTA5LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 103.103.Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNTIvNjI4NS82MDAiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMy8xMS8wOC8yMDIzLjExLjAxLjIzMjk3OTA5LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 104.104.Xu, Y. et al. An atlas of genetic scores to predict multi-omic traits. Nature 616, 123–131 (2023). 105.105.Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023). 106.106.Hou, L. et al. Multitissue H3K27ac profiling of GTEx samples links epigenomic variation to disease. Nat. Genet. 55, 1665–1676 (2023). 107.107.Oliva, M. et al. DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits. Nat. Genet. 55, 112–122 (2023). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-022-01248-z&link_type=DOI) 108.108.Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1003264&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23408905&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 109.109.Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 1–11 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-019-09078-0&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30602773&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 110.110.Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2021). 111.111.Marriott, P., Efron, B. & Tibshirani, R. J. An introduction to the bootstrap. J. R. Stat. Soc. Ser. A Stat. Soc. 158, 347 (1995). 112.112.Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.2517-6161.1995.tb02031.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24443148&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 113.113.Yuan, K. et al. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. bioRxiv 2023.01.07.23284293 (2023) doi:10.1101/2023.01.07.23284293. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1101/2023.01.07.23284293&link_type=DOI) 114.114.Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3190&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25642633&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) 115.115.Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjg6IjEyLzYvOTk2IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjMvMTEvMDgvMjAyMy4xMS4wMS4yMzI5NzkwOS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 116.116.Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btp616&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19910308&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F11%2F08%2F2023.11.01.23297909.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000273116100025&link_type=ISI) [1]: /embed/graphic-1.gif [2]: /embed/graphic-9.gif [3]: /embed/graphic-10.gif [4]: /embed/inline-graphic-1.gif [5]: /embed/inline-graphic-2.gif [6]: /embed/graphic-11.gif [7]: /embed/graphic-12.gif [8]: /embed/graphic-13.gif [9]: /embed/inline-graphic-3.gif [10]: /embed/inline-graphic-4.gif [11]: /embed/graphic-14.gif [12]: /embed/inline-graphic-5.gif [13]: /embed/graphic-15.gif [14]: /embed/graphic-16.gif [15]: /embed/graphic-17.gif [16]: /embed/inline-graphic-6.gif [17]: /embed/graphic-18.gif [18]: /embed/graphic-19.gif [19]: /embed/graphic-20.gif [20]: /embed/inline-graphic-7.gif [21]: /embed/inline-graphic-8.gif [22]: /embed/inline-graphic-9.gif [23]: /embed/inline-graphic-10.gif [24]: /embed/inline-graphic-11.gif [25]: /embed/inline-graphic-12.gif [26]: /embed/inline-graphic-13.gif [27]: /embed/graphic-21.gif [28]: /embed/inline-graphic-14.gif [29]: /embed/inline-graphic-15.gif [30]: /embed/inline-graphic-16.gif [31]: /embed/inline-graphic-17.gif [32]: /embed/graphic-22.gif [33]: /embed/inline-graphic-18.gif [34]: /embed/graphic-23.gif [35]: /embed/inline-graphic-19.gif [36]: /embed/inline-graphic-20.gif [37]: /embed/graphic-24.gif [38]: /embed/inline-graphic-21.gif [39]: /embed/inline-graphic-22.gif [40]: /embed/graphic-25.gif [41]: /embed/inline-graphic-23.gif [42]: /embed/inline-graphic-24.gif [43]: /embed/inline-graphic-25.gif [44]: /embed/inline-graphic-26.gif [45]: /embed/inline-graphic-27.gif [46]: /embed/inline-graphic-28.gif [47]: /embed/inline-graphic-29.gif [48]: /embed/inline-graphic-30.gif [49]: /embed/inline-graphic-31.gif [50]: /embed/inline-graphic-32.gif [51]: /embed/graphic-26.gif [52]: /embed/graphic-27.gif [53]: /embed/graphic-28.gif [54]: /embed/inline-graphic-33.gif [55]: /embed/graphic-29.gif [56]: /embed/inline-graphic-34.gif [57]: /embed/inline-graphic-35.gif [58]: /embed/inline-graphic-36.gif [59]: /embed/inline-graphic-37.gif [60]: /embed/inline-graphic-38.gif [61]: /embed/inline-graphic-39.gif [62]: /embed/inline-graphic-40.gif [63]: /embed/inline-graphic-41.gif [64]: /embed/inline-graphic-42.gif [65]: /embed/inline-graphic-43.gif [66]: /embed/inline-graphic-44.gif [67]: /embed/inline-graphic-45.gif [68]: /embed/graphic-30.gif [69]: /embed/graphic-31.gif [70]: /embed/inline-graphic-46.gif [71]: /embed/inline-graphic-47.gif [72]: /embed/inline-graphic-48.gif