Abstract
Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual's genetic profile and correlates 'imputed' gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys the benefits of gene-based approaches such as reduced multiple-testing burden and a principled approach to the design of follow-up experiments. Our results demonstrate that PrediXcan can detect known and new genes associated with disease traits and provide insights into the mechanism of these associations.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout







Similar content being viewed by others
References
Spencer, C.C., Su, Z., Donnelly, P. & Marchini, J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 5, e1000477 (2009).
Speliotes, E.K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Perera, M.A. et al. The missing association: sequencing-based discovery of novel SNPs in VKORC1 and CYP2C9 that affect warfarin dose in African Americans. Clin. Pharmacol. Ther. 89, 408–415 (2011).
Ritchie, M.D. The success of pharmacogenomics in moving genetic association studies from bench to bedside: study design and implementation of precision medicine in the post-GWAS era. Hum. Genet. 131, 1615–1626 (2012).
Smemo, S. et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature 507, 371–375 (2014).
Nicolae, D.L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
Gamazon, E.R., Huang, R.S., Cox, N.J. & Dolan, M.E. Chemotherapeutic drug susceptibility associated SNPs are enriched in expression quantitative trait loci. Proc. Natl. Acad. Sci. USA 107, 9287–9292 (2010).
Davis, L.K. et al. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genet. 9, e1003864 (2013).
Gamazon, E.R. et al. The convergence of eQTL mapping, heritability estimation and polygenic modeling: emerging spectrum of risk variation in bipolar disorder. arXiv 1303.6227 (2013).
Gusev, A. et al. Regulatory variants explain much more heritability than coding variants across 11 common diseases. bioRxiv 004309 (21 April 2014).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multi-tissue gene regulation in humans. Science 348, 648–660 (2015).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).
Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014).
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc., B 58, 267–288 (1996).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B Stat. Methodol. 67, 301–320 (2005).
Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Hammer, G.E., Kanaseki, T. & Shastri, N. The final touches make perfect the peptide–MHC class I repertoire. Immunity 26, 397–406 (2007).
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Cotsapas, C. et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 7, e1002254 (2011).
Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).
Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Noble, J.A. & Valdes, A.M. Genetics of the HLA region in the prediction of type 1 diabetes. Curr. Diab. Rep. 11, 533–542 (2011).
Hakonarson, H. et al. A novel susceptibility locus for type 1 diabetes on Chr12q13 identified by a genome-wide association study. Diabetes 57, 1143–1146 (2008).
Wang, H. et al. Genetically dependent ERBB3 expression modulates antigen presenting cell function and type 1 diabetes risk. PLoS ONE 5, e11789 (2010).
Hart, A.B. et al. Genome-wide association study of d-amphetamine response in healthy volunteers identifies putative associations, including cadherin 13 (CDH13). PLoS ONE 7, e42646 (2012).
Hart, A.B. et al. Genetic variation associated with euphorigenic effects of d-amphetamine is associated with diminished risk for schizophrenia and attention deficit hyperactivity disorder. Proc. Natl. Acad. Sci. USA 111, 5968–5973 (2014).
Psychiatric GWAS Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat. Genet. 43, 977–983 (2011).
Morley, M. et al. Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743–747 (2004).
Price, A.L. et al. Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals. PLoS Genet. 7, e1001317 (2011).
Gilad, Y., Rifkin, S.A. & Pritchard, J.K. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet. 24, 408–415 (2008).
Cookson, W., Liang, L., Abecasis, G., Moffatt, M. & Lathrop, M. Mapping complex disease traits with global gene expression. Nat. Rev. Genet. 10, 184–194 (2009).
Manor, O. & Segal, E. Robust prediction of expression differences among human individuals using only genotype information. PLoS Genet. 9, e1003396 (2013).
Torres, J.M. et al. Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. Am. J. Hum. Genet. 95, 521–534 (2014).
Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Fuchsberger, C., Abecasis, G.R. & Hinds, D.A. minimac2: faster genotype imputation. Bioinformatics 31, 782–784 (2015).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Hastie, T., Tibshirani, R. & Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, 2009).
Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001).
Wheeler, H.E. et al. Poly-omic prediction of complex traits: OmicKriging. Genet. Epidemiol. 38, 402–415 (2014).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Shabalin, A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).
Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat. Genet. 42, 1118–1125 (2010).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Liu, J.Z. et al. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87, 139–145 (2010).
Wu, M.C. et al. Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet. 86, 929–942 (2010).
Wu, M.C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).
Carroll, R.J., Eyler, A.E. & Denny, J.C. Naive Electronic Health Record phenotype identification for rheumatoid arthritis. AMIA Annu. Symp. Proc. 2011, 189–196 (2011).
Acknowledgements
We thank A. Konkashbaev and C. Fuchsberger for outstanding technical support and N. Knoblauch for assistance in performing the quality control pipeline. We acknowledge the following US National Institutes of Health grants: K12 CA139160 (H.K.I.), T32 MH020065 (K.P.S.), F32 CA165823 (H.E.W.), R01 MH101820 and R01 MH090937 (GTEx), P30 DK20595 and P60 DK20595 (Diabetes Research and Training Center), P50 DA037844 (Rat Genomics), UO1 GM61393 (Pharmacogenomics of Anticancer Agents Research), P50 MH094267 (Conte), U01 GM092691 (J.C.D.) and U19 HL065962 (PGRN Statistical Analysis Resource). Additional acknowledgments can be found in the Supplementary Note.
Author information
Authors and Affiliations
Consortia
Contributions
H.K.I., H.E.W., E.R.G., K.P.S., S.V.M. and K.A.-M. performed the analyses. J.C.D., R.J.C. and A.E.E. provided replication data. E.R.G., H.E.W., K.P.S. and H.K.I. wrote the manuscript. D.L.N., N.J.C. and H.K.I. provided intellectual input and supervised the study. H.K.I. designed the study. All authors reviewed and contributed to the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Additional information
A full list of members and affiliations appears in the Supplementary Note.
Integrated supplementary information
Supplementary Figure 1 Comparison of tenfold cross-validated predictive performance between all tested methods (LASSO, elastic net with α = 0.5, top SNP, polygenic score at several P-value thresholds) in the DGN whole-blood cohort.
Predictive performance was measured by the R2 value between predicted (GReX) and observed expression.
Supplementary Figure 2 Comparison of tenfold cross-validated predictive performance of elastic net in different starting SNP sets (4.6 million 1000 Genomes Project (TGP) SNPs, 1.9 million HapMap Phase 2 SNPs, 331,800 WTCCC genotyped SNPs) in the DGN whole-blood cohort.
Predictive performance was measured by the R2 value between predicted (GReX) and observed expression.
Supplementary Figure 3 Comparison of predicted levels of expression with observed levels from nine tissues of the GTEx pilot project.
The observed squared correlation between predicted and observed gene expression levels, R2, is plotted against the null distribution of R2.
Supplementary Figure 4 Comparison of prediction performance between local- and distal-based prediction models.
Using whole-blood prediction models trained in DGN, we compared predicted levels of expression with observed levels in GTEx whole blood. Local predictors were generated using elastic net on SNPs within 1 Mb of each gene, and distal predictors included any trans eQTLs outside this region with linear regression P < 1 × 10–5. The observed (y axis) squared correlation between predicted and observed gene expression levels, R2, is plotted against the null distribution of R2 (x axis).
Supplementary Figure 5 Quantile-quantile plot of the association P values from the PrediXcan analysis of 6 remaining WTCCC diseases using expression levels imputed from DGN whole blood.
The red line in each panel shows the null expected distribution of P values, and the blue line represents the Bonferroni-corrected genome-wide significance threshold. For each disease, the top three genes that exceed the Bonferroni significance threshold are labeled. The diseases shown are (a) rheumatoid arthritis, (b) Crohn's disease, (c) bipolar disorder, (d) coronary artery disease, (e) hypertension and (f) type 2 diabetes.
Supplementary Figure 6 Plot of the association P values based on genomic position from the PrediXcan analysis of six remaining WTCCC diseases using expression levels imputed from DGN whole blood.
The blue line in each panel represents the Bonferroni-corrected genome-wide significance threshold. For each disease, the top three genes that exceed the Bonferroni significance threshold are labeled. The diseases shown are (a) rheumatoid arthritis, (b) Crohn's disease, (c) bipolar disorder, (d) coronary artery disease, (e) hypertension and (f) type 2 diabetes.
Supplementary Figure 7 Enrichment of known disease genes.
Each plot shows the null expected distribution for the number of genes expected to fall below a P-value threshold of 0.01. The null distribution was derived via 10,000 random permutations. The large point on the horizontal axis of each plot shows the observed number of previously known disease genes that fall below the P-value threshold. The diseases shown are (a) rheumatoid arthritis, (b) Crohn's disease, (c) bipolar disorder, (d) coronary artery disease, (e) hypertension and (f) type 2 diabetes.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7 and Supplementary Note. (PDF 1472 kb)
Supplementary Table 1
Supplementary Table 1. (XLSX 69 kb)
Rights and permissions
About this article
Cite this article
Gamazon, E., Wheeler, H., Shah, K. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 47, 1091–1098 (2015). https://doi.org/10.1038/ng.3367
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3367