Abstract
A typical human exome harbors dozens of loss-of-function (LOF) variants1, which can lower disease risk factor levels and affect drug efficacy2. We hypothesized that LOF variants are enriched in genes influencing risk factor levels and the onset of common chronic diseases, such as cardiovascular disease and diabetes. To test this hypothesis, we sequenced the exomes of 8,554 individuals and analyzed the effects of predicted LOF variants on 20 chronic disease risk factor phenotypes. Analysis of this sample as discovery and replication strata of equal size verified two relationships in well-studied genes (PCSK9 and APOC3) and identified eight new loci. Previously unknown relationships included elevated fasting glucose in carriers of heterozygous LOF variation in TXNDC5, which encodes a biomarker for type 1 diabetes progression, and apparent recessive effects of C1QTNF8 on serum magnesium levels. These data demonstrate the utility of functional-variant annotation within a large sample of deeply phenotyped individuals for gene discovery.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout


Similar content being viewed by others
References
MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein coding genes. Science 335, 823–828 (2012).
Phillips, I.R. & Shephard, E.A. Flavin-containing monooxygenases: mutations, disease and drug response. Trends Pharmacol. Sci. 29, 294–301 (2008).
Margaritte, P., Bonaiti-Pellie, C., King, M.C. & Clerget-Darpoux, F. Linkage of familial breast cancer to chromosome 17q21 may not be restricted to early-onset disease. Am. J. Hum. Genet. 50, 1231–1234 (1992).
Liu, X., Jian, X. & Boerwinkle, E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat. 32, 894–899 (2011).
Liu, X., Jian, X. & Boerwinkle, E. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum. Mutat. 34, E2393–E2402 (2013).
Dang, V.T., Kassahn, K.S., Marcos, A.E. & Ragan, M.A. Identification of human haploinsufficient genes and their genomic proximity to segmental duplications. Eur. J. Hum. Genet. 16, 1350–1357 (2008).
Georgi, B., Voight, B.F. & Bućan, M. From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 9, e1003484 (2013).
Li, B. & Leal, S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).
Cohen, J. et al. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat. Genet. 37, 161–165 (2005).
Crosby, J. et al. Loss-of-function mutations in APOC3, triglycerides, and coronary disease. N. Engl. J. Med. 371, 22–31 (2014).
Jin, Y. et al. Risk of type 1 diabetes progression in islet autoantibody-positive children can be further stratified using expression patterns of multiple genes implicated in peripheral blood lymphocyte activation and function. Diabetes 63, 2506–2515 (2014).
Gizer, I.R. et al. Linkage scan of nicotine dependence in the University of California, San Francisco (UCSF) Family Alcoholism Study. Psychol. Med. 41, 799–808 (2011).
Barbaric, I., Miller, G. & Dear, T.N. Appearances can be deceiving: phenotypes of knockout mice. Brief. Funct. Genomic. Proteomic. 6, 91–103 (2007).
Schäffler, A. & Buechler, C. CTRP family: linking immunity to metabolism. Trends Endocrinol. Metab. 23, 194–204 (2012).
Sheridan, C. Phase 3 data for PCSK9 inhibitor wows. Nat. Biotechnol. 31, 1057–1058 (2013).
Morrison, A.C. et al. Whole-genome sequence-based analysis of high-density lipoprotein cholesterol. Nat. Genet. 45, 899–901 (2013).
ARIC Investigators. The Atherosclerosis Risk in Communities (ARIC) study: design and objectives. Am. J. Epidemiol. 129, 687–702 (1989).
Reid, J.G. et al. Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline. BMC Bioinformatics 15, 30 (2014).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
Challis, D. et al. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics 13, 8 (2012).
Grove, M.L. et al. Best practices and joint calling of the HumanExome BeadChip: the CHARGE Consortium. PLoS One 8, e68095 (2013).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Kelso, J. et al. eVOC: a controlled vocabulary for unifying gene expression data. Genome Res. 13, 1222–1230 (2003).
Kamburov, A., Stelzl, U., Lehrach, H. & Herwig, R. The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Res. 41, D793–D800 (2013).
Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Acknowledgements
The Atherosclerosis Risk in Communities (ARIC) study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute (NHLBI) contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C and HHSN268201100012C). We thank the staff and participants of the ARIC study for their important contributions. Funding support for “Building on GWAS for NHLBI-diseases: the U.S. CHARGE Consortium” was provided by the National Institutes of Health through the American Recovery and Reinvestment Act of 2009 (ARRA) (5RC2HL102419). Sequencing was carried out at the Baylor College of Medicine Human Genome Sequencing Center (U54 HG003273).
Author information
Authors and Affiliations
Contributions
A.H.L. carried out variant quality control, annotation and data analysis. X.L. developed the OP ratio and analyses of this metric. A.H.L., A.C.M., L.M.P. and B.Y. did statistical analyses of quantitative traits. C.K., G.M., D.M. and N.V. ensured that high-quality sequence variants were delivered for analyses. L.A.C., J.A.B. and T.L. were involved with study design. T.H.M. coordinated clinical data collection and recruitment. R.A.G. and E.B. provided materials and project oversight. A.H.L., E.B., A.C.M., X.L., B.Y. and L.M.P. prepared the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Site-frequency spectrum of four categories of exome variation.
The relative proportions of these functional categories are shown binned by allele frequency. LCP, low-confidence or partial LOF criteria (Online Methods); Non-syn, nonsynonymous; Syn, synonymous.
Supplementary Figure 2 OP ratio in stably expressed genes.
We used data from the eGenetics database to select the top 5% of genes expressed in the most tissues (“Universally-Expressed”).
Supplementary Figure 3 OP ratio of highly interacting genes.
We used data from the ConsensusPathDB database to select the top 5% of genes with the most protein interactions (“Highly-Interacting”).
Supplementary Figure 4 Number of LOF variants and genes carrying LOF variants with increasing sample size.
For each sample size N, we randomly chose n ARIC individuals and recorded the number of LOF variants and genes harboring them. This process was repeated 1,000 times to calculate the average numbers of LOF variants and genes carrying LOF variants for sample size n.
Supplementary Figure 5 Quantile-quantile plots of P values from T5 associations with 20 phenotypes.
The 95% confidence intervals are depicted, and each circle represents one gene. Phenotype symbols are defined in Supplementary Table 2.
Supplementary Figure 6 Quantile-quantile plots of P values from T5 homozygous associations with 20 phenotypes.
The 95% confidence intervals are depicted, and each circle represents one gene. Phenotype symbols are defined in Supplementary Table 2.
Supplementary Figure 7 Relationship between OP ratio and RVIS for 15,053 genes.
The y-axis depicts the OP ratio, and the x-axis shows RVIS scores as described by Petrovski et al.25. Each circle represents one gene.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7 and Supplementary Tables 1,2 and 5 (PDF 1147 kb)
Supplementary Table 3
Genotype and phenotype associations: T5 burden. This downloadable table presents all associations in which we detected a T5 association reaching nominal statistical significance (P > 0.05) in either the discovery or replication strata. These data are available as a downloadable table. (XLSX 9939 kb)
Supplementary Table 4
Genotype and phenotype associations: T5 burden homozygotes only. This downloadable table presents all associations in which we detected a T5 association reaching nominal statistical significance (P > 0.05) in either the discovery or replication strata. LOF homozygotes were contrasted with LOF heterozygotes and non-LOF samples for phenotype comparison. These data are available as a downloadable table. (XLSX 656 kb)
Supplementary Table 6
List of all loss-of-function sites in ARIC This table includes hg19 coordinates of LOF sites, functional annotation to RefSeq and allele frequency in 8,554 ARIC samples, including the number of homozygous and heterozygous individuals. These data are available as a downloadable table. (CSV 2204 kb)
Rights and permissions
About this article
Cite this article
Li, A., Morrison, A., Kovar, C. et al. Analysis of loss-of-function variants and 20 risk factor phenotypes in 8,554 individuals identifies loci influencing chronic disease. Nat Genet 47, 640–642 (2015). https://doi.org/10.1038/ng.3270
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3270