A general framework for estimating the relative pathogenicity of human genetic variants

Kircher, Martin; Witten, Daniela M; Jain, Preti; O'Roak, Brian J; Cooper, Gregory M; Shendure, Jay

doi:10.1038/ng.2892

Technical Report
Published: 02 February 2014

A general framework for estimating the relative pathogenicity of human genetic variants

Nature Genetics volume 46, pages 310–315 (2014)Cite this article

74k Accesses
203 Altmetric
Metrics details

Subjects

Abstract

Current methods for annotating and interpreting human genetic variation tend to exploit a single information type (for example, conservation) and/or are restricted in scope (for example, to missense changes). Here we describe Combined Annotation–Dependent Depletion (CADD), a method for objectively integrating many diverse annotations into a single measure (C score) for each variant. We implement CADD as a support vector machine trained to differentiate 14.7 million high-frequency human-derived alleles from 14.7 million simulated variants. We precompute C scores for all 8.6 billion possible human single-nucleotide variants and enable scoring of short insertions-deletions. C scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects and complex trait associations, and they highly rank known pathogenic variants within individual genomes. The ability of CADD to prioritize functional, deleterious and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current single-annotation method.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Relationship of scaled C scores and categorical variant consequences.**

**Figure 2: Relationship between scaled C scores and genetic variation.**

**Figure 3: Sensitivity of methods in distinguishing pathogenic and benign variants.**

**Figure 4: Ranking of pathogenic ClinVar variants among the variants identified by whole-genome sequencing in 11 human individuals from diverse populations.**

**Figure 5: C scores for GWAS SNPs are higher than for nearby control SNPs and are dependent on study sample size.**

Inferring the molecular and phenotypic impact of amino acid variants with MutPred2

Article Open access 20 November 2020

The status of the human gene catalogue

Article 04 October 2023

Understanding the heterogeneous performance of variant effect predictors across human protein-coding genes

Article Open access 30 October 2024

References

Cooper, G.M. et al. Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nat. Methods 7, 250–251 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cooper, G.M. & Shendure, J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat. Rev. Genet. 12, 628–640 (2011).
Article CAS PubMed Google Scholar
Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ward, L.D. & Kellis, M. Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 30, 1095–1106 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ng, S.B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).
Article CAS PubMed PubMed Central Google Scholar
Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ng, P.C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).
Article CAS PubMed PubMed Central Google Scholar
Cooper, G.M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).
Article CAS PubMed PubMed Central Google Scholar
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Article CAS PubMed PubMed Central Google Scholar
Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
Article CAS PubMed PubMed Central Google Scholar
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge University Press, Cambridge and New York, 1983).
Paten, B. et al. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 18, 1829–1843 (2008).
Article CAS PubMed PubMed Central Google Scholar
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Paten, B., Herrero, J., Beal, K., Fitzgerald, S. & Birney, E. Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res. 18, 1814–1828 (2008).
Article CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).
Article CAS PubMed PubMed Central Google Scholar
Meyer, L.R. et al. The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res. 41, D64–D69 (2013).
Article CAS PubMed Google Scholar
Boyle, A.P. et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322 (2008).
Article CAS PubMed PubMed Central Google Scholar
Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).
Article CAS PubMed Google Scholar
Grantham, R. Amino acid difference formula to help explain protein evolution. Science 185, 862–864 (1974).
Article CAS PubMed Google Scholar
Franc, V. & Sonnenburg, S. Optimized cutting plane algorithm for large-scale risk minimization. J. Mach. Learn. Res. 10, 2157–2192 (2009).
Google Scholar
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).
Article CAS PubMed Google Scholar
Liao, B.Y. & Zhang, J. Null mutations in human and mouse orthologs frequently result in different phenotypes. Proc. Natl. Acad. Sci. USA 105, 6987–6992 (2008).
Article PubMed PubMed Central Google Scholar
Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).
Article CAS PubMed Google Scholar
Makrythanasis, P. et al. MLL2 mutation detection in 86 patients with Kabuki syndrome: a genotype-phenotype study. Clin. Genet. doi:10.1111/cge.12081 (16 January 2013).
Giardine, B. et al. HbVar database of human hemoglobin variants and thalassemia mutations: 2007 update. Hum. Mutat. 28, 206 (2007).
Article PubMed Google Scholar
Baker, M. One-stop shop for disease genes. Nature 491, 171 (2012).
Article CAS PubMed Google Scholar
Patwardhan, R.P. et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30, 265–270 (2012).
Article CAS PubMed PubMed Central Google Scholar
Patwardhan, R.P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175 (2009).
Article CAS PubMed PubMed Central Google Scholar
O'Roak, B.J. et al. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat. Genet. 43, 585–589 (2011).
Article CAS PubMed PubMed Central Google Scholar
O'Roak, B.J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250 (2012).
Article CAS PubMed PubMed Central Google Scholar
Sanders, S.J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241 (2012).
Article CAS PubMed PubMed Central Google Scholar
Neale, B.M. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012).
Article CAS PubMed PubMed Central Google Scholar
Iossifov, I. et al. De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285–299 (2012).
Article CAS PubMed PubMed Central Google Scholar
Rauch, A. et al. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 380, 1674–1682 (2012).
Article CAS PubMed Google Scholar
de Ligt, J. et al. Diagnostic exome sequencing in persons with severe intellectual disability. N. Engl. J. Med. 367, 1921–1929 (2012).
Article CAS PubMed Google Scholar
Cooper, G.M. et al. A copy number variation morbidity map of developmental delay. Nat. Genet. 43, 838–846 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ng, S.B. et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat. Genet. 42, 790–793 (2010).
Article CAS PubMed PubMed Central Google Scholar
Rohland, N. & Reich, D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 22, 939–946 (2012).
Article CAS PubMed PubMed Central Google Scholar
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Article PubMed PubMed Central Google Scholar
Nicolae, D.L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
Article CAS PubMed PubMed Central Google Scholar
Gerstein, M.B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).
Article CAS PubMed PubMed Central Google Scholar
Schaub, M.A., Boyle, A.P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).
Article CAS PubMed PubMed Central Google Scholar
González-Pérez, A. & Lopez-Bigas, N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am. J. Hum. Genet. 88, 440–449 (2011).
Article CAS PubMed PubMed Central Google Scholar
Arbiza, L. et al. Genome-wide inference of natural selection on human transcription factor binding sites. Nat. Genet. 45, 723–729 (2013).
Article CAS PubMed PubMed Central Google Scholar
Weedon, M.N. et al. Recessive mutations in a distal PTF1A enhancer cause isolated pancreatic agenesis. Nat. Genet. 46, 61–64 (2014).
Article CAS PubMed Google Scholar
Stenson, P.D. et al. The Human Gene Mutation Database: 2008 update. Genome Med. 1, 13 (2009).
Article CAS PubMed PubMed Central Google Scholar
MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tavaré, S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci. 17, 57–86 (1986).
Google Scholar
Fujita, P.A. et al. The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 39, D876–D882 (2011).
Article CAS PubMed Google Scholar
Rosenbloom, K.R. et al. ENCODE whole-genome data in the UCSC Genome Browser: update 2012. Nucleic Acids Res. 40, D912–D917 (2012).
Article CAS PubMed Google Scholar
Hubisz, M.J., Pollard, K.S. & Siepel, A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief. Bioinform. 12, 41–51 (2011).
Article CAS PubMed Google Scholar
Davydov, E.V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP. PLoS Comput. Biol. 6, e1001025 (2010).
Article CAS PubMed PubMed Central Google Scholar
McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009).
Article CAS PubMed PubMed Central Google Scholar
Hoffman, M.M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Article CAS PubMed PubMed Central Google Scholar
Khurana, E., Fu, Y., Chen, J. & Gerstein, M. Interpretation of genomic variants using a unified biological network approach. PLoS Comput. Biol. 9, e1002886 (2013).
Article CAS PubMed PubMed Central Google Scholar
Liu, X., Jian, X. & Boerwinkle, E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat. 32, 894–899 (2011).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank P. Green and members of the Shendure laboratory for helpful discussions and suggestions. Our work was supported by US NIH grants U54HG006493 (to J.S. and G.M.C.), DP5OD009145 (to D.M.W.) and DP1HG007811 (to J.S.).

Author information

Preti Jain & Brian J O'Roak
Present address: Present address: Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, Oregon, USA.,
Martin Kircher and Daniela M Witten: These authors contributed equally to this work.

Authors and Affiliations

Department of Genome Sciences, University of Washington, Seattle, Washington, USA
Martin Kircher, Brian J O'Roak & Jay Shendure
Department of Biostatistics, University of Washington, Seattle, Washington, USA
Daniela M Witten
HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA
Preti Jain & Gregory M Cooper

Authors

Martin Kircher
View author publications
You can also search for this author inPubMed Google Scholar
Daniela M Witten
View author publications
You can also search for this author inPubMed Google Scholar
Preti Jain
View author publications
You can also search for this author inPubMed Google Scholar
Brian J O'Roak
View author publications
You can also search for this author inPubMed Google Scholar
Gregory M Cooper
View author publications
You can also search for this author inPubMed Google Scholar
Jay Shendure
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

G.M.C. and J.S. designed the study. M.K. processed the annotation data and scores and developed and implemented the simulator and scripts required for scoring. P.J. and B.J.O. prepared and provided data sets and annotations. D.M.W. and M.K. developed the model and performed model training. D.M.W. performed the analysis of individual features and interactions. M.K., D.M.W., G.M.C. and J.S. analyzed the model's performance on different data sets. G.M.C. analyzed the GWAS data. J.S., G.M.C., M.K. and D.M.W. wrote the manuscript with input from all authors.

Corresponding authors

Correspondence to Gregory M Cooper or Jay Shendure.

Ethics declarations

Competing interests

The authors (M.K., D.M.W., G.M.C. and J.S.) have filed a provisional patent application with the US Patent and Trademark Office on the basis of CADD.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–18, Supplementary Tables 1–12 and Supplementary Note (PDF 4022 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kircher, M., Witten, D., Jain, P. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46, 310–315 (2014). https://doi.org/10.1038/ng.2892

Download citation

Received: 13 July 2013
Accepted: 13 January 2014
Published: 02 February 2014
Issue Date: March 2014
DOI: https://doi.org/10.1038/ng.2892

A general framework for estimating the relative pathogenicity of human genetic variants

Subjects

Abstract

Access options

Similar content being viewed by others

Inferring the molecular and phenotypic impact of amino acid variants with MutPred2

The status of the human gene catalogue

Understanding the heterogeneous performance of variant effect predictors across human protein-coding genes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Supplementary information

Supplementary Text and Figures

Source data

Source data to Fig. 1

Source data to Fig. 2

Source data to Fig. 3

Source data to Fig. 4

Source data to Fig. 5

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links