Abstract
Thousands of genomic segments appear to be present in widely varying copy numbers in different human genomes. We developed ways to use increasingly abundant whole-genome sequence data to identify the copy numbers, alleles and haplotypes present at most large multiallelic CNVs (mCNVs). We analyzed 849 genomes sequenced by the 1000 Genomes Project to identify most large (>5-kb) mCNVs, including 3,878 duplications, of which 1,356 appear to have 3 or more segregating alleles. We find that mCNVs give rise to most human variation in gene dosage—seven times the combined contribution of deletions and biallelic duplications—and that this variation in gene dosage generates abundant variation in gene expression. We describe 'runaway duplication haplotypes' in which genes, including HPR and ORM1, have mutated to high copy number on specific haplotypes. We also describe partially successful initial strategies for analyzing mCNVs via imputation and provide an initial data resource to support such analyses.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
References
Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).
International Schizophrenia Consortium. Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature 455, 237–241 (2008).
Weiss, L.A. et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008).
McCarthy, S.E. et al. Microduplications of 16p11.2 are associated with schizophrenia. Nat. Genet. 41, 1223–1227 (2009).
Bochukova, E.G. et al. Large, rare chromosomal deletions associated with severe early-onset obesity. Nature 463, 666–670 (2010).
Vacic, V. et al. Duplications of the neuropeptide receptor gene VIPR2 confer significant risk for schizophrenia. Nature 471, 499–503 (2011).
Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).
McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).
de Cid, R. et al. Deletion of the late cornified envelope LCE3B and LCE3C genes as a susceptibility factor for psoriasis. Nat. Genet. 41, 211–215 (2009).
McCarroll, S.A. et al. Donor-recipient mismatch for common gene deletion polymorphisms in graft-versus-host disease. Nat. Genet. 41, 1341–1344 (2009).
Willer, C.J. et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 41, 25–34 (2009).
McCarroll, S.A. et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat. Genet. 40, 1107–1112 (2008).
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).
Handsaker, R.E., Korn, J.M., Nemesh, J. & McCarroll, S.A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet. 43, 269–276 (2011).
Hollox, E.J., Armour, J.A. & Barber, J.C. Extensive normal copy number variation of a β-defensin antimicrobial-gene cluster. Am. J. Hum. Genet. 73, 591–600 (2003).
Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).
Lee, C., Iafrate, A.J. & Brothman, A.R. Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nat. Genet. 39, S48–S54 (2007).
Perry, G.H. et al. Diet and the evolution of human amylase gene copy number variation. Nat. Genet. 39, 1256–1260 (2007).
Perry, G.H. et al. The fine-scale and complex architecture of human copy-number variation. Am. J. Hum. Genet. 82, 685–695 (2008).
Gu, W., Zhang, F. & Lupski, J.R. Mechanisms for human genomic rearrangements. Pathogenetics 1, 4 (2008).
Sudmant, P.H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).
Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061–1067 (2009).
Yoon, S., Xuan, Z., Makarov, V., Ye, K. & Sebat, J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 19, 1586–1592 (2009).
Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
Bellos, E., Johnson, M.R. & Coin, L.J. cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data. Genome Biol. 13, R120 (2012).
Koren, A. et al. Genetic variation in human DNA replication timing. Cell 159, 1015–1026 (2014).
Wang, Y., Lu, J., Yu, J., Gibbs, R.A. & Yu, F. An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data. Genome Res. 23, 833–842 (2013).
DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Mills, R.E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).
McCarroll, S.A. & Altshuler, D.M. Copy-number variation and association studies of human disease. Nat. Genet. 39, S37–S42 (2007).
Hindson, B.J. et al. High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Anal. Chem. 83, 8604–8610 (2011).
Boettger, L.M., Handsaker, R.E., Zody, M.C. & McCarroll, S.A. Structural haplotypes and recent evolution of the human 17q21.31 region. Nat. Genet. 44, 881–885 (2012).
Su, S.Y. et al. Inferring combined CNV/SNP haplotypes from genotype data. Bioinformatics 26, 1437–1445 (2010).
Kato, M., Nakamura, Y. & Tsunoda, T. An algorithm for inferring complex haplotypes in a region of copy-number variation. Am. J. Hum. Genet. 83, 157–169 (2008).
Browning, B.L. & Browning, S.R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009).
Assaad, F.F., Tucker, K.L. & Signer, E.R. Epigenetic repeat-induced gene silencing (RIGS) in Arabidopsis. Plant Mol. Biol. 22, 1067–1085 (1993).
Dorer, D.R. & Henikoff, S. Expansions of transgene repeats cause heterochromatin formation and gene silencing in Drosophila. Cell 77, 993–1002 (1994).
Dorer, D.R. & Henikoff, S. Transgene repeat arrays interact with distant heterochromatin and cause silencing in cis and trans. Genetics 147, 1181–1190 (1997).
Garrick, D., Fiering, S., Martin, D.I. & Whitelaw, E. Repeat-induced gene silencing in mammals. Nat. Genet. 18, 56–59 (1998).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Wellcome Trust Case Control Consortium. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).
Abu Bakar, S., Hollox, E.J. & Armour, J.A. Allelic recombination between distinct genomic locations generates copy number diversity in human β-defensins. Proc. Natl. Acad. Sci. USA 106, 853–858 (2009).
Dennis, M.Y. et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell 149, 912–922 (2012).
Smith, A.B., Esko, J.D. & Hajduk, S.L. Killing of trypanosomes by the human haptoglobin-related protein. Science 268, 284–286 (1995).
Harrington, J.M., Howell, S. & Hajduk, S.L. Membrane permeabilization by trypanosome lytic factor, a cytolytic human high density lipoprotein. J. Biol. Chem. 284, 13505–13512 (2009).
Genovese, G. et al. Association of trypanolytic ApoL1 variants with kidney disease in African Americans. Science 329, 841–845 (2010).
Genovese, G., Friedman, D.J. & Pollak, M.R. APOL1 variants and kidney disease in people of recent African ancestry. Nat. Rev. Nephrol. 9, 240–244 (2013).
Moffatt, M.F. et al. Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature 448, 470–473 (2007).
Zanda, M. et al. A genome-wide assessment of the role of untagged copy number variants in type 1 diabetes. PLoS Genet. 10, e1004367 (2014).
Hollox, E.J. et al. Psoriasis is associated with increased β-defensin genomic copy number. Nat. Genet. 40, 23–25 (2008).
Steinberg, K.M. et al. Structural diversity and African origin of the 17q21.31 inversion polymorphism. Nat. Genet. 44, 872–880 (2012).
Acknowledgements
We thank D. Skvortsov, M. Thornton, N. Klitgord and B. Zhang for contributions to ddPCR assay design and validation. We also thank members of the 1000 Genomes Project for helpful conversations about analysis methods. This work was supported by a grant from the National Human Genome Research Institute (NHGRI; R01 HG006855). An additional grant from NHGRI (U01 HG006510) is supporting follow-on work to develop these methods into production-ready software that can be used by any research laboratory.
Author information
Authors and Affiliations
Contributions
R.E.H. and S.A.M. designed the study. R.E.H. devised the computational approaches, performed the analysis and wrote the Genome STRiP software. V.V.D. performed the ddPCR experiments and initial data analyses. J.R.B. designed assays for the ddPCR experiments and provided technical guidance and materials. G.G. contributed to the statistical analyses of gene dosage and dispersed duplications. S.K. helped automate and refine the algorithms and produce the public software release. L.M.B. contributed to the analysis of the HPR locus. S.A.M. and R.E.H. interpreted the data and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
J.R.B. is an employee of Bio-Rad Laboratories, Inc.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–11, Supplementary Tables 1–13 and Supplementary Note. (PDF 14119 kb)
Rights and permissions
About this article
Cite this article
Handsaker, R., Van Doren, V., Berman, J. et al. Large multiallelic copy number variations in humans. Nat Genet 47, 296–303 (2015). https://doi.org/10.1038/ng.3200
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3200