Abstract
The autism spectrum disorders (ASDs) are a group of conditions characterized by impairments in reciprocal social interaction and communication, and the presence of restricted and repetitive behaviours1. Individuals with an ASD vary greatly in cognitive development, which can range from above average to intellectual disability2. Although ASDs are known to be highly heritable (∼90%)3, the underlying genetic determinants are still largely unknown. Here we analysed the genome-wide characteristics of rare (<1% frequency) copy number variation in ASD using dense genotyping arrays. When comparing 996 ASD individuals of European ancestry to 1,287 matched controls, cases were found to carry a higher global burden of rare, genic copy number variants (CNVs) (1.19 fold, P = 0.012), especially so for loci previously implicated in either ASD and/or intellectual disability (1.69 fold, P = 3.4 × 10-4). Among the CNVs there were numerous de novo and inherited events, sometimes in combination in a given family, implicating many novel ASD genes such as SHANK2, SYNGAP1, DLGAP2 and the X-linked DDX53–PTCHD1 locus. We also discovered an enrichment of CNVs disrupting functional gene sets involved in cellular proliferation, projection and motility, and GTPase/Ras signalling. Our results reveal many new genetic and functional targets in ASD that may lead to final connected pathways.
Similar content being viewed by others
Main
Twin and family studies indicate a predominantly genetic basis for ASD susceptibility and provide support for considering these disorders as a clinical spectrum. Some 5–15% of individuals with an ASD have an identifiable genetic aetiology corresponding to known rare single-gene disorders (for example, fragile X syndrome) and chromosomal rearrangements (for example, maternal duplication of 15q11-q13). Rare mutations have been identified in synaptic genes, including NLGN3, NLGN4X (ref. 4) and SHANK3 (ref. 5), and microarray studies have revealed copy number variation (CNV) as risk factors6. CNV examples include de novo events observed in 5–10% of ASD cases7,8,9, de novo or inherited hemizygous deletions and duplications of 16p11.2 (refs 9–11) and NRXN1 (ref. 7), and exceptionally rare homozygous deletions in consanguineous families12. Genome-wide association studies using single nucleotide polymorphisms (SNPs) have highlighted two potential ASD risk loci at 5p14.1 (ref. 13) and 5p15.2 (ref. 14), but these data indicate that common variation will account for only a small proportion of the heritability in ASD.
To delineate further the contribution of rare genomic variants to autism we genotyped 1,275 ASD cases and their parents using the Illumina Infinium 1M single SNP microarray (Fig. 1). A set of 1,981 controls used for comparison studies was genotyped on the same platform15 and both data sets were subjected to the same quality control procedures. Ultimately, we analysed 996 ASD cases (876 trios) and 1,287 controls of European ancestry to minimize confounds due to population differences (Supplementary Figs 1 and 2 and Supplementary Table 1)16.
Comprehensive procedures were used to identify the rare CNV data set (boxed). Dashed arrows indicate CNVs not included in downstream analyses. Labels a–f are as follows: a, SNP and intensity quality control (QC) with ancestry estimation; b, QC for CNV calls; c, pilot validation experiments using quantitative PCR were used to evaluate the false discovery rate; d, rare CNVs in samples of European ancestry were defined as ≥30 kb in size and present in the total sample set at a frequency <1%. A total of 70 out of 996 (17%) of ASD cases were analysed on different lower-resolution arrays in previous studies9,10,28. Label e indicates that all CNVs were computationally verified and at least 40% of case CNVs were also experimentally validated by qPCR and/or independent Agilent or other SNP microarrays; f, 3,677 additional European ancestry controls were used to test specific loci from the primary burden analyses. Additional details are in the Methods and Supplementary Information. ID, intellectual disability.
Two CNV prediction algorithms (QuantiSNP17 and iPattern (unpublished data)) and additional extensive quality control procedures were used to establish a stringent data set of non-redundant CNVs called by both algorithms in an individual (Fig. 1, Supplementary Tables 1–3 and Supplementary Fig. 3). This stringent data set of 5,478 rare CNVs in 996 cases and 1,287 controls of European ancestry (Supplementary Table 4) had the following characteristics: (1) CNV present at <1% frequency in the total sample (cases and controls); (2) CNV ≥30 kb in size (because >95% of these could be confirmed); and (3) all CNVs further verified using combined evidence from the PennCNV algorithm18 and child–parent intensity fold changes, genotype proportions (to verify deletions) and visual inspection (for chromosome X).
We assessed the impact of rare CNV in cases compared to controls using three primary measures of CNV burden: the number of CNVs per individual, the estimated CNV size, and the number of genes affected by CNVs (Table 1). No significant difference was found in the former two measures (Supplementary Tables 4a and 5), even after controlling for fine-level ancestry differences by pair-matching cases and controls (Supplementary Information)16. In contrast, we discovered a significant increase in the number of genes intersected by rare CNV in cases when focusing on gene-containing segments (1.19-fold increase, empirical P = 0.012). This ASD association with genic CNV was stronger for deletions (1.26-fold increase, empirical P = 8.0 × 10-3). These differences remained after we further controlled for potential case–control differences that could be present due to biological differences or technical biases. Restricting our analysis to autosomal CNVs (that is, after removing CNVs located on chromosome X) also resulted in a consistent enriched gene count in ASD cases compared to controls. Single-occurrence CNV deletions had increased rates in ASD cases over controls, indicating that some could be pathogenic.
We then examined parent–child transmission and confirmed that 5.7% (50 out of 876) of ASD cases had at least one de novo CNV with >0.6% carrying two or more de novo events (Supplementary Tables 4a, 6 and 7). The de novo CNV rate in our simplex and multiplex families was 5.6% (22 out of 393) and 5.5% (19 out of 348), respectively, in contrast with previous studies showing a higher rate in simplex families8,9. A total of 226 validated de novo (7) and inherited (219) CNVs not observed in controls and affecting single genes were found (Supplementary Table 8).
Numerous novel candidate ASD loci such as SHANK2, SYNGAP1 and DLGAP2 were identified on the basis of the observation that de novo CNV affects these genes in cases but not controls (Supplementary Table 6). The relatedness of SHANK2 to the causal ASD gene SHANK3 (ref. 5), involvement of SYNGAP1 in intellectual disability19, and interaction of DLGAP family proteins with SHANK proteins20 further support their role in ASDs. Maternally inherited X-linked deletions at DDX53–PTCHD1 (7 cases) implicate this locus in ASD. We tested an additional 3,677 European ancestry controls (Fig. 1) and again found no CNV at these genes, and DDX53–PTCHD1 emerged as a significant ASD risk factor (P = 3.1 × 10-3 with the initial 1,287 controls; P = 3.6 × 10-6 with combined controls; Supplementary Fig. 4).
Association studies of individual rare CNV often have insufficient power to discriminate benign from disease-causing variants. Here, we assessed whether genes and CNVs previously associated with ASD and/or intellectual disability were enriched in cases compared with controls, in order to help identify pathogenic events. We defined three gene lists based on evidence from previous studies of their involvement in ASDs (Supplementary Table 9): (1) ‘ASD implicated’ list consisting of 36 disease genes and 10 loci strongly implicated in ASD and identified in subjects with ASD or ASD and intellectual disability; (2) ‘intellectual disability’ consisting of 110 disease genes and 17 loci implicated in intellectual disability but not yet in ASD; and (3) ‘ASD candidates’ including 103 genes from previous studies of common and rare variants.
We observed a higher proportion of cases with rare CNVs overlapping ‘ASD implicated’ disease genes compared to controls (4.3% versus 2.3%, Fisher exact test P = 5.4 × 10-3; Fig. 2a), corresponding to a significant enrichment for genes in this set (odds ratio (OR) = 1.8; 95% confidence interval (CI) 1.3–2.6, empirical P = 2.6 × 10-3; Fig. 2b, see also Supplementary Information). This effect was stronger for duplications, which may also disrupt genes (OR = 2.3; 95% CI 1.4–3.8, empirical P = 9.4 × 10-4). Enrichment was also found for rare CNVs overlapping intellectual disability genes, more notably for deletions (OR = 2.1; 95% CI 1.1–4.2, empirical P = 0.053). In contrast, there was no evidence of enrichment among case CNVs compared to control CNVs for genes in the ASD candidates set (empirical P > 0.3). When the two disease gene sets ‘ASD implicated’ and ‘intellectual disability’ were combined, we observed 7.6% of cases with rare CNVs preferentially affecting ASD/intellectual disability genes compared to 4.5% in controls (Fisher exact test P = 1.2 × 10-3; Fig. 2a). The observed enrichments did not change when potential case–control genome-wide differences for CNV rate and size were considered.
a, Proportion of samples with CNVs overlapping genes and loci known to be associated in ASD with or without intellectual disability (ID) or intellectual disability only, as well as published candidate genes and loci for ASD (Supplementary Table 9). To select for CNVs with maximal impact, they needed to intersect genes and overlap the target loci by ≥50% of their length. Fisher’s exact test P-values for significant differences (P ≤ 0.05, one tailed) are shown. NS, not significant. b, Enrichment analysis for genes overlapped by rare CNVs in cases compared to controls for the three gene sets in a, relative to the whole genome. Odds ratio and 95% confidence intervals are given for each gene set. Empirical P-values for gene-set enrichment are indicated above each odds ratio. All P-values <0.1 are listed.
Our global analyses of these putative pathogenic loci use subjective boundaries for CNV overlap. Manual inspection of the data yields more accurate results. After eliminating CNVs that are less likely to have an aetiological role (heterozygous CNVs that disrupt autosomal recessive loci, events outside the critical region of overlap of genomic disorders, X-linked genes in females inherited from non-ASD fathers, duplications inherited from non-ASD parents, and intronic CNVs in NRXN1), 25 CNVs remained in the ASD group, compared to only four in the controls (P = 3.6 × 10-6; Supplementary Table 10). Moreover, the latter four CNVs were all duplications at 1q21.1, 16p11.2 or 22q11.2, loci known to exhibit incomplete penetrance and variable expressivity6. The population attributable risk provided by the combination of all ASD CNVs that overlap ASDs and/or intellectual disability genes is estimated to be 3.3% (Supplementary Table 11). We also identified rare de novo chromosomal abnormalities and large CNVs likely to be aetiological (Supplementary Table 10).
We then tested for functional enrichment of gene sets among those genes affected by CNVs to identify biological processes involved in ASD (Fig. 3). Here, the term gene set refers to groups of genes that share a common function or operate in the same pathway. Such a functional enrichment mapping approach can combine single-gene effects into biologically meaningful groups21.
Enrichment results were mapped as a network of gene sets (nodes) related by mutual overlap (edges), where the colour (red, blue or yellow) indicates the class of gene set. Node size is proportional to the total number of genes in each set and edge thickness represents the number of overlapping genes between sets. a, Gene sets enriched for deletions are shown (red) with enrichment significance (FDR q-value) represented as a node colour gradient. Groups of functionally related gene sets are circled and labelled (groups, filled green circles; subgroups, dashed line). b, An expanded enrichment map shows the relationship between gene sets enriched in deletions (a) and sets of known ASD/intellectual disability genes. Node colour hue represents the class of gene set (that is, enriched in deletions, red; known disease genes (ASD and/or intellectual disability (ID) genes), blue; enriched only in disease genes, yellow). Edge colour represents the overlap between gene sets enriched in deletions (green), from disease genes to enriched sets (blue), and between sets enriched in deletions and in disease genes or between disease gene-sets only (orange). The major functional groups are highlighted by filled circles (enriched in deletions, green; enriched in ASD/intellectual disability, blue).
We compiled comprehensive collections of gene sets (Supplementary Table 12) and used the Fisher’s exact test to assess which gene sets were more frequently affected by rare CNV events in ASD cases compared to controls. An estimate of the false-discovery rate (FDR) at each gene set was obtained by random permutation of case and control labels (Supplementary Information). To visualize enriched gene sets, overlap scores were used to organize these sets graphically into a functional enrichment map (or network) using Cytoscape22. We identified the ‘seed’ gene sets for the network at an FDR q-value of 5% and further relaxed the thresholds to 12.5% to better capture the network topology23.
Using these criteria only deletions were found to be significantly enriched in gene sets in cases over controls (Supplementary Fig. 5), consistent with the global burden results (Table 1). Specifically, 76 gene sets affected by deletions (2.18% of sets tested) were found to be enriched and used to construct a functional map (Fig. 3a and Supplementary Figs 6 and 7). We tested for possible bias, including measures of CNV size and number for cases versus controls per gene set, as well as genome proximity, but no differences were found that might explain the observed enrichments (Supplementary Figs 8 and 9).
We identified enrichments in gene sets known to be involved in ASDs and also discovered new candidate ASD pathways (Fig. 3a and Supplementary Table 13). For example, gene sets involved in cell and neuronal development and function (including projection, motility and proliferation) previously reported in ASD-associated phenotypes were identified24. Novel observations included gene sets involved in GTPase/Ras signalling, with component Rho GTPases known to be involved in regulating dendrite and spine plasticity and associated with intellectual disability. We also found a tentative link to sets in the kinase activity/regulation functional group where only minorities of these sets meet a stringent 5% FDR q-value threshold (Supplementary Fig. 10).
We further assessed the relationship of our functional enrichment map with known ASD/intellectual disability genes (Fig. 3b and Supplementary Fig. 11) and found genes enriched in sets linked to microtubule cytoskeleton, glycosylation and CNS development/adhesion25. The two groups of genes found to be enriched in deletions (Fig. 3a) also displayed connectivity to the ASD/intellectual disability disease gene sets, either directly or through intermediates (Fig. 3b and Supplementary Fig. 12). Although ASD genes seem to be enriched in different subsets of genes compared to intellectual-disability-only genes, we cannot discount the possibility that this is the result of selection bias, and we expect that more intellectual disability genes may yet be linked to ASD.
Our findings provide strong support for the involvement of multiple rare genic CNVs, both genome-wide and at specific loci, in ASD. These findings, similar to those recently described in schizophrenia26, suggest that at least some of these ASD CNVs (and the genes that they affect) are under purifying selection27. Genes previously implicated in ASD by rare variant findings have pointed to functional themes in ASD pathophysiology6,28. Molecules such as NRXN1, NLGN3/4X and SHANK3, localized presynaptically or at the post-synaptic density (PSD), highlight maturation and function of glutamatergic synapses. Our data reveal that SHANK2, SYNGAP1 and DLGAP2 are new ASD loci that also encode proteins in the PSD. We also found intellectual disability genes to be important in ASD29. Furthermore, our functional enrichment map identifies new groups such as GTPase/Ras, effectively expanding both the number and connectivity of modules that may be involved in ASD. The next step will be to relate defects or patterns of alterations in these groups to ASD endophenotypes. The combined identification of higher-penetrance rare variants and new biological pathways, including those identified in this study, may broaden the targets amenable to genetic testing and therapeutic intervention.
Methods Summary
Raw data from ASD family (accession phs000267.v1.p1) and SAGE control (Accession: phs000092.v1.p1) genotyping are at NCBI dbGAP. CNVs were analysed using PLINK v1.0730, R stats and custom scripts. See Supplementary Information for details. A list of all CNVs passing quality control is available in Supplementary Table 8.
References
Veenstra-Vanderweele, J., Christian, S. L. & Cook, E. H. Jr. Autism as a paradigmatic complex genetic disorder. Annu. Rev. Genomics Hum. Genet. 5, 379–405 (2004)
Chakrabarti, S. & Fombonne, E. Pervasive developmental disorders in preschool children: confirmation of high prevalence. Am. J. Psychiatry 162, 1133–1141 (2005)
Bailey, A. et al. Autism as a strongly genetic disorder: evidence from a British twin study. Psychol. Med. 25, 63–77 (1995)
Jamain, S. et al. Mutations of the X-linked genes encoding neuroligins NLGN3 and NLGN4 are associated with autism. Nature Genet. 34, 27–29 (2003)
Durand, C. M. et al. Mutations in the gene encoding the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders. Nature Genet. 39, 25–27 (2007)
Cook, E. H. & Scherer, S. W. Copy-number variations associated with neuropsychiatric conditions. Nature 455, 919–923 (2008)
Szatmari, P. et al. Mapping autism risk loci using genetic linkage and chromosomal rearrangements. Nature Genet. 39, 319–328 (2007)
Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007)
Marshall, C. R. et al. Structural variation of chromosomes in autism spectrum disorder. Am. J. Hum. Genet. 82, 477–488 (2008)
Weiss, L. A. et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008)
Kumar, R. A. et al. Recurrent 16p11.2 microdeletions in autism. Hum. Mol. Genet. 17, 628–638 (2008)
Morrow, E. M. et al. Identifying autism loci and genes by tracing recent shared ancestry. Science 321, 218–223 (2008)
Wang, K. et al. Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature 459, 528–533 (2009)
Weiss, L. A., Arking, D. E., Daly, M. J. & Chakravarti, A. A genome-wide linkage and association scan reveals novel loci for autism. Nature 461, 802–808 (2009)
Bierut, L. J. et al. A genome-wide association study of alcohol dependence. Proc. Natl Acad. Sci. USA 107, 5082–5087 (2010)
Lee, A. B., Luca, D., Klei, L., Devlin, B. & Roeder, K. Discovering genetic ancestry using spectral graph theory. Genet. Epidemiol. 34, 51–59 (2010)
Colella, S. et al. an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007)
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007)
Hamdan, F. F. et al. Mutations in SYNGAP1 in autosomal nonsyndromic mental retardation. N. Engl. J. Med. 360, 599–605 (2009)
Romorini, S. et al. A functional role of postsynaptic density-95-guanylate kinase-associated protein complex in regulating Shank assembly and stability to synapses. J. Neurosci. 24, 9391–9404 (2004)
O’Dushlaine, C. et al. Molecular pathways involved in neuronal cell adhesion and membrane scaffolding contribute to schizophrenia and bipolar disorder susceptibility. Mol. Psychiatry 10.1038/mp.2010.7 (16 February 2010)
Cline, M. S. et al. Integration of biological networks and gene expression data using Cytoscape. Nature Protocols 2, 2366–2382 (2007)
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)
Meechan, D. W., Tucker, E. S., Maynard, T. M. & LaMantia, A. S. Diminished dosage of 22q11 genes disrupts neurogenesis and cortical development in a mouse model of 22q11 deletion/DiGeorge syndrome. Proc. Natl Acad. Sci. USA 106, 16434–16445 (2009)
Wegiel, J. et al. The neuropathology of autism: defects of neurogenesis and neuronal migration, and dysplastic changes. Acta Neuropathol. 10.1007/s00401-010-0655-4 (3 March 2010)
International Schizophrenia Consortium. Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature 455, 237–241 (2008)
Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010)
Glessner, J. T. et al. Autism genome-wide copy number variation reveals ubiquitin and neuronal genes. Nature 459, 569–573 (2009)
Skuse, D. H. Rethinking the nature of genetic vulnerability to autistic spectrum disorders. Trends Genet. 23, 387–395 (2007)
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007)
Acknowledgements
The authors acknowledge the families participating in the study and the main funders of the Autism Genome Project Consortium (AGP): Autism Speaks (USA), the Health Research Board (HRB; Ireland), The Medical Research Council (MRC; UK), Genome Canada/Ontario Genomics Institute, and the Hilibrand Foundation (USA). Additional support for individual groups was provided by the US National Institutes of Health (NIH grants HD055751, HD055782, HD055784, HD35465, MH52708, MH55284, MH57881, MH061009, MH06359, MH066673, MH080647, MH081754, MH66766, NS026630, NS042165, NS049261), the Canadian Institute for Advanced Research (CIFAR), the Canadian Institutes for Health Research (CIHR), Assistance Publique–Hôpitaux de Paris (France), Autistica, Canada Foundation for Innovation/Ontario Innovation Trust, Deutsche Forschungsgemeinschaft (grant Po 255/17-4) (Germany), EC Sixth FP AUTISM MOLGEN, Fundação Calouste Gulbenkian (Portugal), Fondation de France, Fondation FondaMental (France), Fondation Orange (France), Fondation pour la Recherche Médicale (France), Fundação para a Ciência e Tecnologia (Portugal), the Hospital for Sick Children Foundation and University of Toronto (Canada), INSERM (France), Institut Pasteur (France), the Italian Ministry of Health (convention 181 of 19.10.2001), the John P Hussman Foundation (USA), McLaughlin Centre (Canada), Ontario Ministry of Research and Innovation (Canada), the Seaver Foundation (USA), the Swedish Science Council, The Centre for Applied Genomics (Canada), the Utah Autism Foundation (USA) and the Wellcome Trust core award 075491/Z/04 (UK). D.P. is supported by fellowships from the Royal Netherlands Academy of Arts and Sciences (TMF/DA/5801) and the Netherlands Organization for Scientific Research (Rubicon 825.06.031). S.W.S. holds the GlaxoSmithKline-CIHR Pathfinder Chair in Genetics and Genomics at the University of Toronto and the Hospital for Sick Children (Canada).
Author information
Authors and Affiliations
Contributions
D.P., J.D.B., R.M.C., E.H.C., H.C., M.C., B.D., S.E., L.G., D.H.G., M.G., J.L.H., J.H., J.M., A.P.M., J.I.N., A.D.P., M.A.P.-V., G.D.S., P.S., A.M.V., V.J.V., E.M.W., J.S.S., C.B. and S.W.S. were leading contributors in the design, analysis and writing of this study. A.J.B., A.B., G.D., C.M.F., H.H., S.M.K., E.M., S.F.N., G.O., J.P., T.H.W., J.D.B., R.M.C., E.H.C., H.C., B.D., S.E., L.G., D.H.G., M.G., J.L.H., J.H., A.P.M., J.I.N., A.D.P., M.A.P.-V., G.D.S., P.S., A.M.V., V.J.V., E.M.W., S.W.S., J.S.S. and C.B. are Lead Autism Genome Project Consortium (AGP) investigators who contributed equally to this project. All other authors were either involved in phenotype and clinical assessments or have participated in experiments and analysis.
Corresponding author
Ethics declarations
Competing interests
L.J. Bierut and J.P. Rice are inventors on the patent “Markers for Addiction” (US 20070258898) covering the use of certain SNPs in determining the diagnosis, prognosis and treatment of addiction. L.J. Bierut served as a consultant for Pfizer Inc. in 2008.
Supplementary information
Supplementary Information
This file contains Supplementary Information comprising: Autism spectrum disorder (ASD) sample and control collections; Genotyping and data cleaning; CNV detection and quality control evaluation; CNV verification; Rare CNV burden analysis; Gene-set enrichment and functional map; Supplementary Figures 1-12 with legends, Supplementary Tables 1-13 (see separate files for tables 8 and 13), Acknowledgements and References. (PDF 3971 kb)
Supplementary Table 8
This table contains rare CNVs in 996 ASD cases. (XLS 699 kb)
Supplementary Table 13
This table contains a list of gene-sets enriched for deletions. (XLS 94 kb)
Rights and permissions
About this article
Cite this article
Pinto, D., Pagnamenta, A., Klei, L. et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466, 368–372 (2010). https://doi.org/10.1038/nature09146
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature09146