Abstract
Autism Spectrum Disorder (ASD) is genetically complex, but specific copy number variants (CNVs; e.g., 1q21.1, 16p11.2) and genes (e.g., NRXN1, NLGN4) have been identified as penetrant susceptibility factors, and all of these demonstrate pleiotropy. Many ASD-associated CNVs are, in fact, genomic disorder loci where flanking segmental duplications lead to recurrent deletion and duplication events of the same region in unrelated individuals, but these lesions are large and involve multiple genes. To identify opportunities to establish a more specific genotype and phenotype correlation in ASD, we searched genomic data, and the literature, for recurrent predicted damaging sequence-level variants affecting single genes. We identified 17 individuals from 15 unrelated families carrying a heterozygous guanine duplication (rs797044936; NM_033517.1; c.3679dup; p.Ala1227Glyfs*69) occurring within a string of 8 guanines (at genomic location [hg38]g.50,721,512dup) affecting SHANK3, a prototypical ASD gene (6/7,521 or 0.08% of ASD-affected individuals studied by whole genome sequencing carried the p.Ala1227Glyfs*69 variant). This variant, which is predicted to cause a frameshift leading to a premature stop codon truncating the C-terminal region of the corresponding protein, was not reproducibly found in any of the control groups we analyzed. All probands identified carried de novo mutations with the exception of five individuals in three families who inherited it through somatic mosaicism. This same heterozygous variant in published mouse models leads to an ASD-like phenotype. We scrutinized the phenotype of p.Ala1227Glyfs*69 carriers, and while everyone (16/16) formally tested for ASD carried a diagnosis, there was variable expression of core ASD features both within families and between families, underscoring the impact of as yet unknown modifiable factors affecting expressivity in autism.
Competing Interest Statement
S.W.S. is on the Scientific Advisory Committees of Deep Genomics, Population Bio and an Academic Consultant for the King Abdulaziz University.
Funding Statement
This work was funded by Autism Speaks, Autism Speaks Canada, the University of Toronto McLaughlin Centre, the Canada Foundation for Innovation, the Canadian Institutes of Health Research (CIHR), Genome Canada/Ontario Genomics Institute, the Government of Ontario, Brain Canada, Ontario Brain Institute Province of Ontario Neurodevelopmental Disorders (POND), and The Hospital for Sick Children Foundation. L.O.L holds Lap-Chee Tsui Postdoctoral Fellowship from The Hospital for Sick Children. S.W.S holds the Endowed Chair in Genome Sciences at the Hospital for Sick Children and University of Toronto.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
SickKids Research Ethics Board
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Access to the MSSNG and SSC data can be obtained by completing data access agreements (https://research.mss.ng and https://www.sfari.org/resource/sfari-base, respectively), as was done for this study. These two well-established and stable whole genome sequence and phenotype resources are utilized by approved investigators worldwide. The 1000G genome-sequencing data are publicly available via Amazon Web Services (https://docs.opendata.aws/1000genomes/readme.html). Access to other data through other publications or resources is described in the main text.