Abstract
Purpose Spinal muscular atrophy (SMA), caused by loss of the functional SMN1 gene, is a leading genetic cause of early childhood death. Due to the near identical sequences of SMN1 and its paralog SMN2, analysis of this region is challenging. Population-wide SMA screening to quantify the SMN1 copy number (CN) is recommended by the American College of Medical Genetics.
Methods We developed an informatics method that accurately identifies the CN of SMN1 and SMN2 using whole-genome sequencing (WGS) data. This algorithm calculates the CNs of SMN1 and SMN2 using read depth and eight informative reference genome differences between SMN1/2.
Results We characterized SMN1/2 in 12,747 genomes across five ethnic populations and identified 251 (1317) samples with SMN1 losses (gains) and 6241 (374) samples with SMN2 losses (gains). We calculated a pan-ethnic carrier frequency of 2%, consistent with previous studies. Additionally, we validated our calls and all (48/48) SMN1 and 98% (47/48) of SMN2 CN calls agreed with digital PCR.
Conclusion This WGS-based SMN copy number caller can be used to identify both carrier and affected status of SMA, enabling SMA testing to be offered as a comprehensive test in neonatal care and an accurate carrier screening tool in large-scale WGS sequencing projects.
Competing Interest Statement
Xiao Chen, Aditi Chawla, Aaron L Halpern1, Ryan J Taft, David R Bentley, and Michael A Eberle are all employed by Illumina a maker of genome sequencing instruments.
Funding Statement
This work was supported by the Cambridge Biomedical Research Centre and the National Institute for Health Research (NIHR) for the NIHR BioResource (grant number RG65966), the National Institute of General Medical Sciences of the National Institutes of Health (P30GM114736 and P20GM103446; to MERB) and the Nemours Foundation (to MERB). We thank the New York Genome Center (supported by NHGRI Grant 3UM1HG008901-03S1), and the Coriell Institute for Medical Research for generating and releasing the 1kGP WGS data.
Author Declarations
All relevant ethical guidelines have been followed and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
Any clinical trials involved have been registered with an ICMJE-approved registry such as ClinicalTrials.gov and the trial ID is included in the manuscript.
Not Applicable
I have followed all appropriate research reporting guidelines and uploaded the relevant Equator, ICMJE or other checklist(s) as supplementary files, if applicable.
Not Applicable
Data Availability
The 1kGP data can be downloaded from https://www.ncbi.nlm.nih.gov/bioproject/PRJEB31736/. Data from the NIHR BioResource participants have been deposited in European Genome-phenome Archive (EGA) at the EMBL European Bioinformatics Institute. Those participants from the NIHR BioResource who enrolled for the 100,000 Genomes Project-Rare Diseases Pilot can be accessed by seeking access via Genomics England Limited following the procedure outlined at: https://www.genomicsengland.co.uk/about-gecip/joining-research-community. The Bam files from the NGC individuals have been deposited in EGA under accession number EGAD00001004357.