Structured Abstract
Purpose Over 30 international studies are exploring newborn sequencing (NBSeq) to expand the range of genetic disorders included in newborn screening. Substantial variability in gene selection across programs exists, highlighting the need for a systematic approach to prioritize genes.
Methods We assembled a dataset comprising 25 characteristics about each of the 4,390 genes included in 27 NBSeq programs. We used regression analysis to identify several predictors of inclusion, and developed a machine learning model to rank genes for public health consideration.
Results Among 27 NBSeq programs, the number of genes analyzed ranged from 134 to 4,299, with only 74 (1.7%) genes included by over 80% of programs. The most significant associations with gene inclusion across programs were presence on the US Recommended Uniform Screening Panel (inclusion increase of 74.7%, CI: 71.0%-78.4%), robust evidence on the natural history (29.5%, CI: 24.6%-34.4%) and treatment efficacy (17.0%, CI: 12.3%- 21.7%) of the associated genetic disease. A boosted trees machine learning model using 13 predictors achieved high accuracy in predicting gene inclusion across programs (AUC = 0.915, R² = 84%).
Conclusion The machine learning model developed here provides a ranked list of genes that can adapt to emerging evidence and regional needs, enabling more consistent and informed gene selection in NBSeq initiatives.
Competing Interest Statement
A.J.C. and R.J.T. are employees and shareholders at Illumina Inc. N.G. is co-founder and equity owner of Datavisyn. N.B.G. provides occasional consulting services to RCG Consulting and receives honoraria from Ambry Genetics. R.C.G. has received compensation for advising the following companies: Allelica, Atria, Fabric, Genomic Life and Juniper Genomics; and is co-founder of Genome Medical and Nurture Genomics. B.E.R. and K.L.S. are consultants at Nurture Genomics. L.S. received personal compensation from Zentech and Illumina Inc. P.T. is a co-founder of PlumCare RWE, LLC.
Funding Statement
This work was supported by the following grants: T32GM007748 (S.B.), R01HG011773 (N.G.), K08HG012811-01 (N.B.G.), TR003201 (N.B.G.), HD077671 (R.C.G.) TR003201 (R.C.G.), and EU-IMI H2020 GRANT 101034427 (A.F., J.K.).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.