ABSTRACT
Congenital heart disease (CHD) is the most common congenital anomaly. Non-canonical splice-disrupting variants are not routinely evaluated by clinical tests. Algorithms including SpliceAI predict such variants, but are not specific to cardiac-expressed genes. Whole genome (WGS) (n=1083) and myocardial RNA-Sequencing (RNA-Seq) (n=114) of CHD cases was used to identify splice-disrupting variants. Using features of variants confirmed to affect splicing in myocardial RNA, we trained a machine learning model that outperformed SpliceAI for predicting cardiac-specific splice-disrupting variants (AUC 0.92 vs 0.66), and was independently validated in 43 cardiomyopathy probands (AUC 0.88 vs 0.64). Application of this model to 971 CHD WGS samples identified 9% patients with splice-disrupting variants in CHD genes. Forty-one% of predicted splice-disrupting variants were deeply intronic. The burden of variants in CHD genes was higher in cases compared with 2,570 controls. Our model improved genetic yield by identifying splice-disrupting variants that are not evaluated by routine tests.
Competing Interest Statement
Seema Mital is on the Advisory Board of Bristol Myers Squibb, and Tenaya Therapeutics.
Funding Statement
This project was supported by the Canadian Institutes of Health Research (ENP 161429) under the frame of ERA PerMed (RL, MH, CB, SM), the Ted Rogers Centre for Heart Research (SM), and the Data Sciences Institute at the University of Toronto (SM). SM holds the Heart and Stroke Foundation of Canada & Robert M Freedom Chair in Cardiovascular Science. CRB and AVP are supported by the CVON project 2014-18 CONCOR-genes. EO held the Bitove Family Professorship of Adult Congenital Heart Disease until March 2021. GB is supported by a NSW CVRN Career Advancement Grant. JB is supported by a senior clinical investigator fellowship of FWO Flanders and by the Frans Van de Werf fund for clinical cardiovascular research.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Institutional Research Ethics Boards of The Hospital for Sick Children, Amsterdam Medical Center, The Children's Hospital at Westmead and Kompetenznetz Angeborene Herzfehler gave ethical approval for the collection and use of biospecimens through respective registries The Heart Centre Biobank (Ontario, Canada), CONCOR (Amsterdam, Netherlands), Kids Heart BioBank (Sydney, Australia) and German Heart Registry (Berlin, Germany). Written informed consent was obtained from all patients and/or their parents/legal guardians and study protocols adhered to the Declaration of Helsinki.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Sequencing data for the Discovery and Extension cohorts will be deposited in the European Genome-Phenome Archive (EGA), and will be available for download upon approval by the Data Access Committee. Sequencing data for the cardiomyopathy Validation cohort is available in EGA under accession EGAS00001004929, and are available for download upon approval by the Data Access Committee. Control cohort MGRB data are available by controlled access in EGA under accession EGAS00001003511. Additional data generated or analyzed during this study are included in the supplementary information files, and additional raw data used for figures and results are available from the corresponding author on reasonable request.