Abstract
Bicuspid aortic valve (BAV), the most common congenital heart defect, is a major cause of aortic valve disease requiring valve interventions and thoracic aortic aneurysms predisposing to acute aortic dissections. The spectrum of BAV ranges from early onset valve and aortic complications (EBAV) to sporadic late onset disease. Rare genomic copy number variants (CNVs) have previously been implicated in the development of BAV and thoracic aortic aneurysms. We determined the frequency and gene content of rare CNVs in EBAV probands (n = 272) using genome-wide SNP microarray analysis and three complementary CNV detection algorithms (cnvPartition, PennCNV, and QuantiSNP). Unselected control genotypes from the Database of Genotypes and Phenotypes were analyzed using identical methods. We filtered the data to select large genic CNVs that were detected by multiple algorithms. Findings were replicated in cohorts with late onset sporadic disease (n = 5040). We identified 34 large and rare (< 1:1000 in controls) CNVs in EBAV probands. The burden of CNVs intersecting with genes known to cause BAV when mutated was increased in case-control analysis. CNVs intersecting with GATA4 and DSCAM were enriched in cases, recurrent in other datasets, and segregated with disease in families. In total, we identified potentially pathogenic CNVs in 8% of EBAV cases, implicating alterations of candidate genes at these loci in the pathogenesis of BAV.
Author Summary Bicuspid aortic valve (BAV) is the most common form of congenital heart disease and can lead to long-term complications such as aortic stenosis, aortic regurgitation, or thoracic aortic aneurysms. Most BAV-related complications arise in late adulthood, but 10-15% of individuals with BAV develop early onset complications before age 30. Copy number variants (CNVs) are genomic structural variations that have been previously implicated in some types of congenital heart disease, including BAV. Here we demonstrate that individuals with early onset complications of BAV are enriched for specific rare CNVs compared to individuals with late-onset BAV disease. We also describe novel CNVs involving DSCAM, a gene on chromosome 21 that has not previously been associated with the development of BAV. These results may lead to improved risk stratification and targeted therapies for BAV patients.
Introduction
Copy number variants (CNVs) have been implicated as causes or modifiers of many human diseases [1]. Specifically, large genomic CNVs are significantly enriched in cohorts with developmental delay or congenital abnormalities, and the severity of phenotypes has been correlated with the burden of rare CNVs [2]. These observations show that large, rare, de novo CNVs are likely to be pathogenic and can exert clinically relevant effects on disease pathogenesis [3-4].
Congenital heart disease (CHD) has a worldwide prevalence of 8.2 per 1000 live births [5]. CNVs have been implicated in both syndromic and non-syndromic forms of CHD [6-10]. The pathogenicity and penetrance of CNVs was initially established for clinical syndromes such as velocardiofacial syndrome, Turner syndrome, or Williams–Beuren syndrome, which involve chromosomal or megabase scale duplications or deletions, but has since been expanded to include additional CHD subtypes [10]. CNVs contribute to 10% of all CHD cases and up to 25% of cases with extracardiac anomalies or other syndromic features [11]. The role of pathogenic CNVs affecting genes that are known to cause CHD when mutated, such as GATA4 and TBX1, has been established [12]. Furthermore, population-level analysis has consistently demonstrated an increased burden of CHD in carriers of CNVs at specific genomic hotspots compared to controls, displaying the pathogenic potential of rare or de novo CNVs [12-14].
Bicuspid Aortic Valve (BAV) is the most common congenital heart malformation with a population prevalence of 0.5 – 2% [15]. BAV predisposes to aortic valve stenosis and thoracic aortic aneurysms and is associated with other left ventricular outflow tract lesions such as mitral valve disease and coarctation [16]. The high heritability of BAV was demonstrated in first- and second-degree relatives, who are more than ten times more likely to be diagnosed with BAV compared to matched controls [17]. BAV can occur as an isolated congenital lesion or as part of a clinical syndrome. For example, the prevalence of BAV is increased in Velocardiofacial, Loeys-Dietz, Kabuki, and Turner syndromes. Pathogenic variants of several genes are implicated in familial non-syndromic BAV, which is typically inherited as an autosomal dominant trait with reduced penetrance and variable expressivity. There is strong cumulative evidence that GATA4, GATA6, NOTCH1, ROBO4, SMAD4, MUC4, and SMAD6 each contribute to a small percentage of non-syndromic BAV cases. Phenotypic expression of BAV disease ranges from incidental discovery in late adulthood to neonatal or childhood onset of complications. In comparison to patients with later disease onset, younger BAV cohorts tend to present with syndromic features or complex congenital malformations that are more likely to have a genetic cause, thereby increasing the power of association studies to discover clinically relevant CNVs [18]. Recently, we identified recurrent rare CNVs that were enriched for cardiac developmental genes in a young cohort with early-onset thoracic aortic aneurysms or acute aortic dissections [19].
We hypothesize that large rare genomic CNVs contribute to early onset complications of BAV. Consistent with previous observations, we predict that the burden and penetrance of rare CNVs will be increased in individuals with early onset disease when compared to elderly sporadic BAV cases and population controls. Identification of novel pathogenic CNVs can provide new insights into the genetic complexity of BAV and may be useful for personalized risk stratification or clinical guidance based on the specific recurrent CNV [20]. Therefore, we set out to describe the burden and penetrance of rare CNVs in a young cohort with early onset complications of BAV disease (EBAV).
Materials and Methods
The study protocol was approved by the Committee for the Protection of Human Subjects at the University of Texas Health Science Center at Houston (HSC-MS-11-0185). After written informed consent, we enrolled 272 probands of European ancestry with early onset BAV disease (EBAV), which we defined as individuals with BAV who were under the age of 30 at the time of first clinical event. Clinical events were defined as aortic replacement, aortic valve surgery, aortic dissection, moderate or severe aortic stenosis or aortic regurgitation, large aneurysm (Z > 4.5), or intervention for BAV-related conditions. Those with hypoplastic left heart, known genetic mutations, genetic syndromes, or complex congenital heart disease were excluded. Affected and unaffected family members of probands were included in this cohort for a total of 544 individuals in 293 families (26 trios and 16 multiplex families). Samples were collected and genotyped similar to our previous study [21]. For comparison, we analyzed a cohort of older individuals of European ancestry with sporadic BAV disease selected from the International BAV Consortium (Table 1) [22].
Phenotypes were derived from record review with confirmation of image data whenever possible [23-24]. The computational pipeline for CNV analysis of Illumina single nucleotide polymorphism (SNP) array data included three independent CNV detection algorithms (Fig 1).
GenomeStudio was used to exclude samples with indeterminate sex or more than 5% missing genotypes, and single nucleotide polymorphisms (SNPs) with GenTrain = 0. Principal component analysis was used to remove outliers that did not cluster with European ancestry. Only SNPs common to all microarray platforms were included.
Three independent algorithms (PennCNV, cnvPartition, and QuantiSNP) were used to generate CNV calls and sample-level quality statistics from SNP intensity data. PennCNV and QuantiSNP were run on Unix clusters and cnvPartition data were exported from GenomeStudio. The analysis was run using default configurations.
PennCNV was used to generate QC data and remove CNV calls that intersect with polymorphic genomic regions. Samples that met any of the following criteria were excluded: standard deviation of the LogR ratio (obtained from PennCNV) > 0.35 or number of CNVs > 2 standard deviations above the mean for each data set. CNV calls less than 20 kilobase pairs and/or spanned by less than 6 SNP probes were excluded. The overlap function for rare CNVs in PLINK was used to construct CNV regions (CNVRs) and adjacent regions were merged using PennCNV.
LogR ratio (LRR) and B allele frequency (BAF) data at CNVRs and calls of interest were visualized in GenomeStudio for validation. For segregation analysis, GenomeStudio was used to determine the presence of CNVs in relatives.
A total of 22,014 unselected control Illumina Genotypes obtained from the Database of Genotypes and Phenotypes were analyzed using identical methods (Table in S1Table). Cohorts were paired as follows for case-control analysis based on the concordance of sample-level quality control statistics (mean number of CNV calls and mean standard deviation of the LogR Ratio): EBAV and WLS, BAVGWAS and HRS.
PLINK was used to catalog CNV calls and perform burden and enrichment studies. Case - control burden tests were restricted to large (250 - 5000 kilobase pairs), rare (occurring in less than 1 in 1000 samples; total of cases and controls), and validated CNV calls in EBAV probands. Genome Reference Consortium Human Build 37 [28] was used for CNV annotation.
Results
Compared to BAVGWAS probands, EBAV probands were significantly younger at diagnosis, had more frequent co-existing congenital heart and vascular lesions, and underwent more frequent valve or aortic operations. A phenotype summary of the EBAV and BAVGWAS Cohorts is provided in Table 2.
CNV analysis is summarized in Table 3. The percentages of individuals with large and rare CNV regions were relatively consistent throughout datasets. The prevalence of large and rare CNVs, specifically large genomic deletions, was increased in EBAV cases compared to controls (Table S2).
There were 34 large (>250 Kb), rare (<1:1000 in dbGAP controls) CNV regions that involved protein-coding genes in EBAV cases (Table S3). Seven of these genic CNVs were enriched in EBAV cases compared to WLS controls with a genome-wide adjusted empiric P < 0.05. These CNVs included the genes PCP4, DSCAM, MIR4760, and DSCAM-AS1 in 21q22 and GATA4, C8orf49, NEIL2, FDFT1, and CTSB in 8p23. Large duplications involving the Velocardiofacial (VCFS) region in 22q11.2 and 1q21.1 microduplications were also enriched in EBAV cases (Table S4). The overall burden of large, rare, genic CNVs was not different between EBAV cases and WLS controls. However, the burden of large, rare genic CNVs intersecting with genes known to cause BAV when mutated or implicated in syndromic BAV was significantly increased in EBAV cases (Table 4).
We also scrutinized genomic regions that are implicated in CHD by careful analysis of data from individual CNV algorithms to detect subtle copy number alterations. We identified additional rare EBAV CNVs that intersect with CHD candidate genes CELSR1, GJA5, RAF1, LTBP1, KIF1A, MYH11, MAPK3, TTN, and the VCFS region in 22q11.2. We detected additional GATA4 and DSCAM CNVs in multiplex families. These CNVs were enriched in EBAV cases compared to WLS controls (Table 5).
Next, we attempted to replicate our observations by identifying CNVs in the BAVGWAS dataset that overlapped with rare EBAV CNVs. We found that large duplications involving SOX7 and GATA4 in 8p23 and the VCFS region in 22q11.2 were also significantly enriched in BAVGWAS cases compared to HRS controls (Table 6, Table S6 and S7).
CNVs intersecting with GATA4 and DSCAM significantly overlapped between EBAV and BAVGWAS datasets (Fig 2). On average, the GATA4 CNVs were larger in the BAVGWAS dataset while the DSCAM CNVs were larger in the EBAV dataset.
We identified 7 additional CNV regions that are enriched in BAVGWAS cases but not in EBAV and are rare or absent in controls (Table S5). NANOG and NIBPL are essential for early heart development, and mutation of NIBPL causes Cornelia-de Lange syndrome with a spectrum of congenital heart malformations including BAV.
We also identified 21 very large genomic CNVs more than 5 Mb in length in the BAVGWAS dataset. Analysis of GenomeStudio data showed that most of these were mosaic loss of heterozygosity regions or duplications. Nine were large germline chromosome-scale aberrations, including two cases of trisomy 21 (Table S8). We did not identify any large X chromosome copy variants that may be consistent with Turner syndrome. There were no megabase-scale copy number variants in the EBAV dataset.
Pedigree analysis showed that several CNVs involving CELSR1, LTBP1, KIF1A, GATA4, and DSCAM segregate with BAV in EBAV families (Table S9). CNV carriers tended to present due to moderate or severe aortic regurgitation requiring valvular surgery. One proband had aortic coarctation. The youngest age at presentation was 13 years. There were no sex differences in presentation between CNV carriers.
Discussion
We identified large, rare, and likely pathogenic CNVs in almost 10% of EBAV probands that are enriched in genes that cause BAV when mutated. The percentage of EBAV cases with likely pathogenic CNVs is similar to our previous observations in a cohort with early onset TAD [30]. Enrichment of CNVs involving GATA4 and DSCAM in EBAV cases replicated in two additional BAV datasets and thousands of unselected control genotypes. This analysis provides compelling evidence that rare CNVs collectively cause more BAV cases than any single mutated gene.
GATA-Binding Protein 4 is a transcription factor that is required for cardiac and neuronal differentiation during embryogenesis [31]. Mutations of GATA4 and its homologs GATA5 and GATA6 cause congenital heart lesions [32]. Mutations in the GATA4 gene have been linked to a range of congenital heart diseases in humans, such as cardiac septal defects, tetralogy of Fallot, amd patent ductus arteriosus [33]. Patients with BAV who have rare functional variants in the GATA family exhibit varying degrees of aortopathy expression, including aortic aneurysm, dissection, and/or aortic stenosis. Alonso-Montes et al. described 4 predicted deleterious GATA4 mutations in 122 non-syndromic BAV probands who did not have affected relatives [34]. Rare GATA4 deletions and putative loss of function mutations are also implicated in CHD with distinctive features, underlining the importance of GATA4 dosage to cardiac development [35-36]. Glessner et al. discovered large de novo (∼4Mb) duplications involving GATA4 in CHD trios with conotruncal defects or left ventricular outflow tract obstructive lesions [37]. Some duplications were inherited from apparently unaffected parents. Zogopoulos and Yu described similar genomic duplications in unaffected individuals and in unselected control genotypes [38-39].
These observations are consistent with low-penetrance CHD in GATA4 duplication carriers. Similar to other complex and multifactorial disorders, CHD pathogenesis is likely caused by the cumulative impact of multiple CNVs or mutations, each exerting small to moderate effects to collectively disrupt cardiac development. For example, the frequency of congenital heart lesions is increased in individuals with velocardiofacial syndrome who have 22q1.2 deletions and a common 12p13.31 duplication involving the SLC2A3 gene. The SLC2A3 CNV likely functions as a modifier of the cardiac phenotype associated with 22q11 deletion syndrome, exemplifying a “two-hit” model [40].
More than half of patients with Down syndrome have congenital heart malformations due to the interaction of multiple dosage-sensitive CHD genes on chromosome 21 [41-43]. Down syndrome cell adhesion molecule, previously shown to play a critical role in neurogenesis, has also been implicated in the pathophysiology of CHD [44]. Analysis of rare segmental trisomies of chromosome 21 suggested that duplication of DSCAM and the contiguous COL6A1 and COL6A2 genes may cause septal abnormalities and other Down Syndrome-related CHD lesions, including BAV. Overexpression of DSCAM and COL6A2 causes cardiac malformations in mice [45]. Our findings suggest that rare CNVs involving DSCAM may contribute to some non-syndromic BAV cases.
Consistent with previous observations, GATA4 and DSCAM CNVs segregated with disease in multiple families, but are not fully penetrant and were detected in some unaffected relatives. Intriguingly, large 22q11.2, GATA4 and DSCAM CNVs were more highly enriched in EBAV than in BAVGWAS cases, suggesting that these CNVs may drive early onset BAV disease. These results are consistent with our observation that pathogenic CNVs involving candidate BAV genes are also enriched in EBAV compared to BAVGWAS cases. Our data suggests that pathogenic CNVs at these loci may predict accelerated disease onset or more severe complications.
We also identified recurrent rare CNVs of specific dosage-sensitive regions that affect cardiac developmental genes and are implicated in non-syndromic CHD. Recurrent 1q21.1 distal deletions encompassing GJA5, the gene encoding Connexin-40, are associated with CHD lesions including BAV. A study of 807 TOF cases showed significant enrichment of small duplications spanning the GJA5 gene, providing compelling evidence that it acted as the primary candidate gene, supporting the association of GJA5 and CHD [31]. Additionally, cardiac abnormalities have been documented in mice with a targeted GJA5 deletion, implying that haploinsufficiency of GJA5 might contribute to cardiac defects in individuals affected by 1q21.1 deletions [46]. CELSR1, a cadherin superfamily member, is mutated in families with BAV and hypoplastic left heart syndrome [47]. LTBP1 encodes an extracellular matrix protein that regulates TGF-beta and fibrillin and has been implicated in congenital heart lesions [48]. KIF1A, encoding a kinesin microtubule transporter, was implicated in a dominant multisystem syndromic disorder with valvular and cardiac defects [49]. Mutation of MYH11 causes familial thoracic aortic aneurysms and dissections with an increased prevalence of BAV [50]. TTN mutations cause dilated cardiomyopathy and are associated with other left-sided congenital lesions [51]. Mutations or copy number changes involving these genes all cause a wide spectrum of penetrance and phenotypic severity, consistent with sensitivity to genetic or clinical modifiers.
Our combinatorial analysis method eliminated many CNVs that were detected by single algorithms or did not meet quality control benchmarks. Therefore, our analysis likely underestimated the contribution of rare pathogenic CNVs to BAV. We also recognize that cardiac development involves the complex interaction of many genes. We selectively validated individual CNVs at loci of interest but may have underrepresented CNVs that had no a priori relationship with CHD. The apparent penetrance of some CNVs may be less than expected due to missing phenotypic information. The available clinical data was not sufficiently detailed to permit genotype-phenotype correlations with specific CHD clinical features.
In conclusion, we identified large rare CNVs in a significant proportion of BAV cases, including a subset of CNVs that may predict early onset complications of BAV disease. These observations add to the evidence that rare CNVs may eventually have clinical utility for risk stratification and personalized disease management.
Data Availability
All data produced in the present work are contained in the manuscript.
Supplemental Data
Acknowledgments
We thank Joana Castillo and Jacqueline Jennings for sample preparation, and Gladys Zapata, Nitesh Mehta, and the Laboratory for Translational Genomics at Baylor College of Medicine for microarray genotyping. This study was supported in part by R01HL137028 (SP). Fig 1 created with BioRender.com.
The EBAV Investigators are: Shaine A. Morris, Rita Milewski, Giuseppe Limongelli, Allesandro Della Corte, Laura Perrone, Yuli Y. Kim, Hector Michelena, Maria G. Andreassi, Arturo Evangelista, Denver Sallee, Anji Yetman, Kim McBride, Eduardo Bossone, Rodolfo Citro, Dawn S. Hui, Malenka M. Bissell, Andrea Ballotti, Ilenia Foffa, Margot De Marco, Anthony Caffarelli, Rita Weise, Julie DeBacker, Laura Muino Mosquera, Robbin Cohen, Laura Dos Subira, Justin T. Tretter, Anna Sabe Rotes, Martina Caiazza, Lamia Ait Ali, Francesca Pluchinotta, and Simon C. Body.
The BAVCon Investigators are: Simon Body, Alessandro Della Corte, Alessandro Frigiola, Andrea Ballotta, Arturo Evangelista, Betti Giusti, Bo Yang, Carlo de Vincentiis, Dan Gilon, David Messika Zeitoun, Dianna M. Milewicz, Eduardo Bossone, Eric Eisselbacher, Francesca R. Pluchinotta, Giuseppe Limongelli, Gordon S. Huggins, Hector I. Michelena, J. Daniel Muehlschelgel, Kim Eagle, Lasse Folkerson, Malenka M. Bissell, María Brion, Patrick Mathieu, Per Eriksson, Peter Lichtner, Rodolfo Citro, Ronen Durst, Sébastien Thériault, Siddharth K. Prakash, Thoralf M. Sundt, Vicenza Stefano Nistri, Yahn Bossé.
Footnotes
↵& Complete lists of EBAV and BAVCon Investigators are provided in the Acknowledgments.