ABSTRACT
Increased left ventricular (LV) mass (LVM) and LV hypertrophy (LVH) are risk markers for adverse cardiovascular events, and may indicate an underlying cardiomyopathy. Cardiac magnetic resonance (CMR) is the gold standard for LVM estimation, but is challenging to obtain at scale, which has limited the power of prior genetic analyses. In the current study, we performed a genome-wide association study (GWAS) of CMR-derived LVM indexed to body surface area (LVMI) estimated using a deep learning algorithm within nearly 50,000 participants from the UK Biobank. We identified 12 independent associations (1 known at TTN and 11 novel) meeting genome-wide significance, implicating several candidate genes previously associated with cardiac contractility and cardiomyopathy. Greater CMR-derived LVMI was associated with higher risk of incident dilated (hazard ratio [HR] 2.58 per 1-SD increase, 95% CI 2.10-3.17) and hypertrophic (HR 2.62, 95% CI 2.09-3.30) cardiomyopathies. A polygenic risk score (PRS) for LVMI was also associated with incident hypertrophic cardiomyopathy within a separate set of UK Biobank participants (HR 1.12, 95% CI 1.01-1.12) and among individuals in an external Mass General Brigham dataset (HR 1.18, 95% CI 1.01-1.37). In summary, using CMR-derived LVM available at scale, we have identified 12 common variants associated with LVMI (11 novel) and demonstrated that both CMR-derived and genetically determined LVMI are associated with risk of incident cardiomyopathy.
Journal Subject Terms machine learning, left ventricular hypertrophy, genetics
Introduction
Left ventricular hypertrophy (LVH) is defined as pathologically increased left ventricular mass (LVM)1 and is associated with increased risk of cardiovascular events including heart failure (HF),1–3 stroke,1 atrial fibrillation (AF),4 and sudden cardiac death.5 Increased LVM is also a hallmark of certain primary cardiomyopathies such as hypertrophic cardiomyopathy (HCM) and some dilated cardiomyopathies (DCM). Although LVM can be estimated using 12-lead electrocardiograms or echocardiography, cardiac magnetic resonance (CMR) offers more accurate and reproducible quantification of LVM, and has therefore emerged as the gold standard for diagnosing LVH.6
Imaging-based estimation of LVM typically requires LV segmentation, which is usually performed manually and requires substantial time and expertise. As a result, genetic analyses of imaging-based LVM have been limited by modest sample sizes. Genome-wide association studies (GWAS) of echocardiography-based LVM identified a single susceptibility locus downstream of SPCS3.7–9 More recently, a genome-wide association study within 19,000 individuals10 identified significant variants in the gene TTN associated with CMR-based LVM.
We developed a validated deep learning approach to automate estimation of LVM using CMR images (Machine Learning for Health – Segmentation [ML4Hseg]), which may increase power to detect genetic associations underlying CMR-derived LVM.11 In the current study we utilized ML4Hseg to estimate LVM using CMRs from nearly 50,000 participants in the UK Biobank. Given that body size is a major determinant of LV size and mass,12 we analyzed LVMI (i.e., LVM indexed by body surface area) in our primary analyses, and assessed unindexed LVM in secondary analyses. Our GWAS of LVMI identified 12 independent variants meeting genome-wide significance, including 11 novel associations. Using expression quantitative trait loci (eQTLs), transcriptome-wide association testing (TWAS), and tissue-specific expression level data, we propose several candidate genes, many of which have been previously associated with cardiac contractility and cardiomyopathy. We additionally develop a polygenic risk score (PRS) for LVMI, and demonstrate that both phenotypic and genetic LVMI are associated with incident cardiovascular diseases including cardiomyopathy.
Methods
Data availability
UK Biobank data are publicly available by application (www.ukbiobank.ac.uk). Mass General Brigham (MGB) data contain protected health information and cannot be shared publicly. Data processing scripts used to perform the analyses described herein are available at https://github.com/shaankhurshid/lvmass_gwas.
Study populations
The discovery sample comprised the UK Biobank, a population-based prospective cohort of 502,629 participants recruited between 2006-2010 in the United Kingdom to investigate the genetic and lifestyle determinants of disease. The design of the cohort has been described previously.13,14 Briefly, approximately 9.2 million individuals aged 40-69 years living within 25 miles of the 22 assessment centers in England, Wales, and Scotland were invited, and 5.4% participated in the baseline assessment. Extensive questionnaire data, physical measures, and biological data were collected at recruitment, with ongoing data collection in large subsets of the cohort, including repeated assessments and multimodal imaging. At the time of the current analysis, over 450,000 individuals have genome-wide genotyping data available. All participants are followed up for health outcomes through linkage to national health-related datasets.
We utilized the MGB Biobank to replicate a LVMI PRS that we derived in the UK Biobank. The MGB Biobank is a biorepository comprising patients from a multi-institutional healthcare network spanning seven hospitals in the New England region of the United States. MGB Biobank participants are followed for health outcomes through linkage to electronic health record (EHR) data.
UK Biobank and MGB Biobank participants provided written informed consent. The UK Biobank was approved by the UK Biobank Research Ethics Committee (reference number 11/NW/0382) and the MGB Biobank by the MGB Institutional Review Board. Use of UK Biobank (application #17488) and MGB Biobank data were approved by the local MGB Institutional Review Board.
Cardiac magnetic resonance acquisition
For all analyses, we included individuals who underwent CMR during a UK Biobank imaging assessment and whose bulk CMR data were available for download as of 04-01/2020 (Figure 1). The full CMR protocol of the UK Biobank has been described in detail previously.15 Briefly, all CMR examinations were performed in the United Kingdom on a clinical wide-bore 1.5 Tesla scanner (MAGNETOM Aera, Syngo Platform VD13A, Siemens Healthineers, Erlangen, Germany). All acquisitions used balanced steady-state free precession with typical parameters.
Left ventricular mass estimation
We obtained CMR-derived LVM from all individuals with available CMR imaging using ML4Hseg.11 Briefly, ML4Hseg identifies pixels corresponding to LV myocardium, which are then summed to estimate LV area and multiplied by slice thickness to estimate LV myocardial volume. LV myocardial volume is then multiplied by myocardial density (1.05 g/cm3) to yield LVM. As performed previously,11 LVM estimates were calibrated to the sex-specific sample means using manually labeled LVM measurements which were available within a subset of the UK Biobank sample (n=4,910). LVM estimates obtained using the described method have been shown to have very good correlation (Pearson r 0.86) and agreement (mean absolute error 10g) against manually labeled LVM in the UK Biobank.11 LVM estimates were indexed for body surface area using the DuBois formula to yield LVMI.16 A total of 53 (0.1%) individuals with estimated LVM values ≤0g were removed prior to analyses (Figure 1). The distribution of CMR-derived LVM is shown in Supplemental Figure 1.
Genome-wide association study
To identify common genetic variation associated with CMR-derived LVM, we performed a GWAS of indexed LVM using BOLT-LMM.17 The model was adjusted for age at CMR acquisition, sex, array platform, and first five principal components of genetic ancestry. We excluded variants with minor allele frequency <1%. Associations were considered statistically significant at the standard genome-wide significance level (p=5×10−8). Lead single nucleotide polymorphisms (SNPs) were grouped into independent loci based on distance (±500kb), with conditional analyses performed to assess for independent signals within windows. Variants having suggestive (i.e., p<1×10−6) but not genome-wide significant associations were similarly tabulated. Genetic inflation was assessed by calculating the genomic control factor λ, inspecting quantile-quantile plots, and calculating the linkage disequilibrium score (LDSC) regression intercept.18 Observed scale heritability (h2) was estimated using the slope of LDSC regression. We assessed for independent signals within genome-wide significant loci by a) performing GWAS while conditioning on the imputed allele dosage of each lead SNP found in the primary GWAS (excluding insertion-deletion variants), and b) performing GWAS while conditioning on the top variant on chromosome 17 alone (rs6503451), to assess whether the additional variant located 914kb apart on chromosome 17 (rs199502, r2=0.37), was independent. The primary GWAS was performed among individuals of all genetic ancestries. In secondary analyses, we performed analogous GWAS restricted to individuals of European genetic ancestry, and also analyzed unindexed LV mass as the dependent variable. Further details of GWAS methods are provided in the Supplemental Methods.
Bioinformatics and in silico functional analyses
We assessed whether genes within 500kb of lead SNPs were related to cardiac gene expression using GTEx19 version 8 cis-eQTL tissue data. To maximize power to detect potential candidate genes, as performed previously, we considered eQTLs for both atrial appendage (AA) and LV tissue data.20,21 We included lead variants as well as strong proxy variants (r2 ≥ 0.8). We also quantified tissue-specific expression levels from bulk RNA sequencing data from GTEx19 version 8. We evaluated the effects of predicted gene expression levels on LVMI by performing a transcriptome-wide association study (TWAS) using S-PrediXcan.22 GTEx genotypes and normalized expression data in AA and LV tissues provided in the software were used as training sets to develop the prediction models. Prediction models between each gene-tissue pair were developed using elastic net regression. In total, we tested 6,636 and 6,008 associations in AA and LV, respectively. The significance threshold for S-PrediXcan was therefore set at p=0.05/(6636+6008), or 3.95×10−6.
We prioritized likely candidate genes on the basis of proximity to the lead variant, eQTLs, TWAS, tissue-specific expression levels, and biologic plausibility based on previously reported data.
Polygenic risk score development
To develop a PRS as a genetic instrument for CMR-derived LVMI, we applied a pruning and thresholding approach to our LVMI GWAS results. After removing insertion-deletion variants and strand ambiguous (i.e., A/T and C/G) variants to facilitate replication, we developed and tested four separate candidate PRS utilizing each combination of two thresholds used to define index SNPs (p=1×10−6 and p=1×10−4) and two thresholds used to prune proxy SNPs (r2=0.3 and r2=0.5). We then selected the PRS explaining the greatest variance in LVMI within the derivation set, which ultimately comprised a set of 475 variants (variance of LVMI explained=0.086; +3.60 g/m2 increase in LVMI per 1-standard deviation [1-SD] increase in PRS, p<0.01).
Outcomes association testing
We assessed for associations between CMR-derived LVMI and incident AF, myocardial infarction, HF, ventricular arrhythmias, DCM, HCM, and implantable cardioverter-defibrillator (ICD) within participants with follow-up clinical data available after the imaging visit. We assessed for analogous associations using LVH, which was defined as LVMI >72g/m2 in men and >55 g/m2 in women,23 and alternatively as the sex-specific 90th percentile of LVM.1 Diseases were defined using combinations of self-report and inpatient International Classification of Diseases, 9th and 10th revision codes (Supplementary Table 1). Start of follow-up was defined at the time of CMR acquisition and spanned until the earliest of an incident event, death, or last follow-up. The date of last follow-up was dependent upon the availability of linked hospital data, and was therefore defined as March 31, 2021 for participants enrolled in England (93.6%) and Scotland (6.1%), and February 28, 2018 for participants enrolled in Wales (0.3%).
We performed analogous association testing between the LVMI PRS and the same set of incident cardiovascular events among individuals in the UK Biobank that did not undergo CMR (n=443,330). Outcome and person-time definitions were similar, although start of follow-up was defined as the date of UK Biobank enrollment and blood sample collection. We also repeated association testing between the LVMI PRS and incident events in the independent MGB Biobank sample, using analogous models with person-time beginning at the date of blood sample collection and ending at an event, death, or last encounter in the EHR.
Mendelian randomization analyses of blood pressure
As a form of validation of our LVM estimation, we sought to identify evidence of known causal associations between elevated blood pressure and increased LVM.24 We therefore conducted two-sample Mendelian randomization (MR) within individuals of genetic European ancestry in the UK Biobank sample. Genetic instruments were derived from a previous GWAS for systolic blood pressure (SBP) and diastolic blood pressure (DBP).25 Utilizing a 24 SNP instrument for SBP and a 26 SNP instrument for DBP, we prioritized inverse-variance weighted (IVW) meta-analyses of the effect of each SNP on CMR-derived LVMI (and LVM) divided by the effect of the same SNP on SBP or DBP. Weighted median and MR-Egger analyses were performed secondarily to address potential invalid instruments and directional pleiotropy. Further details of the MR analysis are provided in the Supplemental Methods.
Statistical analysis
We tested associations between CMR-derived LVM and incident AF, myocardial infarction, HF, ventricular arrhythmias, DCM, HCM, and ICD using Cox proportional hazards regression with adjustment for sex and age at CMR acquisition. We fit analogous models using LVH (defined using the thresholds described above) and the LVMI PRS as the primary exposures. Models including the PRS were additionally adjusted for the first five principal components of genetic ancestry. Validity of the proportionality assumption was assessed using the Grambsch-Therneau test of correlation26 as well as visual inspection of smoothed fits to Schoenfeld residuals versus time. Where present, substantial deviations from proportional hazards (observed only for age, sex, and certain principal components of ancestry), were modeled by including interaction terms with strata of person-time.
Statistical analyses were performed using R v4.0 (packages ‘data.table’, ‘ggplot2’, ‘survival’, ‘prodlim’, ‘MendelianRandomization’).27,28 Except where otherwise noted, all two-tailed p-values <0.05 were considered statistically significant.
Results
Genome-wide association study of CMR-derived LVM
We conducted a multi-ancestry GWAS including 43,271 individuals (91% European ancestry) (Table 1). The analysis included a total of 9.9 million common variants imputed at an INFO score ≥0.30 and having minor allele frequency (MAF) ≥1%. The genomic control factor was 1.15 with a linkage disequilibrium score regression intercept of 1.00, consistent with polygenicity of the LVMI trait as opposed to inflation (Supplementary Figure 2). Observed scale h2 for LVMI was 0.26 (standard error [SE] 0.02).
The GWAS initially revealed 12 candidate SNPs associated with CMR-derived LVMI at genome-wide significance (Table 2, Figure 2). Conditional analyses identified an additional variant on chromosome 2, and that the two variants on chromosome 17 located 914kb apart (r2=0.37) were not independent, ultimately resulting in 12 lead SNPs for LVMI. The SNP most strongly associated with LVMI (rs2255167, p=1.7×10−26) was located at the TTN locus on chromosome 2 and has been previously associated with LVM. TTN is highly expressed in LV tissue (Supplementary Table 2).10 The remaining loci (n=11) were novel, with many located at or proximate to genes implicated in arrhythmias, cardiomyopathy and cardiomyocyte function, including FLNC, MYOZ1, MAPT, WNT, CLCN6, and SYNPO2L. Regional association plots for each genome-wide significant SNP are shown in Supplementary Figure 3. Results for 16 additional variants having suggestive but not genome-wide significant associations, including rs3729989 near the MYBPC3 gene (p=5.9×10−8), are shown in Supplementary Table 3. A secondary GWAS of unindexed LVM revealed 12 genome-wide significant SNPs, of which 6 overlapped with the primary LVMI GWAS, and a 7th was a strong proxy (r2=0.87). Loci unique to analyses of unindexed LVM appeared primarily enriched for genes associated with body size (e.g., FTO, HMGA2, GDF5), although FTO has also been implicated in HF29 and CDKN1A has been associated with DCM in a recent multi-trait analysis30 (Supplementary Table 4 and Supplementary Figure 4).
In a GWAS restricted to individuals of European ancestry, 11 loci met genome-wide significance, of which 10 were shared with the primary GWAS (Supplementary Table 5 and Supplementary Figures 5-6). The single locus unique to the European ancestry analysis was rs143973349, an insertion-deletion variant located near FLNC, a gene highly expressed in LV tissue and previously associated with familial hypertrophic, restrictive, and arrhythmogenic cardiomyopathies.31–33 This locus had a suggestive association with LVMI in the primary multi-ancestry GWAS (p=2.7×10−7).
Bioinformatics and in silico functional analyses to determine candidate genes
In total, of the 12 independent lead SNPs, eight (or their proxies at r2 ≥ 0.8) were significant eQTLs in LV and/or AA tissue samples (Figure 3). The locus unique to the European ancestry subset also included eQTLs for LV and AA tissue. For a significant proportion of candidate genes, expression was identified in both LV and AA tissue samples. We then performed TWAS and identified 6 genes across 5 loci where predicted expression was associated with LVMI. Each of the genes implicated by TWAS was also an eQTL for either LV or AA (Figure 3). Detailed results of eQTL and TWAS analyses are shown in Supplementary Table 2.
Probable candidate genes at each locus of interest are summarized in Figure 3. Specific genes implicated by multiple lines of evidence included PDXDC1 near the rs56252725 locus, IGF1R near rs6598541, SYNPO2L and MYOZ1 near rs34163229, LRRC37A near rs6503451, WNT3 near rs199502, and TNNT3 near rs149188822. For the locus detected only in the European ancestry subset, KCP near rs143973349 was also prioritized by multiple analyses. Selected genes prioritized solely based on strong biologic plausibility or previous associations with LVM included TTN near rs255167, ORAI1 near rs28552516, and FLNC near rs143973349 (EUR only subset). Both FLNC and TTN have substantial expression in LV tissue (Supplementary Table 2).
Associations between LVMI and cardiomyopathy
We assessed for associations between CMR-derived LVMI and incident cardiovascular disease. At a median follow-up of 2.7 years (Q1: 1.9, Q3: 4.1), greater LVMI was consistently associated with greater risk of multiple conditions, including AF, HF, DCM, and HCM (Table 3). CMR-derived LVH was strongly associated with incident DCM (HR 10.9, 95% CI 4.67-20.2), HCM (HR 9.26, 95% CI 3.20-26.8), and ICD insertion (HR 8.42, 95% CI 3.82-18.6). Cumulative risk of events stratified by presence versus absence of CMR-derived LVH is depicted in Figure 4.
We next evaluated associations between LVMI genetic risk and incident outcomes. In a set of UK Biobank participants separate from the GWAS sample (n= 443,330), a greater LVMI PRS was associated with higher risk of multiple incident conditions including AF, HF, DCM, HCM, and ICD (Table 4). In the independent MGB sample (n=30,040), the LVMI PRS was again associated with incident HCM and ICD (Table 4). In models of incident HCM risk, the relative hazard of HCM increased log-linearly with greater CMR-derived LVMI as well as LVMI PRS, with similar effect sizes in both the UK Biobank and MGB (Figure 5). Disease association results were similar in analyses restricted to individuals of European ancestry (Supplementary Table 6).
Mendelian randomization
To assess for potential causal associations between blood pressure and CMR-derived LVMI, we performed MR analyses using genetic instruments for SBP and DBP. In an inverse variance weighted two-sample MR, a 1-SD increase in genetically mediated SBP was associated with a 0.203g increase in CMR-derived LVMI (95% CI, 0.076-0.33, p=0.002), and a 1-SD increase in genetically mediated DBP was associated with a 0.303g increase in CMR-derived LVMI (95% CI, 0.109-0.497, p=0.002). Weighted median two-sample MR analyses confirmed associations between both SBP and DBP and increased LVMI (p<5.34×10−4). MR-Egger analyses did not detect significant pleiotropy in any of the tested associations (intercept p>0.547). Associations were directionally similar but did not meet statistical significance for unindexed LVM (inverse variance weighted estimate: 0.265g per 1-SD increase in genetically mediated SBP, 95% CI -0.008-0.538, p=0.057 and 0.390g per 1-SD increase in genetically mediated DBP, 95% CI -0.041-0.821, p=0.08).
Discussion
In the current study, we utilized a deep learning segmentation algorithm to perform GWAS of CMR-derived LVMI in nearly 50,000 individuals. Leveraging favorable statistical power and a rich imaging-based phenotype, we identified 12 independent loci associated with LVMI at genome-wide significance, including 11 associations that have not been previously reported. Downstream analyses prioritize several candidate genes, including multiple genes previously associated with cardiac structure and function, as well as cardiomyopathy. Importantly, both CMR-derived and genetically determined LVMI was associated with greater risk of incident cardiovascular events, and in particular incident HCM and DCM.
Our analyses suggest that common variants in cardiac structural and functional genes appear to be important determinants of LVM. Consistent with prior results,10 CMR-derived LVMI was strongly associated with variation at SNP rs2255167, located on the large sarcomeric protein TTN. MYOZ1, which was prioritized by both eQTL and TWAS and encodes a sarcomeric protein involved in the calcineurin signaling pathway, has been previously associated with both HF29 and AF.34 A mouse knockout of MYOZ1 resulted in increased exercise capacity through activation of the nuclear factor of activated T-cells.35 Another gene prioritized by both eQTL and TWAS, TNNT3, encodes a troponin T isoform which is highly expressed in LV tissue. The TNNT3 R63H variant has been shown to result in increased contractility in mouse skeletal muscle and is a known cause of the human disease Arthrogryposis (Type 2B2),36 which is characterized by limb contractures (i.e., excessive muscular contraction). SYNPO2L, an actin-related protein expressed in LV myocardium, has been previously associated with AF,37 HF,38 HCM,30 and voltage-duration product (i.e., a clinical indicator of LVH).39
Several of the candidate genes we identified prioritize neurohormonal regulation and response to physiologic stress as potential genetic determinants of LVMI. Specifically, lead variant rs143800963 is located on chromosome 1 within 20kb of NPPA and NPPB, genes that encode the natriuretic peptides Nppa and Nppb, respectively, with both proteins playing important roles in blood pressure regulation and salt homeostasis.40 Both Nppa and Nppb are constitutively expressed in ventricular myocardium and are also upregulated in response to stress.41 Knockout of NPPB in mice results in increased cardiac fibrosis in response to pressure overload.42 Conversely, cardiomyocyte-specific deletion of ORAI1, the product of which regulates calcium-induced calcium release, results in improved response to pressure overload, with preservation of normal calcium handling and LV systolic function as well as protection against angiotensin II-induced cardiac remodeling in adult myocardium.43 IGFR1, an eQTL for LV tissue and in which predicted expression in LV was associated with LVMI, encodes the insulin-like growth factor receptor 1, which has been implicated in organ growth and insulin resistance.44
Consistent with expectations, several LVMI candidate genes have previous links to cardiomyopathy and HF. The strongest association we observed with LVMI was at rs2255167, a variant located in TTN, in which mutations have been previously associated with familial cardiomyopathy45 and early-onset AF.46 One of the loci detected in the European ancestry sub-analysis (and suggestive in the primary analysis), FLNC, encodes filamin C, an actin-related protein associated with familial HCM,32 restrictive cardiomyopathy,33 and arrhythmogenic cardiomyopathy.31 A mouse knock-in of filamin C results in myofibrillar degeneration.47 PPP3CB, which encodes the signaling protein calcineurin, has been implicated in pathologic (as opposed to physiologic) cardiac hypertrophy.48 FTO, an obesity gene previously associated with HF,29 was associated with unindexed LV mass, but not for LVMI. Although not genome-wide significant (p=5.9×10−8), we also observed a suggestive association between LVMI and rs3729989 near the MYBPC3 gene, which encodes a cardiac myosin-binding protein in which mutations are well-recognized causes of DCM and HCM.49,50
Importantly, we observed that both phenotypic and genetically determined LVMI were associated with substantially increased risks of incident cardiovascular events. Increased LVMI and LVH are consistently associated with an increased risk of HF.2 Indeed, we observed that greater LVMI was associated not only with greater risk of HF, but also incident DCM, HCM, and insertion of an ICD (a surrogate for cardiomyopathy or ventricular arrhythmias). Consistent with the notion that LVMI may be an endophenotype for certain cardiomyopathies, we observed that genetically determined LVMI – estimated using a 475-variant PRS – was consistently associated with greater risk of incident HCM within a separate set of UK Biobank participants as well as an external sample from the MGB healthcare system. Although recent work suggests that variants associated with risk of HCM and DCM tend to have opposite directions of effect,30 our findings support the notion that certain variants may increase risk of both conditions via pleiotropic effects on contractility and LV hypertrophy. Indeed, rs255167, the most significant association in our LVMI analysis, is within 100 base pairs of rs2042995, a variant associated with increased risk of both HCM and DCM in a prior multi-trait analysis.30 Overall, our findings provide evidence that the genetic variation underlying increased LVM may be clinically relevant, and may improve identification of individuals at greater risk of incident cardiomyopathy.
Our study has limitations. First, although our primary analysis was a mixed-ancestry GWAS, our sample is predominantly of European descent. As a result, our results may not generalize to individuals of other ancestries. Second, we used a deep learning model to estimate CMR-derived LVM, and therefore imperfect accuracy of estimations may have lead to reduced functional power to detect genetic associations. Nevertheless, we note that the model we utilized (ML4Hseg) is very accurate, having a correlation of r=0.86 with hand-labeled CMR-derived LVM in the UK Biobank,51 and using MR analyses we were able to demonstrate evidence of known causal relationships between elevated blood pressure and increased indexed LVM.24 Third, our ability to assess for associations between CMR-derived LVMI and incident outcomes was limited by the amount of follow-up currently available after the imaging analysis. Fourth, generalizability of our findings may be affected by bias introduced by methods of enrollment, as the UK Biobank appears somewhat enriched for health and socioeconomic status as compared to the general population.52
In summary, we performed a GWAS of deep-learned CMR-derived LVM including nearly 50,000 individuals. We discovered 12 independent loci meeting genome-wide significance, including 11 that are novel. Using complementary downstream analyses, we identified multiple candidate genes, many of which are involved in cardiac structure and function, and several that have been previously implicated in cardiomyopathy. Both CMR-derived and genetically determined LVM were associated with incident HCM in independent datasets. Our findings add to our understanding of the common genetic variation underlying LVM and demonstrate the potential to use deep learning to define rich phenotypes at scale to empower clinically relevant biological discovery.
Data Availability
UK Biobank data are publicly available by application (www.ukbiobank.ac.uk). MGB data contain protected health information and cannot be shared publicly. Data processing scripts used to perform the analyses described herein are available at https://github.com/shaankhurshid/lvmass_gwas.
Disclosures
Dr. Pirruccello has consulted for Maze Therapeutics. Dr. Friedman receives research support from Bayer AG and IBM. Dr. Weng receives sponsored research support from IBM to the Broad Institute. Dr. Anderson receives research support from Bayer AG and has consulted for ApoPharma, Inc. Dr. Batra receives research support from Bayer AG and IBM, and consults for Novartis. Dr. Ho receives research support from Bayer AG and Gilead Sciences, and has received research supplies from EcoNugenics. Dr. Philippakis receives research support from Bayer AG, IBM, Intel, and Verily, and has consulted for Novartis and Rakuten. Dr. Ellinor receives research support from Bayer AG, and has consulted for Bayer AG, Novartis, MyoKardia and Quest Diagnostics. Dr. Lubitz receives research support from Bristol Myers Squibb/Pfizer, Bayer AG, Boehringer Ingelheim, and Fitbit, and has consulted for Bristol Myers Squibb/Pfizer and Bayer AG, and participates in a research collaboration with IBM.
Funding
Dr. Pirruccello is supported by a John S. LaDue Memorial Fellowship. Dr. Weng is supported by NIH 1R01HL139731. Dr. Choi is supported by the NIH NHLBI BioData Catalyst Fellows program. Dr. Ho is supported by NIH R01HL134893, R01HL140224, and K24HL153669. Dr. Lubitz is supported by NIH 1R01HL139731 and American Heart Association 18SFRN34250007. Dr. Ellinor is supported by NIH 1R01HL092577, R01HL128914, and K24HL105780, American Heart Association 18SFRN34110082, and the Foundation Leducq 14CVD01. Dr. Nauffal is supported by NIH T32HL007604.