Abstract
Advanced data-driven methods can outperform conventional features in electrocardiogram (ECG) analysis, but often lack interpretability. The variational autoencoder (VAE), a form of unsupervised machine learning, can address this shortcoming by extracting comprehensive and interpretable new ECG features. Our novel VAE model, trained on a dataset comprising over one million secondary care median beat ECGs, and validated using the UK Biobank, reveals 20 independent features that capture ECG information content with high reconstruction accuracy. Through phenome- and genome-wide association studies, we illustrate the increased power of the VAE approach for gene discovery, compared with conventional ECG traits, and identify previously unrecognised common and rare variant determinants of ECG morphology. Additionally, to highlight the interpretability of the model, we provide detailed visualisation of the associated ECG alterations. Our study shows that the VAE provides a valuable tool for advancing our understanding of cardiac function and its genetic underpinnings.
Competing Interest Statement
JSW has received research support from Bristol Myers Squibb, and has acted as a consultant for MyoKardia, Pfizer, Foresite Labs, Health Lumen, and Tenaya Therapeutics. JWW has received research support from Anumama.
Funding Statement
This work was supported by Sir Jules Thorn Charitable Trust [21JTA], Medical Research Council (UK), British Heart Foundation [RE/18/4/34215; FS/CRTF/21/24183; RG/F/22/110078, FS/IPBSRF/22/27059], NIHR Imperial College Biomedical Research Centre, and an EJP RD Research Mobility Fellowship (European Reference Networks) to ES. AS is funded by a British Heart Foundation (BHF) clinical research training fellowship (FS/CRTF/21/24183). FSN and NSP are supported by the BHF (RG/F/22/110078 and RE/18/4/34215) and the National Institute for Health Research Imperial Biomedical Research Centre. LP is funded by a Medical Research Council (MRC) clinical research training fellowship (MR/Y000803/1). For the purpose of open access, the authors have applied a creative commons attribution (CC BY) licence to any author accepted manuscript version arising.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All studies were approved by the relevant regional research ethics committees, and adhered to the principles set out in the Declaration of Helsinki. The UK Biobank study was reviewed by the National Research Ethics Service (11/NW/0382, 21/NW/0157). This study was conducted under terms of access approval number 47602 and 48666. The BIDMC cohort ethics review and approval was provided by the Beth Israel Deaconess Medical Center Committee on Clinical Investigations, IRB protocol # 2023P000042. Access to the BIDMC dataset is restricted due to ethical limitations.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
↵* Joint first authors
↵** Joint senior/corresponding authors
Sources of financial support: This work was supported by Sir Jules Thorn Charitable Trust [21JTA], Medical Research Council (UK), British Heart Foundation [RE/18/4/34215; FS/CRTF/21/24183; RG/F/22/110078, FS/IPBSRF/22/27059], NIHR Imperial College Biomedical Research Centre, and an EJP RD Research Mobility Fellowship (European Reference Networks) to ES. AS is funded by a British Heart Foundation (BHF) clinical research training fellowship (FS/CRTF/21/24183). FSN and NSP are supported by the BHF (RG/F/22/110078 and RE/18/4/34215) and the National Institute for Health Research Imperial Biomedical Research Centre. LP is funded by a Medical Research Council (MRC) clinical research training fellowship (MR/Y000803/1). For the purpose of open access, the authors have applied a creative commons attribution (CC BY) licence to any author accepted manuscript version arising.
Disclosures: JSW has received research support from Bristol Myers Squibb, and has acted as a consultant for MyoKardia, Pfizer, Foresite Labs, Health Lumen, and Tenaya Therapeutics. JWW has received research support from Anumama.
Data availability and ethics
The summary statistics supporting the GWAS findings will be made publicly available through the GWAS Catalog upon publication following peer review. The code used to perform the analyses and generate the plots for this is accessible in the supplement. All UKB data used in this study is publicly availableto registered researchers (https://www.ukbiobank.ac.uk/). The LF generated from the UKB ECGs will be made available as a Returned Dataset in the UKB.
All studies were approved by the relevant regional research ethics committees, and adhered to the principles set out in the Declaration of Helsinki. The UK Biobank study was reviewed by the National Research Ethics Service (11/NW/0382, 21/NW/0157). This study was conducted under terms of access approval number 47602 and 48666. The BIDMC cohort ethics review and approval was provided by the Beth Israel Deaconess Medical Center Committee on Clinical Investigations, IRB protocol # 2023P000042. Access to the BIDMC dataset is restricted due to ethical limitations.