PT - JOURNAL ARTICLE AU - Ometto, Sara AU - Chatterjee, Soumick AU - Vergani, Andrea Mario AU - Landini, Arianna AU - Sharapov, Sodbo AU - Giacopuzzi, Edoardo AU - Visconti, Alessia AU - Bianchi, Emanuele AU - Santonastaso, Federica AU - Soda, Emanuel M. AU - Cisternino, Francesco AU - Ieva, Francesca AU - Di Angelantonio, Emanuele AU - Pirastu, Nicola AU - Glastonbury, Craig A. TI - Unsupervised cardiac MRI phenotyping with 3D diffusion autoencoders reveals novel genetic insights AID - 10.1101/2024.11.04.24316700 DP - 2024 Jan 01 TA - medRxiv PG - 2024.11.04.24316700 4099 - http://medrxiv.org/content/early/2024/11/05/2024.11.04.24316700.short 4100 - http://medrxiv.org/content/early/2024/11/05/2024.11.04.24316700.full AB - Biobank-scale imaging provides a unique opportunity to characterise structural and functional cardiac phenotypes and how they relate to disease outcomes. However, deriving specific phenotypes from MRI data requires time-consuming expert annotation, limiting scalability and does not exploit how information dense such image acquisitions are. In this study, we applied a 3D diffusion autoencoder to temporally resolved cardiac Magnetic Resonance Imaging (MRI) data from 71,021 UK Biobank participants to derive latent phenotypes representing the human heart in motion. These phenotypes were reproducible, heritable (h2 = [4 - 18%]), and significantly associated with cardiometabolic traits and outcomes, including atrial fibrillation (P = 8.5 × 10−29) and myocardial infarction (P = 3.7 × 10−12). By using latent space manipulation techniques, we directly interpreted and visualised what specific latent phenotypes were capturing in a given MRI. To establish the genetic basis of such traits, we performed a genome-wide association study, identifying 89 significant common variants (P < 2.3×10−9) across 42 loci, including seven novel loci. Extensive multi-trait colocalisation analyses (PP.H4 > 0.8) linked these variants to various cardiac traits and diseases, revealing a shared genetic architecture spanning phenotypic scales. Polygenic Risk Scores (PRS) derived from latent phenotypes demonstrated predictive power for a range of cardiometabolic diseases and high risk individuals had substantially increased cumulative hazard rates across a range of diseases. This study showcases the use of diffusion autoencoding methods as powerful tools for unsupervised phenotyping, genetic discovery and disease risk prediction using cardiac MRI imaging data.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not recieve any funding.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This research has been conducted using the UK Biobank Resource under application number 82779.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.Yes