PT - JOURNAL ARTICLE AU - Coppola, Edoardo AU - Savardi, Mattia AU - Massussi, Mauro AU - Adamo, Marianna AU - Metra, Marco AU - Signoroni, Alberto TI - HuBERT-ECG: a self-supervised foundation model for broad and scalable cardiac applications AID - 10.1101/2024.11.14.24317328 DP - 2024 Jan 01 TA - medRxiv PG - 2024.11.14.24317328 4099 - http://medrxiv.org/content/early/2024/11/18/2024.11.14.24317328.short 4100 - http://medrxiv.org/content/early/2024/11/18/2024.11.14.24317328.full AB - Deep learning models have shown remarkable performance in electrocardiogram (ECG) analysis, but their success has been constrained by the limited availability and size of ECG datasets, resulting in systems that are more task specialists than versatile generalists. In this work, we introduce HuBERT-ECG, a foundation ECG model pre-trained in a self-supervised manner on a large and diverse dataset of 9.1 million 12-lead ECGs encompassing 164 cardiovascular conditions. By simply adding an output layer, HuBERT-ECG can be fine-tuned for a wide array of downstream tasks, from diagnosing diseases to predicting future cardiovascular events. Across diverse real-world scenarios, HuBERT-ECG achieves AUROCs from 84.3% in low-data settings to 99% in large-scale setups. When trained to detect 164 overlapping conditions simultaneously, our model delivers AUROCs above 90% and 95% for 140 and 94 diseases, respectively. HuBERT-ECG also predicts death events within a 2-year follow-up with an AUROC of 93.4%. We release models and code.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work has been partly funded by 1) Regione Lombardia, Italy, through the initiative Programme of measures for economic recovery: development of new cooperation agreements with universities for research, innovation and technology transfer - DGR n. XI/4445/2021; 2) European Union - Next Generation EU, through the Italian Ministry of Research PRIN 2022, project n. 2022A49KR3 QT-SEED Quality-of-life Technological and Societal Exploitation of ECG Diagnostics.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The CODE Study (Ribeiro dataset) was approved by the Research Ethics Committee of the Universidade Federal de Minas Gerais, protocol 49368496317.7.0000.5149.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll datasets supporting the findings described in this manuscript are public, except for Ribeiro. This dataset, the test set of which is publicly available, is accessible for scientific research upon request to the respective owner.