PT - JOURNAL ARTICLE AU - Detrois, Kira E. AU - Hartonen, Tuomo AU - Teder-Laving, Maris AU - Jermy, Bradley AU - Läll, Kristi AU - Yang, Zhiyu AU - Estonian Biobank research team, FinnGen AU - Mägi, Reedik AU - Ripatti, Samuli AU - Ganna, Andrea TI - Transferability and accuracy of electronic health record-based predictors compared to polygenic scores AID - 10.1101/2024.10.08.24315073 DP - 2024 Jan 01 TA - medRxiv PG - 2024.10.08.24315073 4099 - http://medrxiv.org/content/early/2024/10/08/2024.10.08.24315073.short 4100 - http://medrxiv.org/content/early/2024/10/08/2024.10.08.24315073.full AB - Electronic health record (EHR)-based phenotype risk scores (PheRS) leverage individuals’ health trajectories to infer disease risk. Similarly, polygenic scores (PGS) use genetic information to estimate disease risk. While PGS generalizability has been previously studied, less is known about PheRS transferability across healthcare systems and whether PheRS provide complementary risk information to PGS.We trained PheRS to predict the onset of 13 common diseases with high health burden in a total of 845,929 individuals (age 32-70) from 3 biobank-based studies from Finland (FinnGen), the UK (UKB) and Estonia (EstB). The PheRS were based on elastic-net models, incorporating up to 242 diagnoses captured in the EHR up to 10 years before baseline. Individuals were followed up for a maximum of 8 years, during which disease incidence was observed. PGS were calculated for each disease using recent publicly available results from genome-wide association studies.All 13 PheRS were significantly associated with the diseases of interest. The PheRS trained in different biobanks utilized partially distinct diagnoses, reflecting differences in medical code usage across the countries. Even with the large variability in the prevalence of various diagnoses, most PheRS trained in the UKB or EstB transferred well to FinnGen without re-training. PheRS and PGS were only moderately correlated (Pearson’s r ranging from 0.00 to 0.08), and models including both PheRS and PGS improved onset prediction compared to PGS alone for 8/13 diseases. PheRS was able to identify a subset of individuals at high-risk better than PGS for 8/13 disease.Our results indicate that EHR-based risk scores and PGS capture largely independent information and provide additive benefits for disease risk prediction. Furthermore, for many diseases the PheRS models transfer well between different EHRs. Given the large availability of EHR, PheRS can provide a complementary tool to PGS for risk stratification.Competing Interest StatementAndrea Ganna is the founder of Real World Genetics Oy. Bradley Jermy became an employee of BioMarin after his part of this work was completed. Kristi Läll has participated as an analyst in a collaboration research project at the Institute of Genomics, University of Tartu, which was funded by Geneto OÜ. No other authors have competing interests to declare.Funding StatementThis study has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement no. 101016775. This Estonian Biobank study was funded by the European Union through the European Regional Development Fund Project No. 2014-2020.4.01.15-0012 GENTRANSMED. A.G. has received funding from the European Union's Horizon 2020 research and innovation programme under grant no. 101016775, the European Research Council under the European Union's Horizon 2020 research and innovation programme (grant number 945733) and from Academy of Finland fellowship grant no. 323116.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Patients and control subjects in FinnGen provided informed consent for biobank research, based on the Finnish Biobank Act. Alternatively, separate research cohorts, collected prior to the Finnish Biobank Act came into effect (in September 2013) and the start of FinnGen (August 2017), were collected based on study-specific consents and later transferred to the Finnish Biobanks after approval by Fimea (Finnish Medicines Agency), the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Fimea. The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) statement number for the FinnGen study is Nr HUS/990/2017. The FinnGen study is approved by Finnish Institute for Health and Welfare (permit numbers: THL/2031/6.02.00/2017, THL/1101/5.05.00/2017, THL/341/6.02.00/2018, THL/2222/6.02.00/2018, THL/283/6.02.00/2019, THL/1721/5.05.00/2019 and THL/1524/5.05.00/2020), digital and population data service agency (permit numbers: VRK43431/2017-3, VRK/6909/2018-3, VRK/4415/2019-3), the Social Insurance Institution (permit numbers: KELA 58/522/2017, KELA 131/522/2018, KELA 70/522/2019, KELA 98/522/2019, KELA 134/522/2019, KELA 138/522/2019, KELA 2/522/2020, KELA 16/522/2020), Findata permit numbers THL/2364/14.02/2020, THL/4055/14.06.00/2020, THL/3433/14.06.00/2020, THL/4432/14.06/2020, THL/5189/14.06/2020, THL/5894/14.06.00/2020, THL/6619/14.06.00/2020, THL/209/14.06.00/2021, THL/688/14.06.00/2021, THL/1284/14.06.00/2021, THL/1965/14.06.00/2021, THL/5546/14.02.00/2020, THL/2658/14.06.00/2021, THL/4235/14.06.00/2021, Statistics Finland (permit numbers: TK-53-1041-17 and TK/143/07.03.00/2020 (earlier TK-53-90- 20) TK/1735/07.03.00/2021, TK/3112/07.03.00/2021) and Finnish Registry for Kidney Diseases permission/extract from the meeting minutes on 4th July 2019. The Biobank Access Decisions for FinnGen samples and data utilized in FinnGen Data Freeze 10 include: THL Biobank BB2017_55, BB2017_111, BB2018_19, BB_2018_34, BB_2018_67, BB2018_71, BB2019_7, BB2019_8, BB2019_26, BB2020_1, BB2021_65, Finnish Red Cross Blood Service Biobank 7.12.2017, Helsinki Biobank HUS/359/2017, HUS/248/2020, HUS/150/2022 12, 13, 14, 15, 16, 17, 18, and 23, Auria Biobank AB17-5154 and amendment #1 (August 17 2020) and amendments BB_2021-0140, BB_2021-0156 (August 26 2021, Feb 2 2022), BB_2021-0169, BB_2021-0179, BB_2021-0161, AB20-5926 and amendment #1 (April 23 2020)and it's modification (Sep 22 2021), Biobank Borealis of Northern Finland_2017_1013, 2021_5010, 2021_5018, 2021_5015, 2021_5023, 2021_5017, 2022_6001, Biobank of Eastern Finland 1186/2018 and amendment 22/2020, 53/2021, 13/2022, 14/2022, 15/2022, Finnish Clinical Biobank Tampere MH0004 and amendments (21.02.2020 and 06.10.2020), 8/2021, 9/2022, 10/2022, 12/2022, 20/2022, 21/2022, 22/2022, 23/2022, Central Finland Biobank 1-2017, and Terveystalo Biobank STB 2018001 and amendment 25th Aug 2020, Finnish Hematological Registry and Clinical Biobank decision 18th June 2021, Arctic Biobank P0844: ARC_2021_1001. Ethics approval for the UK Biobank study was obtained from the North West Centre for Research Ethics Committee (11/NW/0382). UK Biobank data used in this study were obtained under approved application 78537. The activities of the EstBB are regulated by the Human Genes Research Act, which was adopted in 2000 specifically for the operations of the EstBB. Individual level data analysis in the EstBB was carried out under ethical approval 1.1-12/624 from the Estonian Committee on Bioethics and Human Research (Estonian Ministry of Social Affairs), using data according to release application S22, document number 6-7/GI/16259 from the EstBB.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesThe individual-level data in these studies is protected for data privacy, access is regulated through the biobanks. The Finnish biobank data can be accessed through the Fingenious services (https://site.fingenious.fi/en/) managed by FINBB. Researchers interested in EstBB can request access at https://www.geenivaramu.ee/en/access-biobank and the UKB data are available through a procedure described at http://www.ukbiobank.ac. The GWAS data used in this study are available in the GWAS catalog database under accession codes listed in Supplement Table 10. The PGS scores generated in this study are available in the PGS Catalog under publication ID: PGP000618 and score IDs: PGS004869-PGS004886.https://github.com/intervene-EU-H2020/INTERVENE_PheRShttps://github.com/intervene-EU-H2020/onset_prediction