Abstract
The predictive performance of polygenic scores (PGS) is largely dependent on the number of samples available to train the PGS. Increasing the sample size for a specific phenotype is expensive and takes time, but this sample size can be effectively increased by using genetically correlated phenotypes. We propose a framework to generate multi-PGS from thousands of publicly available genome-wide association studies (GWAS) with no need to individually select the most relevant ones. In this study, the multi-PGS framework increased prediction accuracy over single PGS for all included psychiatric disorders and other available outcomes, with prediction R2 increases of up to 9-fold for attention-deficit/hyperactivity disorder (ADHD) compared to a single PGS. We also generate multi-PGS for phenotypes without an existing GWAS and for case-case predictions, with up to 15-fold increases in prediction accuracy. We benchmark the multi-PGS framework against other methods and highlight its potential application to new emerging biobanks.
Competing Interest Statement
CM Bulik reports: Shire (grant recipient, Scientific Advisory Board member); Lundbeckfonden (grant recipient); Pearson (author, royalty recipient); Equip Health Inc. (Clinical Advisory Board). All other authors declare no conflicts.
Funding Statement
C.A., B.J.V. and F.P. were supported by the Danish National Research Foundation (Niels Bohr Professorship to Prof. John McGrath), the Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH (R102-A9118, R155-2014-1724 and R248-2017-2003), and a Lundbeck Foundation Fellowship (R335-2019-2339). C.A. was also supported by a Willam Demant Fonden travel grant. IB was also supported by the Swedish Brain Foundation and Fredrik och Ingrid Thurings Stiftelse. AN data are from the Anorexia Nervosa Genetics Initiative, an initiative of the Klarman Family Foundation, and extendented with support from the Lundbeck foundation (R276-2017-4581).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study was approved by the Danish Data Protection Agency, and data access was approved by Statistics Denmark and the Danish Health Data Authority. Approval by the Ethics Committee and written informed consent were not required for register-based projects [Act no. 1338 of 1 September 2020, section 10 on research ethics for administration of health scientific research projects and health data scientific research projects]. All data were de-identified and not recognizable at an individual level.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Figure 3 updated. Supplemental files updated.
Data Availability
All relevant iPSYCH and Danish ANGI data are available from the authors after approval by the iPSYCH Data Access Committee and can only be accessed on the secured Danish server as the data are protected by Danish legislation. More information about getting access to the Danish data can be obtained at http://ipsych.au.dk/about-ipsych/. We utilized the list of GWAS Catalog (https://www.ebi.ac.uk/gwas/) summary statistics downloaded on 09/09/2020, GWAS summary statistics from the PGC (https://www.med.unc.edu/pgc/download-results/) and GWAS Atlas UKB2 data freeze v20191115 (https://atlas.ctglab.nl/).