ABSTRACT
Risk prediction models provide empirical recommendations that ultimately aim to deliver optimal patient outcomes. Genetic information, in the form of a polygenic risk score (PRS), may be included in these models to significantly increase their accuracy. Several analyses of PRS accuracy have been completed, nearly all focus on only a few diseases and report limited statistics. This narrow approach has limited our ability to assess as a whole whether PRSs can provide actionable disease predictions. This investigation aims to address this uncertainty by comprehensively analyzing 23 diseases within the UK Biobank. Our results show that including the PRS to a base model containing age, sex and the top ten genetic principal components significantly improves prediction accuracy, as measured by ROC curves, in a majority 21 of 23 diseases and reclassifies on average 68% of the individuals in the top 5% risk group. For heart failure, breast cancer, prostate cancer and gout, decision curve analyses using the 5% risk threshold determined that including the PRS in the base model would correctly identity at least 60 more individuals who develop the disease for every 1000 individuals screened, without making any incorrect predictions. Analyses that included disease-specific risk factors, such as Body-Mass Index, and consider time of disease onset found similar PRS benefits. The improved prediction accuracy was translated to 10 instances in which medications/supplements and 94 instances in which lifestyle modifications lead to significantly greater reduction in disease risk for individuals in the top PRS quintile compared to the bottom PRS quintile. Finally we provide guidance for tailored, future PRS generation by comprehensively ranking methods that generate PRS weights and identifying genome wide association study characteristics that influence PRS predictions. The unification of significantly enhanced disease predictions, novel risk mitigation opportunities and improved methodological clarity indicate that PRSs carry far greater clinical impact than previously known.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The authors declare no competing interests.
Author Declarations
All relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.
Yes
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
With the same originating motivation, the results have been completely and entirely re-done.
Data Availability
All resources utilized are thoroughly described within https://kulmsc.github.io/pgs_book/index.html. In short, data originated from: UK Biobank (Application #47137) (https://www.ukbiobank.ac.uk/), 1000 Genomes Project (https://www.cog-genomics.org/plink/2.0/resources#1kg_phase3), GWAS Catalog (https://www.ebi.ac.uk/gwas/downloads/summary-statistics), PGS Catalog (https://www.pgscatalog.org/browse/all/), Functional Annotations (https://alkesgroup.broadinstitute.org/LDSCORE/), Deleterious Scores (https://pcingola.github.io/SnpEff/ss_dbnsfp/). All intermediate data that does not contain individual-level UK Biobank records is available at: https://wcm.box.com/s/oe3oayaoi3mxszf38c0bqpaa8r8ftmif. All custom written scripts are available at https://github.com/kulmsc/pgs_scripts. Additional generative method software are described in Supplementary II. The programming languages R, version 3.6, and Python, version 3.6.5, were utilized.