Abstract
We present shaPRS, a novel method that leverages widespread pleiotropy between traits, or shared genetic effects across ancestries, to improve the accuracy of polygenic scores. The method uses genome-wide summary statistics from two diseases or ancestries to improve the genetic effect estimate and standard error at SNPs where there is homogeneity of effect between the two datasets. When there is significant evidence of heterogeneity, the genetic effect from the disease or population closest to the target population is maintained. We show via simulation and a series of real-world examples that shaPRS substantially enhances the accuracy of PRS for complex diseases and greatly improves PRS performance across ancestries. shaPRS is a PRS pre-processing method that is agnostic to the actual PRS generation method and, as a result, it can be integrated into existing PRS generation pipelines and continue to be applied as more performant PRS methods are developed over time.
Competing Interest Statement
C.A.A. has received consultancy fees from Genomics plc and BridgeBio Inc. C.W. receives funding from GSK and MSD.
Funding Statement
This work was funded by the Wellcome Trust (203950/Z/16/A, WT220788, WT107881, 206194, 108413/A/15/D) and the MRC (MC_UU_00002/4) and supported by the NIHR Cambridge BRC (BRC-1215-20014). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. This research was conducted using the UK Biobank Resource under Application Number 30931.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
ShaPRS R package is available from https://github.com/mkelcb/shaprs. Code to perform all analyses reported in this manuscript is available at https://github.com/mkelcb/shaprs-paper. The final PRS files and diagnostic data are available from the Supplementary data. The Crohn's disease and ulcerative colitis genotype data used here can be obtained via managed access at: https://ega-archive.org/studies/EGAS00001000924, https://ega-archive.org/studies/EGAS00000000084 and https://ega-archive.org/datasets/EGAD00000000005.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
We have enacted the most important suggestions of the reviewers, which have further demonstrated the power of shaPRS for building highly performant polygenic risk scores across traits and ancestries. These changes include, but are not limited to: 1.Formally evaluating the difference in performance between shaPRS and other methods 2.Demonstrating that adding shaPRS into a baseline model contributes non-overlapping information. 3.Adding confidence intervals for the continuous trait PRS.