Abstract
Polygenic risk scores (PRS) based on European training data suffer reduced accuracy in non-European target populations, exacerbating health disparities. This loss of accuracy predominantly stems from LD differences, MAF differences (including population-specific SNPs), and/or causal effect size differences. PRS based on training data from the non-European target population do not suffer from these limitations, but are currently limited by much smaller training sample sizes. Here, we propose PolyPred, a method that improves cross-population polygenic prediction by combining two complementary predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing LD differences; and BOLT-LMM, a published predictor. In the special case where a large training sample is available in the non-European target population (or a closely related population), we propose PolyPred+, which further incorporates the non-European training data, addressing MAF differences and causal effect size differences. PolyPred and PolyPred+ require individual-level training data (for their BOLT-LMM component), but we also propose analogous methods that replace the BOLT-LMM component with summary statistic-based components if only summary statistics are available. We applied PolyPred to 49 diseases and complex traits in 4 UK Biobank populations using UK Biobank British training data (average N=325K), and observed statistically significant average relative improvements in prediction accuracy vs. BOLT-LMM ranging from +7% in South Asians to +32% in Africans (and vs. LD-pruning + P-value thresholding (P+T) ranging from +77% to +164%), consistent with simulations. We applied PolyPred+ to 23 diseases and complex traits in UK Biobank East Asians using both UK Biobank British (average N=325K) and Biobank Japan (average N=124K) training data, and observed statistically significant average relative improvements in prediction accuracy of +24% vs. BOLT-LMM and +12% vs. PolyPred. The summary statistic-based analogues of PolyPred and PolyPred+ attained similar improvements. In conclusion, PolyPred and PolyPred+ improve cross-population polygenic prediction accuracy, ameliorating health disparities.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This research was funded by NIH grants U01 HG009379, R37 MH107649, R01 MH101244 and R01 HG006399. MK is supported by a Nakajima Foundation Fellowship and the Masason Foundation. WJP is supported by an NWO Veni grant (91619152). ARM is supported by NIMH K99/R00MH117229. HKF is supported by Eric and Wendy Schmidt. AVK is supported by grants 1K08HG010155 and 1U01HG011719 from the National Human Genome Research Institute and a sponsored research agreement from IBM Research.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
UK Biobank: Collection of the UK Biobank (UKBB) data was approved by the UKBB's Research Ethics Committee. Approval to use UKBB individual-level in this work was obtained under application #16549. Biobank Japan: All the participants provided written informed consent approved by the ethics committees of RIKEN Center for Integrative Medical Sciences and the Institute of Medical Sciences at the University of Tokyo. Uganda-APCDR: As described previously in Asiki et al 2013, before all survey procedures including interviews, blood tests and sample storage for future use, written consent or assent in conjunction with parental/guardian consent for those less than 18 years of age, are obtained following Uganda National Council of Science and Technology (UNCST) guidelines. Written consent/assent is also obtained from participants on the use of their clinical records for research purposes. All study procedures including material transfer agreements are approved annually by the Uganda Virus Research Institute Science and Ethics Committee and the UNCST. A request to use of these deidentified data for this work (genetic data from EGAD00010000965 for genetic data and phenotype data via sftp with reference: DD_PK_050716 gwas_phenotypes_28Oct14.txt) via a Data Access Application for External Investigators was approved by the Data Access Committee for APCDR via and accessed through the European Genome-Phenome Archive.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
↵* co-first authors
Added new figures, tables, secondary analyses, and an analysis of meta-analyzed summary statistics from the ENGAGE consortium. Also added PolyPred-S, PRS-CS, and PolyPred-P as new main methods in all of the experiments, figures, and tables.
Data Availability
PolyPred and PolyPred+ are provided as part of the open-source software package PolyFun, which is freely available at https://github.com/omerwe/polyfun. Access to the UK Biobank resource is available via application (http://www.ukbiobank.ac.uk/). PRS coefficients generated in this study are available for public download at http://data.broadinstitute.org/alkesgroup/polypred_results.