1.1 Abstract
Polygenic risk scores (PRS) are summaries of an individual’s personalized genetic risk for a trait or disease. However, PRS often perform poorly for phenotype prediction when the ancestry of the target population does not match the population in which GWAS effect sizes were estimated. For many populations this can be addressed by performing GWAS in the target population. However, admixed individuals (whose genomes can be traced to multiple ancestral populations) lie on an ancestry continuum and are not easily represented as a discrete population.
Here, we propose slaPRS (stacking local ancestry PRS), which incorporates multiple ancestry GWAS to alleviate the ancestry dependence of PRS in admixed samples. slaPRS uses ensemble learning (stacking) to combine local population specific PRS in regions across the genome. We compare slaPRS to single population PRS and a method that combines single population PRS globally. In simulations, slaPRS outperformed existing approaches and reduced the ancestry dependence of PRS in African Americans. In lipid traits from African British individuals (UK Biobank), slaPRS again improved on single population PRS while performing comparably to the globally combined PRS. slaPRS provides a data-driven and flexible framework to incorporate multiple population-specific GWAS and local ancestry in samples of admixed ancestry.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Funding for this project was provided by National Institutes of Health grant R01 HG011031 and R01 HG005855 (S.Z.) and the NIH/National Human Genome Research Institute Genome Science Training Program (T32HG00040). We thank UK biobank participants and study teams for providing high quality genetic and phenotypic data.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study used openly available data from the UK Biobank (https://www.ukbiobank.ac.uk/) and the Global Lipids Genetics Consortium (http://csg.sph.umich.edu/willer/public/glgc-lipids2021/).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Data Availability This research used genetic and phenotypic data from the UK Biobank Resource under Application Number 24460. Data is available for download for approved researchers of the UK Biobank. High powered ancestry specific GWAS from the Global Lipids Genetics Consortium are publicly available: http://csg.sph.umich.edu/willer/public/glgc-lipids2021/.
Code Availability slaPRS and necessary functions has been implemented as an R package and can be installed via running devtools::install_github(‘kliao12/slaPRS’) using the devtools library in R. An example workflow is available at https://github.com/kliao12/slaPRS.
Funding Statement Funding for this project was provided by National Institutes of Health grant R01 HG011031 and R01 HG005855 (S.Z.) and the NIH/National Human Genome Research Institute Genome Science Training Program (T32HG00040). We thank UK biobank participants and study teams for providing high quality genetic and phenotypic data. Lastly, we appreciate the helpful insights and feedback given by Dr. Jean Morrison on general method development and analysis.
Conflict of interest disclosure The authors report no conflicts of interest regarding commercial or financial interests involved with the study.
Data Availability
This research used genetic and phenotypic data from the UK Biobank Resource under Application Number 24460. Data is available for download for approved researchers of the UK Biobank. High powered ancestry specific GWAS from the Global Lipids Genetics Consortium are publicly available: http://csg.sph.umich.edu/willer/public/glgc-lipids2021/.