Abstract
Background There is growing interest in the clinical application of polygenic scores as their predictive utility increases for a range of health-related phenotypes. However, providing polygenic score predictions on the absolute scale is an important step for their safe interpretation. Currently, polygenic scores can only be converted to the absolute scale when a validation sample is available, presenting a major limitation in the interpretability and clinical utility of polygenic scores.
Methods We have developed a method to convert polygenic scores to the absolute scale for binary and normally distributed phenotypes. This method uses summary statistics, requiring only the area-under-the-ROC curve (AUC) or variance explained (R2) by the polygenic score, and the prevalence of binary phenotypes, or mean and standard deviation of normally distributed phenotypes. Polygenic scores are converted using normal distribution theory. Given the AUC/R2 of polygenic scores may be unknown, we also evaluate two methods (AVENGEME, lassosum) for estimating these values from genome-wide association study (GWAS) summary statistics alone. We validate the absolute risk conversion and AUC/R2 estimation using data for eight binary and three continuous phenotypes in the UK Biobank sample.
Results When the AUC/R2 of the polygenic score is known, the observed and estimated absolute values were highly concordant. Across binary phenotypes, the mean absolute difference between the observed and estimated proportion of cases was 5%. For continuous phenotypes, the mean absolute difference between observed and estimated means was <0.3%. Estimates of AUC/R2 from the lassosum pseudovalidation method were most similar to the observed AUC/R2 values, though estimated values deviated substantially from the observed for autoimmune disorders.
Conclusion This study enables accurate interpretation of polygenic scores using only summary statistics, providing a useful tool for educational and clinical purposes. Furthermore, we have created interactive webtools implementing the conversion to the absolute scale for binary and normally distributed phenotypes (https://opain.github.io/GenoPred/PRS_to_Abs_tool.html). Several further barriers must be addressed before clinical implementation of polygenic scores, such as ensuring target individuals are well represented by the GWAS sample.
Competing Interest Statement
Cathryn Lewis sits on the Myriad Neuroscience Scientific Advisory Board. The other authors declare no competing interests.
Funding Statement
This paper represents independent research funded by the UK Medical Research Council (MR/N015746/1), and the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The UK Biobank received ethical approval from the North West - Haydock Research Ethics Committee (reference 16/NW/0274). This study was conducted under UK Biobank application number 18177. All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Individual-level data for UK Biobank must be applied for via the Access Management System (https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access). All code used in this study is publicly available (https://opain.github.io/GenoPred/).