Selection, optimization, and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse populations

Niall J Lennon; Leah C Kottyan; Christopher Kachulis; Noura Abul-Husn; Josh Arias; Gillian Belbin; Jennifer E Below; Sonja Berndt; Wendy Chung; James J. Cimino; Ellen Wright Clayton; John J. Connolly; David Crosslin; Ozan Dikilitas; Digna R. Velez Edwards; QiPing Feng; Marissa Fisher; Robert Freimuth; Tian Ge; The GIANT Consortium; The All of Us Research Program; Joseph T. Glessner; Adam Gordon; Candace Guiducci; Hakon Hakonarson; Maegan Harden; Margaret Harr; Joel Hirschhorn; Clive Hoggart; Li Hsu; Ryan Irvin; Gail P. Jarvik; Elizabeth W. Karlson; Atlas Khan; Amit Khera; Krzysztof Kiryluk; Iftikhar Kullo; Katie Larkin; Nita Limdi; Jodell E. Linder; Ruth Loos; Yuan Luo; Edyta Malolepsza; Teri Manolio; Lisa J. Martin; Li McCarthy; James B Meigs; Tesfaye B. Mersha; Jonathan Mosley; Bahram Namjou; Nihal Pai; Lorenzo L. Pesce; Ulrike Peters; Josh Peterson; Cynthia A. Prows; Megan J. Puckelwartz; Heidi Rehm; Dan Roden; Elisabeth A. Rosenthal; Robb Rowley; Konrad Teodor Sawicki; Dan Schaid; Tara Schmidlen; Roelof Smit; Johanna Smith; Jordan W. Smoller; Minta Thomas; Hemant Tiwari; Diana Toledo; Nataraja Sarma Vaitinadin; David Veenstra; Theresa Walunas; Zhe Wang; Wei-Qi Wei; Chunhua Weng; Georgia Wiesner; Yin Xianyong; Eimear Kenny

doi:10.1101/2023.05.25.23290535

Abstract

Polygenic risk scores (PRS) have improved in predictive performance supporting their use in clinical practice. Reduced predictive performance of PRS in diverse populations can exacerbate existing health disparities. The NHGRI-funded eMERGE Network is returning a PRS-based genome-informed risk assessment to 25,000 diverse adults and children. We assessed PRS performance, medical actionability, and potential clinical utility for 23 conditions. Standardized metrics were considered in the selection process with additional consideration given to strength of evidence in African and Hispanic populations. Ten conditions were selected with a range of high-risk thresholds: atrial fibrillation, breast cancer, chronic kidney disease, coronary heart disease, hypercholesterolemia, prostate cancer, asthma, type 1 diabetes, obesity, and type 2 diabetes. We developed a pipeline for clinical PRS implementation, used genetic ancestry to calibrate PRS mean and variance, created a framework for regulatory compliance, and developed a PRS clinical report. eMERGE’s experience informs the infrastructure needed to implement PRS-based implementation in diverse clinical settings.

Introduction

Polygenic risk scores (PRS) are being calculated and disseminated at a prodigious rate[1], but their development and application to clinical care, particularly among ancestrally diverse individuals, present substantial challenges[2–4]. Incorporation of genomic risk information has the potential to improve risk estimation and management [3,5], particularly at younger ages [6]. Clinical use of PRS may ultimately prevent disease or enable its detection at earlier, more treatable stages [6–8]. Improved estimation of risk may also enable targeting of preventive or therapeutic interventions to those most likely to benefit from them while avoiding unnecessary testing or over-treatment [9,10].

PRS for individual conditions are typically generated from summary statistics derived from genome-wide association studies (GWAS), which are themselves derived from populations that are heavily over-represented by individuals of European ancestry [11]. Such scores have been shown to have limited prediction accuracy with increasing genetic distance from European populations [11,12]. PRS can be improved if developed and validated using multi-ancestry cohorts [13]. Clinical and environmental data combined with genomically-derived risk measurements can improve risk prediction [14]. Approaches for combining genomic and non-genomic information, optimizing models for genomically diverse populations and across age groups, and conveying this information to clinicians and patients have yet to be developed and applied in clinical care.

The Electronic Medical Records and Genomics (eMERGE) Network is a multicenter consortium established in 2007 to conduct genomic research in biobanks with electronic medical records [15,16]. In 2020, eMERGE embarked on a study of genomic risk assessment and management in 5,000 children and 20,000 adults of diverse ancestry, beginning with efforts to identify and validate published PRS across multiple race-ethnic groups (and inferred genetic ancestries) in 10 common diseases with complex genetic etiologies. This paper describes identification, selection, and optimization of these PRS; calibration of ancestry for PRS estimation using a novel method developed for eMERGE; and development and launch of clinical reporting tools.

Results

PRS Auditing and Evaluation

To select the PRS for clinical implementation, the network conducted a multi-stage process to evaluate proposed scores (Figure 1). An initial set of 23 conditions was selected based on considerations including relevance to population health (condition prevalence and heritability), strength of evidence for PRS performance, clinical expertise in the eMERGE Network, and data availability that would facilitate validation of the PRS in diverse populations. Network sites completed comprehensive literature reviews on 23 proposed conditions and the corresponding PRS. A summary of the features of the PRS for each of the final conditions chosen is shown in Supplemental Table 1. The collated information included analytic viability - a description of covariates, the age, and ancestry effects of the original PRS model; feasibility - access to sufficiently diverse validation data sets (race/ethnicity and age) as well as condition prevalence and relevance to preventative care; potential clinical actionability - existing screening or treatment strategies, and magnitude (odds ratio) of risk in the high risk group; and translatability - expected public health impact across diverse populations. Candidate PRS were restricted to those that were either previously validated and published (journal or pre-print) or for which there was sufficient access to information to develop and/or optimize new PRS, which could then be validated.

Figure 1

Timeline and process for selection, evaluation, optimization, transfer, validation, and implementation of the clinical PRS test pipeline. Dotted lines represent pivotal moments in the progression of the project.

Figure 2

Overview of the eMERGE PRS process. Participant DNA is genotyped using the Illumina Global Diversity Array which assesses 1.8M sites. Genotyping data is phased and imputed with a reference panel derived from the 1000 Genomes Project. Raw PRS scores are calculated for each condition. For each condition an ancestry calibration model is applied based on model parameters derived from the All of Us Research Program. Participants whose adjusted scores cross the pre-defined threshold for high PRS are identified and a pdf report is generated. The report is electronically signed after data review by a clinical laboratory director and delivered to the study portal for return to the clinical sites.

In auditing and evaluating evidence of PRS performance, the eMERGE Steering Committee (SC) considered PRS for conditions that could be implemented in pediatric and/or adult populations, and for diseases with a range of age of onset (0 to >65 years of age). We considered published SNP-based heritability estimates available for 10 of the 23 conditions, ranging from 3% to 58%. The majority of PRS under consideration aimed to identify individuals at high risk for disease; however, PRS to predict disease severity and drug response were also considered. Two of the conditions, breast cancer and prostate cancer, were only considered for implementation in individuals whose biological sex was female or male, respectively. As the eMERGE network plans to enroll >50% participants from underrepresented groups (including racial and ethnic minority groups; people with lower socioeconomic status (SES); underserved rural communities; sexual and gender minority (SGM) groups) [17] emphasis was placed on the PRS that were already available for, or could be developed and validated in, diverse population groups.

To define population groups, study-level population descriptors were first extracted from published literature, pre-prints or information shared directly by collaborators on data used to develop and/or optimize and/or validate PRS. Methods for using population labels across studies ranged from self-reporting, extraction from health system data, and/or analysis of genetic ancestry. We designated four population groups; European ancestry (EA) (i.e. study population descriptors included European, European-American, or other European descent diaspora groups), African (African, African-American (AA), or other African descent diaspora groups), Hispanic (HL) (i.e. Hispanic, Latina/o/x, or those who have origins in countries in the Caribbean and Latin America), and Asian (Asn) (i.e. South Asian, East Asian, South-East Asian, Asian-American or other diaspora Asian groups).

Of the 23 conditions initially selected, six were excluded at the outset (August 2020) due to lack of diversity in the PRS training or optimization data, lack of access to diverse datasets for validation, or lack of available clinical expertise in the network (Figure 1). A further four conditions were dropped in March 2021 due to insufficient confidence in PRS performance or lack of validation datasets. Conditions not prioritized for implementation continued on a ‘developmental’ pathway for further refinement. Each of the 12 conditions that were selected to move forward from the March 2021 review were assigned a ‘lead’ and ‘co-lead’ site which worked together to develop, validate, and transfer the score to the clinical laboratory for instantiation and CLIA validation. Assignment of leads was based on site preference, expertise, and distribution of workload.

Selection, optimization, and validation

A systematic framework was developed to evaluate the performance for the remaining 12 PRS, in accordance with best practices outlined in Wand et al [18]. An in depth evaluation matrix of the 12 chosen conditions can be found in Supplemental Table 2. Clinical use of Eurocentric PRS in diverse patient samples risks exacerbating existing health disparities[11][19,20]. The Network carefully considered a variety of strategies to optimize PRS generalizability and portability. The Network prioritized validation across four ancestries with an emphasis on African and Hispanic ancestry due to their underrepresentation in genetic research and projected representation within the study cohort. We determined that a PRS was validated if the genomic predictor was significantly discriminative and the odds ratios were statistically significant in a minimum of two and up to four ancestral populations: African/African-American, Asian, European Ancestry; and Hispanic/Latino. The PRS Working Group members conducted an extensive scoping exercise to identify suitable datasets of multiple ancestries for disease-specific PRS validation. These included datasets from early phases of eMERGE (2007-2019) as well as external datasets such as the UK Biobank and Million Veteran Program (MVP). A standardized set of questions were addressed by the disease leads that included the source of discovery and validation datasets, the availability of multi-ancestry validation datasets, availability of cross-ancestry PRS, proposed percentile thresholds for identifying high risk status, model discrimination (AUC), and effect sizes (odds ratios) associated with high risk vs. not-high risk status (Supplemental Table 2). For 7 out of the 12 candidate scores, no further optimization of the original model was performed. For 5 scores, an additional optimization effort was undertaken to further refine the score performance in multiple ancestries. Details of the optimization can be found in Supplemental Table 3. A specific score optimization was applied for CKD. This optimization consisted of adding the effect of APOL1 risk genotypes to a polygenic component, which has been found to improve risk predictions in African ancestry cohorts[21].

For final selection, the steering committee considered the score performance summaries (presented by condition leads) in addition to the actionable and measurable recommendations relevant for return, for each condition, in the prospective cohort. Two conditions (colorectal cancer (CRC) and abdominal aortic aneurysm (AAA)) were moved to the developmental pathway (Figure 1). While the PRS for CRC was not included in the prospective cohort, as several Mendelian genes for Lynch syndrome were included in the custom panel generated by Invitae, the network decided to include CRC in the list of conditions returned with the overall risk report (GIRA) and only report on monogenic and family history risk.

Population-based z-score calibration

In this study, the focus is on integration and implementation of validated PRS in clinical practice rather than novel PRS development. Ultimately, the Network opted to balance generalizability and feasibility by validating and returning cross-ancestry PRS. However, even with cross-ancestry scores, differences remain in the distribution of z-scores across genetic ancestries that can result in inconsistent categorization of individuals into ‘high’ or ‘not high’ polygenic risk categories for a given condition [22]. To that end, the Network chose to develop methods to determine each participant’s ancestry and calibrate the distribution of resulting z-scores through a population-based calibration model[22][23] (see below). An alternative would have been to apply existing PRS in available samples of different ancestries and derive ancestry-specific effect estimates. However, returning ancestry-specific risk estimates is challenging in real world implementations as it would require self-reporting of ancestry by patients (who may not be able to provide this with accuracy) and developing multiple ancestry-specific reports for each health condition. In addition, such PRS would be problematic to return to patients of mixed ancestry.

Polygenic risk scores often have different mean and standard deviation for individuals from different genetic ancestries. While some of these differences could be due to true biological differences in risk, they also result from allele frequency and linkage disequilibrium (LD) structure differences between populations [24]. This problem is more acute when a PRS is calculated for an individual whose ancestry does not match the ancestries used to develop the PRS. A clinically implemented PRS test to return disease risk estimates, therefore, must be adjusted to account for these differences due to ancestral background. A calibration method based on principal component analysis (PCA) which was initially described by Khera et al. [22] was modified to model both the variance and means of scores as ancestry dependent, as compared to the previous method, which modeled only the means as dependent on ancestry. This modification was found to be necessary because some conditions were found to exhibit highly ancestry-dependent variance (see for example, Figure 3 in Online Methods), which would have led to many more or fewer participants of certain ancestries receiving a ‘high PRS risk’ determination than was intended. The model was fit to a portion of the All of Us Research Program (https://www.researchallofus.org/) cohort genotyping data, which allowed for continuous return of results to participants without needing to wait for the entire study dataset to be available. More details can be found in Online Methods.

Figure 3.

Summary of the ten conditions that were implemented. “High-PRS Threshold” represents the percentile that is deemed to be the cut-off for a specific condition above which a high PRS result is reported for that condition. The Odds Ratios are the OR of the implemented scores, 95% confidence interval shown in the whiskers (with the exception of Obesity for which the OR will be published by the GIANT consortium). “Number of SNPs” represents the range of numbers or sites included in each score. “Age ranges for return” indicates the participant ages at which a PRS is calculated for a given condition. AFIB= Atrial fibrillation; BC = Breast Cancer; CKD = Chronic Kidney Disease; CHD = Coronary Heart Disease; HC = Hypercholesterolemia; PC = Prostate Cancer; T2D = Type 2 Diabetes; T1D = Type 1 Diabetes.

Transfer and Implementation

Once the final 10 conditions had been selected, condition-leads worked with computational scientists at the clinical laboratory (Clinical Research Sequencing Platform, LLC at the Broad Institute) to transfer the PRS models. Condition-specific models were run with outputs from the lab’s genotyping (Illumina Global Diversity Array), Phasing (Eagle2 [25] https://github.com/poruloh/Eagle), and imputation (Minimac4 [26] https://genome.sph.umich.edu/wiki/Minimac4) pipelines to assess genomic site representation (see Online Methods for more information on the architecture and components of the pipeline). Several rounds of iteration between the clinical laboratory and condition-leads followed in which any issues with the pipeline were resolved and the effect of genomic site missingness was assessed (Table 1). The final version of the implemented models was returned to the condition leads to recalculate effect sizes in the validation cohorts.

View this table:

Table 1.

Performance measures from the PRS pipeline validation study at the clinical laboratory.

Finally, as part of the implementation of the PRS pipelines as a clinical test in a CLIA laboratory, a validation study was performed (See Online Methods for a detailed description, Table 1 summarizes some of the results). Briefly, this study leveraged 70 reference cell lines from diverse ancestry groups (Coriell) where 30X whole genome sequencing data was generated to form a variant truth set from which the technical accuracy and reproducibility of imputation and PRS calling was assessed. A second sample set of 20 matched donor blood and saliva specimens was procured to assess the performance of the pipeline with different input materials. A set of three samples, each with 6 replicates, was run end-to-end through the wet lab and analytical pipelines as an assessment of reproducibility. As a verification of the clinical validity of the scores, cohorts of cases for 8 of the 10 conditions were created using the eMERGE phase III imputed dataset (available on https://anvil.terra.bio/#workspaces/anvil-datastorage/AnVIL_eMERGE_GWAS/data (registration required)). PRS performance measures were calculated to confirm associations between scores and conditions. Due to limitations in the eMERGE phase III imputation (no chromosome X, different imputation pipeline) the ORs from this analysis were not included in the final reports, rather the ORs calculated in the condition-specific validation cohorts (using the final clinical lab pipeline) were used (Figure 3 and Table 1). A validation report was created for each condition. This report was reviewed and approved by the Laboratory Director in compliance with CLIA regulations for the development of a laboratory developed test. Personnel were trained on laboratory and analytical procedures, and standard operating procedures were implemented. Data review metrics were established, sample pass/fail criteria were defined, and order and report data transfer pipelines were built as described in Linder et al. [27]

Creation of report and pipeline for report creation, review, sign-out, release

A software pipeline was built to facilitate data review and clinical report generation in both document (pdf) and structured data formats (sample report included in Supplementary Material). Logic was built into the PRS and reporting pipeline to account for differences in return based on age and sex at birth for certain conditions. For instance, the PRS for breast cancer is only calculated for participants who report sex at birth as female; similarly prostate cancer scores are only generated for participants who report sex at birth as male. Age-related restrictions were similarly coded into the pipeline to account for study policies on return. Data review by an appropriately qualified, trained individual is required for high complexity clinical testing. In the PRS clinical pipeline this review takes the form of a set of metrics that are exposed by the pipeline to the reviewer. These include a z-score range for each condition (passing samples will have a score -5 < z < +5), a PCA plot per batch against a reference sample set (visual representation of outlier samples), monitoring the z-score range for each control per condition (one control on each plate; NA12878), and flagging any samples with multiple ‘High Risk’ results for further review.

Each participant’s sample is also run on an orthogonal fingerprinting assay (Fluidigm biomark) that creates a genotype-based fingerprint for that DNA aliquot. Infinium genotyping data is compared to this fingerprint as a primary check of sample chain-of-custody fidelity and to preclude sample or plate swaps during lab processing.

Reviewed and approved data for a participant is processed into a clinical report. The text and format of this report were created during an iterative review process by consortium work groups. For this pragmatic clinical implementation study, two results are returned to participants: “High Risk” or “Not High Risk” based on the PRS [27]. In the clinical report a qualitative framework has been developed to indicate for which condition(s) a participant has been determined to have a high PRS (if any). Quantitative values (Z-scores) are not included for any condition in the main results panel. For breast cancer and CHD, the z-score is presented in another section of the report for inclusion in integrated score models for those conditions. For breast cancer specifically, the provided z-score is used with the BOADICEA[28] model to generate an integrated risk that is included in the genome-informed risk assessment (GIRA) as described in Linder et al.[27]

Overview of first 2500 clinical samples processed

Between launch in July 2022 and May 2023, 2500 participants have been processed through the clinical PRS pipeline (representing ∼10% of the proposed cohort). Of the first 2500 participants processed, 64.5% (1612) indicated sex at birth as female, while 35.5% (886) indicated male. Median age at sample collection was 51 years (range 3 years to 75 years). Participants self-reported race/ancestry, with 32.8% (820) identifying as “White (e.g. English, European, French, German, Irish, Italian, Polish, etc)”; 32.8% (820) identified as “Black, African American, or African (e.g. African American, Ethiopian, Haitian, Jamaican, Nigerian, Somali, etc.)”; 25.4% (636) identified as “Hispanic, Latino, or Spanish (e.g. Colombian, Cuban, Dominican, Mexican or Mexican American, Puerto Rican, Salvadoran, etc.)”; 5% (124) identified as “Asian (e.g. Asian, Indian, Chinese, Filipino, Japanese, Korean, Vietnamese, etc.)”; 1.5% (38) identified as American Indian or Alaska Native (e.g. Aztec, Blackfeet Tribe, Mayan, Navajo Nation, Native Village of Barrow (Utqiagvik) Inupiat Traditional Government, Nome Eskimo Community, etc.); 0.9% (22) identified as Middle Eastern or North African (e.g. Algerian, Egyptian, Iranian, Lebanese, Moroccan, Syrian, etc.); 0.8% (21) selected “None of these fully describe [me_or_my_child]”; 0.7% (17) selected “Prefer not to answer”; 0.1% (2) participants had incomplete data. A summary of the performance of the first 2500 samples and resulting high PRS metrics are shown in Figure 4. In the first 2500 participants, we identified 515 participants (20.6%) with a high PRS risk for one of the 10 conditions, 64 participants (2.6%) had high PRS risk for two conditions, and two participants (0.08%) had a high risk for three conditions. The remaining 1919 participants had no high PRS found. High PRS participants spanned the spectrum of genetic ancestry when projected onto principal component space (Figure 4).

Figure 4.

Summary of first 2500 clinical samples

Discussion

While the predictive performance of PRS has improved significantly in recent years, challenges remain in ensuring that PRS are applicable and effective in diverse populations. In particular, the vast majority of GWAS have focused on individuals of European ancestry, and the predictive accuracy of PRS declines with increasing genetic distance from the discovery population[29][24][4]. This risks exacerbating existing health disparities, as clinical use of Eurocentric PRS in diverse patient samples may not accurately reflect disease risk in non-European populations. To address these challenges, the eMERGE Network has conducted a multi-stage process to evaluate and optimize PRS selection, development, and validation. The network has prioritized conditions with high prevalence and heritability, existing literature, clinical actionability, and the potential for health disparities, and has developed strategies to optimize PRS generalizability and portability across diverse populations. In particular, the network has emphasized performance across four major ancestry groups (African, Asian,

European, Hispanic, as reflected by self-identified race/ethnicity) and has developed a pipeline for clinical PRS implementation, a framework for regulatory compliance, and a PRS clinical report. The potential impact of PRS-based risk assessment in clinical practice is significant. By enabling targeted interventions and preventative measures, PRS-based risk assessment has the potential to reduce the burden of a range of conditions [27]. Moreover, the development of PRS-based risk assessment in diverse populations has the potential to reduce health disparities by ensuring that clinical use of PRS accurately reflects disease risk in diverse populations.

However, challenges remain in the successful implementation of PRS-based risk assessment in clinical practice. These include concerns about genetic determinism, the potential for stigmatization, and the need for robust regulatory frameworks to ensure that PRS-based risk assessment is deployed safely and effectively. Additionally, one of the biggest challenges is the implementation of effective disease prevention strategies after the return of the results. Return of the results won’t result in a benefit without effective disease prevention or early detection strategies. The eMERGE Network’s work provides a promising blueprint for addressing these challenges, but ongoing research and evaluation will be necessary to ensure that PRS-based risk assessment is implemented in a responsible and effective manner.

In conclusion, the eMERGE Network’s work in PRS development represents a significant step forward in the implementation of PRS-based risk assessment (in combination with other risk estimates from monogenic testing and family history) in clinical practice. By leveraging the power of genetics to predict disease risk and enable targeted interventions, genetically-informed risk assessment has the potential to revolutionize personalized medicine and usher in a new era of precision health. While challenges remain in ensuring that PRS are applicable and effective in diverse populations, the eMERGE Network’s work provides a promising foundation for the continued development and evaluation of PRS-based risk assessment in clinical practice.

Funding

This phase of the eMERGE Network was initiated and funded by the NHGRI through the following grants: U01HG011172 (Cincinnati Children’s Hospital Medical Center); U01HG011175 (Children’s Hospital of Philadelphia); U01HG008680 (Columbia University); U01HG011176 (Icahn School of Medicine at Mount Sinai); U01HG008685 (Mass General Brigham); U01HG006379 (Mayo Clinic); U01HG011169 (Northwestern University); U01HG011167 (University of Alabama at Birmingham); U01HG008657 (University of Washington); U01HG011181 (Vanderbilt University Medical Center); U01HG011166 (Vanderbilt University Medical Center serving as the Coordinating Center). The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA#: AOD 16037; Federally Qualified Health Centers: 75N98019F01202.; Data and Research Center: 1 OT2 OD35404; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 OT2 OD030043; Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276.

Data Availability

Data produced are available online for registered users at https://anvilproject.org/

Author Information

These Authors contributed equally Niall Lennon, Leah C.Kottyan

Authors and Affiliations

Broad Institute of MIT and Harvard

Marissa Fisher, Candace Guiducci, Maegan Harden, Chris Kachulis, Amit Khera,Katie Larkin, Niall J. Lennon, Edyta Malolepsza, Li McCarthy, Nihal Pai, Heidi Rehm, Diana Toledo

Children’s Hospital Boston

Joel Hirschhorn

Children’s Hospital of Philadelphia

John J. Connolly, Joseph T. Glessner, Hakon Hakonarson, Margaret Harr

Cincinnati Children’s Hospital Medical Center

Leah C. Kottyan, Lisa J. Martin, Tesfaye B. Mersha, Bahram Namjou, Cynthia A. Prows

Columbia University

Wendy Chung, Atlas Khan, Krzysztof Kiryluk, Chunhua Weng

Fred Hutchinson Cancer Center and University of Washington

Li Hsu, Ulrike Peters

Icahn School Mount Sinai

Gillian Belbin, Clive Hoggart, Roelof Smit, Zhe Wang

Invitae

Tara Schmidlen

Mass General Brigham

Tian Ge, Elizabeth W. Karlson, Elizabeth W. Karlson, James B Meigs, Jordan W. Smoller

Mayo Clinic

Ozan Dikilitas, Robert Freimuth, Iftikhar Kullo, Dan Schaid, Johanna Smith

National Institutes of Health

Josh Arias, Sonja Berndt, Teri Manolio, Robb Rowley

Northwestern University

Adam Gordon, Yuan Luo, Lorenzo L. Pesce, Megan J. Puckelwartz, Konrad Teodor Sawicki, Theresa Walunas

University of Alabama at Birmingham

James J. Cimino, Ryan Irvin, Nita Limdi, Hemant Tiwari

University of Copenhagen

Ruth Loos

University of Michigan

Yin Xianyong

University of Washington

David Crosslin, Minta Thomas, David Veenstra

University of Washington Medical Center

Gail P. Jarvik, Elisabeth A. Rosenthal

Vanderbilt University Medical Center

Jennifer E Below, Ellen Wright Clayton, Digna R. Velez Edwards, QiPing Feng, Jodell E. Linder, Jonathan Mosley, Josh Peterson, Dan Roden, Nataraja Sarma Vaitinadin, Wei-Qi Wei, Georgia Wiesner

The GIANT Consortium

The All of Us Research Program

Conflict of Interest

The authors have no conflicts of interest to declare.

Online Methods/Supplemental Information Clinical PRS Paper

A. Analytical and Technical Validation Studies

Broad Imputation Pipeline Overview

An imputation pipeline that takes as an input a variant call format (VCF) file generated from a genotyping microarray and imputes the genotypes at additional sites across the genome was developed. The pipeline architecture and composition was based on the widely used University of Michigan Imputation Server which uses a software called Eagle (https://github.com/poruloh/Eagle) for phasing and Minimac4 (https://genome.sph.umich.edu/wiki/Minimac4) for the imputation. The pipeline uses a curated version of the 1000 Genomes Project (1KG, www.internationalgenome.org) as the reference panel. Additional details on the imputation pipeline can be found at https://broadinstitute.github.io/warp/docs/Pipelines/Imputation_Pipeline/README.

Broad Curated 1KG Reference Panel

During the validation process we determined that some sites in the 1KG reference panel were incorrectly genotyped compared to the sites in matching whole-genome sequencing data. In order to increase accuracy of the imputation and PRS scoring, we curated the original panel by removing sites that were likely incorrectly genotyped based on comparing allele frequencies to those reported in gnomAD v2 (https://gnomad.broadinstitute.org/). Documentation of this curation can be found at: https://broadinstitute.github.io/warp/docs/Pipelines/Imputation_Pipeline/references_overview and a publicly available version of the panel at: gs://broad-gotc-test-storage/imputation/1000G_reference_panel/

Selection of a reference panel for imputation as an input to PRS is an important consideration. Some reference panels (e.g. TOPMed) have more samples than the default used in our pipeline (i.e. 1KG). This leads to more variants being imputed. The question is whether this would materially change the PRS calculated from samples imputed with the TOPMED panel. Access to this panel computationally is restricted (and local download prohibited) so it was deemed infeasible to implement in our clinical production environment. The performance of a non-eMERGE PRS (for CHD, Khera et al.) using the two different reference panels was determined for 20 GDA saliva specimens and for 42 AoU v1 specimens. The cohort was imputed both by the Broad imputation pipeline with curated 1KG as the reference panel as well as on the TOPMed Imputation Server with TOPMed as the reference panel. Imputed arrays were scored by the PRS pipeline.

The PRS percentiles computed with each method are highly concordant for both cohorts. The Pearson correlation coefficient is 0.996 for both cohorts, the p-value of the Welch two sample t-test is equal to 0.93 and 0.85 (indicating no statistical difference between the methods) for GDA and AoU v1 cohorts, respectively.

Performance verification of the Imputation Pipeline

Imputation accuracy was determined for 42 specimens that were processed through a genotyping microarray (AoU v1 array - the precursor to the commercial Global Diversity Array) and imputed with curated 1KG as the reference panel where corresponding deep-coverage (>30X) PCR-free whole genome sequencing data were used as a truth call set to calculate sensitivity and specificity. The arrays were also imputed on the Michigan Imputation Server with 1KG as the reference panel.

Within the cohort, four different ancestries were represented - Non-Finnish Europeans (NFE), East Asians (EAS), South Asian (SAS), African (AFR), together with the results for three arrays of undetermined ethnicity (NA).

View this table:

Table 1. Sensitivity and specificity of SNP and INDEL imputation from the Broad Imputation Pipeline (Broad) and Michigan Imputation Pipeline (Michigan) with the curated 1000Genomes reference panel when compared to matched whole genome data for 42 AoU v1 samples.

Broad imputation pipeline sensitivity for SNPs is >97% and INDELs >95% for all ethnicities. Similarly, specificity for SNPs from the Broad imputation pipeline are above 99% and the specificity for INDELS is >98%. Results were highly concordant with those returned by the remote server at Michigan.

Performance evaluation of different input material types

To assess the performance of specimens derived from both saliva and whole blood a set of 20 matched blood saliva pairs were run through the GDA genotyping process and the resulting VCFs were imputed using the Broad pipeline to be compared against results for matched blood derived whole genome data. The Pearson correlation between sensitivity and specificity of blood and saliva derived samples are equal to 100% and 100%, respectively. For the same pairs, the Welch two-sample t-test statistic is 0.997 and 0.987, respectively. There is no significant difference between the different input sample types.

Imputation repeatability and reproducibility

Imputation pipeline repeatability was assessed by repeating imputation of a cohort of 1000 GSA arrays ten times over the course of two weeks and was found to be 100% concordant. Imputation pipeline precision (reproducibility) was also tested on technical replicates. Three individual samples derived from saliva were each genotyped six times, followed by an imputation in a cohort of all saliva derived samples. In each set of technical replicates all pairs and variants in each pair were compared (making a total of 45 pairs for which genotypes were compared). Reproducibility is measured using Jaccard scores. “Reproducibility over variants” was calculated only over sites where at least one of the two replicates in a pair calls a non hom-ref genotype and was found to be 99.91% (95CI: 99.89-99.93) for SNPs and 99.87% (95CI: 99.85-99.90) for InDels. “Reproducibility over all sites” was calculated over all genotyped sites, including sites genotyped as hom-ref in both replicates and was found to be 100% (95CI: 100-100) for both SNPs and InDels.

Imputation performance as a function of variant frequency

Because we expect accuracy to be impacted by the frequency of a variant in the population (rare variants are less likely to be in the reference panel and therefore less accurately imputed) we further subdivided the performance assessment by allele frequencies on two cohorts: 42 AoU v1 arrays and 20 blood-saliva pairs of GDA arrays. Accuracy of imputation of variants as a function of population allele frequency performed as expected with rare variants in the population not being as accurately represented. Imputation is more accurate for variants that are more frequently observed in the population (≥0.1 allele frequency). This is predicted to have a low impact on the accuracy of PRS calculations from imputed variants as PRS scores are also typically derived from common variants.

Impact of genotyping array call rate on imputation performance

The impact of call rate on the imputation was assessed by generating a downsampled series of 42 arrays, each with call rates of 90%, 95%, 97% and 98%. Pearson correlation values for SNPs and INDELs were calculated across bins of allele frequencies, assessed against gnomAD common variants (AF >0.1), for the cohorts with downsampled call rates. Call rates below 95% were found to produce suboptimal results. At this rate the mean R² dosage score for sites with AF ≥0.1 was found to be 0.98% (95CI: 0.98-0.98) for both SNPs and InDels compared to 0.99% for call rates of 97% and 98%.

Impact of imputation batch size on performance

Batch size effect of the imputation pipeline was assessed by imputing and analyzing arrays in a cohort of size 1000 (randomly chosen), ten cohorts of size 100 (non-overlapping subsets of the 1000 cohort), and ten cohorts of size 10 (non-overlapping subsets of one of the 100 cohorts). Pearson correlations of dosage scores were calculated across bins for allele frequencies (assessed against gnomAD) for smaller cohorts versus larger cohorts. The data show that imputation is highly correlated across batch sizes with batches down to as few as 10 samples producing acceptable performance. The mean R² correlation of dosage scores for sites with allele frequency greater or equal to 0.1 is above 0.97 in all cases both for SNPs and INDELs and increases to 0.98 for the larger studied cohorts. Increasing batch sizes produces very slight improvements in imputation but these are not significant and the choice of imputation batch size (above or equal to 10 samples) can be made on practical and operational grounds

Broad PRS Pipeline Overview

The polygenic risk score (PRS) pipeline begins by calculating a raw score using plink2 (https://www.cog-genomics.org/plink/2.0/). For each condition, effect alleles and weights are defined for a set of genomic sites stored in a weights file. At each site, the effect allele dosage observed in the imputed vcf is multiplied by the effect weight in the weights file. The raw score is the sum of these products over all the specified sites.

Validation of technical and analytical performance of the PRS pipeline

For each of the 10 conditions chosen by the consortium for clinical return, a validation study was performed to assess the technical and analytical performance as well as to verify the association between score and disease risk.

PRS Pipeline Accuracy

Accuracy of the pipeline was determined by calculating the Pearson correlation between PRS scores calculated from 70 specimens imputed from GDA array data and PRS scores of corresponding deep-coverage PCR-free whole genome sequencing data (used as a truth call set).

Input Material Performance

Accuracy of PRS scoring when different sample types (blood or saliva) are used as inputs was determined by comparing the PRS scores from matched blood and saliva pairs collected from 20 individuals.

PRS pipeline repeatability

PRS pipeline repeatability was assessed by running the pipeline on the same dataset of 70 imputed GDA arrays ten times over the course of two weeks (without call caching). Scores generated from the different processing runs were compared to determine if there are any differences observed for a given PRS when the pipeline is run at different times.

PRS pipeline reproducibility

PRS pipeline precision (reproducibility) was assessed using three samples each run 6 times end-to-end and then compared in a pairwise manner. The z-score standard deviation is used as a measure of variability.

PRS site representation

The SNP weight sites that are not called during genotyping or imputation were determined. These are sites not present in the intersection of an imputed GDA array and the reference panel. Ideally, all sites required for PRS calculation are present either as genotyped or imputed sites; however, in practice a small number of sites are not present due to differences in the data used to create the score and the specific array and imputation reference panel used in this study.

View this table:

Table 2. Validation measures summary.

Performance verification using eMERGE I-III cohort

A cohort of samples with known phenotypic information was used to verify the relationship between polygenic risk score as determined by our pipeline and disease risk. For conditions where cases and controls could be identified in the eMERGE I-III cohort we determined performance using metrics outlined in the ClinGen working group recommendations (Wand et al.). Specifically, we determined the PRS distributions for cases and controls, we examined the impact of ancestry adjustment on the distributions (Fig), and we examined the relationship between observed and predicted risk. There are some limitations to this analysis: i) The eMERGE I-III dataset being used for this analysis was generated from different array platforms and was imputed with a different pipeline including a different version of 1KG reference panel than the one currently implemented; ii) The eMERGE I-II-III imputed dataset does not include variants from Chromosomes X or Y. For these reasons, the PRS disease association analysis represents a verification of the clinical validation performed by eMERGE IV condition leads rather than the quantitative measure of the impact of the score on risk. The clinical associations (odd ratios) that are reported on the clinical report for each condition were independently determined by eMERGE IV disease-specific expert teams.

Validation of pipeline and ancestry adjustment in original case control cohorts

The final pipeline was made available to computational scientists at each of the eMERGE IV disease-specific expert teams who had access to appropriate case control cohorts. These groups confirmed the performance of the final pipeline on their cohorts. The odds ratios for each condition that are reported on the clinical reports come from these cohorts rather than the eMERGE cohort for the reasons described above.

B. PRS Ancestry Calibration Overview

PCA method description

For a polygenic risk score which is a sum of SNP effects (linear weights), the central limit theorem states that the distribution of scores in a homogenous population will tend towards a normal distribution as the number of SNPs becomes large. When two different homogenous populations are randomly mixed, the additive property of prs leads the resulting distribution to be similarly normally distributed, with mean and variance depending on the mean and variance of the original homogenous populations. We can therefore model the distribution of prs scores as being normally distributed, with mean and variance being functions of genetic ancestry.

Practically, we implement this as

with genetic ancestry being represented by projection into principal component space. The α and β parameters are found by jointly fitting them to a cohort of training data. This fit is performed by minimizing the negative log likelihood:

where ι runs over the individuals in the training cohort, prs_j is the i’th individuals raw prs score, and µ and σ are calculated using Eq X by projecting the i’th individual into PC space. Note that, due to the simplicity of the model, overfitting is unlikely to be a problem, and so no regularization or other overfitting avoidance technique is implemented. An individual’s PRS z-score can then be calculated as

where µ and σ have again been calculated based on the specific individuals projection into PC space. In this way, once the model has been trained, the z-score calculation is fully defined by the fitted model parameters, and z-scores can be calculated without needing additional access to the original training cohort.

Generating trained models from All of Us data

Generating the trained models consisted of three steps: 1. Selecting the training cohort. 2. Imputation of the training cohort. 3. Training the models on the training cohort. A test cohort was also generated in order to test the performance of the training.

Ancestry balanced training and test cohorts were generated by subsampling from an initial cohort of around 100,000 All of Us samples. For the purposes of balancing the cohort, each sample was assigned to one of the five 1KG Super Populations. Principal component analysis was first performed on a random subset of 20,000 samples. 1KG samples were projected onto these principal components, and a support vector machine (SVM) was trained on 1KG to predict ancestry. The SVM was then used to assign 108,000 AoU samples to one of the five 1KG Super Populations. A balanced training cohort was selected based on these predicted ancestries, and principal components were recalculated using this balanced training cohort. A similarly balanced test cohort was selected based on ancestries estimated from projection on the training set PCs. The resulting breakdown of the cohorts by estimated ancestry is shown in Table 3.

View this table:

Table 3

Both the training and testing cohorts include a number of individuals with highly admixed ancestry. Admixture was quantified using the tool Admixture (Alexander et al. PMID: 19648217) with 5 ancestral populations. The resulting admixtures of each cohort are shown in Figure 1, and the most common admixed ancestries in each cohort are summarized in Table 4.

Each cohort was imputed using the imputation pipeline described above, with 1KG as the reference panel. By keeping the imputation pipeline identical to the pipeline used for the eMERGE dataset, and because the AoU dataset uses the same GDA array as the eMERGE dataset, any potential biases resulting from differing data production and processing methods were removed. The training cohort was scored for each of the ten conditions, and model parameters were fit by minimizing the negative log likelihood as described. The test cohort was then used to evaluate the generalizability of these model parameters.

Figure 1

View this table:

Table 4.

Admixed ancestries are defined as ancestries for which an individual’s admixture fraction is greater than 20%. For example, an individual who is indicated by admixture to be

45% AFR, 37% EUR, 12 % AMR, 5% EAS, 1% SAS would be included in the AFR-EUR row of this table.

Performance on Test Cohort

Figure 2 the distribution of calibrated z-scores in the test cohort using the parameters fit in the training cohort. As can be seen, all ancestries show the intended standard normal distribution of calibrated scores.

Figure 2

One of the main improvements of this method over previous methods is the inclusion of an ancestry dependent variance in addition to the ancestry dependent mean. The importance of this is shown for the Hypercholesterolemia PRS in Figure X below. As can be seen, the variance of this score differs significantly across ancestries, so that a method which only fits the mean of the distribution as ancestry dependent can result in z-score distributions which have been attenuated towards zero or expanded away from zero for some ancesties. By also treating variance as ancestry dependent, this method results in z-score distributions which are more standardized across ancestries.

Figure 3

Hypercholesterolemia PRS calibrated z-scores of training cohort. Note the improvement when an ancestry dependent variance is used over a constant variance method.

In addition to improving calibration across ancestries, this method can improve calibration within ancestries, particularly for highly admixed individuals. An example of this can be seen in Figure 4. As can be seen, because no ancestry group is homogenous, when individuals are compared directly to other individuals in their assigned population group, a dependence between admixture fraction and PRS score can result. This dependence is removed by the described PCA calibration method, and the resulting calibrated PRS scores are independent of admixture fraction.

Figure 4

PRS z-score as a function of African Admixture Fraction, for individuals of African ancestry. In the “Bucketing” method, a z-score is calculated by comparing to the mean and variance of all individuals of African ancestry in the cohort. The “PCA Calibrated” method is the method described above. Note the dependence on admixture fraction in the “Bucketing” method, which has been removed in the “PCA Calibrated” method

View this table:

Supplemental Table 1

View this table:

Supplemental Table 2

View this table:

Supplemental Table 3

Acknowledgements

The authors would like to thank the past and future participants of the eMERGE network projects. The authors would like to thank Mary O’Reilly for help with figure creation. We thank our All of Us Research Program colleagues, Andrea Ramirez, Sokny Lim, Brandy Mapes, Ashley Green and Anji Musick for providing their support and input throughout the ancestry calibration demonstration project lifecycle. We also thank the All of Us Science Committee and All of Us Steering Committee for their efforts evaluating and finalizing the approved demonstration projects. The All of Us Research Program would not be possible without the partnership of contributions made by its participants. To learn more about the All of Us Research Program’s research data repository, please visit https://www.researchallofus.org/.

References

1.↵
Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet. 2021;53: 420–425.
OpenUrl
2.↵
Polygenic Risk Score Task Force of the International Common Disease Alliance. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat Med. 2021;27: 1876–1884.
3.↵
Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nat Rev Genet. 2018;19: 581–590.
OpenUrl CrossRef PubMed
4.↵
Duncan L, Shen H, Gelaye B, Meijsen J, Ressler K, Feldman M, et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat Commun. 2019;10: 3328.
OpenUrl PubMed
5.↵
Hurson AN, Pal Choudhury P, Gao C, Hüsing A, Eriksson M, Shi M, et al. Prospective evaluation of a breast-cancer risk model integrating classical risk factors and polygenic risk in 15 cohorts from six countries. Int J Epidemiol. 2022;50: 1897–1911.
OpenUrl PubMed
6.↵
Inouye M, Abraham G, Nelson CP, Wood AM, Sweeting MJ, Dudbridge F, et al. Genomic Risk Prediction of Coronary Artery Disease in 480,000 Adults: Implications for Primary Prevention. J Am Coll Cardiol. 2018;72: 1883–1893.
OpenUrl FREE Full Text
7.
Guo F, Edelmann D, Cardoso R, Chen X, Carr PR, Chang-Claude J, et al. Polygenic Risk Score for Defining Personalized Surveillance Intervals After Adenoma Detection and Removal at Colonoscopy. Clin Gastroenterol Hepatol. 2023;21: 210–219.e11.
OpenUrl
8.↵
Fantus RJ, Helfand BT. Germline Genetics of Prostate Cancer: Time to Incorporate Genetics into Early Detection Tools. Clin Chem. 2019;65: 74–79.
OpenUrl Abstract/FREE Full Text
9.↵
Pharoah PDP, Antoniou AC, Easton DF, Ponder BAJ. Polygenes, risk prediction, and targeted prevention of breast cancer. N Engl J Med. 2008;358: 2796–2803.
OpenUrl CrossRef PubMed Web of Science
10.↵
Willoughby A, Andreassen PR, Toland AE. Genetic Testing to Guide Risk-Stratified Screens for Breast Cancer. J Pers Med. 2019;9. doi:10.3390/jpm9010015
OpenUrl CrossRef
11.↵
Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51: 584–591.
OpenUrl CrossRef PubMed
12.↵
Ruan Y, Lin Y-F, Feng Y-CA, Chen C-Y, Lam M, Guo Z, et al. Improving polygenic prediction in ancestrally diverse populations. Nat Genet. 2022;54: 573–580.
OpenUrl
13.↵
Márquez-Luna C, Loh P-R, South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium, Price AL. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet Epidemiol. 2017;41: 811–823.
OpenUrl CrossRef PubMed
14.↵
Elliott J, Bodinier B, Bond TA, Chadeau-Hyam M, Evangelou E, Moons KGM, et al. Predictive Accuracy of a Polygenic Risk Score-Enhanced Prediction Model vs a Clinical Risk Score for Coronary Artery Disease. JAMA. 2020;323: 636–645.
OpenUrl CrossRef PubMed
15.↵
McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB, et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011;4: 13.
16.↵
Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA, et al. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med. 2013;15: 761–771.
OpenUrl CrossRef PubMed
17.↵
Overview. In: NIMHD [Internet]. [cited 20 Apr 2023]. Available: https://www.nimhd.nih.gov/about/overview/
18.↵
Wand H, Lambert SA, Tamburro C, Iacocca MA, O’Sullivan JW, Sillari C, et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021;591: 211–219.
OpenUrl CrossRef PubMed
19.↵
Mars N, Lindbohm JV, Della Briotta Parolo P, Widén E, Kaprio J, Palotie A, et al. Systematic comparison of family history and polygenic risk across 24 common diseases. Am J Hum Genet. 2022;109: 2152–2162.
OpenUrl
20.↵
Wang Z, Choi SW, Chami N, Boerwinkle E, Fornage M, Redline S, et al. The Value of Rare Genetic Variation in the Prediction of Common Obesity in European Ancestry Populations. Front Endocrinol. 2022;13: 863893.
OpenUrl
21.↵
Khan A, Turchin MC, Patki A, Srinivasasainagendra V, Shang N, Nadukuru R, et al. Genome-wide polygenic score to predict chronic kidney disease across ancestries. Nat Med. 2022;28: 1412–1420.
OpenUrl
22.↵
Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet. 2018;50: 1219–1224.
OpenUrl CrossRef PubMed
23.↵
Ge T, Irvin MR, Patki A, Srinivasasainagendra V, Lin Y-F, Tiwari HK, et al. Development and validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. Genome Med. 2022;14: 70.
24.↵
Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, et al. Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Am J Hum Genet. 2017;100: 635–649.
OpenUrl CrossRef PubMed
25.↵
Loh P-R, Danecek P, Palamara PF, Fuchsberger C, A Reshef Y, K Finucane H, et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet. 2016;48: 1443–1448.
OpenUrl CrossRef PubMed
26.↵
Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012;44: 955–959.
OpenUrl CrossRef PubMed
27.↵
Linder JE, Allworth A, Bland ST, Caraballo PJ, Chisholm RL, Clayton EW, et al. Returning integrated genomic risk and clinical recommendations: The eMERGE study. Genet Med. 2023;25: 100006.
28.↵
Lee A, Mavaddat N, Wilcox AN, Cunningham AP, Carver T, Hartley S, et al. BOADICEA: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genet Med. 2019;21: 1708–1718.
OpenUrl CrossRef PubMed
29.↵
Ding Y, Hou K, Burch KS, Lapinska S, Privé F, Vilhjálmsson B, et al. Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification. Nat Genet. 2022;54: 30–39.
OpenUrl CrossRef

View the discussion thread.

Posted June 05, 2023.

Download PDF

Data/Code

Citation Tools

Subject Area

Genetic and Genomic Medicine

Subject Areas

All Articles

Addiction Medicine (386)
Allergy and Immunology (701)
Anesthesia (193)
Cardiovascular Medicine (2859)
Dentistry and Oral Medicine (326)
Dermatology (244)
Emergency Medicine (431)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1011)
Epidemiology (12569)
Forensic Medicine (10)
Gastroenterology (807)
Genetic and Genomic Medicine (4447)
Geriatric Medicine (402)
Health Economics (716)
Health Informatics (2856)
Health Policy (1050)
Health Systems and Quality Improvement (1050)
Hematology (376)
HIV/AIDS (893)
Infectious Diseases (except HIV/AIDS) (13986)
Intensive Care and Critical Care Medicine (831)
Medical Education (415)
Medical Ethics (114)
Nephrology (464)
Neurology (4201)
Nursing (223)
Nutrition (617)
Obstetrics and Gynecology (788)
Occupational and Environmental Health (723)
Oncology (2205)
Ophthalmology (626)
Orthopedics (254)
Otolaryngology (319)
Pain Medicine (269)
Palliative Medicine (83)
Pathology (488)
Pediatrics (1172)
Pharmacology and Therapeutics (489)
Primary Care Research (483)
Psychiatry and Clinical Psychology (3658)
Public and Global Health (6787)
Radiology and Imaging (1494)
Rehabilitation Medicine and Physical Therapy (869)
Respiratory Medicine (902)
Rheumatology (430)
Sexual and Reproductive Health (433)
Sports Medicine (369)
Surgery (473)
Toxicology (57)
Transplantation (202)
Urology (174)

[1] 1.↵
Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet. 2021;53: 420–425.
OpenUrl

[2] 2.↵
Polygenic Risk Score Task Force of the International Common Disease Alliance. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat Med. 2021;27: 1876–1884.

[3] 3.↵
Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nat Rev Genet. 2018;19: 581–590.
OpenUrl CrossRef PubMed

[4] 4.↵
Duncan L, Shen H, Gelaye B, Meijsen J, Ressler K, Feldman M, et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat Commun. 2019;10: 3328.
OpenUrl PubMed

[5] 5.↵
Hurson AN, Pal Choudhury P, Gao C, Hüsing A, Eriksson M, Shi M, et al. Prospective evaluation of a breast-cancer risk model integrating classical risk factors and polygenic risk in 15 cohorts from six countries. Int J Epidemiol. 2022;50: 1897–1911.
OpenUrl PubMed

[6] 6.↵
Inouye M, Abraham G, Nelson CP, Wood AM, Sweeting MJ, Dudbridge F, et al. Genomic Risk Prediction of Coronary Artery Disease in 480,000 Adults: Implications for Primary Prevention. J Am Coll Cardiol. 2018;72: 1883–1893.
OpenUrl FREE Full Text

[7] 7.
Guo F, Edelmann D, Cardoso R, Chen X, Carr PR, Chang-Claude J, et al. Polygenic Risk Score for Defining Personalized Surveillance Intervals After Adenoma Detection and Removal at Colonoscopy. Clin Gastroenterol Hepatol. 2023;21: 210–219.e11.
OpenUrl

[8] 8.↵
Fantus RJ, Helfand BT. Germline Genetics of Prostate Cancer: Time to Incorporate Genetics into Early Detection Tools. Clin Chem. 2019;65: 74–79.
OpenUrl Abstract/FREE Full Text

[9] 9.↵
Pharoah PDP, Antoniou AC, Easton DF, Ponder BAJ. Polygenes, risk prediction, and targeted prevention of breast cancer. N Engl J Med. 2008;358: 2796–2803.
OpenUrl CrossRef PubMed Web of Science

[10] 10.↵
Willoughby A, Andreassen PR, Toland AE. Genetic Testing to Guide Risk-Stratified Screens for Breast Cancer. J Pers Med. 2019;9. doi:10.3390/jpm9010015
OpenUrl CrossRef

[11] 11.↵
Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51: 584–591.
OpenUrl CrossRef PubMed

[12] 12.↵
Ruan Y, Lin Y-F, Feng Y-CA, Chen C-Y, Lam M, Guo Z, et al. Improving polygenic prediction in ancestrally diverse populations. Nat Genet. 2022;54: 573–580.
OpenUrl

[13] 13.↵
Márquez-Luna C, Loh P-R, South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium, Price AL. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet Epidemiol. 2017;41: 811–823.
OpenUrl CrossRef PubMed

[14] 14.↵
Elliott J, Bodinier B, Bond TA, Chadeau-Hyam M, Evangelou E, Moons KGM, et al. Predictive Accuracy of a Polygenic Risk Score-Enhanced Prediction Model vs a Clinical Risk Score for Coronary Artery Disease. JAMA. 2020;323: 636–645.
OpenUrl CrossRef PubMed

[15] 15.↵
McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB, et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011;4: 13.

[16] 16.↵
Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA, et al. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med. 2013;15: 761–771.
OpenUrl CrossRef PubMed

[17] 17.↵
Overview. In: NIMHD [Internet]. [cited 20 Apr 2023]. Available: https://www.nimhd.nih.gov/about/overview/

[18] 18.↵
Wand H, Lambert SA, Tamburro C, Iacocca MA, O’Sullivan JW, Sillari C, et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021;591: 211–219.
OpenUrl CrossRef PubMed

[19] 19.↵
Mars N, Lindbohm JV, Della Briotta Parolo P, Widén E, Kaprio J, Palotie A, et al. Systematic comparison of family history and polygenic risk across 24 common diseases. Am J Hum Genet. 2022;109: 2152–2162.
OpenUrl

[20] 20.↵
Wang Z, Choi SW, Chami N, Boerwinkle E, Fornage M, Redline S, et al. The Value of Rare Genetic Variation in the Prediction of Common Obesity in European Ancestry Populations. Front Endocrinol. 2022;13: 863893.
OpenUrl

[21] 21.↵
Khan A, Turchin MC, Patki A, Srinivasasainagendra V, Shang N, Nadukuru R, et al. Genome-wide polygenic score to predict chronic kidney disease across ancestries. Nat Med. 2022;28: 1412–1420.
OpenUrl

[22] 22.↵
Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet. 2018;50: 1219–1224.
OpenUrl CrossRef PubMed

[23] 23.↵
Ge T, Irvin MR, Patki A, Srinivasasainagendra V, Lin Y-F, Tiwari HK, et al. Development and validation of a trans-ancestry polygenic risk score for type 2 diabetes in diverse populations. Genome Med. 2022;14: 70.

[24] 24.↵
Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, et al. Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Am J Hum Genet. 2017;100: 635–649.
OpenUrl CrossRef PubMed

[25] 25.↵
Loh P-R, Danecek P, Palamara PF, Fuchsberger C, A Reshef Y, K Finucane H, et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet. 2016;48: 1443–1448.
OpenUrl CrossRef PubMed

[26] 26.↵
Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012;44: 955–959.
OpenUrl CrossRef PubMed

[27] 27.↵
Linder JE, Allworth A, Bland ST, Caraballo PJ, Chisholm RL, Clayton EW, et al. Returning integrated genomic risk and clinical recommendations: The eMERGE study. Genet Med. 2023;25: 100006.

[28] 28.↵
Lee A, Mavaddat N, Wilcox AN, Cunningham AP, Carver T, Hartley S, et al. BOADICEA: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genet Med. 2019;21: 1708–1718.
OpenUrl CrossRef PubMed

[29] 29.↵
Ding Y, Hou K, Burch KS, Lapinska S, Privé F, Vilhjálmsson B, et al. Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification. Nat Genet. 2022;54: 30–39.
OpenUrl CrossRef

Selection, optimization, and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse populations

Abstract

Introduction

Results

PRS Auditing and Evaluation

Selection, optimization, and validation

Population-based z-score calibration

Transfer and Implementation

Creation of report and pipeline for report creation, review, sign-out, release

Overview of first 2500 clinical samples processed

Discussion

Funding

Data Availability

Author Information

Authors and Affiliations

Conflict of Interest

Online Methods/Supplemental Information Clinical PRS Paper

A. Analytical and Technical Validation Studies

Broad Imputation Pipeline Overview

Broad Curated 1KG Reference Panel

Performance verification of the Imputation Pipeline

Performance evaluation of different input material types

Imputation repeatability and reproducibility

Imputation performance as a function of variant frequency

Impact of genotyping array call rate on imputation performance

Impact of imputation batch size on performance

Broad PRS Pipeline Overview

Validation of technical and analytical performance of the PRS pipeline

PRS Pipeline Accuracy

Input Material Performance

PRS pipeline repeatability

PRS pipeline reproducibility

PRS site representation

Performance verification using eMERGE I-III cohort

Validation of pipeline and ancestry adjustment in original case control cohorts

B. PRS Ancestry Calibration Overview

PCA method description

Generating trained models from All of Us data

Performance on Test Cohort

Acknowledgements

References

Citation Manager Formats

Subject Area