ABSTRACT
Prostate-specific antigen (PSA) screening for prostate cancer remains controversial because it increases overdiagnosis and overtreatment of clinically insignificant tumors. We evaluated the potential of genetic determinants of PSA levels to improve screening utility by accounting for variation in PSA not due to cancer. A multi-ancestry genome-wide meta-analysis of 95,768 men without prostate cancer discovered 128 PSA-associated index variants (P<5×10−8), including 82 novel signals. The resulting 128-variant polygenic score for PSA (PGSPSA) explained 7.3-8.8% of PSA variation in external validation cohorts. Adjusting PSA values using PGSPSA enabled more accurate diagnostic decisions, such as avoiding 17-20% of negative prostate biopsies. Genetically adjusted PSA was more predictive of aggressive prostate cancer (odds ratio (OR)=3.04, P=3.3×10−7; AUC=0.716) than unadjusted PSA (OR=2.80, P=3.2×10−6; AUC=0.684) and improved detection of aggressive disease when combined with a prostate cancer PGS (AUC: 0.732 vs. 0.645, P=3.3×10−4). We further showed that PSA-related selection bias distorts genetic associations with prostate cancer and hampers PGS performance. Our findings provide a roadmap towards personalizing cancer biomarkers and screening.
INTRODUCTION
Prostate-specific antigen (PSA) is a serine protease produced by the prostate gland and encoded by the kallikrein-3 (KLK3) gene1,2. Its primary function is to enable the release of motile sperm by degrading gel-forming seminal proteins1,3. PSA is secreted by normal prostate epithelial tissue, and when this basal layer becomes disrupted by a tumor, greater PSA concentrations are released into circulation1,2. PSA levels can also rise due to local prostatic inflammation or infection, benign prostatic hyperplasia, older age, and increased prostate volume2,4,5. There is an established inverse relationship between body mass index (BMI) and PSA levels, but it remains unclear whether it is due to decreased androgenic signaling in obese men or hemodilution6,7. Low PSA levels thus do not rule out prostate cancer and PSA elevation is not necessarily indicative of a tumor8.
PSA testing for prostate cancer detection has been used for over 20 years despite controversy surrounding its value. Some argue that PSA testing sufficiently reduces the burden of death from prostate cancer to warrant widespread implementation9. However, the long-term risk of lethal prostate cancer remains low, especially in men with PSA below the age-specific median10,11. This has led others to question whether the modest mortality benefits outweigh the costs of overdiagnosing and subsequently overtreating indolent disease12-14. Between 20 and 60% of prostate cancers detected using PSA testing are estimated to be overdiagnoses, although estimates vary by age group and definition of overdiagnosis15-17. As a result, non-lethal prostate cancers are often treated with therapies that can involve substantial side effects15,16. The two sides of the debate left the United States Preventive Services Task Force unable to give definitive advice regarding PSA screening for prostate cancer. Its Grade C recommendation indicates that the choice to undergo screening should be an individual one18. Clinical guidelines in Canada and the United Kingdom similarly advise against population-level screening19,20.
One avenue for refining the predictive value of PSA screening for prostate cancer detection is using a more personalized approach. Genetic factors account for over 40% of the variation in PSA levels21-23. Our group previously identified 40 independent loci in the largest genome-wide association study (GWAS) of PSA levels to date23. Accounting for variability in PSA due to underlying genetics would increase the cancer-related relative variation in PSA, thereby improving its predictive value for prostate cancer. An earlier study using just four PSA-associated variants showed that genetic correction of PSA reclassified 3% of participants to warranting biopsy and 3% to avoiding biopsy24. Incorporating additional genetic predictors of PSA variation therefore has potential to transform PSA testing toward reducing overdiagnosis-related morbidity and improving detection of lethal disease.
To maximize the utility of personalized PSA testing, it will be critical to distinguish variants associated with constitutive PSA levels from those that increase the risk of prostate cancer. Studies have identified many shared loci, including the KLK family of genes on chromosome 19q13.33 and pan-cancer susceptibility regions in 5p15.33. 8q24.21, and 10q26.123-27. Because individuals who are genetically predisposed to higher PSA levels are also likely to be screened for prostate cancer more frequently, they are also more likely to receive a prostate cancer diagnosis. As a result, it is possible that GWAS of prostate cancer capture signals for both disease risk and benign PSA elevation.
We present findings from the Precision PSA study, an extensive exploration of the genetics of PSA levels. First, we develop a polygenic score for PSA variation based on a new large GWAS meta-analysis of PSA levels in men without prostate cancer. Following the external validation of this score, we demonstrate how genetic adjustment of PSA levels can improve clinical decision-making related to biopsy and detection of aggressive prostate cancer. In parallel, we provide evidence that PSA-related screening bias influences prostate cancer GWAS and that correction for this bias improves prediction of prostate cancer endpoints. Taken together, our work advances the understanding of the genetic architecture of PSA variation and provides a novel framework for the clinical translation of these findings.
RESULTS
The study design and analytic strategy of the Precision PSA study is illustrated in Figure 1. We meta-analyzed results from a previously published GWAS in the Kaiser Permanente Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort23 with newly conducted GWAS in the UK Biobank (UKB), the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial, BioVU from the Vanderbilt University Medical Center, and the Malmö Diet and Cancer Study (MDCS). All discovery analyses were conducted in individuals never having been diagnosed with prostate cancer who had PSA values ≤10 ng/mL (see Methods). A total of 95,768 individuals were included in the GWAS meta-analysis. Across all contributing studies, individuals of predominantly European ancestry comprised the largest subgroup (NEUR=85,824), followed by participants of African ancestry (NAFR=3,509), East Asian ancestry (NEAS=3,337), and Hispanic/Latino individuals (NHIS/LAT=3,098).
Genome-wide association analyses were first conducted separately in European ancestry (EUR), African ancestry (AFR), East Asian ancestry (EAS) and Hispanic/Latin American (HIS/LAT) men without prostate cancer. Each contributing study and corresponding sample size are listed. Results from the multi-ancestry meta-analysis of 95,768 men was used to develop a PSA genetic score, which was validated in two independent cohorts and used to genetically adjust PSA levels. Downstream analyses examined how genetically adjusted PSA levels influence prostate biopsy eligibility and evaluated associations with prostate cancer risk.
Genetic Architecture of PSA Variation
To assess sensitivity to underlying modeling assumptions, the heritability (h2) of PSA levels was investigated in multiple datasets using several methods (see Methods). In the UKB, h2 was estimated based on PSA values abstracted from clinical records for 26,491 men of predominantly European ancestry, 54.6% of whom had multiple PSA measurements. Median PSA across all available values was 2.35 ng/mL (Supplementary Figure 1). Longitudinal PSA measurements summarized by the median value for each individual had higher h2 than subject-specific random intercepts derived from linear mixed models (Supplementary Table 1). Using the median, estimated PSA heritability was h2=0.41 (95% CI: 0.36-0.46) based on GCTA28 and h2=0.30 (95% CI: 0.26-0.33) based on LDAK29,30 (using common (MAF≥0.01) variants with imputation INFO>0.80 for both). GCTA estimates were higher than LDAK estimates across all genetic relatedness matrix (GRM) configurations, but these differences attenuated when restricting to genotyped variants.
Distribution of PSA values and measurements in the European ancestry men from the UK Biobank cohort that were included in the GWAS.
Applying LDAK to GWAS summary statistics from the same 26,491 UKB subjects produced similar heritability estimates (h2=0.35, 0.28-0.43) to GRM results (Supplementary Table 1). Heritability estimates based on other GWAS-based methods were lower, ranging from h2=0.21 (0.15-0.26) using linkage disequilibrium (LD) score regression to h2=0.25 (0.21-0.30) using a related high-definition likelihood approach31. Since LDAK produced more consistent results in GRM and GWAS summary statistics-based analyses, it was applied to the European ancestry GWAS meta-analysis of PSA levels (NEUR=85,824), yielding h2=0.30 (95% 0.29-0.31) (Figure 2). Sample sizes for non-European ancestry populations were too small to produce reliable heritability estimates.
Panel A) compares UK Biobank heritability (h2) estimates from GCTA and Linkage Disequilibrium Adjusted Kinships (LDAK) Thin models applied to a genetic relatedness matrix (GRM) of common (MAF≥0.01) LD-pruned (r2<0.80) variants with imputation quality INFO>0.80. Panel B) compares h2 estimated from UK Biobank GWAS summary statistics using the baseline linkage disequilibrium (BLD)-LDAK model and a high-definition likelihood (HDL) method by Ning et al 29. Panel C) presents h2 estimates using BLD-LDAK and HDL applied to the GWAS meta-analysis.
Discovery of Novel PSA-Associated Loci
Looking at individual studies, genome-wide analyses of PSA levels in the UKB identified 29 index variants with P<5×10−8 (using LD clumping (r2<0.01) within ±10 Mb windows). Six of these variants were independent of previously reported signals in GERA23: rs58235267 (OTX1) in 2p15; rs79625619 (THADA) in 2p21; rs9275602 in the HLA region; rs6506878 (SALL3) in 18q23; rs2150165 (CBFA2T2) in 20q11.21; and rs186347618 (TEX11) in Xq13.1 (Figure 3; Supplementary Table 2). Novel genome-wide significant associations with PSA were not detected in the other, smaller, contributing studies.
The genome-wide significance threshold of P<5×10−8 is indicated by the solid red line. Panel A) shows UK Biobank GWAS results where known PSA associations are labeled with the corresponding cytoband region and new loci labeled with the nearest gene. Panel B) shows results from the multi-ancestry meta-analysis, with the circular Manhattan plot providing a close-up view of the newly discovered loci. Peaks in dark blue include variants in linkage disequilibrium (r2≥0.01) with the lead novel variant. For parsimony, only one index variant, with lowest p-value, is labeled in each cytoband.
In the full Precision PSA study, the fixed effects multi-ancestry meta-analysis of 95,768 men from five studies identified 128 index variants (P<5.0×10−8, LD r2<0.01 within ±10 Mb windows) across 90 broadly defined regions corresponding to chromosomal cytobands (Figure 3; Supplementary Table 3). The strongest associations were observed in known PSA genes23,24,26,27, such as KLK3 in 19q13.33 (rs17632542, P=3.2×10−638), 10q26.12 (rs10886902, P=8.2×10−118), MSMB in 10q11.23 (rs10993994, P=7.3×10−87), NKX3-1 in 8p21.2 (rs1160267, P=6.3×10−83), CLPTM1L in 5p15.33 (rs401681, P=7.0×10−54), and HNF1B in 17q12 (rs10908278, P=2.1×10−46).
Of the 128 index variants, 82 were independent (LD r2<0.01) of previously reported23 PSA associations in GERA. They mapped to 56 cytobands where genome-wide significant signals for PSA have not previously been detected. Novel associations discovered in the UKB became stronger in the meta-analysis: TEX11 (rs62608084, P=1.7×10−24); THADA (rs11899863, P=1.7×10−13); OTX1 (rs58235267, P=4.9×10−13); SALL3 (rs71279357, P=1.8×10−12); and ST6GAL1 (rs12629450, P= 2.6×10−10) (Supplementary Table 3). Additional novel findings in the meta-analysis included CDK5RAP1 in 20q11.21 (rs291671, P=1.2×10−18), LDAH in 2p24.21 (rs10193919, P=1.5×10−15), ABCC4 in 13q32.1 (rs61965887, P=3.7×10−14), INKA2 in 1p13.2 (rs2076591, P=2.6×10−13), SUDS3 in 12q24.23 (rs1045542, P=1.2×10−13), FAF1 in 1p32.3 (rs12569177, P=3.2×10−13), JARID2 in 6p22.3 (rs926309, P=1.6×10−12), GPC3 in Xq26.2 (rs4829762, P=5.9×10−12), EDA in Xq13.1 (rs2520386, P=4.2×10−11), and ODF3 in 11p15.5 (rs7103852, P=1.2×10−9) (Supplementary Table 3).
In the European ancestry only meta-analysis (NEUR=85,824), 96 of the 128 PSA index variants reached genome-wide significance, compared with three in the East Asian ancestry analysis (KLK3: rs2735837, rs374546878; MSMB: rs10993994; NEAS=3,337), two in the Hispanic/Latino meta-analysis (KLK3: rs17632542, rs2735837; NHIS/LAT=3,098), and only one (FGFR2: rs10749415; NAFR=3,509) in the meta-analysis of results from African ancestry men (Supplementary Table 4). Effect sizes from the European ancestry meta-analysis were modestly correlated with effect sizes from the Hispanic/Latino meta-analysis (Spearman’s ρ=0.48, P=1.1×10−8) and African ancestry meta-analysis (ρ=0.27, P=2.0×10−3), but less so with estimates in East Asian individuals (ρ=0.16, P=0.068) (Supplementary Figure 2).
Ancestry-specific effect sizes and corresponding correlations for 128 PSA-associated variants identified in the multi-ancestry meta-analysis. Correlations were estimated using Spearman’s rho (ρ).
Cochran’s Q indicated evidence of heterogeneity (PQ<0.05) for 12 out of 128 index variants, four of which had effects on PSA in different directions across ancestry-specific meta-analyses: rs58235267 (OTX1), rs1054713 (KLK1), rs10250340 (EIF4HP1), and rs7020681 (SLC35D2) (Supplementary Table 5). To further explore ancestry-specific signals and effect heterogeneity due to ancestry, we applied MR-MEGA32. This meta-analysis identified 119 genome-wide significant index variants, 115 of which also attained P<5.0×10−8 using the standard fixed-effects approach (Supplementary Table 5). Only one variant detected by MR-MEGA mapped to a new PSA-associated region (rs291812 in 5q15). It exhibited the strongest association with PSA in men of East Asian ancestry (PEAS=1.2×10−6) (Supplementary Table 6). Ancestry-related allelic heterogeneity was observed for 18 variants (PHet-Anc<0.05), 8 of which were also detected by Cochran’s Q.
Since the fixed-effects meta-analysis discovered a larger number of PSA-associated variants than MR-MEGA, the 128 index variants detected using the former approach were used to construct a polygenic score for PSA (PGSPSA) and carried forward to other analyses. The predicted functional consequences of the 128 variants were explored using CADD scores33 and expression quantitative trait loci (eQTL) from prostate tissue in GTEx v8 and whole blood in eQTLGen34. A total of 16 out of 128 variants had CADD scores >13 (corresponding to the top 5% most deleterious substitutions), which included 10 new signals: rs10193919 (LDAH) in 2p24.21; rs7732515 in 5q14.3 (P=3.53×10−14); rs11899863 (THADA); rs58235267 (OTX1); rs926309 (JARID2); rs4829762 (GPC3), rs13268 (P=1.59×10−10), a missense variant in FBLN1; rs78378222 (P=2.79×10−10) in TP53, rs3760230 (P=2.77×10−8) in SMG6; and rs712329 (P=2.01×10−8) in SLC25A21 (Supplementary Table 7).
A total of 61 variants had significant (FDR<0.05) effects on gene expression, including 15 prostate tissue eQTLs for 17 eGenes, 55 blood eQTLs for 185 eGenes, and 9 eQTLs with effects in both tissues (Supplementary Table 7). Notable prostate eGenes among PSA loci included RUVBL1, a chromatin-remodeling factor that has diverse cellular functions, such as modulating transcription of MYC and β-catenin and pro-inflammatory responses via NF-κB35. Two eGenes were identified for rs10193919: LDAH in prostate and blood tissues and HS1BP3 in blood. LDAH promotes cholesterol mobilization in macrophages, which has been linked to prostate cancer and hearing loss36, and HS1BP3 plays a role in lymphocyte activation. The lead variant in 11p15.5 (rs7103852) had 7 target eGenes in blood, including ODF3, which maintains the elastic structures part of the sperm tail37, as well as IFITM2 and IFITM3, interferon-induced antiviral proteins.
Impact of PSA-Related Selection Bias on Prostate Cancer GWAS
To characterize the overlap between genetic loci involved in regulation of PSA levels and prostate cancer susceptibility, we obtained GWAS summary statistics from the PRACTICAL consortium38. Of the 128 lead PSA variants, 58 (45%) were associated with prostate cancer risk at the Bonferroni-corrected threshold (p<0.05/128) in the PRACTICAL multi-ancestry GWAS (Supplementary Table 8). The PSA-increasing allele was the risk-increasing allele for 53 out of 58 Bonferroni-significant variants. Next, we investigated whether index event bias, a type of selection bias, could partly explain these shared genetic signals39,40 (see Methods; Figure 4). Since prostate cancer detection often hinges on PSA elevation, genetic factors resulting in higher constitutive PSA levels may appear to increase disease risk because of more frequent screening, resulting in biased signals for prostate cancer susceptibility.
Panel A) depicts how selection on PSA levels induces an association between genetic variant G and U (a composite confounder that captures polygenic and non-genetic factors). This leads to an association with prostate cancer (PrCa) via path G – U → PrCa (blue dotted line) in addition to the direct G → PrCa effects. Gray dotted lines show how PSA is not only a disease biomarker, but also influences the likelihood of PrCa detection via screening. Panel B) shows the impact of correction for PSA-related selection bias on associations with PrCa for 128 lead PSA-associated variants. Panel C) compares original and bias-corrected association with PrCa for 209 independent index variants selected from the PRACTICAL GWAS by Conti et al.36 All analyses restricted to European ancestry.
The method by Dudbridge et al.39 generated a positive estimate of the index event bias correction factor (b=1.144, 95% CI: 1.143-1.144) from a regression of prostate cancer log odds ratios (OR) on PSA coefficients for a set of LD-pruned variants in European ancestry subjects (Supplementary Table 9). Using multi-ancestry summary statistics for PSA and prostate cancer yielded similar estimates (b=1.104), although these should be interpreted cautiously due to LD differences across populations. Sensitivity analyses using SlopeHunter41, which attempts to cluster pleiotropic variants separately from those associated with the selection trait only, produced attenuated estimates (b=0.476, 95% CI: 0.213-0.740).
After applying the Dudbridge estimate to recover unbiased associations with prostate cancer, the number of lead PSA variants associated with prostate cancer in European ancestry subjects decreased from 52 to 34 (Figure 4; Supplementary Table 8). For six of the PSA variants that remained associated with prostate cancer, the effect of the PSA-increasing allele changed from increasing to decreasing for prostate cancer risk: rs17632542 (KLK3: ORadj=0.89, Padj=2.66×10−12), rs2735837 (KLK3: ORadj=0.94, Padj=2.10×10−4), rs7065158 (EIF2S3, ORadj=0.97, Padj=1.01×10−5), rs9325569 (WDR11: ORadj=0.96, Padj=1.94×10−6), rs7206309 (16q23.1: ORadj=0.96, Padj=8.26×10−6), and rs10466455 (11p13: ORadj=0.97, Padj,= 9.66×10−5). We also evaluated how correction for PSA-related selection bias impacts associations between prostate cancer risk variants and disease risk. Of the 209 independent prostate cancer risk variants (P<5.0×10−8) selected from the PRACTICAL European ancestry meta-analysis using LD clumping (r2<0.01), 93 (45%) remained genome-wide significant after bias correction, one of which (rs76765083 in KLK3) reversed direction (Figure 4; Supplementary Table 10).
Impact of PSA-Related Bias on Genetic Risk Score Associations
We fit the most recent 269-variant prostate cancer polygenic score (PGS269)38 in the UKB and examined its association with a polygenic score for PSA (PGSPSA) comprised of the 128 lead variants from the multi-ancestry meta-analysis. Limiting analyses to male UKB participants without PSA data (who were excluded from the GWAS) showed a strong positive relationship between the two genetic scores in prostate cancer cases (β=0.245, P=4.16×10−163; n=11,568) and controls (β=0.283, P<10−700; n=152,884) (Figure 5, Supplementary Table 11). Re-fitting PGS269 using bias-corrected risk allele weights (PGS269adj) substantially attenuated this association in cases (βadj=0.040, P=1.18×10−5) and controls (βadj=0.047, P=6.08×10−73).
Association between PGS for PSA (PGSPSA) and PGS for prostate cancer (PGS269) fit using original weights is compared to PGS269 fit using weights that have been adjusted for index event bias (PGS269adj). Panel A) visualizes the regression line for the PGS association in cases overlaid on individual data points summarized as hexbins. Panel B) visualizes results of the same regression in controls. Analyses were restricted to European ancestry men in the UK Biobank (UKB) that were not included in the PSA GWAS or prostate cancer GWAS from PRACTICAL.
To further characterize the impact of PSA-related bias, we examined PGS269 associations with prostate cancer in 3673 cases and 2363 biopsy-confirmed cancer-free controls from the GERA cohort. Bias-corrected PGS269adj had a larger magnitude of association with prostate cancer (OR for top decile=3.63, P=4.87×10−42) than the standard PGS269 (OR for top decile=2.71, P=2.85×10−30) and yielded a higher area under the curve based on 10-fold cross-validation (AUC: 0.685 vs. 0.677, P=3.91×10−3) (Supplementary Table 12). The impact of bias correction was more pronounced for tumors with Gleason score ≥7 (PGS269adj AUC=0.692 vs. PGS269 AUC=0.678, P=1.91×10−3). However, we note that AUC estimates in GERA are optimistic because this study was included in the prostate cancer GWAS38 used to develop PGS269.
The apparent benefit of correcting for PSA-related bias was consistent with inverse PGSPSA and PGS269 associations with Gleason score observed in case-only analyses (Supplementary Table 13). Men with higher PGSPSA were less likely to have tumors with Gleason score 7 (OR per SD increase=0.80, P=3.97×10−10) or Gleason ≥8 (OR=0.77, P=4.45×10−6) than Gleason ≤6. Patients in the top decile of PGS269 were nearly 30% less likely to be diagnosed with Gleason ≥8 disease (OR=0.72, P=0.024) than Gleason ≤6 disease, but this inverse relationship attenuated after bias adjustment (PGS269adj: OR=0.93, P=0.61).
Validation of the PSA Genetic Score (PGSPSA)
Prior to exploring the potential clinical utility of genetically adjusted PSA values, we validated PGSPSA in the Prostate Cancer Prevention Trial (PCPT) and Selenium and Vitamin E Cancer Prevention Trial (SELECT), both of which were excluded from the PSA discovery GWAS. In PCPT participants of predominantly (≥0.80) European ancestry (n=5725), PGSPSA was robustly associated with baseline log(PSA) levels (effect per SD increase in PGS: βPGS=0.169, P=5.29×10−98) and accounted for 7.33% of trait variance (Figure 6; Supplementary Table 14). Sample sizes for other ancestry groups in PCPT were insufficient (n≤103) to evaluate associations. PGSPSA was associated with baseline log(PSA) in all age groups, although its effects attenuated in participants aged 70 or older (Figure 6).
Performance of PGSPSA was evaluated in the Prostate Cancer Prevention Trial (PCPT) and Selenium and Vitamin E Cancer Prevention Trial (SELECT). Panels A) and B) depict associations with baseline log(PSA) and quantiles of PGSPSA in each trial. Panels C) and D) show effect estimates per standard deviation increase in the standardized PGSPSA on baseline log(PSA) in each trial, overall and stratified by age group. Panels E) and F) show the distribution of genetically adjusted PSA values (PSAG) in each cohort, with the horizontal line at 4 ng/mL denoting a PSA threshold commonly used to indicate further diagnostic testing.
SELECT offered a larger validation cohort in which to assess PGSPSA performance across a wider ancestry spectrum (Figure 6; Supplementary Table 14). PGSPSA was predictive of baseline PSA in 22,253 European ancestry cancer-free subjects (βPGS=0.213, P=4.24×10−478), accounting for 8.78% of trait variation. PGSPSA was substantially less predictive in other ancestry groups. In 1173 individuals of predominantly African ancestry (≥0.80 AFR), PGSPSA accounted for 3.45% of phenotypic variation (βPGS=0.163, P=8.22×10−11), slightly exceeding the variance explained by age (2.84%) (Supplementary Table 14). Among individuals with intermediate African and European ancestry (0.2<AFR/EUR<0.80; n=1763), PGSPSA was associated with baseline PSA levels (βPGS=0.146, P=3.00×10−15), but explained less variation than age (3.32% vs. 4.23%). Predictive performance of PGSPSA was poorest in 257 subjects of predominantly East Asian ancestry (βPGS=0.136, P=0.012).
Given the most convincing validation of PGSPSA in European ancestry subjects, we applied it to calculate genetically adjusted PSA values (PSAG) in European ancestry individuals in PCPT and SELCT. For each subject, baseline or earliest pre-randomization PSA values were adjusted based on their PGSPSA relative to the PGSPSA population mean (see Methods for details). Genetically adjusted PSAG and unadjusted baseline PSA were strongly correlated in PCPT (Pearson’s r=0.851, 95% CI: 0.843 – 0.858) and SELECT (r=0.872, 0.869 – 0.875). The number of participants with PSAG>4 ng/mL, a commonly used threshold for further diagnostic testing, increased from 0 to 20 in PCPT and from 4 to 337 in SELECT (Figure 6), reflecting the preferential selection of subjects with low PSA into these trials.
Genetic Adjustment of PSA Values Affects Eligibility for Prostate Biopsy
Having demonstrated that adjustment of PSA values using PGSPSA results in appreciable shifts in the PSA distribution, we examined whether PSAG could improve decision-making related to prostate biopsy. We examined re-classification of eligibility for biopsy based on age-specific PSA thresholds used in Kaiser Permanente: 40-49 years old = 2.5 ng/ml; 50-59 years old = 3.5 ng/ml; 60-69 years old = 4.5 ng/ml; and 70-79 years old = 6.5 ng/ml. In a subset of GERA cancer cases (n=3673) and controls (n=2363) between ages 40 and 90 who had undergone a biopsy, we adjusted each person’s PSA value closest to the date of biopsy based on their PGSPSA. Mean PSA levels in controls who were biopsied (7.2 ng/mL) were substantially higher than in controls who did not have a prostate biopsy (1.5 ng/mL; n=24,811) (Supplementary Table 15).
There was noteworthy reclassification among controls, with 19.6% reclassified from above to below PSA cut points for biopsy eligibility, and 1.7% reclassified from below to above, for a biopsy net reclassification index (NRI) of 0.179 (Figure 7; Supplementary Table 15). There was also a higher proportion of cases re-classified downward (13.1%) than upward (2.5%), resulting in a negative case NRI (−10.6%). The subset of men who underwent a biopsy was enriched for genetic predisposition to PSA elevation. Mean values of the standardized PGSPSA were above zero in cases and controls
; therefore, downward adjustment of PSA values was expected (Supplementary Table 15). Most of the reclassifications below the biopsy eligibility threshold in cases occurred in those with Gleason score <7 (71.4%). The prevalence of downward re-classification was also high among patients aged <65 years (15.3%). The overall NRI for biopsy was positive (0.073), which suggests that PSAG has potential clinical utility in this setting, assuming that changes in eligibility in either direction are valued equivalently.
Each Sankey diagram illustrates changes in PSA values after genetic adjustment and the resulting reclassification at PSA thresholds used to recommend prostate biopsy in Kaiser Permanente. Panel A) depicts prostate cancer cases stratified by Gleason score categories, where Gleason <7 represents potentially indolent disease. Panel B) illustrates reclassification in a subset of controls who underwent a prostate biopsy. Analyses were conducted in European ancestry subjects from the Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort.
Genetic Adjustment of PSA Levels Improves Prostate Cancer Detection
We evaluated the potential utility of genetically adjusted PSA, alone and in combination with prostate cancer PGS269, by examining associations with prostate cancer risk in PCPT (323 cases and 5414 controls). End-of-study biopsies were performed in all PCPT participants, which effectively eliminated potential for misclassification of case status due to undiagnosed, asymptomatic disease. PGSPSA was not associated with prostate cancer incidence (OR=0.98, P=0.71), which confirms that this score captures genetic determinants of non-cancer PSA variation. The association with prostate cancer was slightly larger in magnitude for genetically adjusted baseline log(PSAG) (OR per unit increase=1.93, P=3.58×10−11) than baseline log(PSA) (OR=1.88, P=1.82×10−10), although the difference in AUC was not statistically significant (AUC: 0.650 vs. 0.639, P=0.14) (Supplementary Table 16). The magnitude of association with prostate cancer was larger for bias-corrected PGS269adj (OR=1.57, P=1.92×10−14) than standard PGS269 (OR=1.53, P=4.59×10−13) (Supplementary Table 16). The model that included both PGS269 and PSAG achieved the highest AUC of 0.690. It also outperformed PGS269 alone (AUC: 0.690 vs. 0.655, P=2.63×10−4), as well as PGS269 combined with measured PSA (AUC: 0.690 vs. 0.682, P=0.032). The correlation between PGS269 and PSAG (β=0.029) was lower than the correlation between PGS269 and PSA (β=0.072), and PGS269adj was not correlated with PSA (β=-0.001) (Supplementary Figure 3). This suggests that genetic adjustment of PSA and bias correction of PGS269 make these predictors more orthogonal.
Associations with baseline PSA and genetically adjusted baseline PSA in PCPT with the standard prostate cancer polygenic score (PGS269) and using weights that have been adjusted for index event bias (PGS269adj). Each panel visualizes the regression line of the PGS association, overlaid on individual data points summarized as hexbins.
The added value of PSA genetic adjustment was more pronounced for aggressive prostate cancer (71 cases: Gleason score ≥7, PSA ≥ 10 ng/mL, T3-T4 stage, and/or distant or nodal metastases; 5415 controls). In PCPT, PSAG conferred an approximately 3-fold increase in risk (OR=3.04, P=3.34×10−7; AUC=0.716), compared to an approximately 1.5-fold increase observed for PGS269adj (OR=1.54, P=2.95×10−4; AUC=0.657) and PGS269 (OR=1.45, P=2.21×10−3; AUC=0.645) (Figure 8; Supplementary Table 17). In case-only analyses comparing risk of aggressive to non-aggressive disease, PSAG (OR=2.06, P=7.35×10−3) and baseline PSA (OR=1.81, P=0.026) remained associated with prostate cancer risk, whereas PGS269 did not (OR=0.90, P=0.50) (Supplementary Table 18). Associations with risk of aggressive prostate cancer were replicated in 85 cases and 21,795 controls from SELECT. PSAG (OR=3.38, P=3.35×10−11) improved discrimination relative to baseline PSA (AUC: 0.775 vs. 0.742, P=0.044) and relative to PGS269 (AUC: 0.775 vs. 0.726, P=0.057) (Figure 8; Supplementary Table 19). The best model for aggressive prostate cancer in SELECT included PSAG and PRS269adj and achieved an AUC of 0.803 (95% CI: 0.758 – 0.848).
Comparison of models for aggressive disease, defined as Gleason score ≥7, PSA ≥ 10 ng/mL, T3-T4 stage, and/or distant or nodal metastases in European ancestry participants from the Prostate Cancer Prevention Trial (PCPT) and Selenium and Vitamin E Cancer Prevention Trial (SELECT). Area under the curve (AUC) estimates are based on fully adjusted logistic regression models that include age, randomization arm, and the top 10 genetic ancestry principal components. Odds ratios were estimated per one unit increase in log(PSA), log(PSAG), and standardized prostate cancer genetic risk score (PGS269).
DISCUSSION
Serum PSA is the most widely used biomarker for prostate cancer detection, although concerns with specificity, and to a lesser degree sensitivity, have limited formal adoption of PSA testing for population-level screening. With the goal of improving its accuracy, we conducted a GWAS of PSA levels in 95,768 men without prostate cancer, which established the heritability of PSA variation to be between 30% to 40% and identified 128 PSA-associated index variants. In addition, our study provides new evidence that genetic determinants of PSA levels can be used to personalize and enhance the utility of PSA screening.
Leveraging genetic profiles to personalize clinical biomarkers enables the translation of GWAS discoveries into clinical practice. This concept has been referred to as “de-Mendelization,” since it is essentially Mendelian randomization in reverse – instead of relying on genetically-predicted biomarker values to investigate causal relationships, subtracting the component of variance attributed to genetic factors for non-causal predictive biomarkers can maximize the residual disease-related signal and yield appreciable improvement in disease prediction 42,43. While doing so has been alluded to in previous work on PSA24,44 and other biomarkers42,45,46, the value of this approach for detecting clinically meaningful disease and reducing unnecessary diagnostic testing has not been demonstrated prior to this study.
A personalized PSA adjustment factor was calculated using a PGSPSA comprised of the 128 GWAS-identified variants, which accounted for approximately 7% of total PSA variance in external validation cohorts. Each person’s PSA value was subsequently normalized relative to the PGSPSA population mean. For those with above average PGSPSA, reflecting an inherited predisposition to PSA elevation, their measured PSA values were adjusted downward, whereas those with lower PGSPSA received an upward correction. For instance, if we consider two 60-year-old men on opposite ends of the PGSPSA spectrum with PSA equal to 2.5 ng/mL, the genetically adjusted value for the man with the high PGSPSA (i.e., a high constitutive PSA) could decrease to 1.7 ng/mL, whereas the man with the low PGSPSA could shift to a potentially high-risk level of 4.5 ng/mL.
Our analyses of real-word data showed that genetic adjustment produces clinically meaningful shifts in PSA distribution. Relying on genetically adjusted PSA values would result in a 20% reduction of biopsies in men who are later found to be cancer-free. Among patients who had a tumor detected, PSAG reclassified 2.4% above the biopsy eligibility threshold and 10-15% below. Although loss of sensitivity in cases is undesirable, the majority of such reclassifications occurred in patients with non-aggressive disease characteristics (Gleason score <7), who are susceptible to overdiagnosis14. Furthermore, most patients already had PSA values at or above the biopsy referral cutoff, making it impossible to observe large increases in biopsy eligibility. Although the magnitude of reclassification may be specific to the GERA cohort and Kaiser Permanente testing guidelines, our findings still indicate that genetically adjusted PSA may reduce the overdiagnosis and overtreatment.
In addition to informing decisions related to diagnostic procedures, we found that genetically adjusted PSA values have a larger magnitude of association with risk of prostate cancer and significantly improve classification of disease status compared to baseline PSA. Analyses in PCPT and SELECT showed that PSAG was a more robust predictor of aggressive prostate cancer, relative to both baseline PSA and an established 269-variant prostate cancer risk score. Furthermore, we show that GWAS-identified prostate cancer risk variants, including those in PGS269, are affected by a systematic PSA-related selection bias. Correcting for this bias represents an extension of the PSA de-Mendelization paradigm and resulted in improved PGS performance, while allowing for the same variants to contribute information.
Distinguishing variants that influence prostate cancer detection via PSA screening from genetic signals for prostate carcinogenesis has implications not only for deciphering susceptibility mechanisms, but also for the development of more effective genetic risk prediction models. However, differentiating between these classes of variants using standard methods, like conditional analysis, may not be feasible. Prostate cancer detection directly depends on PSA testing, while PSA screening activities are in part influenced by genetic factors affecting constitutive PSA variation. Methods that attempt to model the bias arising from this complex dependency on a genome-wide scale39,47 suggest that the magnitude of PSA-related selection bias may be substantial. Using summary statistics from the largest GWAS of prostate cancer from PRACTICAL38, we observed that less than 50% of the selected index variants remained genome-wide significant after calculating bias-corrected estimates.
This reduction in signal does not imply that half of prostate cancer GWAS associations are false, but rather suggests that bias-corrected effect sizes may be more accurate. This aligns with our findings that prediction of prostate cancer status improves proportionally to the extent that both PSA and PGS269 are de-noised of genetic signals for PSA elevations not attributable to prostate cancer. Adjusting risk allele weights may be an optimal strategy specifically for PGS269 because this score is comprised of fine-mapped variants that already have a high posterior probability of being causal. Therefore, tuning their weights is sufficient to increase accuracy. The improvement following correction for PSA-related bias was observed in all analyses, but was most pronounced in analyses of men with high PSA. The relative improvement in AUC achieved by PGS269adj compared to PGS269 was highest in GERA participants who underwent a biopsy, where controls had markedly higher PSA levels than in the remainder of the cohort. However, this trend was also observed in PCPT, a starkly different clinical trial population of men with baseline PSA ≤3 ng/mL8,48. Furthermore, the magnitude of association with aggressive prostate cancer was consistently larger for PGS269adj than the standard PGS269 in both PCPT and SELECT, which enrolled men with PSA ≤4 ng/mL49. These results become intuitive considering that bias correction decreases the correlation with PGSPSA, which is associated with a higher likelihood of low-grade disease.
A clear pattern emerging from our study is that refining risk stratification and personalizing screening for prostate cancer will require parallel efforts to elucidate the genetic architecture of PSA variation and prostate cancer in individuals without disease. Our multi-ancestry GWAS of PSA levels advances these efforts with the discovery of 82 PSA-associated variants that are novel based on conservative LD criteria. Many novel variants map to genes involved in embryonic development, epigenetic regulation, and chromatin organization, including DNMT3A, OTX1, CHD3, JARID2, HMGA1, HMGA2, and SUDS3. DNMT3A is a methyltransferase that regulates imprinting and X-chromosome inactivation. Its role in clonal hematopoiesis and hematologic cancers has been studied extensively50, and DNMT3A variants have also been consistently associated with height51. One of the highest CADD scores, indicative of TF-binding activity, was detected for rs58235267 in OTX1, which regulates the development of cortical, sensory, and mammary organs. CHD3 is also involved in chromatin remodeling during development and plays a role in suppressing herpes simplex virus infection52. In fact, there were several PSA-associated variants in genes related to infection and immunity, including HLA-A; ST6GAL1, involved in IgG N-glycosylation53; KLRG1, which regulates NK cell function and IFN-γ production54; and FUT2, which affects ABO precursor H antigen presentation in mucosal tissues and confers susceptibility to multiple viral and bacterial infections55,56.
Several new PSA signals mapped to genes involved in reproductive processes, which may reflect non-cancer function of PSA in liquefying seminal fluid. TEX11 on Xq13.1 is preferentially expressed in male germ cells and early spermatocytes. Mutations in TEX11 cause meiotic arrest and azoospermia, and this gene also regulates homologous chromosome synapsis and double-strand DNA break repair57. ODF3 encodes a component of sperm flagella fibers and has also been linked to regulation of platelet count and volume58. PLAC1 is involved in placenta development, although there is some evidence that it is differentially expressed among healthy, hyperplastic, and neoplastic prostate tissues59.
Although our GWAS was restricted to men without prostate cancer, several cancer susceptibility genes were among the newly identified PSA-associated loci, including a pan-cancer risk variant in TP53 (rs78378222)60,61, as well as signals in TP63, GPC3, and THADA. We cannot rule out the presence of undiagnosed prostate cancer, but the prevalence of undetected tumors is unlikely to be high enough to have an appreciable impact on GWAS results. This is supported by the observation that PGSPSA was not associated with prostate cancer in PCPT and SELECT. Pervasive pleiotropy and an omnigenic architecture62 may explain the diverse functions of PSA-associated genes. As GWAS sample sizes and power increase, many of the newly identified PSA loci are broadly implicated in disease susceptibility by regulating inflammation, epigenetic regulation, and growth factor signaling. There is also evidence that even established tumor suppressor genes have pleiotropic effects. For instance, TP53, GPC3, and THADA have been linked to anthropometric traits and obesity via dysregulation of cell growth and metabolism63-66. Distinct p63 isoforms play a crucial role in epithelial and craniofacial development, as well as apoptosis of male germ cells and spermatogenesis67,68. Mutations in GPC3 cause Simson-Golabi-Behmel syndrome, which is characterized by overgrowth with visceral and skeletal abnormalities and excess risk of embryonic tumors69.
Our investigation of index event bias is not without limitations. A fundamental but unrealistic assumption of the Dudbridge method is that direct genetic effects on PSA levels and prostate cancer susceptibility are not correlated39. Violations of this assumption would over-attribute shared genetic signals to selection bias. SlopeHunter relaxes this assumption41, resulting in an attenuated bias estimate. This approach relies on clustering to distinguish PSA-specific from pleiotropic variants and uses the latter to estimate the bias correction factor41. Poorly separated or small clusters may result in unstable or imprecise estimates. Furthermore, both methods assume that a single bias correction factor applies to all variants, although signals at some loci may be more biased than others, depending on the magnitude of pleiotropy. Despite these limitations, all sensitivity analyses detected the presence of a non-zero PSA-related bias in the prostate cancer GWAS. Disentangling PSA and prostate cancer associations with a greater certainty will require experimental approaches, such as CRISPR screens and massively parallel reporter assays.
Biopsy reclassification analyses in GERA may be biased since GERA controls were part of the PSA discovery GWAS (30% of sample size). We attempted to mitigate this by using an out-of-sample mean PGSPSA value from the UKB to calculate the genetic adjustment factor. Nonetheless, doing so may have been insufficient to reduce inflation in the PSA adjustment, resulting in greater reduction in variance of PSAG. Since it is unlikely that men with low PSA would have been biopsied, there are also limited opportunities to increase biopsy eligibility in this dataset. Because controls who underwent a biopsy had higher PSA and PGSPSA values than other GERA controls, downward correction of PSA values was to be expected. The same constraint applies to cases, most of whom were already eligible for biopsy, although some procedures were performed before the current guidelines were implemented in Kaiser Permanente. In contrast, there was more of an upward trend in PSAG in PCPT and SELECT, since these trials selected for men with low PSA.
Our genetic score-based approach offers a contemporary update to the first application of genetic correction of PSA by Gudmundsson et al.24 The straightforward calculation of the PSA genetic correction factor would have a relatively low adoption barrier in clinical settings, although the accuracy of the genetic adjustment could be improved with more sophisticated PGS modeling approaches. Furthermore, the choice of reference population for calculating the correction factor is not trivial. For instance, using the PCPT mean of PGSPSA to obtain PSAG values yielded an upward correction of smaller magnitude than genetic adjustment relative to the mean PGSPSA in a population-based cohort like the UKB. Although predictive performance would remain unaffected, such choices will impact clinical decisions based on thresholds for absolute PSA values.
Studies in a wider range of populations and real-world settings will be required to further evaluate the performance of genetically corrected PSA values and inform their application. Despite many promising findings in our study, an important limitation is that all assessments of clinical utility were conducted in populations of predominantly European ancestry. Although our GWAS included available data from multiple ancestry groups, the resulting PGSPSA was dominated by association signals detected in European ancestry subjects, which made up over 90% of the analytic sample. As such, PGSPSA performance was substantially lower in men of predominantly African or East Asian ancestry, as well as populations with more complex admixture. Multi-ancestry GWAS efforts at a much larger scale are currently under way and will greatly augment the catalog of PSA-associated variants and their utility.
An important outstanding question is whether genetically adjusted PSA improves prediction of prostate cancer mortality. Future lines of inquiry should also investigate a range of PSA measures and related biomarkers. Our choice of the median PSA value reduces the influence of fluctuations in PSA due to infection or inflammation. However, there may be genetic signals specific to temporal PSA dynamics, such as PSA velocity or doubling time, which may improve the accuracy of genetic correction of PSA trajectories. Our study focused on total PSA, although serum PSA exists in multiple forms, and studies have suggested that other PSA derivatives, such as the ratio of free to total PSA and pro-PSA, may have higher specificity for prostate cancer detection70,71. Nonetheless, we believe our approach is broadly applicable and may improve the accuracy of any heritable PSA biomarker, with varying degrees of improvement. We also envision that genetically adjusted PSA levels may become useful in clinical research settings to enable more refined selection of subjects into trials of specific screening protocols.
In summary, by detecting many genetic variants associated with non-prostate cancer PSA variation, we developed a novel PGSPSA that measures the contribution of common genetic variants to a man’s inherent PSA level. Genetic determinants of PSA provided an avenue for refining prostate cancer GWAS signals by mitigating selection bias due to PSA screening, and for improving disease prediction. Moreover, we used the PGSPSA to calculate genetically adjusted, personalized PSA levels that provide clinically meaningful improvements in prostate cancer diagnostic outcomes. These results illustrate a roadmap for incorporating genetic factors into PSA screening for prostate cancer and expanding this potentially valuable approach to other diagnostic biomarkers.
METHODS
Study Populations and Phenotyping
Genome-wide association analyses of PSA levels were conducted in individuals never diagnosed with prostate cancer to avoid reverse causation. Men with a history of surgical resections of the prostate were also excluded in studies for which this information was available. All analyses were limited to PSA values ≤10 ng/mL, which corresponds to low-risk prostate cancer based on the D’Amico prostate cancer risk classification system72, and PSA>0.01 ng/mL, to ensure that subjects had a functional prostate not impacted by surgery or radiation.
The UK Biobank (UKB) is a population-based prospective cohort of over 500,000 individuals aged 40-69 years at enrollment in 2006-2010 with genetic and phenotypic data73. Health-related outcomes were ascertained via individual record linkage to national cancer and mortality registries and hospital in-patient encounters. For a subset of UKB participants, PSA values were abstracted from primary care records that were linked to genetic and phenotypic data. Field code mappings used to identify PSA values included any serum PSA measure except for free PSA or ratio of free to total PSA (Supplementary Table 20).
The Kaiser Permanente Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort used in this analysis has been previously described in Hoffmann et al23. Briefly, prostate cancer status was ascertained from the Kaiser Permanente Northern California Cancer Registry, the Kaiser Permanente Southern California Cancer Registry, or through review of clinical electronic health records. PSA levels were abstracted from Kaiser Permanente electronic health records from 1981 through 2015.
The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial is a randomized trial that enrolled approximately 155,000 participants between November 1993 and July 2001. PLCO was designed to determine the effects of screening on cancer-related mortality and secondary endpoints in men and women aged 55 to 7474. Men randomized to the screening arm of the trial underwent annual screening with PSA for six years and digital rectal exam (DRE) for four years74. These analyses were limited to men of European, African, or East Asian ancestry with a baseline PSA measurement who were randomized to the screening arm of the trial (N= 29,524). PSA outliers were removed. Men taking finasteride at the time of PSA measurement, individuals who were outliers based on ancestry-specific principal components were excluded from analysis.
The Vanderbilt University Medical Center BioVU resource is a synthetic derivative biobank linked to deidentified electronic health records75. Analyses were based on PSA levels that were measured as part of routine clinical care. For men with multiple PSA measurements, the median PSA was used.
The Malmö Diet and Cancer Study (MDCS) is a population-based prospective cohort study that recruited men and women aged between 44 and 74 years old who were living in Malmö, Sweden between 1991 and 1996 to investigate the impact of diet on cancer risk and mortality76. These analyses included men from the MDCS who were not diagnosed with prostate cancer as of December 2014 and had available genotyping and baseline PSA measurements76.
The Prostate Cancer Prevention Trial (PCPT) was a phase III randomized, double-blind, placebo-controlled trial of finasteride for prostate cancer prevention that began in 19938. PCPT randomly assigned 18,880 men aged 55 years or older who had a normal DRE and PSA level ≤3 ng/mL to either finasteride or placebo. For subjects who had multiple pre-randomization PSA values, the earliest value was selected. Cases included all histologically confirmed prostate cancers detected during the 7-year treatment period and tumors that were detected by the end-of-study prostate biopsy. These analyses included the subset of PCPT participants that was genotyped on the Illumina Infinium Global Screening Array (GSAMD) 24v2-0 array.
The Selenium and Vitamin E Cancer Prevention Trial (SELECT) was a phase III randomized, placebo-controlled trial of selenium (200 µg/day from L-selenomethionine) and/or vitamin E (400 IU/day of all rac-α-tocopheryl acetate) supplementation for prostate cancer prevention49. Between 2001 and 2004, 34,888 eligible subjects were randomized. The minimum enrollment age was 50 years for African American men and 55 years for all other men49. Additional eligibility requirements included no prior prostate cancer diagnosis, ≤4 ng/mL of PSA in serum, and a DRE not suspicious for cancer. For subjects who had multiple pre-randomization PSA values, the earliest value was selected. These analyses included a subset of SELECT participants genotyped on the Illumina Infinium Global Screening Array (GSAMD) 24v2-0 array.
Study-Specific Quality Control and Association Analyses
Standard genotyping and quality control (QC) procedures were implemented in each participating study. Variant-level QC filters included low imputation quality (INFO<0.30), deviations from Hardy-Weinberg equilibrium in controls, and minor allele frequency (MAF)<0.005. Sample-level QC filtered based on discordant genetic and self-reported sex and low call rate and removed one sample from each pair of first-degree relatives. Detailed descriptions of the genotyping platforms, imputation methods, and QC for each study are provided in Supplementary Note 1.
Study-specific GWAS phenotypes and covariates are reported in Supplementary Table 21 and in Hoffmann et al.23 for previously published analyses in GERA. Genome-wide association analyses performed linear regression analyses of log(PSA) as the outcome, using age and genetic ancestry principal components (PCs) as the minimum set of covariates. For most studies with longitudinal data, multiple PSA measures per individual were summarized by taking the median PSA value (Supplementary Table 21). Sensitivity analyses were conducted in the UKB comparing this approach to a GWAS of individual-specific random effects derived from fitting a linear mixed model to repeated log(PSA) values with the same covariates.
Heritability of PSA Levels Attributed to Common Variants
Heritability of PSA levels was estimated using individual-level data and GWAS summary statistics. UKB subjects with available PSA and genetic data were analyzed using Linkage Disequilibrium Adjusted Kinships (LDAK) v5.129 and GCTA v1.9328, following the approach previously implemented in the GERA cohort23. Genetic relationship matrices were filtered to ensure that no pairwise relationships with kinship estimates>0.05 remained. Heritability was estimated using common (MAF≥0.01) LD-pruned (r2<0.80) variants with imputation INFO>0.80. We implemented the LDAK-Thin model using the recommended GRM settings (INFO>0.95, LD r2<0.98 within 100 kb) and the same parameters as GCTA for comparison (LD r2<0.80, INFO>0.80). For both methods, sensitivity analyses were conducted using more stringent GRM settings (kinship=0.025, genotyped variants).
Summary statistics from GWAS results based on the same set of UKB participants (26,491 subjects) and from a European ancestry GWAS meta-analysis (85,824 subjects) were analyzed using LDAK, LD score regression (LDSR)77, and an extension of LDSR using a high-definition likelihood (HDL) approach31. For LDSR we used the default panel comprised of variants available in HapMap3 with weights computed in 1000 Genomes v3 EUR subjects and in-house LD scores computed in UKB European ancestry subjects61. The baseline linkage disequilibrium (BLD)-LDAK model was fit using precomputed tagging files calculated in UKB GBR (white British) individuals for HapMap3 variants from the LDSR default panel. HDL analyses were conducted using the UKB-derived panel restricted to high-quality imputed HapMap3 variants31. All GWAS summary statistics had sufficient overlap with the reference panels, not exceeding the 1% missingness threshold for HDL and 5% missingness threshold for LDAK and LDSR.
Genome-Wide Meta-Analysis
Each ancestral population was analyzed separately, and GWAS summary statistics were combined via meta-analysis (Figure 1). We first used METAL78 to conduct a fixed-effects inverse-variance-weighted meta-analysis in each ancestral group. We then meta-analyzed the ancestry-specific results. Meta-analysis results were processed using clumping to identify independent association signals by grouping variants based on linkage disequilibrium within specific windows. Clumps were formed around index variants with the lowest genome-wide significant (P<5×10−8) meta-analysis p-value. All other variants with LD r2 >0.01 within a ± 10Mb window were considered non-independent and assigned to that lead variant. Since over 90% of the meta-analysis consisted of predominantly European ancestry subjects, clumping was performed using 1000 Genomes (1000G) EUR and UKB reference panels, which yielded concordant results. We confirmed that the LD among the resulting lead variants did not exceed r2=0.05 using a merged 1000G ALL reference panel.
We first examined heterogeneity in the multi-ancestry fixed effects meta-analysis results using Cochran’s Q statistic. To assess heterogeneity specifically due to ancestry we applied MR-MEGA32, a meta-regression approach for aggregating GWAS results across diverse populations. Summary statistics from each GWAS were meta-analyzed using MR-MEGA without combining by ancestry first. The MR-MEGA analysis was performed across four axes of genetic variation derived from pairwise allele frequency differences, based on the recommendation for separating major global ancestry groups. Index variants from the MR-MEGA analysis were selected using the same clumping parameters as described above, based on the merged 1000G ALL reference panel. For each variant, we report two heterogeneity p-values: one that is correlated with ancestry and accounted for in the meta-regression (PHet-Anc) and the residual heterogeneity that is not due to population genetic differences (PHet-Res).
Index Event Bias Analysis
Index event bias occurs when subjects are selected based on the occurrence of an event or specific criterion. This is analogous to the direct dependence of one phenotype on another, as in the commonly used example of cancer survival40. Due to unmeasured confounding, this dependence can induce correlations between previously independent risk factors among those selected39,40. Genetic effects on prostate cancer can be viewed as conditional on PSA levels, since elevated PSA typically triggers diagnostic investigation. Genetic factors resulting in higher constitutive PSA levels may also increase the likelihood of prostate cancer detection due to more frequent testing (Figure 4). This selection mechanism could bias prostate cancer GWAS associations by capturing both direct genetic effects on disease risk and selection-induced PSA signals. In the GWAS setting, methods using summary statistics have been developed to estimate and correct for this bias39,41. Although typically derived assuming a binary selection trait, these methods are still applicable to selection or adjustment based on quantitative phenotypes39,79. In this study, we conceptualized PSA variation as the selection trait and prostate cancer incidence as the outcome trait (Figure 4).
We applied the method described in Dudbridge et al.39, which tests for index event bias and estimates the corresponding correction factor (b) by regressing genetic effects on the selection trait (PSA) against their effects on the subsequent trait (prostate cancer), with inverse variance weights: w = 1/(SEPrCa)2. Summary statistics for prostate cancer were obtained from the most recent prostate cancer GWAS from the PRACTICAL consortium38. Sensitivity analyses were performed using SlopeHunter47, an extension of the Dudbridge approach that allows for direct genetic effects on the index trait and subsequent trait to be correlated. For both methods, analyses were conducted using relevant summary statistics and 127,906 variants pruned at the recommended threshold39 (LD r2<0.10 in 250 kb windows) with MAF ≥0.05 in the 1000G EUR reference panel. After merging the pruned 1000G variants with each set of summary statistics, variants with large effects, (|β|>0.20) on either log(PSA) or prostate cancer, were excluded. Raw bias estimates (braw) were adjusted for regression dilution using a modified version of the SIMEX algorithm. The resulting estimate (b) was used as a correction factor to recover unbiased genetic effects for each variant: , where βPSA is the per-allele effect on log(PSA), and βPrCa is the log(OR) for prostate cancer.
The impact of the bias correction was assessed in three ways. First, genome-wide significant prostate cancer index variants were selected from the European ancestry PRACTICAL GWAS meta-analysis (85,554 cases and 91,972 controls) using clumping (LD r2<0.01 within 10 Mb)38. We tabulated the number of variants that remained associated at P<5×10−8 after bias correction. Next, we fit genetic scores for PSA and prostate cancer in the UKB, limiting to an out-of-sample set of participants that was not included in the PSA or prostate cancer GWAS (11,568 cases and 152,884 controls). We compared the correlation between the PGS for PSA (PGSPSA), comprised of 128 lead variants, and the 269-variant prostate cancer risk score fit with original risk allele weights (PGS269) and with weights corrected for index event bias (PGS269adj). To allow adjustment for genetic ancestry PCs and genotyping array, associations between the two scores were estimated using linear regression models. Next, we examined associations for each genetic score (PGS269, PGS269adj) with prostate cancer in a subset of GERA participants who underwent a biopsy (3763 cases and 2363 controls). Since GERA controls were include in the PSA GWAS meta-analysis, AUC estimates and corresponding bootstrapped 95% confidence intervals were obtained using 10-fold cross-validation. We also examined PGS associations with Gleason score, a marker of disease aggressiveness, which was not available in the UK Biobank. Multinomial logistic regression models with Gleason score ≤6 (reference), 7, and ≥8 as the outcome were fit for each score in 4584 cases from the GERA cohort.
Validation and Clinical Application of the PSA Genetic Score
The predictive performance of PGSPSA was evaluated in two independent cancer prevention trials that were not included in the meta-analysis: PCPT and SELECT. In addition to evaluating PGSPSA directly, we examined genetically corrected PSA values calculated for individual i as follows: , where ai is a personalized adjustment factor derived from PGSPSA23,24. Since genetic effects were estimated for log(PSA), ai for correcting PSA in ng/mL was derived as:
, where
is estimated directly in controls without prostate cancer or obtained from an external control population23,24. We see that ai > 1 when an individual has a higher multiplicative increase in PSA than the sample average due to their genetic profile, resulting in a lower genetically adjusted PSA compared to the observed value
.
We evaluated the potential utility of PGSPSA in two clinical contexts. First, we quantified the impact of using on biopsy referrals by examining reclassification at age-specific PSA thresholds used in the Kaiser Permanente health system. Analyses were conducted in GERA participants with information on biopsy date and outcome, comprised of 3763 prostate cancer cases not included in the PSA GWAS and 2363 controls that were part of the PSA GWAS. In order to use the same normalization factor for both cases and controls while mitigating bias due to control overlap with the PSA discovery GWAS, a( for GERA subjects was calculated by substituting
from out-of-sample UK Biobank controls (n=152,884). Upward classification occurred when
, where ref was the biopsy referral threshold. Downward classification was defined as:
. The net reclassification index (NRI) was used to summarize clinical utility: NRI = P(up|case) − P(down|case) + P(down|control) − P(up|control).
Finally, we evaluated the performance of risk prediction models for prostate cancer overall and aggressive prostate cancer in PCPT and SELECT. Since both studies were excluded from the PSA GWAS meta-analysis, ai and for subjects in PCPT and SELECT were calculated using
observed in each respective dataset. Aggressive prostate cancer was defined as Gleason score ≥7, PSA ≥ 10 ng/mL, T3-T4 stage, and/or distant or nodal metastases. We compared AUC estimates for logistic regression models using the following predictors, alone and in combination: baseline PSA, genetically adjusted baseline PSA (PSAG), PGSPSA, prostate cancer risk score with original weights (PGS269)38 and weights corrected for index event bias (PGS269adj).
DATA AVAILABILITY
The research was conducted with approved access to UK Biobank data under application number 14105 (PI: Witte). UK Biobank data are publicly available by request from https://www.ukbiobank.ac.uk. To maintain individuals’ privacy, data on the GERA cohort are available by application to the Kaiser Permanente Research Bank (researchbank.kaiserpermanente.org).
Informed consent was obtained from all study participants. UK Biobank received ethics approval from the Research Ethics Committee (REC reference: 11/NW/0382) in accordance with the UK Biobank Ethics and Governance Framework. Approval for other studies contributing data to the Precision PSA consortium was obtained from each of the participating institutional research ethics review boards.
FUNDING ACKNOWLEDGEMENTS
The Precision PSA study is supported by funding from the National Institutes of Health (NIH) National Cancer Institute (NCI) under award number R01CA241410 (PI: JSW). Additionally, LK is supported by funding from National Cancer Institute (K99CA246076) and REG is supported by a Young Investigator Award from the Prostate Cancer Foundation. This work was supported by research grants from the NIH National Institute of General Medical Sciences (NIGMS) under award number R01GM130791 (PI: JDM). HL is supported in part by NIH/NCI by a Cancer Center Support Grant to Memorial Sloan Kettering Cancer Center [P30 CA008748], prostate cancer SPORE grant [P50-CA92629], Swedish Cancer Society (Cancerfonden 20 1354 PjF), and General Hospital in Malmö Foundation for Combating Cancer. This work was supported in part through the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. Research reported in this paper was supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD026880 and NIH/NCI funding (R01CA175491, R01CA244948; PI: RJK).
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
DISCLOSURES
JSW is a non-employee, cofounder of Avail Bio. HL is named on a patent for intact PSA assays and a patent for a statistical method to detect prostate cancer that is licensed to and commercialized by OPKO Health. HL receives royalties from sales of the test and has stock in OPKO Health.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.
- 23.↵
- 24.↵
- 25.
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.
- 65.
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵