Abstract
Background Genetic scores may provide an objective measure of a man’s risk of dying from prostate cancer and thus inform screening decisions, especially in men of African ancestry, who have a higher average risk of prostate cancer death but are often treated as a homogeneous group.
Objective Determine whether a polygenic hazard score based on 290 genetic variants (PHS290) is associated with risk of fatal or metastatic prostate cancer in a diverse population.
Design, Setting, and Participants Retrospective analysis of Million Veteran Program (MVP), a national, population-based cohort study of United States military veterans conducted 2011-2021.
Exposure(s) Genotype data to calculate the genetic score, PHS290. Family history of prostate cancer and ancestry group (European, African, Hispanic, or Asian) were also studied.
Outcome Measurements and Statistical Analysis Primary outcome: age at death from prostate cancer. Key secondary outcome: age at diagnosis of prostate cancer metastases. Statistical analysis: Cox proportional hazards.
Results 582,515 MVP participants. Median age at last follow-up: 69 years. PHS290 was associated with fatal prostate cancer in the full cohort and for each ancestry group (p<10-16). Comparing men in the highest 20% of PHS290 to those in the lowest 20%, the hazard ratio for death from prostate cancer was 4.41 [95% CI: 3.9-5.02]. Corresponding hazard ratios for European, African, Hispanic, and Asian subsets were 4.26 [3.66-4.9], 2.4 [1.77-3.23], 4.72 [2.68-8.87], and 10.46 [2.01-101.0]. When accounting for family history and ancestry group, PHS290 remained a strong independent predictor of fatal prostate cancer. PHS290 was also associated with metastasis.
Conclusions PHS290 stratified US veterans of diverse ancestry for lifetime risk of metastatic or fatal prostate cancer. Predicting genetic risk of lethal prostate cancer with PHS290 might inform individualized decisions about prostate cancer screening.
Patient Summary In a large, diverse population (including over 100,000 African Americans), we evaluated whether a genetic score (PHS290) was associated with a man’s risk of fatal or metastatic prostate cancer. We conclude that men with a high PHS290 genetic score, independent of ancestry or family history, are more likely to die of prostate cancer. This information might be useful to guide which men might benefit from screening for prostate cancer.
Introduction
Prostate cancer is the most diagnosed and second deadliest cancer in men1. Despite the enormous mortality from this disease, early detection of prostate cancer remains controversial. Screening all men via prostate-specific antigen (PSA) testing, regardless of underlying risk, has been shown to reduce prostate cancer deaths by 27% but also results in frequent overdiagnosis of indolent prostate cancer that may never have become symptomatic2–4. These overdiagnoses often lead to unnecessary treatment, with attendant side effects and societal costs. A better strategy is to target PSA screening to those men at higher risk of developing metastatic or fatal prostate cancer.
As one of the most heritable cancers5, genetic risk stratification is a promising approach for identifying individuals at higher risk of developing metastatic or fatal prostate cancer1,3,6.Measures of genetic risk have proven highly effective for predicting lifetime risk of being diagnosed with prostate cancer, outperforming family history or other clinical risk factors7–10. Rather than only predicting lifetime risk, however, an ideal genetic test would focus on clinically significant prostate cancer and estimate age-specific risk. Prostate cancer is highly age dependent, with very low incidence before 50 years of age and increasing exponentially as men get older11,12. Absolute incidence of aggressive prostate cancer also increases with age11,12. Meanwhile, some men with high genetic risk develop aggressive prostate cancer at a younger age and are at particular risk of dying from this disease. Age-specific genetic risk could inform individualized decisions about PSA testing, in the context of a given man’s overall health and competing causes of mortality.
A major limitation of early studies of polygenic risk was an exclusive focus on men of European ancestry13,14. Such systematic bias may exacerbate existing health disparities in prostate cancer incidence and health outcomes15,16. This is particularly worrisome for men of African ancestry, who have a higher overall incidence of metastatic and fatal prostate cancer than men of European or Asian ancestry17,18.
Our group has developed a novel risk prediction tool called a polygenic hazard score (PHS) that identifies men who are likely to develop clinically significant prostate cancers at younger ages. This score, which can be calculated from a single saliva sample at any point in a man’s life, was strongly associated with age at diagnosis of clinically significant prostate cancer in large datasets10,19,20. The score also improved the accuracy of conventional screening with PSA7,10,12. We subsequently expanded the model to optimize performance in men of all ancestries, particularly men with African ancestry19–21. Here, we seek to validate the ability of the PHS to identify men at risk of metastatic or fatal prostate cancer within the Million Veteran Program (MVP), one of the largest and most racially and ethnically diverse populations studied to date22.
Methods
Participants
We retrospectively obtained data from the MVP, composed of individuals between ages 19 to over 100 years who were recruited from 63 Veterans Affairs Medical Centers across the United States (US). Recruitment for the MVP started in 2011, and all veterans were eligible for participation. Consent to participate and permission to re-contact was provided after counseling by research staff and mailing of informational materials. Study participation included consenting to access the participant’s electronic health records for research purposes. The MVP received ethical and study protocol approval from the VA Central Institutional Review Board in accordance with the principles outlined in the Declaration of Helsinki.
Only men were included in this prostate cancer study, comprising 582,515 individuals of European (73.3%), African (17.2%), Hispanic (8.2%), and Asian ancestry (1.3%) (Table 1). There were no inclusion or exclusion criteria for age. Median age at last follow-up was 69 years (interquartile range 59-74 years). Men not meeting the endpoint for each analysis were censored at age at last follow-up. Clinical information used for analyses was retrieved as described below in the Clinical Data Extraction section.
Genotype Data
All study participants provided blood samples for DNA extraction and genotyping. Researchers are provided data that is de-identified except for dates. Blood samples were collected by phlebotomists and banked at the VA Central Biorepository in Boston, MA, where DNA was extracted and shipped to two external centers for genotyping. DNA extracted from buffy coat was genotyped using a custom Affymetrix Axiom biobank array.
The MVP 1.0 genotyping array contains a total of 723,305 single nucleotide polymorphisms (SNPs), enriched for low frequency variants in African and Hispanic populations, and variants associated with diseases common to the VA population22.
Harmonized Ancestry and Race/Ethnicity (HARE)
The MVP has previously developed an approach called Harmonized Ancestry and Race/Ethnicity (HARE) to identify ancestry groups using a machine learning algorithm23. HARE utilizes a support vector machine to estimate probabilities of an individual belonging to one of four ancestry groups using both self-identified race/ethnicity and genetic ancestry23. Information on race and ethnicity was obtained based on self-report through centralized VA data collection methods using standardized survey forms or using information from the VA Corporate Data Warehouse or Observational Medical Outcomes Partnership data. All but 9,989 (1.52%) MVP participants were assigned to a HARE ancestry group. The support vector machine was trained with the top 30 principal components of population stratification analysis and self-identified ancestry. Regularization constant C and inverse variance of kernel were optimized through 2-dimensional grid searching and 5-fold cross-fold validation. Individuals were categorized as predominantly European, Hispanic, African, or Asian based on output probabilities.
Clinical Data Extraction
Each participant’s electronic health record is integrated into the MVP biorepository. These records include International Classification of Diseases (ICD) diagnosis codes (ICD-9-CM and ICD-10-CM), procedure codes (ICD, Current Procedural Terminology, and Healthcare Common Procedure Coding (HCPCS)), laboratory values, medications, and clinical notes documenting VA care (inpatient and outpatient) and non-VA care paid for by the VA.
Prostate cancer diagnosis, age at diagnosis, and date of last follow-up were retrieved from the VA Corporate Data Warehouse based on ICD codes and VA Central Cancer Registry data. Age at diagnosis of metastasis was determined via a validated natural language processing tool and a search of individual participant’s medical records in the Veterans Affairs system, as described previously24. This tool was developed using data from over 1 million VA patients with prostate cancer; compared to manual chart review, the natural language processing tool had 92% sensitivity and 98% specificity for diagnosis of metastatic prostate cancer. Cause and date of death was collected from National Death Index. Participants with ICD10 code “C61” as underlying cause of death were considered to have died from prostate cancer. Age of death was determined from difference between year of death and year of birth.
Polygenic Hazard Score (PHS290)
The most recent version of the PHS, called PHS290, was calculated as the vector product of participants’ genotype dosage (Xi) for 290 SNPs and the corresponding parameter estimates (βi) from Cox proportional hazards regression:
The development of this score has been described elsewhere21. Briefly, 299 previously identified SNPs associated with prostate cancer risk (in single-ancestry or all-ancestry analyses) were simultaneously evaluated using a machine-learning least absolute shrinkage and selection operator (LASSO) approach to generate an optimal combined model for association with age at prostate cancer diagnosis.
We calculated PHS290 for each MVP participant. Distributions were visualized using histograms for each ancestry group. Differences in mean PHS290 between ancestry groups were assessed via ANOVA. In all statistical analyses, significance was set at a two-tailed alpha of 0.01. As in prior studies, p-values less than 10-16 were truncated at this value, as comparison of miniscule values is not likely to be meaningful7,10,12,20.
Cox Proportional Hazards Analysis
We evaluated association of PHS290 with two important clinical endpoints extracted from clinical data: age at death from prostate cancer (i.e., lifetime prostate-cancer-specific mortality) and age at diagnosis of metastases from prostate cancer (i.e., lifetime distant-metastasis-free survival). Secondarily, we also tested for association with age at diagnosis of any prostate cancer. To visualize the association in the full dataset, we generated cause-specific cumulative incidence curves for each endpoint and each of several PHS290 risk groups. Cox proportional hazards models were used to assess these associations in the full dataset and in each ancestral group (European, African, Hispanic, Asian). Individuals not meeting the endpoint of interest were censored at age at last follow-up. Sample-weight corrections were applied to all Cox models to correct for potential bias due to a higher number of prostate cancer cases compared to a general population and permit direct comparisons to other studies7,9,19,25. Although underlying population incidence may vary (e.g., across ancestry groups), a constant correction factor was used for all analyses (in this case, based on previously reported population data from Sweden, for consistency with prior work), as effect size estimates have been shown to be robust to wide variation in population incidence19.
Effect sizes were estimated using hazard ratios (HRs) between risk strata, as described previously7,9,10,12,19,20,26. Percentiles of genetic risk were calculated using percentile thresholds defined in a prior study of men of European ancestry less than 70 years old and with no diagnosis of prostate cancer21. HRs for each ancestry group were calculated to make the following comparisons: HR80/20, men in the highest 20% vs. lowest 20%; HR95/50, men in the highest 5% of genetic risk vs. those with average risk (30–70th percentile); and HR20/50, men in the lowest 20% vs. those with average risk. These risk groups were chosen to mirror prior work and permit direct comparison of HRs7,19,20; the same risk groups were chosen as the strata for incidence curves for each endpoint. Reference thresholds for hazards ratio estimation were: 9.004659 (20th quantile), 9.123500 (30th quantile), 9.519703 (70th quantile), 9.639068 (80th quantile), 9.946332 (95th quantile)21.
Ancestry, Family History, and PHS290
To assess the added value of PHS290 beyond commonly used clinical risk factors, we tested a multivariable Cox proportional hazards model with ancestry group, family history, and PHS2907,9,19. This combined model was limited to the 374,455 participants who provided family history information in baseline survey data. Family history was recorded as either the presence or absence of (one or more) first-degree relatives with prostate cancer. Cox proportional hazards models tested associations with fatal, metastatic, or any prostate cancer. For PHS290, the effect size was illustrated via the hazard ratio for the highest 20% vs. lowest 20% of genetic risk. Hazard ratios for ancestry groups were estimated using European as the reference.
A univariable Cox proportional hazards model was applied to test for association of ancestry group with prostate cancer risk. Similarly, a univariable model tested for association of family history alone. The @anova function from the R ‘survival’ package (version 3.2-13; Therneau 2021) was used to compare the nested Cox models (multivariable vs. univariable), based on the log partial likelihood of the model fits. Significance was set at a two-tailed alpha of 0.01 for the test of whether the multivariable model performed better than either univariable model alone.
Results
PHS290 Score
The distribution of PHS290 in the European ancestry group was similar to that reported previously for men of European ancestry (mean=9.37, SD=0.37)21. Mean PHS290 did vary by ancestry group with statistically significant differences between all groups (ANOVA p < 10-16; all pair-wise t-tests also p < 10-16). The distribution for the Hispanic ancestry group overlapped closely with that of the European group (mean=9.35, SD=0.37), while PHS290 tended to be lower among men of Asian ancestry (mean=9.17, SD=0.36) and higher among men of African ancestry (mean=9.56, SD=0.34, t-test) (Figure 1, Supplementary Figure 1).
Association of PHS290 with Fatal Prostate Cancer
PHS290 was associated with age at death from prostate cancer in the full dataset and in each of the four ancestry groups (Table 2). Comparing 80th and 20th percentiles of genetic risk in the full dataset, men with higher PHS290 had an HR80/20 of 4.41 [95% CI: 3.9-5.02]. Cause-specific cumulative incidence curves for various PHS290 percentile groups demonstrated risk stratification (Figure 2). Hazard ratios quantified significant risk stratification using PHS290 in the full MVP dataset and in each ancestry group (though confidence intervals were large in the Asian ancestry group). For European, African, Hispanic, and Asian men, HRs80/20 were 4.26 [95% CI: 3.66-4.9], 2.4 [1.77-3.23], 4.72 [2.68-8.87] and 10.46 [2.01-101.0], respectively.
Association of PHS290 with Metastatic Prostate Cancer
PHS290 was associated with age at diagnosis of metastases from prostate cancer in the full dataset and in each of the four ancestry groups (Supplementary Table 1). Comparing 80th and 20th percentiles of genetic risk, men with higher PHS290 had an HR80/20 of 4.94 [95% CI: 4.58-5.28]. For European, African, Hispanic, and Asian men, HRs80/20 were 4.64 [4.28-5.03], 3.02 [2.6-3.57], 3.95 [2.87-5.37] and 6.85 [2.84-17.12], respectively.
Association of PHS290 with Prostate Cancer
PHS290 was associated with age at prostate cancer diagnosis in all 4 ancestry groups (Supplementary Table 2). Comparing 80th and 20th percentiles of genetic risk, men with higher PHS290 had an HR80/20 of 6.29 [95% CI: 6.14-6.46]. For European, African, Hispanic, and Asian men, HRs80/20 were 6.19 [6.01-6.38], 3.83 [3.61-4.08], 4.75 [4.22-5.32] and 5.52 [3.98-7.56], respectively.
Ancestry, Family History, and PHS290
Ancestry group, alone, was associated with differential risk of fatal prostate cancer (Supplementary Table 3). These associations were largely driven by an increased risk from African ancestry. Similar patterns were seen for age at diagnosis of prostate cancer and for age at diagnosis of prostate cancer metastasis (Supplementary Table
3). Compared to the European group, men in the African ancestry group had a HR of 2.65 [2.37-2.96] for dying of prostate cancer.
Family history, alone, was also associated with fatal prostate cancer, as well as with diagnosis of prostate cancer and with age at diagnosis of prostate cancer metastasis (Supplementary Table 4). Compared to men with no family history of prostate cancer, men with one or more first-degree relatives who had prostate cancer had a HR of 1.84 [1.54-2.17] for dying of prostate cancer.
PHS290 added significant value beyond ancestry group or family history in a multivariable model that included all three variables and tested for association with age at prostate cancer death (Table 3). The multivariable model improved performance over the common risk factors alone (ANOVA p < 10-16). Similarly, the combination proved optimal when evaluating age at diagnosis of prostate cancer diagnosis or age at diagnosis of prostate cancer metastasis (ANOVA p<10-16). Independent of ancestry and family history, a high PHS290 (top 20%) approximately quadrupled a man’s risk of death from prostate cancer, compared to a low PHS290 (bottom 20%) (Table 3).
To ensure that genetic ancestry was accounted for, we repeated the multivariable analysis with the top 10 principal components from population stratification analysis (Supplementary Table 5). PHS290 was still significantly associated with all three prostate cancer clinical endpoints.
Discussion
PHS290 was associated with lifetime prostate-cancer-specific mortality in this large and diverse dataset. Even when accounting for family history and ancestry group, PHS290 remained a strong independent predictor of dying from prostate cancer. The genomic score was also associated with age at diagnosis of metastasis from prostate cancer and at age at diagnosis of any prostate cancer, consistent with previous reports that common genetic markers for overall prostate cancer risk overlap with those for aggressive prostate cancer risk27,28. Metastatic prostate cancer has poor prognostic outcomes and is major driver of pain, disability and aggressive medical therapy29. To our knowledge, this study is the first to show the association of a genomic score with lifetime risk of metastatic prostate cancer. This study also represents the largest and most ancestry-diverse independent validation of polygenic association with lifetime risk of fatal prostate cancer.
Men of African ancestry in the US are substantially more likely to develop metastatic disease and to die from prostate cancer18. The causes of this disparity are likely a combination of genetic, environmental, and social factors, including systemic racism30–34. National guidelines recommend consideration of prostate screening in men of African ancestry at a younger age and that screening occur at more frequent intervals35. The results of the present study demonstrate that men of African ancestry have highly variable levels of lifetime risk and should not be treated as a homogeneous group. This finding is consistent with prior results and with the known admixture and genetic diversity of the African American population9,26. Moreover, PHS290 can identify those more likely to develop lethal prostate cancer and may facilitate personalized screening recommendations.
Intriguingly, typical PHS290 scores differed between ancestry groups, with the mean PHS290 slightly higher in the African ancestry group and slightly lower in Asian ancestry group than in the European or Hispanic groups. These shifts in PHS290 distribution are consistent with reported differences in prostate cancer incidence across racial groups36–40. Higher overall PHS290 scores in African ancestry group may point to true differences in prostate cancer risk but could also be inflated by minor allele frequency (MAF) differences between ancestry groups. Incorporating approaches for local ancestry and admixture can also boost genetic model performance and should be explored further to improve the predictive accuracy of polygenic scores41.
Along with race/ancestry, family history is another important clinical consideration in prostate cancer screening decisions35,42–45. Prior studies have found polygenic scores to be the most important risk factor for prostate cancer, with family history sometimes offering modest improvement in multivariable models, possibly by capturing yet unknown genetic factors and/or shared familial environmental factors7,9,19,46. Among the subset of MVP participants who provided family history information, family history of prostate cancer was independently associated with prostate cancer risk in a multivariable model that included PHS290 and ancestry group. The relationship of environmental exposures, family history, and prostate cancer risk are worth further investigation9, particularly in groups like veterans who may have been exposed to rare carcinogens47.
The present study builds on prior work that reported the performance of polygenic scores in non-Europeans19,21,26,27,48 and is consistent with those prior studies in showing a strong association of polygenic scores with prostate cancer risk, including death from prostate cancer9,19,46. Polygenic hazard scores designed to incorporate the strong age-dependence of prostate cancer have also been shown to increase the accuracy of conventional prostate cancer screening7,10,12. Population-level analyses of benefit, harm, and cost-effectiveness support incorporation of genomic risk into screening3,6. The present study adds to the literature an independent validation in a dataset of over half a million men with diverse ancestry. Current clinical guidelines try to achieve targeted, or risk-stratified, screening by recommending each man discuss his individual risk factors, emphasizing racial/ethnic background35,43–45. It is particularly important, therefore, that this study was able to combine ancestry and genetic risk to estimate the relative impact of each and to demonstrate that a polygenic score adds considerable information beyond ancestry alone, for a man’s individual risk of metastasis or death from prostate cancer.
While PHS290 performed well in the present study to stratify men by genetic prostate cancer risk, the effect sizes estimated here are lower than those reported in previous studies. For example, HR80/20 for fatal prostate cancer using PHS290 was 7.73 [95% CI 6.45, 9.27] in a population-based Swedish cohort21, compared to 4.41 [3.9-5.02] in all participants and 4.26 [3.66-4.9] in the European ancestry group in the present study. A similar pattern of smaller effect size was seen when comparing the strength of association with age at diagnosis of prostate cancer within MVP ancestry groups to the hazard ratios previously reported for European, Asian, and African genetic ancestry groups21. Most likely, the discrepancy arises in differences in the populations studied; for example, the MVP dataset comes exclusively from a population of US veterans, with many receiving care in a single US-based healthcare system, whereas the prior study used data from multiple countries and widely varying recruitment strategies. Patterns of screening, detection, and treatment of prostate cancer in the present dataset could be different from clinical trial and case-control datasets used in previous work. Some of the difference in performance could also be explained by the fact that the testing datasets in the prior report for PHS290 had been included in the discovery of a majority of the 290 SNPs in the model; on the other hand, the testing datasets represented a very small proportion of the discovery datasets, and the model weights were estimated in an independent training dataset.
The present work shows that adding PHS290 to guideline-recommended risk factors improves risk stratification for meaningful clinical endpoints of death or metastases from prostate cancer. Men at highest risk of metastatic or fatal prostate cancer are potentially those most likely to benefit from screening. Prior studies have further suggested genetic scores could also add value even after results from screening or diagnostic tests are already available, but this needs further investigation49,50. For example, one early detection strategy with strong evidence is early baseline PSA (e.g., at age 45-49)51,52. This strategy has not yet been widely adopted in the U.S.53, but future studies should evaluate whether PHS290 adds value in men where early baseline PSA is known and whether PHS290, if known prior to PSA testing, should inform the decision of whether to obtain an early baseline PSA test. Another compelling avenue for future studies is whether high genetic risk can be mitigated by lifestyle or other preventive intervention54,55.
Limitations of this study include heterogeneity of phenotyping and smaller sample sizes for Asian ancestry group. Heterogeneity of prostate cancer screening and diagnostic pathways by clinicians across VA and other hospitals in the US could potentially introduce noise, although this heterogeneity likely leads to underestimation of associations with prostate cancer. Large confidence intervals in the Asian ancestry group may be due to relatively smaller sample sizes, but they also suggest increased heterogeneity compared to prior datasets (including Asian ancestry) of similar sample size19,21,56. Finally, we acknowledge that while HARE ancestry groups may be a reasonable attempt to harmonize genetic ancestry, race, and ethnicity, these groups cannot account for—much less, disentangle—the complex web of biological and social factors associated with these categories. Further work will attempt to incorporate agnostic genetic ancestry groups and address impacts of admixture and local/regional genetic ancestry on risk stratification with PHS19.
We show that PHS290 stratified US men for lifetime risk of metastatic or fatal prostate cancer. Critically, this genetic risk stratification was successful within each of four ancestry groups in this diverse dataset. PHS290 was higher, on average, among men with African ancestry, who were also at higher risk from prostate cancer.
The combination of ancestry, family history, and PHS290 performed better than any variable, alone, in identifying men at highest risk of prostate cancer metastasis and death. Predicting genetic risk of lethal prostate cancer with PHS290 might inform individualized decisions about screening and early cancer detection.
Data Availability
Requests regarding data access may be directed to MVPLOI{at}va.gov
Competing Interests
TMS reports honoraria from Varian Medical Systems and WebMD; he has an equity interest in CorTechs Labs, Inc. and serves on its Scientific Advisory Board; he has received in-kind research support from GE Healthcare via a research agreement with the University of California San Diego. These companies might potentially benefit from the research results. The terms of this arrangement have been reviewed and approved by the University of California San Diego in accordance with its conflict-of-interest policies.
Data Availability Statement
Requests regarding data access may be directed to MVPLOI{at}va.gov.
Take-Home Message
PHS290 stratified US veterans of diverse ancestry for lifetime risk of fatal or metastatic prostate cancer, even when accounting for family history and ancestry. Predicting genetic risk of lethal prostate cancer might inform individualized decisions about prostate cancer screening.
Acknowledgements
This research used data from the Million Veteran Program, Office of Research and Development, Veterans Health Administration. This research was supported by the Million Veteran Program MVP022 award # I01 CX001727 (PI: Richard L. Hauger MD). This publication does not represent the views of the Department of Veterans Affairs or the United States Government. Dr. Hauger was additionally funded by the VISN-22 VA Center of Excellence for Stress and Mental Health (CESAMH) and National Institute of Aging RO1 grant AG050595 (The VETSA Longitudinal Twin Study of Cognition and Aging VETSA 4). This research was supported by VA MVP022. Meghana S. Pagadala was supported by the National Institutes of Health (#1F30CA247168, #T32CA067754). Tyler Seibert and Roshan Karunamuni were supported by the National Institutes of Health (NIH/NIBIB #K08EB026503), the Prostate Cancer Foundation, and the University of California (#C21CR2060).
Footnotes
Correction of a typo in number of participants. Addition of supplementary analysis re: top 10 principal components. Expansion of Discussion section.