Abstract
Background Prostate cancer causes substantial morbidity and mortality worldwide. Recently, a polygenic hazard score (PHS1)—the weighted sum of 54 single-nucleotide polymorphism (SNP) genotypes—was developed and validated to predict age of aggressive prostate cancer onset in Caucasians. We evaluated the performance of the PHS for prediction of aggressive prostate cancer and of death from prostate cancer across diverse ethnic populations.
Methods 80,491 men of various self-reported race/ethnicities were included (30,575 controls, 49,916 cases. Previously determined genetic ancestries for these men were: 71,856 European, 6,253 African, and 2,382 Asian. Where applicable, age of prostate cancer diagnosis, age at last follow-up, TNM stage, PSA, Gleason score, and cause of death were also determined. Median age at last follow-up was 70 years (IQR 63-76). 3,983 men died from prostate cancer, 5,806 died from non-prostate cancer causes, and 70,702 were still alive at the end of follow-up. Patient samples were previously genotyped on a cancer-specific array; PHS1 was adapted for compatibility with this array (PHS2) and tested in the multi-ethnic dataset via Cox proportional hazards models for age of aggressive prostate cancer onset and for age at prostate-cancer-specific death.
Results PHS2 had 46 SNPs: 24 directly genotyped and 22 acceptable proxies (r2≥0.94). PHS2 was predictive of age of aggressive prostate cancer onset in the independent, multi-ethnic dataset (z=48, p<10−16) and in each genetic ancestry: European (z=46, p<10−16), Asian (z=44, p<10−16), and African (z=24, p<10−16). PHS2 was also predictive of age of prostate cancer death in the multi-ethnic dataset (z=16, p<10−16). Comparing the 80th and 20th percentiles of genetic risk, men with high PHS had a hazard ratio of 5.9 for aggressive prostate cancer and 5.7 for prostate cancer death. Within each genetic ancestry group, analogous hazard ratios for aggressive prostate cancer were 5.6, 5.2, and 2.4 for men of European, Asian, and African ancestry, respectively.
Conclusions PHS2 is highly predictive of both age of onset of aggressive prostate cancer and age of prostate-cancer-specific death in a multi-ethnic dataset. Genetic risk stratification via PHS could guide individualized screening and treatment strategies for fatal and potentially fatal prostate cancer for men of European, Asian, and African genetic ancestry.
Introduction
Prostate cancer is a prominent international public health challenge. It is the second most common cancer diagnosed in men worldwide, causing substantial morbidity and mortality1. Screening for prostate cancer may reduce morbidity and mortality2–5, but to avoid the harms of overdiagnosis and overtreatment of indolent disease6–9, it should be targeted to men at risk of developing aggressive or fatal forms of the disease. Predicting age of onset of prostate cancer is important for clinical decisions regarding if and when to initiate screening for an individual man10,11. Survival is another key endpoint in men diagnosed with prostate cancer and is the American Joint Committee on Cancer’s recommended endpoint for predictive models12.
Genetic risk stratification is promising for identifying patients with greater predisposition for developing cancer13–16, including aggressive prostate cancer17. While rare genetic variants predict increased cancer risk, their low prevalence limits utility in screening decisions18. Polygenic models use common variants—identified in genome-wide association studies—whose combined effects can predict overall risk of disease development18,19. Recently, a polygenic hazard score (PHS) was developed as a weighted sum of 54 single-nucleotide polymorphisms (SNPs) that predicts a man’s genetic predisposition for the development of aggressive prostate cancer13. Validation testing was done using data from the ProtecT trial2 and demonstrated the PHS as an independent predictor of age of onset of aggressive prostate cancer13. However, the development and validation datasets were limited to men of European ancestry alone. While genetic risk models may be important clinical tools to predict patient diagnoses and outcomes, using them may actually worsen health disparities20–24 because most models are constructed using Eurocentric data and may underrepresent genetic variants important in persons of non-European ancestry20–24. Indeed, this is of particular concern in prostate cancer, as race/ethnicity is an important risk factor for development of prostate cancer; diagnostic, treatment, and outcomes disparities continue to exist between men of different races/ethnicities25,26.
The aim of the present work is to evaluate the performance of the PHS in a multi-ethnic dataset— which includes individuals of European, African, and Asian genetic ancestry. A subset of this multi-ethnic dataset also includes long-term follow-up and survival information, affording an opportunity to evaluate PHS as a genetic predictor of fatal prostate cancer.
Methods
Participants
We obtained data from the OncoArray project27; all data had undergone quality control steps described previously19. This dataset includes 91,480 men with genotype and phenotype data from 64 studies (Supplemental Methods). Individuals whose data were used in the prior development or validation of the original PHS model were excluded13, leaving 80,491 in the independent dataset used here. Table 1 describes available data in detail. Individuals not meeting the endpoint for each analysis were censored at age at last follow-up.
Polygenic Hazard Score (PHS)
The original PHS (PHS1) was developed to predict age of aggressive prostate cancer onset (age of onset defined here as age at clinical onset, i.e. diagnosis). It used a continuous survival analysis of data from men of European ancestry; details were published elsewhere13. Aggressive prostate cancer was defined as intermediate-risk disease, or above. PHS is calculated as the vector product of a patient’s genotype (Xi) for n selected SNPs and the corresponding parameter estimates (βi) from a Cox proportional hazards regression:
The 54 SNPs in the model were selected using data from the PRACTICAL consortium and genotyped with a custom array (iCOGS, Illumina, San Diego, CA)13.
Genetic Ancestry Determination
Self-reported race/ethnicities in this dataset included European, East Asian, African American, Hawaiian, Hispanic American, South Asian, Black African, Black Caribbean, and Other (Supplemental Methods). Genetic ancestry (European, African, or Asian) for all individuals in the dataset was determined previously27,28 and used for the present analyses because they are objective and may be more informative than self-reported race/ethnicities29. Briefly, genetic data from 2,318 ancestry informative markers were mapped into a two-dimensional space representing the first two principal components. The distance from the individual’s mapping to the three reference clusters, consisting of Europeans, Africans, and Asians, was then used to estimate the genetic ancestry of the individual (Table 1).
Adapting the PHS to OncoArray
Genotyping was performed using a commercially-available, cancer-specific array (OncoArray, Illumina, San Diego, CA)19. Twenty-four of the 54 SNPs in PHS1 were directly genotyped on OncoArray; to adapt the PHS to OncoArray, we identified proxy SNPs for those not directly genotyped and re-calculated the SNP weights in the same dataset used for the original development of PHS113 (Supplemental Methods).
The performance of this new, adapted PHS (PHS2), was compared to that of PHS1 in the ProtecT dataset originally used to validate PHS1 (n=6,411). PHS2 was calculated for all patients in the ProtecT validation set and was tested as the sole predictive variable in a Cox proportional hazards regression model (R version 3.5.1, “survival” package30) for age of aggressive prostate cancer diagnosis.
Performance was assessed by the metrics reported during the development of PHS113: z-score and hazard ratio (HR98/50) for aggressive prostate cancer between men in the highest 2% of genetic risk (≥98th percentile) vs. those with average risk (30th-70th percentile). 95% confidence intervals (CIs) for the HRs were determined by bootstrapping 1,000 random samples from the ProtecT dataset, while maintaining the same number of cases/controls.
Prediction of Aggressive Prostate Cancer
We tested the performance of PHS2 for accurate prediction of age of onset of aggressive prostate cancer in the multi-ethnic dataset. For prediction of aggressive prostate cancer, we included prostate cancer cases that had known T stage, Gleason score, and PSA at diagnosis (n=60,617 cases, Table 1).
Aggressive prostate cancer cases were those that met any of the following previously-defined criteria for aggressive disease6,13: Gleason score ≥7, PSA ≥10 ng/mL, T3-T4 stage, nodal metastases, or distant metastases. PHS2 was calculated for all patients in the multi-ethnic dataset and used as the sole predictive variable in Cox proportional hazards regressions for the endpoint of age at aggressive prostate cancer onset. Due to the potential for Cox proportional hazards results to be biased by a higher number of cases in our dataset than in the general population, sample weight corrections were applied to all Cox models using previously described methods13,31 (Supplemental Methods). Significance was set at α=0.01, and p-values reported were truncated at <10−16, if applicable13.
These Cox proportional hazards regressions (with PHS2 as the sole predictive variable and age of onset of aggressive prostate cancer as the outcome) were then repeated for subsets of data, broken down by genetic ancestry: European, Asian, and African. Percentiles of genetic risk were calculated as done previously13, using data from the 9,728 men in the original (iCOGS) development set who were less than 70 years old and did not have prostate cancer. Hazard ratios for each genetic ancestry group were calculated to make the following comparisons for men in each genetic ancestry group: HR98/50, men in the highest 2% of genetic risk vs. those with average risk (30th-70th percentile); HR80/50, men in the highest 20% vs. those with average risk, HR20/50, men in the lowest 20% vs. those with average risk; and HR80/20, men in the highest 20% vs. those in the lowest 20%. 95% confidence intervals (CIs) for the HRs were determined by bootstrapping 1,000 random samples from each genetic ancestry group, while maintaining the same number of cases/controls. These hazard ratios and CIs were calculated for age of diagnosis of aggressive prostate cancer separately for each genetic ancestry group. Given that the overall prevalence of prostate cancer in different populations is unclear, we performed a sensitivity analysis of the population case/control numbers, allowing the population prevalence to vary from 25% to 400% of that reported in Sweden (Supplemental Methods).
Prediction of Fatal Prostate Cancer
We then evaluated PHS2 as a possible predictor of age of prostate cancer death in the multi-ethnic dataset. For prediction of fatal prostate cancer, all cases were included (regardless of completeness of staging information), and the endpoint was age at death due to prostate cancer; death analysis was not stratified by genetic ancestry due to low numbers of prostate cancer deaths in the non-European datasets. Cause of death was determined by the investigators of each contributing study using cancer registries and/or medical records (Supplemental Methods). There were 3,983 men in the dataset who died from prostate cancer, 5,806 died from non-prostate cancer causes, and 70,702 were still alive at the end of follow-up. The median age at last follow-up was 70 years (IQR 63-76). As before, Cox proportional hazards models and sensitivity analysis were used to assess prediction.
Controlling for Family History
Family history was also tested as a predictor of aggressive or fatal prostate cancer. This was taken as presence or absence of a first-degree relative with a prostate cancer diagnosis. We also evaluated the number of affected first-degree relatives (0, 1, 2, 3, or ≥4) as a predictor. There were 46,135 men with data on prostate cancer family history (binary variable), and 17,619 men with data on number of first-degree relatives with disease.
As above, Cox proportional hazards models were used to assess family history or number of relatives as a predictor of each endpoint, among those in the dataset with available family history. To evaluate the relative importance of each, a combined, multivariable model using both family history and PHS was compared to prediction using family history alone, with a log-likelihood test and α=0.01.
Results
Adaption of PHS for OncoArray
Of the 30 SNPs from PHS1 not directly genotyped on OncoArray, proxy SNPs were identified for 22 (linkage disequilibrium r2≥0.94). Therefore, PHS2 included a total of 46 SNPs. Prediction of age of onset of aggressive prostate cancer with PHS2 in ProtecT was similar to that previously reported for PHS1 (z=22 for PHS1 and z=21 for PHS2, p<10−16 for each). HR98/50 was 4.7 (95% CI: 3.6-6.1) for PHS2, compared to 4.6 (95% CI: 3.5-6.0) for PHS1, demonstrating that PHS2 had similar performance for prediction of age of onset of aggressive prostate cancer.
Prediction of Aggressive Prostate Cancer
PHS2 was predictive of age of onset of aggressive prostate cancer in all three genetic ancestry groups. Table 2 shows highly significant z-scores as well as corresponding hazard ratios. Comparing the 80th and 20th percentiles of genetic risk, men with high PHS had a hazard ratio of 5.9 for aggressive prostate cancer; within each genetic ancestry group, men with high PHS had hazard ratios of 5.6, 5.2, and 2.4 for men of European, Asian, and African ancestry, respectively. Sensitivity analyses revealed that large changes in assumed population prevalence had minimal effect on the HRs (up to 1-2% change; Supplemental Results).
Hazard ratios (HRs) are shown comparing men in the highest 2% of genetic risk (≥98th percentile of PHS), highest 20% of genetic risk (≥80th percentile), average risk (30th-70th percentile), and lowest 20% of genetic risk (≤20th percentile) across genetic ancestry.
Family history was also predictive of aggressive prostate cancer in the multi-ethnic dataset (z=32, p<10−16). Among those with known family history, the combination of family history and PHS performed better than family history alone (log-likelihood p<10−16). This pattern held true when analyses were repeated on each genetic ancestry and when using the number of affected first-degree relatives. Further details of family history analysis for aggressive prostate cancer onset are reported in the Supplemental Results.
Prediction of Fatal Prostate Cancer
PHS2 was predictive of age of death due to prostate cancer for all men in the multi-ethnic dataset (z=16, p<10−16). Table 3 shows highly significant z-scores and corresponding HRs for fatal prostate cancer. Comparing the 80th and 20th percentiles of genetic risk, men with high PHS had a hazard ratio of 5.7 times for prostate cancer death. Sensitivity analyses, shown in the Supplement, again demonstrated that large changes in assumed population prevalence had minimal effect (2-3%) on the calculated HRs.
Hazard ratios (HRs) are shown comparing men in the highest 2% of genetic risk (≥98th percentile of PHS), highest 20% of genetic risk (≥80th percentile), average risk (30th-70th percentile), and lowest 20% of genetic risk (≤20th percentile).
Family history was also predictive of fatal prostate cancer in the multi-ethnic dataset (z=16, p<10−16). Among those with known family history, the combination of family history and PHS performed better than family history alone (log-likelihood p<10−16). This pattern held true when analyses were repeated on each genetic ancestry and when using the number of affected first-degree relatives. Further details of family history analysis for fatal prostate cancer are reported in the Supplemental Results.
Discussion
The results presented here confirm the utility of a PHS for prediction of age of onset of aggressive prostate cancer in Europeans and show that this prediction generalizes to a multi-ethnic dataset, including men of European, Asian, and African genetic ancestry. Moreover, we demonstrate that PHS is also highly predictive of death from prostate cancer. Comparing the highest and lowest quintiles of genetic risk, men with high PHS had a hazard ratio of 5.9 for aggressive disease and a hazard ratio of 5.7 prostate cancer death.
We found that PHS predicts aggressive prostate cancer in men of European, Asian, and African genetic ancestry (and an even wider range of self-reported race/ethnicities; Supplemental Methods). Current guidelines for prostate cancer screening suggest possible initiation at earlier ages for men of African ancestry, given higher incidence rates and worse mortality when compared to men of European ancestry26. Using the PHS to risk-stratify men may help patients and their physicians decide when to initiate prostate cancer screening: perhaps a man with African genetic ancestry in the lowest percentiles of genetic risk by PHS could safely delay or forgo screening to decrease the possible harms associated with overdetection and overtreatment9, while a man in the highest risk percentiles may consider screening at an earlier age. The same reasoning applies to men of all genetic ancestries.
In the current dataset PHS performance is better in those with European and Asian genetic ancestry than in those with African ancestry. For example, comparing the highest and lowest quintiles of genetic risk, men of European and Asian genetic ancestry with high PHS had comparable hazard ratios for aggressive prostate cancer (5.6 and 5.2 times, respectively), while the hazard ratio for men of African genetic ancestry was 2.4. This suggests the PHS, in its current form, can differentiate men of higher and lower risk in each ancestral group, but the range of predicted risk levels may be narrower in those of African ancestry. Possible reasons for relatively diminished performance include the increased genetic diversity with less linkage disequilibrium in those of African genetic ancestry32–34. Known health disparities may also play a role25, as the availability of—and timing of—PSA results may depend on healthcare access. Alarmingly, there has historically been poor representation of African populations in clinical/genomic research data 20,21. This pattern is reflected in the present study, where most of men of African genetic ancestry were missing clinical information from diagnosis used to determine disease aggressiveness. That such clinical information is less available for men of African ancestry in our dataset also leaves open the possibility of systematic differences in the diagnostic workup—and therefore age of diagnosis—in the different ancestry populations. Notwithstanding these caveats, the present PHS is already highly predictive in men of African ancestry for age of onset of aggressive prostate cancer, possibly paving the way for more personalized screening decisions for men of African descent.
The first validation study of PHS used data from ProtecT, a large trial of prostate cancer screening2,13 The screening design of that trial yielded biopsy results from both controls and cases with PSA ≥ 3 ng/mL, making it possible to demonstrate improved accuracy and efficiency of prostate cancer screening with PSA testing. Limitations of the ProtecT analysis, though, include the exclusion of advanced cancer, as well as a lack, as yet, of longer observation to reveal which cancers would be fatal2. The present study includes long-term observation, with both early and advanced disease19. This allowed for evaluation of PHS as a predictor of both aggressive (i.e., potentially fatal) and fatal prostate cancer, and we found PHS to be highly predictive of both outcomes.
Age is critical in clinical decisions of whether men should be offered prostate cancer screening35,36 and in how to treat men diagnosed with the disease35,36. Age may also inform the prognosis for men with prostate cancer36,37. Predicting age of cancer onset or cancer mortality is therefore of clinical interest. We have shown that PHS is highly predictive of age at both aggressive prostate cancer onset and prostate-cancer-specific death. Using a survival analysis, instead of a binary outcome (e.g., with logistic regression) allowed for leveraging all of the available data, with censoring for unknown possible future outcomes in men still alive at time of last observation. Furthermore, prostate cancer death is a hard endpoint with less uncertainty than disease onset (which may vary with screening practices and delayed medical attention). Therefore, this genetic hazard score may help identify men with high (or low) genetic predisposition to develop lethal prostate cancer and could assist physicians deciding when to initiate screening.
Current guidelines suggest taking into account a man’s individual cancer risk factors, as well as overall life expectancy and medical comorbidities when making the decision of whether to screen6. The most prominent among the risk factors used in clinic are family history and race/ethnicity6,38,39. The combination of PHS and family history improved prediction over either alone in the multi-ethnic dataset. This finding is consistent with the prior report that PHS adds considerable predictive power over family history, alone13. Thus, PHS may help clinicians considering screening to identify men at highest risks of developing aggressive disease. The clinical relevance of the PHS prediction in prostate cancer is similar or better than prediction tools routinely used in other fields of medicine, such as breast cancer, diabetes, and cardiovascular disease40–44.
Future development and optimization holds promise for improving upon the encouraging prediction already achieved here. The PHS here may not include all SNPs predictive of onset of aggressive prostate cancer; in fact, more SNPs associated with prostate cancer have been reported since the development of the original PHS19. Also, there have been reports of SNPs associated with prostate cancer specifically within non-European populations (including Asian, African, and Latinos)45–47. We are currently studying how incorporation of additional SNPs, including ethnicity-specific SNPs, might improve predictive performance—overall, or in men of specific genetic ancestry, particularly African.
Limitations to our work, beyond those discussed above, include that our dataset is heterogenous and comes from multiple, diverse studies with different study designs. This allowed for a large, multi-ethnic dataset that includes clinical and survival data, but comes with uncertainties that were not an issue in the ProtecT dataset used for original validation. However, the heterogeneity would likely rather reduce the performance of PHS, not systematically inflate the results. Secondly, while the genetic ancestry classifications used here may be more accurate than self-reported race/ethnicity alone29, they do not lend themselves to assessing possible nuances of admixed genetic ancestry within individuals; future development will consider local ancestry. As noted above, availability of clinical information was not uniform across contributing studies, and there was less clinical information available for men of African genetic ancestry. Finally, we do not have information on how prostate cancer was clinically managed after diagnosis for individual men in this study. Several disease-modifying treatments exist and may have influenced post-diagnosis survival to varying degrees. Despite this possible source of variability in survival among men with fatal prostate cancer, PHS was still predictive of age at death, an objective and meaningful endpoint.
Conclusion
In a multi-ethnic dataset, PHS was highly predictive both of aggressive prostate cancer onset and of death from prostate cancer. Polygenic scores might be useful to inform risk-stratified screening strategies seeking to reduce disparities in prostate cancer mortality in European, Asian and African men.
Footnotes
↵9 http://www.icr.ac.uk/our-research/research-divisions/division-of-genetics-and-epidemiology/oncogenetics/research-projects/ukgpcs/ukgpcs-collaborators
↵115 http://www.cancerresearchuk.org/about-cancer/find-a-clinical-trial/a-study-find-out-looking-gene-changes-would-be-useful-in-screening-for-prostate-cancer-profile-pilot
↵* Additional members from the Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome consortium (PRACTICAL, http://practical.icr.ac.uk/) are provided in the Supplemental Material.