Non-coding genetic variants underlying higher prostate cancer risk in men of African ancestry

Shan Li; Kaniz Fatema; Sundarraj Nidharshan; Arashdeep Singh; Padma Sheila Rajagopal; Dimple Notani; David Y. Takeda; Sridhar Hannenhalli

doi:10.1101/2024.11.14.24317278

Abstract

Incidence and severity of prostate cancer (PrCa) substantially varies across ancestries. American men of African ancestry (AA) are more likely to be diagnosed with and die from PrCa than the those of European ancestry (EA). Published polygenic risk scores for developing prostate cancer, even those based on multi-ancestry genome-wide association studies, do not address population-specific genetic mechanisms underlying PrCa risk in men of African ancestry. Specifically, the role of non-coding regulatory polymorphisms in driving inter-ancestry variation in PrCa has not been sufficiently explored. Here, by employing a sequence-based deep learning model of prostate regulatory enhancers, we identified ∼2,000 SNPs with higher alternate allele frequency in AA men that potentially affect enhancer function associated with PrCa susceptibility, as supported by our experimental validation. The identified enhancer SNPs (eSNPs) may influence PrCa development through two complementary mechanisms: 1) the alternate allele that increase enhancer activity result in immune suppression and telomere elongation, and 2) the alternate alleles that decrease enhancer activity, lead to de-differentiation and inhibition of apoptosis. Notably, the eSNPs tend to disrupt the binding of known prostate transcription factors including FOX, AR and HOX families. Lastly, the identified eSNPs can be combined into a polygenic risk score that adds value to current GWAS-based risk variants in assessing PrCa risk in independent cohorts.

Introduction

PrCa exhibits considerable heritability and notable disparities in incidence rates among different ancestral populations, with men of African ancestry displaying a far greater risk than those of European (1.75-fold) and Asian (3.18-fold) descent^1,2. Additionally, AA men experience the highest mortality rate from PrCa³. Studies have demonstrated a role for population-specific genetic factors, even in conjunction with neighborhood deprivation and other social determinants of health in PrCa incidence^4–6. Large genome-wide association studies (GWAS) have identified PrCa risk variants, which are aggregated into a polygenic risk score (PRS)^7–10. In general, GWAS-derived risk variants do not generalize well to independent discovery cohorts^11–13. Specifically, PrCa PRSs, including those based on multi-ancestry cohorts, show variable accuracies across ancestries¹⁴. More importantly, variants discovered via GWAS do not directly point to underlying mechanisms.

Recent studies have shown that a rare, AA-specific variant of the non-coding SNP rs72725854 (A>G/T) at the 8q24 locus is associated with a ∼2-fold increased risk of PrCa^15,16. Follow-up work showed that the alternate allele ‘T’ introduces a binding site for the transcription factor (TF) SPDEF, which substantially increases the activity of the enhancer harboring the SNP and, in response to androgen stimulation, increases the expression of the enhancer’s target genes PCAT1, PVT1 and c-Myc in prostate tumors; enhancer with the reference ‘A’ allele remain non-responsive to androgen¹⁷. This finding underscores the potential roles of germline non-coding regulatory variants in mediating cancer risk, motivating this study.

Here, we aimed to comprehensively assess the role of genome-wide non-coding variants associated with higher PrCa risk in AA men, specifically via affecting regulatory enhancer function in prostate tissue. We propose a method that integrates a sequence-based deep learning enhancer model with a sliding window strategy, referred to as ProSNP-DL. The deep learning model of enhancer activity has single-nucleotide resolution¹⁸, i.e., the model can predict and quantify the potential change in the enhancer activity due to a single nucleotide change in the enhancer sequence. Applying ProSNP-DL genome-wide, we identified 2,407 non-coding SNPs (termed eSNPs) with greater frequency of the alternate allele in AA individuals, that exhibited a substantial difference in their predicted enhancer activities between the two alleles. Of the 2,407 eSNPs, for 1,296 (53.8%) eSNPs, the alternate allele was predicted to have a higher activity (we refer to these as “gained” eSNPs), and for the remaining 1,111 eSNPs the alternate allele was predicted to have a lower enhancer activity (we refer to these as “lost” eSNPs). We noted complementary functions affected by the gained and the lost eSNPs: putative targets of gained eSNPs are involved in oncogenic pathways, such as suppression of the adaptive immune system and telomere elongation. In contrast, targets of lost eSNPs are involved in tumor suppression pathways including differentiation. Gained and lost eSNPs respectively establish and inhibit the binding of several families of TFs, in particular those critical in prostate development and homeostasis, as well as PrCA development, including FOX, AR, and HOX. Finally, we assessed the ability of eSNPs to quantify PrCa risk. We found that a polygenic risk score (PRS) based on eSNPs can distinguish PrCa patients from healthy controls with high accuracy, far better than controls. Importantly, eSNPs, which are biologically motivated and derived agnostic to PrCa status, add value to the previously reported gold-standard PRS learnt from a large multi-ancestry cohort of PrCa patients and healthy controls ⁸. Overall, our biologically informed approach effectively identifies regulatory germline variants linked to elevated PrCa risk, while also suggesting potential mechanisms.

Results

Overview of the study

To identify the regulatory variants associated with divergent PrCa risk in AA and EA men, we first restricted our investigation to 10,171,946 non-coding SNPs that were common (Minor Allele Frequency; MAF >= 5%) in either the EA or the AA population, and further selected the 491K SNPs having the most disparate allele frequencies between AA and EA (top 5% F_ST score). We next prioritized these candidate SNPs in silico based on their effect on prostate enhancer activity as follows. Leveraging the H3K27ac ChIP-seq data (as a proxy for active enhancers) in a prostate- derived LNCaP cell line, we trained a deep learning model that learns the sequence encoding of prostate enhancers and is capable of quantifying regulatory impact of a SNP. Here, the major and minor alleles in EA are considered the reference and the alternate alleles, respectively. Our overall approach involves a deep learning model of enhancer activity and a sliding window strategy, to account for all potential position of a SNP within an enhancer, aiming to identify SNPs potentially affecting enhancer activity (Fig. 1, Methods); we dubbed our approach as ProSNP-DL (Prostate Cancer Risk SNPs prediction using Deep Learning). Applying ProSNP-DL to the aforementioned 491K SNPs, we identified two sets of AA-dominant SNPs (defined as those with alternate allele frequency higher in AA than EA) whose alternate alleles are predicted to lead to either an enhancer gain or loss, referred to as gained (1,296) and lost (1,111) eSNPs, respectively. Fig. 1 illustrates our approach. The rationale underlying this approach is that the SNPs with large potential to affect enhancer activity (either gain or loss) are more likely to have a phenotypic impact in the corresponding cellular context. Next, we investigated the biological pathways associated with the two sets of eSNPs and identified the potential TFs whose bindings are likely disrupted by the eSNPs. We further conducted a ChIP-seq experiment targeting FOXA1, a primary cognate TF of the eSNPs, in two prostate cancer cell lines (LNCaP¹⁹, derived from a donor of European ancestry and MDA PCa 2B²⁰, derived from a donor of African ancestry) to assess the functional impact of eSNPs on TF binding disruption. Finally, we derive a polygenic risk score (PRS) from the eSNPs and assess its potential to quantify PrCa risk, relative to the PRSs based on previously published risk variants derived from GWAS.

Fig. 1. The flow chart of our pipeline.

Non-coding SNPs in dbSNP database are retained if they are common in either EA or AA cohort, and have high allele frequency difference between the two populations. Our deep learning model was then applied to the retained 491K SNPs to select the ∼2000 AA-dominant SNPs (eSNPs) at which the model predicts substantially different enhancer activities for the two alleles. The eSNPs are categorized as either gained, where the alternate allele has greater predicted enhancer activity or lost, where the alternate allele has lower predicted enhancer activity. These ∼2000 eSNPs are used to assess polygenic risk score for PrCa for an individual.

ProSNP-DL - a deep learning model of PrCa enhancers with single-nucleotide resolution

To capture the sequence encryption of the PrCa enhancers, we first trained a deep convolution model (Fig. 2A, Methods) on 67,064 LNCaP H3K27ac peaks²¹ as a proxy for prostate enhancers, and 10 fold pan-cell type DNase Hypersensitive regions not overlapping LNCaP H3K27ac peaks as the negative control to ensure the tissue-specificity of the PrCa enhancer model. The enhancer model has the same architecture as the one in our previous work¹⁸. The model is able to accurately discriminate prostate enhancers from accessible regions devoid of prostate enhancers, with the area under the receiver operating characteristic curve (auROC) of 0.91 (Fig. 2B). We have shown that an enhancer model in HepG2 cell line with the same architecture and parameters is able to predict the allele-specific effects of previously profiled raQTLs²² on enhancer activity¹⁸. In short, the enhancer model score is not only able to identify active enhancers, is sensitive to single nucleotide changes in enhancer activity.

Fig. 2. Architecture and performance of enhancer model.

A) The architecture of the deep convolutional neural network. B) ROC curve of the model in predicting PrCa enhancers. C) DL scores of the reference and alternate alleles of the previously reported PrCa risk SNP rs72725854¹⁷ across the sliding 1KB windows overlapping the SNP rs72725854. The x-axis is the relative distance of the window center to the SNP. The essential positions are highlighted by a green dot.

However, the effects of a SNP on enhancer activity depends on the position of the enhancer (Fig. 2C), as enhancers overlapping the SNP at various locations have different sequence contexts. We therefore evaluate a SNP’s influence by aggregating its effects across all conceivable positions of a hypothetical enhancer within a 1 KB window. The number of windows in which the alternate allele substantially transforms a neutral sequence into an active enhancer or vice versa, as predicted by the model, is termed the ‘essential window number’ (EWN). We utilized the EWN to quantify the effect size of the SNP on enhancer activity (Methods). Specifically, if the alternate allele consistently exhibits positive delta scores (the difference in model scores caused by the alternate allele) and converts a neutral sequence to an active enhancer in at least 5 windows (EWN ≥ 5), the SNP is classified as a gained activity eSNP (gained eSNP for simplicity), suggesting an enhancer activity gain. Conversely, if the alternate allele consistently shows negative delta scores and alters an active enhancer to a neutral sequence in at least 5 windows (EWN ≥ 5), the SNP is considered as a lost activity eSNP (lost eSNP for simplicity; Fig. 1, Methods), suggesting an enhancer activity loss. Fig. 2C illustrates this logic for gained eSNPs using the aforementioned PrCa risk SNP rs72725854 that were previously identified¹⁷. The delta scores of alternate allele ‘T’ are larger than the reference allele ‘A’ in all the windows in the (- 200bp, 200bp) region centered by the SNP, and change an inactive enhancer to an active enhancer (with False Positive Rate FPR < 0.01) in 5 windows (EWN = 5) according to our model. In summary, our approach aims to identify SNPs with a significant impact on enhancer activity by integrating a deep learning model trained on context-specific enhancer sequences with a sliding- window strategy, termed ProSNP-DL.

ProSNP-DL identifies enhancer SNPs dominant in an African ancestral population and associated with PrCa risk and patient survival

Starting with 84,802,133 genome-wide SNPs, we first selected the 10,171,946 non-coding SNPs that are common (MAF>=5%) in EA or AA, to ensure the robustness of our analysis. Further, to specifically identify SNPs that may underly the higher PrCa incidence in AA men, we focused on the SNPs with the largest allele frequency difference between AA and EA (Methods), yielding 491K SNPs (Fig. S1A). Applying ProSNP-DL on these SNPs yielded 2,069 gained enhancer activity SNPs, and 2,175 lost enhancer activity SNPs (eSNPs). Overall, these included 2,407 eSNPs with higher alternate allele frequency in AA (AA-dominant), and 1,837 eSNPs with higher alternate allele frequency in EA (EA-dominant) (Fig. S1B). The remainder of the 491K SNPs are referred to as non-enhancer SNPs (non-eSNPs) and used as negative control for several following analyses. To validate the functional impact of eSNPs on enhancer activity, we assessed allelic imbalance of H3K27ac reads at the heterozygous eSNP sites. Compared with non-eSNPs, the eSNPs exhibited a significantly greater extent of allelic imbalance of H3K27ac read coverage; as expected, the alternate alleles at gained eSNPs showed larger H3K27ac read coverage (Fig. 3A), and those at lost eSNP exhibited lower H3K27ac read coverage (Fig. 3B). This result suggests a coherent functional impact of the identified gained and lost activity eSNPs on the experimentally determined gain and loss of enhancer activity, respectively.

Fig. 3. eSNPs with AA dominant alternate alleles are associated with PrCa risk.

A,B) Fraction of heterozygous eSNPs (A: Gained, B: Lost) that exhibit allelic imbalance C) Enrichment of both AA-dominant gained and lost activity eSNPs in PrCa GWAS traits compared to non-eSNPs. D) Enrichment of both EA-dominant gained and lost activity eSNPs in PrCa GWAS traits compared to non-eSNPs. In C) and D) for various PrCa GWAS signal z-score cutoffs (x-axis), the y-axis shows the ratio of the fraction of eSNPs with PrCa GWAS signal z-score greater than the cutoff to the fraction of non-eSNPs with PrCa GWAS signal z score greater than the cutoff. E) Fraction of genes with HR > 1.5 associated with AA-dominant gained activity eSNPs, lost activity eSNPs, and all LNCaP enhancers.

Next, we directly compared the AA-dominant and EA-dominant eSNPs in terms of their association with PrCa risk, using two complementary strategies. First, we assessed the association of the eSNPs with PrCa risk, using summary statistics from a PrCa GWAS of 140,306 males¹⁰. Notably, the eSNPs with a greater alternate allele frequency in AA are strongly associated with the PrCa GWAS traits (Fig. 3C). Contrastingly, this is not the case for the eSNPs with EA- dominant alternate allele frequency (Fig. 3D, Fig. S1C). As our second strategy, for the AA- dominant gained activity eSNPs, we assessed whether their putative target genes (proximal genes taken as the proxy) are associated with worse patient survival using Cox regression, controlling for age; conversely, for the lost activity eSNPs, we assessed whether their putative target genes are associated with better survival. Consistent with the first strategy, the target genes of the gained activity AA-dominant eSNPs associated with decreased survival as shown by the high hazard ratios (HR), and those of the lost AA-dominant eSNPs with improved survival (Fig. 3E). However, these associations were not observed for the EA-dominant gained and lost activity eSNPs (Fig. S1D). Taken together, our findings are indicative of a distinct association between eSNPs and PrCa risk and survival for AA-dominant eSNPs encoding either gained or lost enhancer actvity. In the following, we focus specifically on the AA-dominant gained (1,296) and lost (1,111) eSNPs and further investigate their role in prostate cancer risk.

The gained and lost activity eSNPs may facilitate carcinogenesis via complementary mechanisms

We next functionally characterize gained and lost activity eSNPs, using the annotations enrichment tool GREAT^23,24 (Method). The analysis showed that gained activity eSNPs are primarily associated with innate and adaptive immune system, lipid metabolism, and telomerase activity regulation (Fig. 4A). This suggests that the gained activity eSNPs may facilitate carcinogenesis by modulating the immune system, substantiated by prior clinical research²⁵. To further assess this possibility, we identified the subset of gained activity eSNPs, whose alternate alleles were present in at least ten ICGC PrCa samples. We then compared the T cell dysfunction scores from TIDE (Tumor Immune Dysfunction and Exclusion)²⁶ of the genes proximal to such eSNPs with genes near other gained activity eSNPs and non-eSNPs. We observed that the genes near the frequently gained activity eSNPs exhibit higher T cell dysfunction scores (Fig. 4C, Methods), suggesting potential functional impact of the gained activity eSNPs on immune system dysfunction. In addition, the gained activity eSNPs are associated with regulation of telomerase activity. Several lines of evidence indicate that longer telomere length contributes to cancer^27,28. In agreement with these findings, a previous study has shown that relative to cells derived from Europeans, the cells derived from Africans have longer telomere length across all tissues and, even more relevant, the difference was found to be the highest in the prostate²⁹, further supporting a potential causal role of the gained eSNPs in tumor progression.

Fig. 4. Gained and lost activity eSNPs with AA-dominant alternate alleles are associated with PrCa risk through two complementary mechanisms.

A) Biological processes associated with gained activity eSNPs using the GREAT tool. B) Biological processes associated with lost eSNPs using the GREAT tool. C) Averaged T cell dysfunction score across cancer types of the genes near the gained and lost activity eSNPs, as well as non-eSNPs as control. D) Coefficients for logistic regression of gene expression change (adult - fetal) against genes associated with various sets of SNPs. * refers to coefficient significance (p-value) <= 0.05. E) Fraction of different sets of genes with DapMap effect score < -0.5. Fisher’s exact test were applied to evaluate significance.

In contrast, the lost activity eSNPs are largely associated with regulation of differentiation/development (Fig. 4B). We hypothesized that genes regulated by lost activity eSNPs are involved in maintaining differentiated states, and the loss of their enhancers may cause cells to revert to a more undifferentiated state. In other words, the target genes of lost activity eSNPs are expected to be highly expressed in adult normal prostate tissue but repressed in the developmental stage. To validate this hypothesis, we performed a regression analysis to assess the association of different classes of SNPs to the adult-to-fetal prostate differential expression. As shown in Fig. 4D, a positive coefficient for the target genes of lost activity eSNP indicates their inactivation during development, and conversely, the target genes of gained activity eSNPs exhibit higher expression in the fetal prostate compared to the adult normal prostate (GTEx), consistent with our hypothesis (Fig. 4D). Notably, when we stratify the eSNPs based on the number of patients in which the risk allele occurs, the more frequent risk alleles show a greater magnitude of coefficient (Fig. 4D), further supporting their functional impact. Furthermore, the target gene of gained activity eSNPs have higher Cancer Dependency score (DepMap, Methods) compared to those of lost activity eSNPs (Fig. 4E). Taken together, these results strongly suggest that by activating oncogenic processes and disrupting tumor suppression mechanisms, the gained and lost activity eSNPs collaboratively promote carcinogenesis.

eSNPs disrupt binding of key prostate transcription factors including the FOX and the HOX families

The identified eSNPs, by design and supported by further validations, affect enhancer activity, which is likely to be mediated, in substantive part, via disruption (gain or loss) of transcription factor binding¹⁷. We therefore sought to identify the cognate TFs whose bindings are potentially disrupted by the eSNPs. Toward this, for both gained and lost activity eSNPs, we focused on the TFs whose binding sites are enriched at the eSNPs sites compared to the non-eSNP sites. This analysis was conducted using sequences containing both the reference and alternate alleles to eliminate allelic bias (Methods). Our findings revealed that for both gained and lost activity eSNPs, TFs belonging to the FOX and HOX families were among the most enriched (Table S1-4). The eSNPs affecting these TFs not only exhibited large effects on enhancer activity (average EWN ≥ 7) but also had the highest number of potential target genes (Fig. 5AB). This finding aligns with previous research indicating that FOXA1 acts as a pioneer factor in PrCa, capable of extensively reprogramming the androgen receptor (AR) cistrome in collaboration with HOXB13³⁰; interestingly, AR is also among the TFs majorly disrupted by gained activity eSNPs (Fig. 5AB, Table S1-2). This provides further mechanistic insights into how eSNPs might contribute to PrCa risk. Although gained and lost activity eSNPs appear to disrupt the binding of the same set of TFs, we speculated that their target genes are involved in complementary pathways. Indeed, the genes proximal to gained activity eSNPs associated with FOX family TFs were enriched in the pro-tumor cancer signatures such as hypoxia, epithelial-to-mesenchymal transition (EMT), metastasis, and stemness signatures in the CancerSEA database³¹. In contrast, the genes proximal to lost eSNPs associated with FOX were enriched in the anti-tumor cancer signatures, including differentiation and apoptosis (Fig. 5C, Methods).

Fig. 5. Gained and lost activity eSNPs disrupt binding of FOX TFs and are associated with complementary mechanisms.

A,B) Effect size (y-axis) of gained (A) and lost (B) activity eSNPs associated with enriched TFs against their putative target gene counts (x-axis). Only the TFs with number of gene targets ≥ 15 and average EWN ≥ 7 are labeled in the figure. C) Fold enrichment of different cancer signatures in FOXA1-associated gained activity eSNPs compared to FOXA1-associated lost activity eSNPs. D) FOXA1 binding signal change (MDA - LNCaP) at gained and lost activity eSNP positions where MDA PCa 2B (dubbed MDA) genotype has a greater number of alternate alleles. E) Read coverage of FOXA1 at rs10095018 in LNCaP and MDA cell lines, where the alternate allele C disrupts the binding of FOXA1 according to the consensus motif of FOXA1 (see Fig. S2B for larger motif logo). The graphic of the genomic coordinates with the track of FOXA1 reads coverage was generated using UCSC genome browser³² (http://genome.ucsc.edu/; UCSC hg19 assembly). F) Gene expression log₂(TPM+1) of NDRG1, a likely target of enhancer harboring rs10095018, in two prostate cancer cell lines: MDA and LNCaP.

To experimentally validate the impact of eSNPs on FOX binding, we performed ChIP-seq for FOXA1 in two PrCa cell lines of distinct ancestral origins: LNCaP cells, derived from a lymph node metastasis of a 50-year-old Caucasian male with PrCa¹⁹, and MDA PCa 2B cells, isolated from the prostate of a 63-year-old Black male patient with adenocarcinoma²⁰ (Methods). First, consistent with our expectation, MDA PCa 2B has a greater number of alternate alleles at eSNP sites overall (Table S5). Furthermore, we observed that relative to LNCaP, FOXA1 binding intensity was greater in MDA PCa 2B at gained activity eSNPs and lower at lost activity eSNPs (Fig. 5D, Methods).

One lost activity eSNP coincided with a common SNP rs10095018 (T/C) which is spatially proximal to the gene NDRG1 in the 3D genomic space (Fig. S2A), where the alternate allele ‘C’ substantially reduces enhancer activity according to our model (EWN=10). NDRG1 is recognized for its role in suppression of PrCa; decreased membrane expression of NDRG1 has been associated with significantly poorer survival outcomes in PrCa patients, and is correlated with higher Gleason scores, which predict the aggressiveness of PrCa^33,34. At rs10095018 SNP site, LNCaP cell line possesses two reference alleles (T/T), while MDA PCa 2B cell line has heterozygous alleles (T/C). Notably, LNCaP exhibited a stronger FOXA1 binding signal compared to MDA PCa 2B (Fig. 5E). Furthermore, the MDA PCa 2B cell line exhibits allelic imbalance at this position, with the C allele harboring far fewer ChIP-seq reads of FOXA1 (Fig. S2C). These observations together suggest that the alternate allele C can indeed disrupt FOXA1 binding. More importantly, NDRG1 expression level is higher in LNCaP compared to MDA PCa 2B (Fig. 5F), suggesting that the alternate allele ‘C’ at lost eSNP rs10095018 reduces the expression of tumor suppressor NDRG1 by disrupting FOXA1 binding in MDA PCa 2B cell line, thus adversely affecting PrCa patient outcome. In summary, these eSNPs may contribute to heightened PrCa risk by disturbing the binding of key TFs.

Polygenic risk score derived from eSNPs can be helpful to differentiate between cases and controls of PrCa

While by design our eSNPs have high allele frequency differential between AA and EA ancestry, given their potential functional role in prostate cancer, here we assessed their ability to broadly capture PrCa susceptibility across various ancestral populations. We derived a Polygenic Risk Score (PRS) based on the number of risk alleles of each eSNP in an individual, trained and tested (10-fold cross-validation) in a large EA cohort in a PrCa GWAS study (PEGASUS) (4,599 cases and 2,841 controls, see Methods). We compared the cross-validation accuracy of eSNP-based PRS with those based on (i) randomly select 5000 high Fst SNPs, (ii) 50,708 chromatin Quantitative Trait Loci (cQTLs) linked to chromatin accessibility in PrCa⁷ and (iii) 545 PrCa risk variants (PGS-545) identified through a multi-ancestry genome-wide study⁸. We followed the convention⁸ to measure the accuracy as follows: we partitioned the range of PRS scores for the controls in five quantiles and assessed the enrichment of PrCa PRS scores in the top and the bottom quantiles. As shown in Fig. 6A (and Supplementary Fig S3), eSNP-based PRS substantially outperformed those based on cQTL and random high F_ST SNPs. As expected, PGS- 545 has a superior performance, as reported previously⁸. However, integrating eSNPs with PGS- 545 SNPs resulted in a slightly improved performance (Fig. 6B and S3), suggesting that eSNPs provide complementary information to PGS-545 SNPs. These results observed in the cross- validation within the EA cohort were similar to those when a model trained on the EA cohort was applied to the independent AA cohort (474 cases and 458 controls; see Methods). These results suggest that eSNPs provide valuable insights into PrCa susceptibility across various ancestral populations. Interestingly, we also found that, among the AA PrCa patients, those with higher PRS based on eSNPs and PGS-545 tend to develop PrCa at younger ages compared to those with lower PRS (Fig. 6C), supporting the role of eSNPs and PGS-545 SNPs in PrCa onset (see Methods).

Fig. 6. PrCa risk assessment.

A) For different SNP sets, the plot shows the percentage of cases in top and bottom 20 percentile PRS among controls in the test set in EA cohort; error bars show the standard deviation across the 10-fold cross validation. B) For different SNP sets, the plot shows the percentage of cases in top and bottom 20 percentile PRS among controls of AA cohort; model was trained in EA cohort; error bars show the standard deviation of the 10-fold bootstrapping (90% of the AA cohorts). C) Age distribution of the AA PrCa patients with top 20%, middle 60%, and bottom 20% percentile of eSNP+PGS-545 SNPs-based PRSs. * refers to Wilcoxon test p-value ≤ 0.05, ** refers to Wilcoxon test p-value ≤ 5e-3. D) Fraction of heterozygous SNPs that exhibit allelic imbalance. P-values are based on Fisher’s exact test. E) Enrichment of prostate-related diseases/traits for different sets of SNPs, which was quantified by observed ratio of the fraction of SNPs overlapping prostate traits to the fraction of SNPs overlapping other traits, compared to the expected ratio. Error bars show the standard deviation of the 10-fold bootstrapping (90% of the SNPs). P-values are based on Wilcoxon test.

Given that eSNPs are identified agnostic of PrCa cohorts, it is not surprising that they do not perform as well as the gold-standard GWAS-based PGS-545 SNPs in discriminating PrCa from control. However, eSNPs are more likely to be functional in terms of affecting enhancer function, and they are likely to underlie prostate-related disorders other than PrCa. First, we found that compared to PGS-545 SNPs, eSNPs indeed exhibit a greater allelic imbalance for H3K27Ac reads (Fig 6D). Looking at the GWAS catalog, while there are four additional traits besides PrCa related to the prostate, they are all indirectly related to cancer – prostate antigen levels and various adverse responses to PrCa treatment. We assessed whether eSNPs better capture these additional traits than PGS-545 which is based specifically on PrCa. We found that eSNPs indeed show a greater enrichment for these additional traits (Fig 6E). Furthermore, the enrichment of these prostate traits in eSNPs is true only for AA-dominant eSNPs; there is no overlap between EA-dominant SNPs and the QTLs for these additional traits. These results suggest that by virtue of being mechanism-based and agnostic to PrCa, eSNPs may better capture other prostate- related traits.

Overall, these results suggest that our identified eSNPs, derived in mechanism-aware but PrCa- agnostic fashion, can contribute to prioritize PrCa susceptibility across different ancestries, as well as other prostate disorders.

Discussion

In this study, we explored non-coding regulatory polymorphisms potentially underlying the high prevalence of PrCa and associated mortality in AA men. Using a sequence-based deep learning model (ProSNP-DL) that learns the sequence encodings of prostate enhancers and is capable of quantifying regulatory effects of a single nucleotide change, we identified two sets of regulatory genetic variants: one included the SNPs where the alternate allele encoded an increased enhancer activity (gained activity eSNPs) and the other included the SNPs where the alternate allele encoded a reduced enhancer activity (lost activity eSNPs). We particularly focused on the eSNPs with greater alternate allele frequency in AA (AA-dominant SNPs) because: 1) the AA- dominant eSNPs tended to have stronger signal for PrCa GWAS traits than EA-dominant eSNPs, and 2) a higher expression of the target genes of AA-dominant eSNPs were commonly associated with patient survival, which was not observed for EA-dominant eSNPs. Overall, we identify ∼2,000 gained and lost activity eSNPs with AA-dominant alternate alleles that provide a list of regulatory variants potentially associated with the excessive PrCa risk manifest in AA men.

We discovered that the gained and lost activity eSNPs potentially affect PrCa risk via two complementary mechanisms. Specifically, the genes near the gained activity eSNPs exhibit higher T cell Dysfunction score, leading to T cell exhaustion and suppression of the adaptive immune system. Moreover, the gained activity eSNPs are linked to telomere elongation, a process implicated in oncogenesis ^27,28. On the other hand, the lost activity eSNPs are associated with de- differentiation (stem-like) and evasion of cell death processes. Thus, the gained and lost activity eSNPs collectively contributing to a PrCa predisposition. The identified eSNPs likely function by disrupting the binding of key transcription factors, predominantly belonging to the FOX family^30,35. Although both the gained and the lost activity eSNPs disrupt the binding of the same set of TFs, they regulate different target genes. The target genes of gained activity eSNPs disrupting FOX binding are involved in oncogenic processes, such as EMT³⁶, stemness³⁶, hypoxia³⁷, and metastasis. Conversely, the target genes of lost activity eSNPs disrupting FOX binding are involved in tumor suppression processes, including differentiation and apoptosis (Fig. 5C). Therefore, loss of enhancers caused by lost activity eSNPs would lead to inhibition of cell differentiation³⁸ and evasion of cell death³⁹. This aligns with the enrichment of the overall gained and lost eSNPs in the two complementary mechanisms that collectively contribute to PrCa risk (Fig. 4AB).

While our computational analysis points to FOX family of TFs, given the well-established role of FOXA1 in prostate development and prostate cancer^30,35, we experimentally validated the predicted effects of the two sets of SNPs on FOXA1 binding in two PrCa cell lines: LNCaP and MDA PCa 2B cells, respectively derived from individuals of European and African ancestry. The MDA PCa 2B cell line, with a higher number of alternate alleles of eSNPs, exhibited increased binding at the gained activity eSNPs and decreased binding at the lost activity eSNPs (Fig. 5D), thereby validating our predictions. The lost activity eSNPs appear to have a larger impact on disrupting FOXA1 binding compared to the gained activity eSNPs (Fig. 5D). This observation is consistent with the fact that TF binding is more easily lost than gained with single nucleotide changes⁴⁰.

Unlike traditional GWAS studies that identify risk variants associated with complex human traits or diseases through SNP-trait associations in large cohorts, our approach does not rely on large cohorts with clinical information, in particular, PrCa status, but rather directly identifies potentially functional regulatory SNPs in prostate tissue, which may be relevant to other prostate diseases beyond cancer. Despite not relying on PrCa cohorts, the identified eSNPs add value to the gold standard PGS-545 PrCa variants in discriminating PrCa patients from healthy controls, both within EA and within AA cohort (Fig 6AB), even though our eSNPs were identified based on differential allele frequency between AA and EA individuals. However, compared to PGS-545, eSNPs are more likely to have direct functional role beyond mere association with PrCa (Fig. 6D). Interestingly, the genes near PGS-545 SNPs are also associated with developmental processes (Fig. S4), similar to those associated with the lost eSNPs, further substantiating the connection between development and cancer progression. However, the genes near eSNPs are enriched for several additional biological processes likely to be relevant to PrCa, such as telomere maintenance, interferon signaling, and EMT.

Past successes in deep learning models of enhancers^18,41,42 have motivated the current study to focus on non-coding SNPs potentially affecting regulatory enhancer activity. However in principle, such an approach can be applied to other classes of non-coding SNPs, such as those affecting splicing, as long as accurate sequence-based models capturing those functions can be developed. This provides an avenue for future extension of our study. Likewise, this study can be extended to other cancer types with substantial hereditary basis, provided availability of the active enhancer data in appropriate cell or tissue type. Another point worth noting is that our deep learning model is trained on prostate enhancers, which can affect not just malignant transformation, i.e., PrCa, but other maladies of the prostate, which should be explored in a follow up study. Much like the QTLs identified by GWAS, the eSNPs identified in our study must be rigorously tested experimentally to establish their causality. As a strength our study, we have provided initial experimental validation of our predictions, vis a vis FOXA1 binding.

Methods

Data Availability

We downloaded the LNCaP H3K27ac peaks after DHT treatment from GEO dataset GSM1249448²¹. The DHT treated DHS data in the LNCaP cell line were obtained from GSM822388⁴³. For allelic imbalance analysis, we utilized the mapped bam files for H3K27ac data of 5 EA PrCa patients from⁷. The gene expression profile of fetal prostate was obtained from GSM614545⁴⁴. The list of cQTL SNPs were obtained from the authors of⁷. The PGS SNPs were obtained from⁸. The genotypes of PrCa cases and controls with European ancestry of the GWAS study (National Cancer Institute (NCI) Prostate Cancer Genome-Wide Association Study for Uncommon Susceptibility Loci (PEGASUS)) were downloaded from dbGap (phs000882.v1.p1, https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000882.v1.p1), and the genotypes of PrCa cases and controls with African ancestry from the study (Ghana)⁹ were downloaded from dbGap (phs000838.v1.p1.c1, https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000838.v1.p1). The Hi-C data of LNCaP cell line was obtained from⁴⁵. The RNA-seq data for LNCaP and MDA PCa 2b cell lines were obtained from DapMap⁴⁶ (DepMap, Broad (2023). DepMap 23Q4 Public. Figshare+. Dataset. https://depmap.org/portal). As there is no effects/dependency DapMap score for prostate cancer, we simply took the average DapMap scores across cancer types as an estimate of the DapMap score for each gene. The SNP array- based genotype of LNCaP cell line is obtained from GSM888346, and the SNP array-based genotype of MDA PCa 2b cell line is obtained from GSM888377. The FOXA1 ChIP-seq data was generated in house and is deposited at GSE276748. The human GRCh37 (hg19) reference genome was used throughout the study. Clinical data for TCGA PrCa were downloaded from PanCanAtlas (https://gdc.cancer.gov/about-data/publications/pancanatlas). Gene expression data for TCGA PrCa patients was downloaded from the toil-hub of UCSC-Xena browser.

Data Processing

Fastq files were downloaded from the European Nucleotide Archive (https://www.ebi.ac.uk/ena/browser/home). Reads were trimmed based on the overlap with the adapter sequence, and those with an average base quality of <20 were removed using Trimmomatic⁴⁷ version 0.39. Trimmed reads were aligned to hg19.p13.plusMT.no_alt_analysis_set genome using bowtie2⁴⁸ version 2.5.1 (https://bowtie-bio.sourceforge.net/bowtie2/index.shtml) with default parameters. Duplicate reads from the alignment files were removed, and alignments with a mapping quality score of <20 were discarded using samtools⁴⁹ version 1.16.1 (http://www.htslib.org/). Peaks were called using MACS2⁵⁰ version 2.2.9.1. For H3K27Ac data, broad peaks were called. For DHS data, narrow peaks were called without building a model, shifting the reads by 150 bp towards the 5’ end and extending by 300 bp towards the 3’ end.

FOXA1 ChIP-seq experiments in two PrCa cell lines

Approximately 10 million cells were fixed using 2 mM DSG (CovaChem) for 10 minutes followed by 1% formaldehyde for 10 minutes at 37°C and quenched with 2 M glycine for 5 minutes at room temperature. Cross-linked cells were resuspended in cold lysis buffer (50 mM Tris pH 8.0, 10 mM EDTA pH 8.0, 0.5% SDS, 1X protease inhibitor cocktail) and sheared using a Bioruptor Pico (Diagenode) device. Fragmented chromatin was incubated with 3 µg of FOXA1 (Abcam, cat#ab23738) antibody overnight at 4°C. A fraction of each sample prior to addition of antibody was used as an input control. Protein A/G beads (ThermoFisher) were added and incubated for 1 hour at 4°C, washed six times with RIPA buffer (50 mM HEPES pH 7.6, 1 mM EDTA pH 8.0, 500 mM LiCl, 0.7% sodium deoxycholate, 1% NP40), and eluted in elution buffer (100 mM NaHCO3, 1% SDS). Samples, including input DNA, were treated with RNase A (ThermoFisher) for 30 minutes at 37 °C followed by proteinase K (New England Biolabs) overnight at 65 °C to reverse crosslinking. ChIP DNA was purified with Monarch Genomic DNA Purification Kit (New England Biolabs) and concentrations were quantified by Qubit fluorometer (ThermoFisher) and TapeStation (Agilent). Sequencing libraries were prepared by NEBNext UltraII DNA Library Prep Kit (New England Biolabs) per manufacturer’s instructions and sequenced on the Illumina NextSeq2000 platform at the CCR Genomics Core at the National Cancer Institute, NIH, Bethesda, MD.

A deep convolutional neural network model of prostate enhancers

We built a deep convolutional neural network to predict tissue-specific enhancer activity directly from the enhancer DNA sequence. The deep learning model has the same architecture as the one in our previous study¹⁸, comprising five convolution layers with 320, 320, 240, 240, and 480 kernels, respectively. Higher-level convolution layers receive input from larger genomic ranges and are able to represent more complex patterns than the lower layers. The convolutional layers are followed by a fully connected layer with 180 neurons, integrating the information from the 1kb sequence. In total, the DLM has 3,631,401 trainable parameters. We used the Python library Keras version 2.1.5 (https://github.com/keras-team/keras) to implement our model. The positive sets contain the 1kb sequence (and their reverse complement) of LNCaP enhancers. We used the DHS peaks overlapping H3K27ac peaks to delimit enhancer regions. DHS peaks located within the same H3K27ac peak were merged into one. The enhancers were then these DHS peaks extended to 1 kb from its original center. Enhancers overlapping promoters (including all alternative promoters) and promoters (intervals [−1000 base pairs (bp), +1000 bp] surrounding the transcription start site) were removed from the enhancer set. Overall, we identified 58,266 LNCaP enhancers. The DHS profiles of non–prostate tissues from Roadmap Epigenomics projects⁵¹, which do not overlap the positive sets, were collected as the negative training set of the deep learning model. The reason we used DHS sites not overlapping prostate H3K27ac peaks as negative control regions is that we aim to distinguish prostate-specific enhancers from otherwise accessible and potential enhancers in other tissues. Seventy percent of the data was used for training, 15% is for parameter tuning (on validation set), and the remaining 15% is used as the testing set. Larger DL score of the genomic sequence corresponds to a higher propensity to be an active tissue-specific enhancer¹⁸. The genomic sequence with deep learning model score ≥ 0.58 (selected based on FPR ≤ 0.01) is predicted to be active enhancers.

Effect size of a SNP

We evaluate the effect of a SNP by combining its effects across all possible potential positions of a putative enhancer. For a given SNP, using a stride as 20bp, there are 21 2-KB windows overlapping the SNP within a (-200bp, +200bp) region centered on the SNP. The reason we restricted our sliding windows to cover the central (-200bp, +200bp) region is that the alternate alleles tend to have a larger effect on enhancer activity within this narrow region⁴² and tend to have more consistent directions of deltas (Fig. 2C). The number of windows in which exactly one of the alternate alleles converts a neutral sequence to an active enhancer and vice versa, according to the model, is termed the essential window number (EWN). The EWN, which ranges from 0 to 21, is utilized to assess the effect size of the SNP on enhancer activity. Further, we expect a consistent directional impact of the SNP on enhancer activity across the 21 windows. Specifically, if the alternate allele exhibits positive delta scores in over 85% of the windows and converts a neutral sequence to an active enhancer in at least 5 windows (EWN ≥ 5) according to our model, the SNP is classified as a gained eSNP, suggesting an enhancer gain. Similarly, if the alternate allele has negative delta scores in over 85% of the windows and changes an active enhancer to a neutral sequence in at least 5 windows (EWN ≥ 5), the SNP is considered as a lost eSNP, suggesting an enhancer loss.

Identification of allelic imbalance in H3K27ac data

We reasoned that if an eSNP locus happens to be heterozygous in a sample, we would expect the two alleles to have differential enhancer activity. Specifically, the alternate alleles of the gained eSNPs should have higher enhancer activity than the reference alleles, and the alternate alleles of the lost eSNPs should have weaker enhancer activity than the reference alleles. This should be reflected in the differential representation of the two alleles among the H3K27Ac reads of the locus. The H3K27ac reads were extracted using BaalChIP⁵². Allelic counts over heterozygous sites of the 5 EA PrCa patients H3K27ac data⁷ were merged, and variants that had at least six reads were further processed for allele-specific enhancer activity analysis with binomial test. We use the heterozygous sites of the gained and lost non-eSNPs as the background. The gained non-eSNPs are high F_ST SNPs with positive delta scores, and the lost non-eSNPs are high F_ST SNPs with negative delta scores. For a heterozygous site of gained eSNPs, if the ratio of read number of the alternate allele to that of the reference allele is over 1.5 and the binomial P value ≤ 0.01, the position is considered to have allelic imbalance. For a heterozygous site of lost eSNPs, if the ratio of read number of the reference allele to that of the alternate allele is over 1.5 and the binomial P value ≤ 0.01, the position is considered to have allelic imbalance. Notably, the essential mutations are identified according to the sequence-based deep learning model, suggesting that the alternate alleles at these positions are likely to create or deactivate enhancers. The alternate alleles at these eSNPs coincidently (and independently) exhibit significantly more or less H3K27ac reads than do the reference alleles (termed allelic imbalance), substantiating that the alternate alleles of gained eSNPs are associated with increased enhancer activity, and alternate alleles of lost eSNPs are associated with decreased enhancer activity.

Functional enrichment analysis using GREAT and cancerSEA

To probe the potential functional roles of gained and lost eSNPs, we tested for functional enrichment among genes near the eSNPs loci using the online Genomic Regions Enrichment of Annotations Tool (GREAT²³) version 3.0.0 using two-nearest-genes association rule. The GO terms will be considered as enriched if it has at least 10 gene hits with binomial p-value threshold set as 0.01. The whole genome region was used as the background. To characterize the cancer progression related functions of the gained and lost eSNPs associated with FOX family TFs, we tested the enrichment of the nearby genes of the eSNPs in the 14 functional states across cancer types (including stemness, invasion, metastasis, proliferation, EMT, angiogenesis, apoptosis, cell cycle, differentiation, DNA damage, DNA repair, hypoxia, inflammation and quiescence) portrayed by cancerSEA database³¹. The enrichment of the targets of gained eSNPs in the genes of a particular functional state relative to those of lost eSNPs was evaluated as log₂(fraction_gained/fraction_lost), and vice versa, where fraction_gained is the fraction of the target genes of gained eSNPs overlapping with the genes of a particular functional state.

Survival analysis

We used clinical data from TCGA to model the overall survival of PrCa patients using the gene expression as a predictor variable and age as a covariate in the cox regression to relate gene expression levels to survival outcomes. We used the R library “survival” for this analysis. The Hazard Ratio (HR) represents the change in hazard (risk of death) associated with a unit increase in gene expression. Only the HR with significant p-values (p <= 0.05) were considered in this study.

Logistic regression

We applied the “glm” method from the R package to apply logistic regression to predict gene expression change (adult prostate – fetal prostate) against categories of gained eSNPs, lost eSNPs, random SNPs, and gene expression in fetal prostate. where Δg is the gene expression change in adult prostate compared to fetal prostate (g_adult − g_fetal), which is binary (1 for positive change, 0 for negative change). We integrated the fetal prostate gene expression profile (GSM614545⁴⁴) to GTEx and applied quantile normalized adult and fetal embryonic prostate gene expression and to remove differences across experiments and batch effects. Three different glms were applied to the three sets of SNPs (gained eSNPs, lost eSNPs, and random SNPs), separately. SNP category is binary for each SNP set. g_fetal is the expression level of the gene in fetal prostate. Positive coefficient of each variable indicates positive correlation between the variable and Δg.

Identification of potential TFBSs associated with the gained and lost eSNPs

To identify potential TF binding sites, we used FIMO⁵³ to scan the profiles of binding sites for vertebrate TF motifs in JASPAR⁵⁴, CIS-BP⁵⁵, SwissRegulon⁵⁶, HOCOMOCO⁵⁷, and UniPROBE⁵⁸ databases, along the 100-bp sequences centered at the gained and lost eSNPs. We identified motif-specific thresholds to limit the FDR to no more than five false positives in 10 kb of sequence, by scanning each motif on random genomic sequences using FIMO⁵³. Enrichment of a motif associated with gained/lost eSNPs (foreground) relative to non-eSNPs (background) was ascertained using Fisher’s exact test. The occurrence of a particular Transcription Factor Binding Site (TFBS) in the set of 100 bp genomic sequences centered by the gained/conserved eSNPs was normalized by the total number of eSNPs. Notably, we included the sequences containing both the reference and alternate alleles to avoid allelic bias.

Polygenic Risk Score

Before estimating the PRS, we applied Minimac4 algorithm to impute genotype using Michigan Imputation Server⁵⁹. We applied multivariate linear regression using 10-fold cross validation in the EA cohort (PEGASUS) to test the predictive accuracy in PrCa predisposition of different sets of risk variants. To estimate the PRS in the independent AA cohort (Ghana) and test the model, we retrain the model using the entire EA cohort and applied it to score the AA individuals using bootstrapping, by randomly subsampling 90% the AA individuals 10 times. The PRS is the weighted sum of the alternative allele number of the risk variants in a given genotype, where the weights are the coefficients of the trained multivariate linear regression model.

Data Availability

All data produced in the present work are contained in the manuscript

Supplemental Information

Fig. S1. eSNPs with AA dominant alternate alleles are associated with PrCa risk.

A) Distribution of F_ST of SNPs that are common (MAF ≥ 0.05) in either EA or AA. The shaded part is the top 5% of SNPs with the highest F_ST. B) Number of gained and lost eSNPs that are dominant either in AA or EA. C) Enrichment of both gained and lost eSNPs with AA-dominant alleles in PrCa GWAS traits compared to gained and lost eSNPs with EA-dominant alleles, respectively. For gained and lost eSNPs the y-axis shows the ratio of the fraction of AA-dominant alleles with PrCa GWAS signal above the cutoff (x-axis) to the fraction of EA-dominant alleles with PrCa GWAS signal above the cutoff. D) Fraction of genes with HR > 1.5 associated with gained eSNPs with EA-dominant alternate alleles, lost eSNPs with EA-dominant alternate alleles, and all LNCaP enhancers.

Fig. S2. An example of lost eSNP rs10095018.

A) LNCaP Hi-C data near the rs10095018 SNP locus. B) An enlarged binding motif logo of FOXA1 (MA0148.1) from JASPAR⁵⁴, where the 6^th position corresponds to the eSNP. C). Percentage of FOXA1 ChIP-seq reads coverage of both T and C alleles at the rs10095018 SNP site in MDA PCa 2B cell line.

Fig. S3.

Odds ratio (y-axis) related to different SNP sets based on contrast PRS against PRS score threshold (x-axis) in EA cohort. The odds ratio is the ratio of the fraction of PrCa patients with PRS above a certain threshold to that of control individuals with PRS above the same threshold. All PRSs based on different SNP sets are z-score normalized for comparison. Error bars show the standard deviation of the 10-fold cross validation.

Fig. S4.

Biological processes associated with gained activity PGS-545 SNPs using the GREAT tool

Acknowledgement

This work used the computational resources of the NIH HPC Biowulf cluster. Next generation sequencing was done at the CCR Genomics Core at the National Cancer Institute, NIH, Bethesda, MD. We would like to thank Dr. Baca, Sylvan C, Dr. Fortunato, Brad J., Dr. Ziwei Zhang and Dr. Mathew Freedman for kindly sharing the supplementary materials of their works with us. We would like to thank Dr. Kun Wang, Dr. Vishaka Gopalan, Annan Timon, Arati Rajeevan for their feedback. This research was funded, in part, by U.S. National Cancer Institute grant 1-ZIA-BC011979-02 (to S.H.) and 1-ZIA-BC011973-04 (to D.T.)

References

1.↵
Saunders, E.J., Kote-Jarai, Z. & Eeles, R.A. Identification of Germline Genetic Variants that Increase Prostate Cancer Risk and Influence Development of Aggressive Disease. Cancers (Basel) 13(2021).
2.↵
Conti, D.V. et al. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat Genet 53, 65–75 (2021).
OpenUrl CrossRef PubMed
3.↵
Taitt, H.E. Global Trends and Prostate Cancer: A Review of Incidence, Detection, and Mortality as Influenced by Race, Ethnicity, and Geographic Location. Am J Mens Health 12, 1807–1823 (2018).
OpenUrl CrossRef PubMed
4.↵
Freedman, M.L. et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci U S A 103, 14068–73 (2006).
OpenUrl Abstract/FREE Full Text
5.
Haiman, C.A. et al. Multiple regions within 8q24 independently a_ect risk for prostate cancer. Nat Genet 39, 638–44 (2007).
OpenUrl CrossRef PubMed
6.↵
Pichardo, C.M. et al. West African Genetic Ancestry, Neighborhood Deprivation, and Prostate Cancer. JAMA Netw Open 7, e2433546 (2024).
OpenUrl PubMed
7.↵
Baca, S.C. et al. Genetic determinants of chromatin reveal prostate cancer risk mediated by context-dependent gene regulation. Nat Genet 54, 1364–1375 (2022).
OpenUrl CrossRef PubMed
8.↵
Wang, A. et al. Characterizing prostate cancer risk through multi-ancestry genome-wide discovery of 187 novel risk variants. Nat Genet 55, 2065–2074 (2023).
OpenUrl CrossRef PubMed
9.↵
Cook, M.B. et al. A genome-wide association study of prostate cancer in West African men. Hum Genet 133, 509–21 (2014).
OpenUrl CrossRef PubMed
10.↵
Schumacher, F.R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat Genet 50, 928–936 (2018).
OpenUrl CrossRef PubMed
11.↵
Torkamani, A., Wineinger, N.E. & Topol, E.J. The personal and clinical utility of polygenic risk scores. Nat Rev Genet 19, 581–590 (2018).
OpenUrl CrossRef PubMed
12.
Brown, B.C., Asian Genetic Epidemiology Network Type 2 Diabetes, C., Ye, C.J., Price, A.L. & Zaitlen, N. Transethnic Genetic-Correlation Estimates from Summary Statistics. Am J Hum Genet 99, 76–88 (2016).
OpenUrl
13.↵
Ruan, Y. et al. Improving polygenic prediction in ancestrally diverse populations. Nat Genet 54, 573–580 (2022).
OpenUrl CrossRef PubMed
14.↵
Darst, B.F. et al. Evaluating approaches for constructing polygenic risk scores for prostate cancer in men of African and European ancestry. Am J Hum Genet 110, 1200–1206 (2023).
OpenUrl PubMed
15.↵
Conti, D.V. et al. Two Novel Susceptibility Loci for Prostate Cancer in Men of African Ancestry. J Natl Cancer Inst 109(2017).
16.↵
Darst, B.F. et al. A Germline Variant at 8q24 Contributes to Familial Clustering of Prostate Cancer in Men of African Ancestry. Eur Urol 78, 316–320 (2020).
OpenUrl PubMed
17.↵
Walavalkar, K. et al. A rare variant of African ancestry activates 8q24 lncRNA hub by modulating cancer associated enhancer. Nat Commun 11, 3598 (2020).
18.↵
Li, S., Hannenhalli, S. & Ovcharenko, I. De novo human brain enhancers created by single-nucleotide mutations. Sci Adv 9, eadd2911 (2023).
19.↵
Horoszewicz, J.S. et al. LNCaP model of human prostatic carcinoma. Cancer Res 43, 1809–18 (1983).
OpenUrl Abstract/FREE Full Text
20.↵
Navone, N.M. et al. Establishment of two human prostate cancer cell lines derived from a single bone metastasis. Clin Cancer Res 3, 2493–500 (1997).
OpenUrl Abstract/FREE Full Text
21.↵
Hazelett, D.J. et al. Comprehensive functional annotation of 77 prostate cancer risk loci. PLoS Genet 10, e1004102 (2014).
OpenUrl CrossRef PubMed
22.↵
van Arensbergen, J. et al. High-throughput identification of human SNPs a_ecting regulatory element activity. Nat Genet 51, 1160–1169 (2019).
OpenUrl CrossRef PubMed
23.↵
McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 28, 495–501 (2010).
OpenUrl CrossRef PubMed Web of Science
24.↵
Tanigawa, Y., Dyer, E.S. & Bejerano, G. WhichTF is functionally important in your open chromatin data? PLoS Comput Biol 18, e1010378 (2022).
OpenUrl CrossRef PubMed
25.↵
Awasthi, S. et al. Comparative Genomics Reveals Distinct Immune-oncologic Pathways in African American Men with Prostate Cancer. Clin Cancer Res 27, 320–329 (2021).
OpenUrl Abstract/FREE Full Text
26.↵
Jiang, P. et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat Med 24, 1550–1558 (2018).
OpenUrl CrossRef PubMed
27.↵
Karimian, K. et al. Human telomere length is chromosome end-specific and conserved across individuals. Science 384, 533–539 (2024).
OpenUrl PubMed
28.↵
Telomeres Mendelian Randomization, C., et al. Association Between Telomere Length and Risk of Cancer and Non-Neoplastic Diseases: A Mendelian Randomization Study. JAMA Oncol 3, 636–651 (2017).
OpenUrl PubMed
29.↵
Demanelis, K. et al. Determinants of telomere length across human tissues. Science 369(2020).
30.↵
Teng, M., Zhou, S., Cai, C., Lupien, M. & He, H.H. Pioneer of prostate cancer: past, present and the future of FOXA1. Protein Cell 12, 29–38 (2021).
OpenUrl CrossRef PubMed
31.↵
Yuan, H. et al. CancerSEA: a cancer single-cell state atlas. Nucleic Acids Res 47, D900–D908 (2019).
OpenUrl CrossRef PubMed
32.↵
Kuhn, R.M., Haussler, D. & Kent, W.J. The UCSC genome browser and associated tools. Brief Bioinform 14, 144–61 (2013).
OpenUrl CrossRef PubMed
33.↵
Song, Y. et al. N-myc downstream regulated gene-1/Cap43 may play an important role in malignant progression of prostate cancer, in its close association with E-cadherin. Hum Pathol 41, 214–22 (2010).
OpenUrl CrossRef PubMed
34.↵
Joshi, V., Lakhani, S.R. & McCart Reed, A.E. NDRG1 in Cancer: A Suppressor, Promoter, or Both? Cancers (Basel) 14(2022).
35.↵
Adams, E.J. et al. FOXA1 mutations alter pioneering activity, di_erentiation and prostate cancer phenotypes. Nature 571, 408–412 (2019).
OpenUrl CrossRef PubMed
36.↵
Ribatti, D., Tamma, R. & Annese, T. Epithelial-Mesenchymal Transition in Cancer: A Historical Overview. Transl Oncol 13, 100773 (2020).
37.↵
Muz, B., de la Puente, P., Azab, F. & Azab, A.K. The role of hypoxia in cancer progression, angiogenesis, metastasis, and resistance to therapy. Hypoxia (Auckl) 3, 83–92 (2015).
OpenUrl PubMed
38.↵
Enane, F.O., Saunthararajah, Y. & Korc, M. Di_erentiation therapy and the mechanisms that terminate cancer cell proliferation without harming normal cells. Cell Death Dis 9, 912 (2018).
39.↵
Campbell, K.J. & Leung, H.Y. Evasion of cell death: A contributory factor in prostate cancer development and treatment resistance. Cancer Lett 520, 213–221 (2021).
OpenUrl CrossRef PubMed
40.↵
Kircher, M. et al. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat Commun 10, 3583 (2019).
41.↵
Zhou, J. & Troyanskaya, O.G. Predicting e_ects of noncoding variants with deep learning-based sequence model. Nat Methods 12, 931–4 (2015).
OpenUrl CrossRef PubMed
42.↵
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant e_ects on expression and disease risk. Nat Genet 50, 1171–1179 (2018).
OpenUrl CrossRef PubMed
43.↵
He, H.H. et al. Di_erential DNase I hypersensitivity reveals factor-dependent chromatin dynamics. Genome Res 22, 1015–25 (2012).
OpenUrl Abstract/FREE Full Text
44.↵
Orr, B. et al. Identification of stromally expressed molecules in the prostate by tag-profiling of cancer-associated fibroblasts, normal fibroblasts and fetal prostate. Oncogene 31, 1130–42 (2012).
OpenUrl CrossRef PubMed
45.↵
San Martin, R., et al. Chromosome compartmentalization alterations in prostate cancer cell lines model disease progression. J Cell Biol 221(2022).
46.↵
Tsherniak, A. et al. Defining a Cancer Dependency Map. Cell 170, 564–576 e16 (2017).
OpenUrl CrossRef PubMed
47.↵
Bolger, A.M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–20 (2014).
OpenUrl CrossRef PubMed Web of Science
48.↵
Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–9 (2012).
OpenUrl CrossRef PubMed Web of Science
49.↵
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
OpenUrl CrossRef PubMed Web of Science
50.↵
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137 (2008).
51.↵
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–30 (2015).
OpenUrl CrossRef PubMed
52.↵
de Santiago, I. et al. BaalChIP: Bayesian analysis of allele-specific transcription factor binding in cancer genomes. Genome Biol 18, 39 (2017).
53.↵
Bailey, T.L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 37, W202–8 (2009).
OpenUrl CrossRef PubMed Web of Science
54.↵
Mathelier, A. et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 42, D142–7 (2014).
OpenUrl CrossRef PubMed Web of Science
55.↵
Weirauch, M.T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–43 (2014).
OpenUrl CrossRef PubMed Web of Science
56.↵
Pachkov, M., Erb, I., Molina, N. & van Nimwegen, E. SwissRegulon: a database of genome-wide annotations of regulatory sites. Nucleic Acids Res 35, D127–31 (2007).
OpenUrl CrossRef PubMed Web of Science
57.↵
Kulakovskiy, I.V. et al. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res 44, D116–25 (2016).
OpenUrl CrossRef PubMed
58.↵
Hume, M.A., Barrera, L.A., Gisselbrecht, S.S. & Bulyk, M.L. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res 43, D117–22 (2015).
OpenUrl CrossRef PubMed
59.↵
Das, S. et al. Next-generation genotype imputation service and methods. Nat Genet 48, 1284–1287 (2016).
OpenUrl CrossRef PubMed

View the discussion thread.

Posted November 15, 2024.

Download PDF

Supplementary Material

Data/Code

Citation Tools

Subject Area

Genetic and Genomic Medicine

Subject Areas

All Articles

Addiction Medicine (399)
Allergy and Immunology (708)
Anesthesia (201)
Cardiovascular Medicine (2922)
Dentistry and Oral Medicine (333)
Dermatology (249)
Emergency Medicine (439)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1032)
Epidemiology (12718)
Forensic Medicine (12)
Gastroenterology (827)
Genetic and Genomic Medicine (4570)
Geriatric Medicine (416)
Health Economics (729)
Health Informatics (2913)
Health Policy (1069)
Health Systems and Quality Improvement (1075)
Hematology (387)
HIV/AIDS (923)
Infectious Diseases (except HIV/AIDS) (14090)
Intensive Care and Critical Care Medicine (845)
Medical Education (423)
Medical Ethics (115)
Nephrology (468)
Neurology (4341)
Nursing (235)
Nutrition (637)
Obstetrics and Gynecology (802)
Occupational and Environmental Health (734)
Oncology (2264)
Ophthalmology (643)
Orthopedics (258)
Otolaryngology (324)
Pain Medicine (278)
Palliative Medicine (83)
Pathology (500)
Pediatrics (1196)
Pharmacology and Therapeutics (503)
Primary Care Research (494)
Psychiatry and Clinical Psychology (3743)
Public and Global Health (6922)
Radiology and Imaging (1524)
Rehabilitation Medicine and Physical Therapy (898)
Respiratory Medicine (915)
Rheumatology (437)
Sexual and Reproductive Health (443)
Sports Medicine (383)
Surgery (486)
Toxicology (60)
Transplantation (210)
Urology (179)

[1] 1.↵
Saunders, E.J., Kote-Jarai, Z. & Eeles, R.A. Identification of Germline Genetic Variants that Increase Prostate Cancer Risk and Influence Development of Aggressive Disease. Cancers (Basel) 13(2021).

[2] 2.↵
Conti, D.V. et al. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat Genet 53, 65–75 (2021).
OpenUrl CrossRef PubMed

[3] 3.↵
Taitt, H.E. Global Trends and Prostate Cancer: A Review of Incidence, Detection, and Mortality as Influenced by Race, Ethnicity, and Geographic Location. Am J Mens Health 12, 1807–1823 (2018).
OpenUrl CrossRef PubMed

[4] 4.↵
Freedman, M.L. et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci U S A 103, 14068–73 (2006).
OpenUrl Abstract/FREE Full Text

[5] 5.
Haiman, C.A. et al. Multiple regions within 8q24 independently a_ect risk for prostate cancer. Nat Genet 39, 638–44 (2007).
OpenUrl CrossRef PubMed

[6] 6.↵
Pichardo, C.M. et al. West African Genetic Ancestry, Neighborhood Deprivation, and Prostate Cancer. JAMA Netw Open 7, e2433546 (2024).
OpenUrl PubMed

[7] 7.↵
Baca, S.C. et al. Genetic determinants of chromatin reveal prostate cancer risk mediated by context-dependent gene regulation. Nat Genet 54, 1364–1375 (2022).
OpenUrl CrossRef PubMed

[8] 8.↵
Wang, A. et al. Characterizing prostate cancer risk through multi-ancestry genome-wide discovery of 187 novel risk variants. Nat Genet 55, 2065–2074 (2023).
OpenUrl CrossRef PubMed

[9] 9.↵
Cook, M.B. et al. A genome-wide association study of prostate cancer in West African men. Hum Genet 133, 509–21 (2014).
OpenUrl CrossRef PubMed

[10] 10.↵
Schumacher, F.R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat Genet 50, 928–936 (2018).
OpenUrl CrossRef PubMed

[11] 11.↵
Torkamani, A., Wineinger, N.E. & Topol, E.J. The personal and clinical utility of polygenic risk scores. Nat Rev Genet 19, 581–590 (2018).
OpenUrl CrossRef PubMed

[12] 12.
Brown, B.C., Asian Genetic Epidemiology Network Type 2 Diabetes, C., Ye, C.J., Price, A.L. & Zaitlen, N. Transethnic Genetic-Correlation Estimates from Summary Statistics. Am J Hum Genet 99, 76–88 (2016).
OpenUrl

[13] 13.↵
Ruan, Y. et al. Improving polygenic prediction in ancestrally diverse populations. Nat Genet 54, 573–580 (2022).
OpenUrl CrossRef PubMed

[14] 14.↵
Darst, B.F. et al. Evaluating approaches for constructing polygenic risk scores for prostate cancer in men of African and European ancestry. Am J Hum Genet 110, 1200–1206 (2023).
OpenUrl PubMed

[15] 15.↵
Conti, D.V. et al. Two Novel Susceptibility Loci for Prostate Cancer in Men of African Ancestry. J Natl Cancer Inst 109(2017).

[16] 16.↵
Darst, B.F. et al. A Germline Variant at 8q24 Contributes to Familial Clustering of Prostate Cancer in Men of African Ancestry. Eur Urol 78, 316–320 (2020).
OpenUrl PubMed

[17] 17.↵
Walavalkar, K. et al. A rare variant of African ancestry activates 8q24 lncRNA hub by modulating cancer associated enhancer. Nat Commun 11, 3598 (2020).

[18] 18.↵
Li, S., Hannenhalli, S. & Ovcharenko, I. De novo human brain enhancers created by single-nucleotide mutations. Sci Adv 9, eadd2911 (2023).

[19] 19.↵
Horoszewicz, J.S. et al. LNCaP model of human prostatic carcinoma. Cancer Res 43, 1809–18 (1983).
OpenUrl Abstract/FREE Full Text

[20] 20.↵
Navone, N.M. et al. Establishment of two human prostate cancer cell lines derived from a single bone metastasis. Clin Cancer Res 3, 2493–500 (1997).
OpenUrl Abstract/FREE Full Text

[21] 21.↵
Hazelett, D.J. et al. Comprehensive functional annotation of 77 prostate cancer risk loci. PLoS Genet 10, e1004102 (2014).
OpenUrl CrossRef PubMed

[22] 22.↵
van Arensbergen, J. et al. High-throughput identification of human SNPs a_ecting regulatory element activity. Nat Genet 51, 1160–1169 (2019).
OpenUrl CrossRef PubMed

[23] 23.↵
McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 28, 495–501 (2010).
OpenUrl CrossRef PubMed Web of Science

[24] 24.↵
Tanigawa, Y., Dyer, E.S. & Bejerano, G. WhichTF is functionally important in your open chromatin data? PLoS Comput Biol 18, e1010378 (2022).
OpenUrl CrossRef PubMed

[25] 25.↵
Awasthi, S. et al. Comparative Genomics Reveals Distinct Immune-oncologic Pathways in African American Men with Prostate Cancer. Clin Cancer Res 27, 320–329 (2021).
OpenUrl Abstract/FREE Full Text

[26] 26.↵
Jiang, P. et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat Med 24, 1550–1558 (2018).
OpenUrl CrossRef PubMed

[27] 27.↵
Karimian, K. et al. Human telomere length is chromosome end-specific and conserved across individuals. Science 384, 533–539 (2024).
OpenUrl PubMed

[28] 28.↵
Telomeres Mendelian Randomization, C., et al. Association Between Telomere Length and Risk of Cancer and Non-Neoplastic Diseases: A Mendelian Randomization Study. JAMA Oncol 3, 636–651 (2017).
OpenUrl PubMed

[29] 29.↵
Demanelis, K. et al. Determinants of telomere length across human tissues. Science 369(2020).

[30] 30.↵
Teng, M., Zhou, S., Cai, C., Lupien, M. & He, H.H. Pioneer of prostate cancer: past, present and the future of FOXA1. Protein Cell 12, 29–38 (2021).
OpenUrl CrossRef PubMed

[31] 31.↵
Yuan, H. et al. CancerSEA: a cancer single-cell state atlas. Nucleic Acids Res 47, D900–D908 (2019).
OpenUrl CrossRef PubMed

[32] 32.↵
Kuhn, R.M., Haussler, D. & Kent, W.J. The UCSC genome browser and associated tools. Brief Bioinform 14, 144–61 (2013).
OpenUrl CrossRef PubMed

[33] 33.↵
Song, Y. et al. N-myc downstream regulated gene-1/Cap43 may play an important role in malignant progression of prostate cancer, in its close association with E-cadherin. Hum Pathol 41, 214–22 (2010).
OpenUrl CrossRef PubMed

[34] 34.↵
Joshi, V., Lakhani, S.R. & McCart Reed, A.E. NDRG1 in Cancer: A Suppressor, Promoter, or Both? Cancers (Basel) 14(2022).

[35] 35.↵
Adams, E.J. et al. FOXA1 mutations alter pioneering activity, di_erentiation and prostate cancer phenotypes. Nature 571, 408–412 (2019).
OpenUrl CrossRef PubMed

[36] 36.↵
Ribatti, D., Tamma, R. & Annese, T. Epithelial-Mesenchymal Transition in Cancer: A Historical Overview. Transl Oncol 13, 100773 (2020).

[37] 37.↵
Muz, B., de la Puente, P., Azab, F. & Azab, A.K. The role of hypoxia in cancer progression, angiogenesis, metastasis, and resistance to therapy. Hypoxia (Auckl) 3, 83–92 (2015).
OpenUrl PubMed

[38] 38.↵
Enane, F.O., Saunthararajah, Y. & Korc, M. Di_erentiation therapy and the mechanisms that terminate cancer cell proliferation without harming normal cells. Cell Death Dis 9, 912 (2018).

[39] 39.↵
Campbell, K.J. & Leung, H.Y. Evasion of cell death: A contributory factor in prostate cancer development and treatment resistance. Cancer Lett 520, 213–221 (2021).
OpenUrl CrossRef PubMed

[40] 40.↵
Kircher, M. et al. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat Commun 10, 3583 (2019).

[41] 41.↵
Zhou, J. & Troyanskaya, O.G. Predicting e_ects of noncoding variants with deep learning-based sequence model. Nat Methods 12, 931–4 (2015).
OpenUrl CrossRef PubMed

[42] 42.↵
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant e_ects on expression and disease risk. Nat Genet 50, 1171–1179 (2018).
OpenUrl CrossRef PubMed

[43] 43.↵
He, H.H. et al. Di_erential DNase I hypersensitivity reveals factor-dependent chromatin dynamics. Genome Res 22, 1015–25 (2012).
OpenUrl Abstract/FREE Full Text

[44] 44.↵
Orr, B. et al. Identification of stromally expressed molecules in the prostate by tag-profiling of cancer-associated fibroblasts, normal fibroblasts and fetal prostate. Oncogene 31, 1130–42 (2012).
OpenUrl CrossRef PubMed

[45] 45.↵
San Martin, R., et al. Chromosome compartmentalization alterations in prostate cancer cell lines model disease progression. J Cell Biol 221(2022).

[46] 46.↵
Tsherniak, A. et al. Defining a Cancer Dependency Map. Cell 170, 564–576 e16 (2017).
OpenUrl CrossRef PubMed

[47] 47.↵
Bolger, A.M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–20 (2014).
OpenUrl CrossRef PubMed Web of Science

[48] 48.↵
Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–9 (2012).
OpenUrl CrossRef PubMed Web of Science

[49] 49.↵
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
OpenUrl CrossRef PubMed Web of Science

[50] 50.↵
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137 (2008).

[51] 51.↵
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–30 (2015).
OpenUrl CrossRef PubMed

[52] 52.↵
de Santiago, I. et al. BaalChIP: Bayesian analysis of allele-specific transcription factor binding in cancer genomes. Genome Biol 18, 39 (2017).

[53] 53.↵
Bailey, T.L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 37, W202–8 (2009).
OpenUrl CrossRef PubMed Web of Science

[54] 54.↵
Mathelier, A. et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 42, D142–7 (2014).
OpenUrl CrossRef PubMed Web of Science

[55] 55.↵
Weirauch, M.T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–43 (2014).
OpenUrl CrossRef PubMed Web of Science

[56] 56.↵
Pachkov, M., Erb, I., Molina, N. & van Nimwegen, E. SwissRegulon: a database of genome-wide annotations of regulatory sites. Nucleic Acids Res 35, D127–31 (2007).
OpenUrl CrossRef PubMed Web of Science

[57] 57.↵
Kulakovskiy, I.V. et al. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res 44, D116–25 (2016).
OpenUrl CrossRef PubMed

[58] 58.↵
Hume, M.A., Barrera, L.A., Gisselbrecht, S.S. & Bulyk, M.L. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res 43, D117–22 (2015).
OpenUrl CrossRef PubMed

[59] 59.↵
Das, S. et al. Next-generation genotype imputation service and methods. Nat Genet 48, 1284–1287 (2016).
OpenUrl CrossRef PubMed

Non-coding genetic variants underlying higher prostate cancer risk in men of African ancestry

Abstract

Introduction

Results

Overview of the study

ProSNP-DL - a deep learning model of PrCa enhancers with single-nucleotide resolution

ProSNP-DL identifies enhancer SNPs dominant in an African ancestral population and associated with PrCa risk and patient survival

The gained and lost activity eSNPs may facilitate carcinogenesis via complementary mechanisms

eSNPs disrupt binding of key prostate transcription factors including the FOX and the HOX families

Polygenic risk score derived from eSNPs can be helpful to differentiate between cases and controls of PrCa

Discussion

Methods

Data Availability

Data Processing

FOXA1 ChIP-seq experiments in two PrCa cell lines

A deep convolutional neural network model of prostate enhancers

Effect size of a SNP

Identification of allelic imbalance in H3K27ac data

Functional enrichment analysis using GREAT and cancerSEA

Survival analysis

Logistic regression

Identification of potential TFBSs associated with the gained and lost eSNPs

Polygenic Risk Score

Data Availability

Supplemental Information

Acknowledgement

References

Citation Manager Formats

Subject Area