Abstract
Background Due to the limitations in specificity of current diagnostic methods for prostate cancer (PCa), more reliable biomarkers are needed to explore for improving early detection. Plasma proteins represent a promising source of biomarkers, therefore understanding the causal relationships between specific plasma proteins and PCa could be conductive to identify novel biomarkers and therapeutic targets for PCa prevention and treatment.
Methods We performed a meta-analysis of two independent genome-wide association studies (GWASs) including 94,397 individuals with PCa and 192,372 controls. A mendelian randomization (MR) supplemented by colocalization analysis was conducted, using cis-acting variants on 4,907 plasma proteins from deCODE Genetics (N=35,559) and 2,940 plasma proteins from UK Biobank Pharma Proteomics Project (UKB-PPP) (N=54,219). Then, the biological pathway analysis and druggability evaluation of the risk proteins were further performed.
Results Five possible susceptibility loci (JAZF1, PDILM5, WDPCP, EEFSEC, and TNS3) for PCa were identified through the meta-analysis of GWASs. Among 3,722 plasma proteins, 193 proteins were associated with PCa risk, of which 20 high-risk proteins, including KLK3, were validated in both the deCODE and UKB-PPP cohorts. Functional annotation of these genes encoding proteins confirmed enrichment of immune response, inflammatory response, cell-cell interaction and so on. Genetic colocalization and druggable genome analyses also identified several potential drug targets for PCa, such as HSPB1, RRM2B and PSCA.
Conclusions We identified novel variants as well as several protein biomarkers linked to PCa risk and indicated pathways associated with PCa, which offered new insights into PCa etiology and contributed to development of novel biomarkers for early detection and potential therapeutic interventions.
Funding This work was supported by grants from Beijing Municipal Natural Science Foundation (grant No. JQ24059, No. L234038) and National Natural Science Foundation of China (grant No. 82274015).
Graphical abstract A comprehensive study design summarized as follows.
Introduction
Prostate cancer (PCa) is the second most common malignancy with an estimation of 1.5 million new cases in 2022, which is account for 14% of total cancer diagnosed in men worldwide(Ferlay J, 2024). Even more worrying is that the number of new cases of PCa are projected to 2.9 million by 2040(James et al., 2024). Despite advancements in medical technology, early detection of PCa remains a significant challenge due to the limitations in specificity of current diagnostic methods, such as prostate-specific antigen (PSA) testing(Carter et al., 2013; Draisma et al., 2009; Oesterling, 1991). Therefore, there is an urgent need for more accurate and reliable biomarkers that can improve early detection and prognostic evaluations of PCa. Plasma proteins have key roles in the development and progression of PCa, such as interleukin (IL)-6(Deichaite et al., 2022), insulin-like growth factor (IGF)-1 and IGF-binding protein (IGFBP)-1(Cao et al., 2015). However, observational studies exploring the association between plasma proteins and PCa risk are often limited by confounding factors and selection bias, making it challenging to establish a clear causal relationship.
To overcome these limitations, Mendelian randomization (MR) offers a robust methodological approach. This approach utilizes the principle of random assortment of genes from parents to offspring, using genetic variation as an instrumental variable, which mimics the randomization process in a controlled trial, thereby minimizing the impact of reverse causality(Davies, Holmes, & Smith, 2018). With the recent development of proteomics technology, several large-scale proteomic studies have identified over 18,000 protein quantitative trait loci (pQTLs) covering more than 4800 proteins(Ferkingstad et al., 2021; Pietzner et al., 2021). By using MR, researchers can gain a deeper comprehension of whether particular plasma proteins hold a causal nexus with PCa, thereby potentially unraveling novel biomarkers and targets for the prevention and treatment of PCa.
In this study, we performed a meta-analysis of two genome-wide association studies (GWAS) on PCa (PRACTICAL and FinnGen) for a total sample size of 94,397 cases and 192,372 controls. Based on GWAS summary statistics data from deCODE and UKB-PPP cohorts, we further performed a protein-wide MR (PW-MR) study, supplemented by colocalization analysis, to explore the casual relationship between plasma proteins and PCa risk. Moreover, we indicated biological processes and pathways associated with PCa, and evaluated the druggability of risk proteins. We aimed to identify novel plasma proteins biomarkers for PCa, which could address the limitations of current diagnostic methods and offer new insights into the biological mechanisms of PCa and potential therapeutic targets for intervention.
Results
Meta-analysis of the Genome-Wide Association Studies for Prostate Cancer
We conducted a meta-analysis combing two GWAS studies with a collective sample size of 94,397 individuals with PCa and 192,372 controls, aiming to identify genetic variants linked to PCa. The associations and assessment of SNPs heterogeneity that passed the genome-wide P-value threshold at these loci with PCa were presented in Supplementary Table 1. We found 5 genetic risk loci contained at least one SNP passing the genome-wide significance threshold of P ≤ 5×10−8: JAZF1, PDILM5, WDPCP, EEFSEC, and TNS3 (Figure 1). Among them, PDLIM5, WDPCP, EEFSEC and TNS3 were promising candidates as novel susceptibility loci associated with PCa. Supplementary Figure 1 displayed the associated quantile-quantile plot. The LocusZoom plots of the top SNPs at JAZF1, PDILM5, WDPCP, EEFSEC and TNS3, along with their genomic location, GWAS P values and recombination rate with neighboring SNPs were visualized in Supplementary Figure 2. In summary, this GWAS meta-analysis discovered genetic variations in one recognized PC-associated loci and four potential novel loci, providing a reliable dataset for MR analyses.
Manhattan plot of PCa GWAS meta-analysis. The genetic regions containing top SNPs related to PCa are depicted. The red dashed line signifies the genome-wide significance threshold of 5.0×10−8.
Cross-phenotype Analysis of Prostate Cancer
We used the iCPAGdb to conduct the cross-phenotype genetic association analyses with PCa genome-wide significant SNPs (P ≤ 5×10−8). The iCPAGdb offered an improved algorithm for identifying cross-phenotype associations by using pre-computed ancestry-specific LD databases and integrating genetic data from 3793 traits in the NHGRI-EBI GWAS catalog(L. Wang et al., 2020). After adjusting for Bonferroni’s correction, the cross-phenotype analysis indicated that 117 traits showed significant association with PCa (Supplementary Table 2). As shown in Figure 2, the top 10 enrichments were for prostate specific antigen measurement (P=7.16×10-43), body height (P=2.18×10-30), BMI-adjusted waist-hip ratio (P=1.32×10-26), high density lipoprotein cholesterol measurement (P=2.38×10-22), systolic blood pressure (P=2.93×10-22), balding measurement (P=3.01×10-22), leukocyte count (P=3.46×10- 21), blood protein measurement (P=4.95×10-21), body mass index (P=1.71×10-20) and heel bone mineral density (P=2.02×10-19).
The top 10 significant cross-phenotype associations with PCa at a 5% FDR. The x-axis represents the P value of the correlation. BMI, body mass index.
Proteome-Wide Mendelian Randomization Studies of Prostate Cancer
The genetic association summary statistics of 35,559 Icelanders from deCODE Genetics and 54,219 Europeans from the UKB-PPP were utilized to investigate the relationship between PCa and plasma proteins. Our genetic instrument selection strategy enabled us to examine 1778 and 1944 proteins from deCODE and UKB-PPP (Supplementary Table 3A and B). Using the Wald ratio or IVW method, a total of 193 unique plasma proteins were significantly associated with PCa after multiple tests with a 5% FDR correction. This analysis yielded 137 proteins in UKB-PPP more than 76 proteins in deCODE (Figure 3A). The results of Heterogeneity test based on Q statistics showed little evidence of heterogeneity. Furthermore, no significant intercept was detected, implying that there was no directional pleiotropy observed.
Result of PW-MR on the associations between plasma proteins and the risk of PCa. (A) Volcano plot of PCa PW-MR study using deCODE (the left side) and UKB-PPP (the right side) cohorts. Annotated proteins passed the 5% FDR IVW P-value threshold. The blue and red colors represent a negative and positive effect, respectively. (B) Venn diagram depicting proteins associated with PCa in deCODE and UKB-PPP. (C) PhenoGram of PCa PW-MR study significant associations. The blue dots and the green dots represent the deCODE and UKB-PPP specific proteins, respectively. The red dot represents both simultaneously.
After FDR correction, 20 proteins were detected in both data sets and the identified associations were consistent (Figure 3B). Genetic prediction indicated that 11 of these 20 proteins were positively associated with the risk of PCa as well as remaining 9 proteins were negatively associated, suggesting that these 9 proteins maybe protective factors against PCa (Supplementary Table 4). A PhenoGram depicted the chromosomal location of the 193 unique identified proteins in deCODE and UKB-PPP studies (Figure 3C).
Colocalization analysis
We performed colocalization analyses of proteins significantly expressed in deCODE and UKB-PPP studies with PCa. It was observed that 4 proteins in deCODE study and 7 proteins in UKB-PPP were colocalized with PCa associations with high support of evidence (PPH4 ≥ 0.8) (Table 1), suggesting that these 10 plasma proteins might serve as potential targets for treating PCa. Among them, SERPINA3 showed strong colocalization evidence in both decode (PPH4=0.952) and UKB-PPP (PPH4=0.951) studies. This analysis identified 1 causal variant in deCODE (rs61976125) and 1 causal variant in Fenland (rs6575449) (Figure 4).
Colocalization plot of SERPINA3 variants associated with PCa in deCODE and UKB-PPP. Variants are color-coded based on their linkage disequilibrium (LD) with the lead SNP (the variant with the lowest p-value). The lead SNP is highlighted in red. Other variants are colored from blue to yellow, indicating decreasing LD with the lead SNP.
Gene-Based Association and Pathway Analyses
We used the GENE2FUNC tool available in FUMA to explore the biological significance, functional implications and tissue-specific expression of the genes identified from our GWAS. These genes appeared to be significantly enriched in inflammatory and immune pathways (such as defense, immune and inflammatory response), as well as cell interaction and signaling pathways (such as interaction between organism and cell adhesion) (Figure 5A). The KEGG pathway enrichment analysis revealed several pathways that were significantly enriched, such as cytokine-cytokine receptor interaction, p53 signaling pathway, JAK-STAT signaling pathway, pathways in cancer and apoptosis (Figure 5B). These analyses indicated potential biological processes and mechanisms associated with PCa. The 193 unique genes were predominantly expressed in the lymphocytes, blood, liver and prostate (Supplementary Figure 3).
Functional annotation of the genetic architecture of PCa. (A) Biological processes and (B) KEGG pathway analysis of the 193 unique proteins identified in deCODE and UKB-PPP.
Druggability of identified proteins
Exploring new therapeutic opportunities for PCa based on genetic information is crucial for developing targeted treatments. In our study, we examined the druggability of genes and proteins identified in MR analyses using OpenTargets databases. We investigated 128 unique drugs targeting 45 identified proteins (Supplementary Table 5). Among these, three drugs (APATORSEN, TRIAPINE and MK-4721) currently in clinical trials were intended for PCa treatment, with each targeting HSPB1, RRM2B and PSCA, respectively. Notably, the effects of these drugs on their respective protein targets align with the directions indicated by our MR results, suggesting a consistency between genetic evidence and therapeutic potential. In addition, several other identified targets, such as RET, FGFR3, NCAM1, TYMP, TNFRSF10B, MMP3, TACSTD2 and NOTCH2, were implicated in various cancers and present potential therapeutic avenues.
Discussion
In this study, we conducted a comprehensive meta-analysis of two GWAS for PCa, identifying significant genetic loci associated with PCa risk. The combined sample size of 94,397 cases and 192,372 controls revealed one known (JAZF1) and four potential novel (PDLIM5, WDPCP, EEFSEC and TNS3) susceptibility loci of PCa. JAZF1 (Juxtaposed with another zinc finger gene 1) is a transcriptional repressor of testicular nuclear receptor 4 (TR4) (Nakajima, Fujino, Nakanishi, Kim, & Jetten, 2004). Several GWAS studies have implicated that JAZF1 is highly associated with type 2 diabetes and PCa risk(Frayling, Colhoun, & Florez, 2008; Machiela et al., 2012; Saxena, Voight, Zeggini, Scott, & Genetics, 2008; Thomas et al., 2008). JAZF1 may influence the risk of PCa through its role in metabolic regulation(Rosario et al., 2023) and cellular proliferation(Sung et al., 2018).
PDLIM5 (PDZ and LIM domain 5) is a member of the enigma subfamily and features a N-terminal PDZ domain and three C-terminal LIM domains(X. J. Wang & Su, 2010). A study implicated that downregulating the expression of PDLIM5 could ultimately impede the progression of PCa(Xie et al., 2020). WDPCP, a PCP effector, can regulate PCP by directly modulating the actin cytoskeleton(Cui et al., 2013). Disruption of PCP signaling can cause abnormal cell behavior and tissue structure, leading to the development of cancer(Humphries & Mlodzik, 2018). EEFSEC (eukaryotic elongation factor, selenocysteine-tRNA specific) was identified as a genome-wide significant loci in recent chronic obstructive pulmonary disease GWAS(Hobbs et al., 2017; Wain et al., 2017). TNS3 (Tensin 3) is involved in cell migration, invasion and adhesion processes(Chen et al., 2017; Martuszewska et al., 2009; Zheng et al., 2021; Zuidema et al., 2022), which are critical in cancer metastasis. To certain whether these 4 genes causative factors for PCa, genetic association studies with a larger sample size will be needed.
In the cross-phenotype Analysis of PCa, we found that prostate specific antigen (PSA) measurement, BMI-adjusted waist-hip ratio (WHR), high density lipoprotein cholesterol (HDL) measurement, systolic blood pressure, balding measurement and other traits were significantly associated with PCa. These findings highlighted the multifactorial nature of PCa and underscored the importance of considering a broad range of physiological and biochemical factors in understanding its etiology and progression.
PSA measurement, a protein secreted by both normal and malignant prostate epithelial cells, has long been a cornerstone in the early detection and monitoring of PCa(Oesterling, 1991). However, the specificity and sensitivity of PSA as a diagnostic tool have been questioned due to its elevation in benign conditions such as prostatitis and benign prostatic hyperplasia, leading to unnecessary biopsies(Carter et al., 2013) and substantial overdiagnosis(Draisma et al., 2009). Therefore, supplementary biomarkers are urgently needed to improve diagnostic accuracy.
Obesity has emerged as a significant risk factor in cancer development(Petrelli et al., 2021), such as aggressive PCa(Moyad, 2015). The significant association between BMI-adjusted WHR and PCa suggested that central obesity may play a critical role in the development of PCa(Perez-Cornago, Dunneram, Watts, Key, & Travis, 2022).
Central adiposity, primarily referring to visceral adipose tissue, can influence hormone levels and inflammatory processes, thereby contributing to carcinogenesis(Doyle, Donohoe, Lysaght, & Reynolds, 2012). HDL cholesterol is known for its atherosclerotic protective effects(Rohatgi, Westerterp, von Eckardstein, Remaley, & Rye, 2021), mainly due to the ability to promote the reverse transport of cholesterol from peripheral cells to the liver for excretion(Cuchel & Rader, 2006), and to exert antioxidant and anti-inflammatory activities. Through similar mechanisms, HDL may influence the development and progression of tumors, by directly interacting with cancer cells or by modifying the tumor microenvironment(Ossoli, Wolska, Remaley, & Gomaraschi, 2022). Higher HDL levels might confer a protective effect against PCa by influencing cancer cells proliferation, possibly through mechanisms involving antioxidant(Ruscica et al., 2018) and anti-inflammatory properties.
Results from previous studies of the association between hypertension and PCa development have been inconsistent(Christakoudi et al., 2020; Liang et al., 2016; Seretis et al., 2019). One MR analysis suggested that elevated systolic blood pressure might be linked to an increased risk of PCa, potentially due to systemic inflammation(Stikbakke et al., 2022), which contributed to a pro-tumorigenic environment by promoting cellular proliferation, DNA damage, and resistance to apoptosis(Mantovani, Allavena, Sica, & Balkwill, 2008). However, the precise relationship between systolic blood pressure and PCa remains unclear. More research is needed to elucidate the exact biological pathways involved and to determine whether managing hypertension could serve as a preventive strategy for PCa.
The underlying mechanism for the link between androgenic alopecia and an increased risk of PCa may involve androgen metabolism, as both conditions are influenced by dihydrotestosterone (DHT) levels. Elevated DHT can promote the miniaturization of hair follicle leading to balding(Kaufman, 1996), as well as stimulate prostate cell proliferation(Tong et al., 2022), contributing to the development of PCa. While these associations highlight a potential shared hormonal pathway, further studies are needed to confirm the causality. Overall, our cross-phenotype analysis provided a comprehensive view of the diverse factors associated with PCa and enhanced our understanding of the disease’s multifaceted nature. Future researches should focus on elucidating the underlying mechanisms of these associations and exploring their potential for integration into clinical practice.
We performed a PW-MR study supplemented by colocalization analysis to explore the casual association of 3,722 plasma proteins with the risk of PCa. MR analysis identified a total of 193 unique proteins significantly associated with PCa, of which 10 proteins had a strong support of colocalization. Microseminoprotein-beta (MSMB), also known as PSP94 or beta-inhibin, as an immunoglobulin superfamily protein and one of the most abundant proteins secreted by prostate epithelial cells has the strongest correlation(Byrne et al., 2019). We predicted that men with lower blood levels of MSMB have higher risk of PCa, based on two European ancestry cancer GWAS studies. These results were consistent with another published prospective study(Haiman et al., 2013), supporting a potential protective role of MSMB in the development of PCa. One previous publication reported that the decreased expression of MSMB was both in the tumor (especially in more advanced tumor) and adjacent benign prostate tissue(Bergström, Järemo, Nilsson, Adamo, & Bergh, 2018), which is not in conflict as it may be related to aggressiveness of prostate tumors. Although the mechanism of action of MSMB in PCa is unclear, it has been shown to control prostate cell growth by regulating apoptosis(Garde et al., 1999). In addition to our study, several other studies also supported the potential of MSMB as a serum marker for the early detection of PCa(Nam et al., 2006; Reeves, Dulude, Panchal, Daigneault, & Ramnani, 2006). Consequently, further research is essential to explore the functional role and possible clinical utility of MSMB in PCa.
Serine protease inhibitor A3 (SERPINA3), an inhibitors of serine proteases that promotes tumor development by regulating the transcription of some oncogenes and its elevation was associated with a worse prognosis in some cancers(De Mezer et al., 2023). SERPINA3 was up-regulated in PCa cells and enhanced cells migration and invasion(Z. S. Xing, Li, Liu, Zhang, & Bai, 2021), which was consistent with our prediction of it as a risk factor in this MR Study. A study identified that the enhanced expression of SERPINA3 stimulated the bone environment, therefore it may be served as a diagnostic biomarker to predict PCa with bone metastasis phenotype and survival(Ito et al., 2023).
Protease serine 3 (PRSS3), also named mesotrypsin, is reported to be aberrantly expressed in various types of tumors and participates in the progression and development of cancers. For example, PRSS3 was identified to be downregulated in lung cancer(Zhou, Li, & Luo, 2023) but upregulated in pancreatic cancer(Jiang et al., 2010; C. J. Xing et al., 2019), suggesting it may have different roles depending on the cellular or disease ways. A study of vivo and vitro experiments has demonstrated that PRSS3 played an important role in PCa metastasis(Hockla et al., 2012). Despite these findings, further studies are necessary to fully understand the functions and mechanisms of PRSS3 in human PCa.
KLK3 (Kallikrein Related Peptidase 3), also known as PSA, is a serine protease predominantly expressed in the prostate gland. One potential mechanism through which KLK3 may influence PCa development involved its proteolytic activity, which can degrade extracellular matrix components and facilitate tumor invasion and metastasis(Avgeris, Mavridis, & Scorilas, 2012; Lawrence, Lai, & Clements, 2010; Mavridis, Avgeris, & Scorilas, 2014). Additionally, the androgen regulation of KLK3 expression correlated with the androgen receptor signaling pathway, a critical driver of PCa progression, particularly in castration-resistant prostate cancer (CRPC)(Lilja, Ulmert, & Vickers, 2008). Despite these insights, the precise biological pathways through which KLK3 contributes to PCa progression remain incompletely understood. Further research is needed to illustrate the direct and indirect effects of KLK3 on PCa cells and the tumor microenvironment.
Lastly, the druggability evaluation identified 45 protein biomarkers that have been targeted by 128 drugs. Among these, three drugs targeting HSPB1, RRM2B and PSCA, respectively, are currently in clinical trials for potential use in PCa treatment. HSPB1, also known as human HSP27, is a small heat-shock protein that functions as a molecular chaperone, protecting cells against stress-induced damage and apoptosis(Okuno, Adachi, Kozawa, Shimizu, & Yasuda, 2016). It was overexpressed in various cancers(Sheng et al., 2017; Shi et al., 2019), including PCa(Shiota et al., 2013; Vasiljevic et al., 2013), and was associated with tumor progression, metastasis and resistance to therapy. Evidences in vitro and in vivo suggested that inhibiting HSP27 inhibits tumor growth and sensitivity to cytotoxic chemotherapy(Hadaschik et al., 2008; Kamada et al., 2007). APATORSEN is an antisense oligonucleotide designed to inhibit HSPB1 expression(Jansen & Zangemeister-Wittke, 2002), which is still in clinical trials.
RRM2B is a subunit of ribonucleotide reductase, an enzyme critical for DNA synthesis and repair(Okumura et al., 2005). Its expression was upregulated in response to DNA damage and was involved in maintaining genomic stability(Aye, Li, Long, & Weiss, 2015; Foskolou et al., 2017). Elevated levels and distinct mutation signatures of RRM2B have been linked to poor prognosis in several cancers(Iqbal et al., 2021). PSCA is a cell surface glycoprotein highly expressed in high-grade PCa(Gu et al., 2000; Zhigang & Wenlv, 2004) and other solid tumors(Teng et al., 2022), and was involved in cell proliferation, adhesion and survival(Li et al., 2017).
This study possessed several strengths, including the use of the largest collection of plasma protein data (coving more than 4800 proteins), large sample sizes of GWAS studies, a mutual validation across two independent outcome datasets and the use of colocalization analysis to support the MR results. Additionally, our assessment of the human blood proteome relied on two technologies (SOMAmer and Olink), which were valuable for identifying plasma proteins associated with diseases traits such as PCa.
Some limitations of our analysis should be acknowledged. Firstly, this investigation was only confined to Europeans, restricting the applicability of our findings to other populations. It is crucial for future studies to identify risk proteins in more diverse populations, especially African ancestry who face a higher risk of PCa. Moreover, our study concentrated on the proteins available in the MR analysis, which likely led to the omission of other potential therapeutic targets. Lastly, we utilized proteomic data from Icelanders whose genetic backgrounds may differ from other European populations, which might introduce bias. However, this potential bias might be minimal, as we found that 20 proteins showing significance (P<0.05) in both the deCODE and UKB-PPP studies had consistent associations with PCa.
In conclusion, we conducted a large-scale PW-MR study using the MR and colocalization analysis to investigate the genetic associations of up to 3,722 unique proteins with PCa. We revealed the complex genetic architecture of PCa and identified many new plasma proteins with strong causal associations to PCa. These findings highlighted the potential biomarkers for early detection and therapeutic targets, providing a foundation for future research and potential clinical applications.
Methods
Data sources for plasma proteins
We selected cis-SNPs associated with plasma proteins as instrumental variables from two large-scale GWASs in the deCODE Genetics(Ferkingstad et al., 2021) and UKB-PPP(Sun et al., 2023). Cis-SNPs were defined as SNPs within a vicinity of ±1 Mb around the gene encoding the protein. From these SNPs, only those with a minor allele frequency of ≥ 1% that were genome-wide significance (p < 5×10-8) and considered independent (linkage disequilibrium r2 < 0.1 in 1000G) were retained. deCODE Genetics conducted proteomic profiling on blood plasma samples from 35,559 Icelanders using the SomaScan platform and collected data on 4907 aptamers(Ferkingstad et al., 2021). For the two-sample MR analysis, we selected cis-SNPs as instrumental variables for 1778 proteins. Likewise, cis-SNPs for 1944 plasma proteins were obtained from the UKB-PPP where 2940 proteins were measured among 54,219 Europeans using the Olink platform(Sun et al., 2023). The proteins with positive MR results were included in the colocalization analysis.
Data sources for Prostate Cancer
GWAS summary statistics for PCa were obtained from the PRACTICAL consortium(Schumacher et al., 2018) and FinnGen study. The PRACTICAL (Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome) consortium included 79,198 cases and 61,106 controls. We used the data on PCa from the FinnGen study R10 in this analysis, which comprised 15,199 cases and 131,266 controls. All participants of these two cohorts were of European ancestry. In the MR analysis, we treated the PRACTICAL consortium as the discovery study and the FinnGen R10 study as the replication. To increase the effectiveness, we performed a fixed-effect GWAS meta-analysis of the two GWASs using the METAL package. The quantile-quantile plot was generated using the “qqman” package in R software. The R package “gassocplot” was used to plot regional association plots for the top SNP at each of the genome-wide significant loci identified.
Cross-phenotype analysis
Through the interactive Cross-Phenotype Analysis of GWAS database (iCPAGdb)(L. Wang et al., 2020), a new platform for cross-phenotype analysis, we explored the genetic correlation between PCa and other traits. We analyzed the genetic correlations between PCa and 3793 traits using data from the National Human Genome Research Institute-European Bioinformatics Institute (NHGRI-EBI) GWAS catalog(L. Wang et al., 2020). Through the use of ancestry LD-specific association data, iCPAGdb performs cross-phenotype enrichment analyses. iCPAGdb reveals signals of pairwise traits and shared signals by analyzing traits associations with LD proxy SNPs. The output data will show results from Fisher’s exact test with adjustment for 5% false discovery rate (FDR) and also Bonferroni’s, Jaccard’s, Sorensen’s, and Chao-Sorensen similarity indexes.
MR analysis
We conducted a MR analysis with plasma proteins as the exposure variable and PC as the outcome variable. We utilized the R package “TwoSampleMR” in R software (4.3.3) for the MR analysis. When only one SNP was available for a particularprotein, we applied the Wald ratio test. Inverse-variance weighted (IVW) was used as the main analysis method for two or more SNPs available. Considering multiple testing, we employed the 5% False Discovery Rate (FDR) method for P-value correction. P<0.05 was considered statistically significant. MR-Egger and weighted median methods were used as supplementary analysis methods. Cochran’s Q statistic test using the MR-Egger method was used to determine the heterogeneity between the genetic variants. We also performed MR-Egger regression intercept to detect and adjust the directional horizontal pleiotropy(Hemani, Bowden, & Smith, 2018), P<0.05 was considered as the presence of directional pleiotropy and thus removed from the further analyses.
Colocalization analysis
The “coloc” package was used to perform Bayesian colocalization analysis to investigate if the associations between plasma proteins and PCa were driven by linkage disequilibrium (LD) (Giambartolomei et al., 2014). For proteins with positive MR results, the Bayesian method assessed the support for the following five exclusive hypotheses: 1) no association with either trait; 2) association with trait 1 only; 3) association with trait 2 only; 4) both traits are associated, but distinct causal variants were for two traits; and 5) both traits are associated, and the same shares causal variant for both traits(Foley et al., 2021). The posterior probability is provided for each hypothesis (H0, H1, H2, H3, and H4). In this analysis, we set prior probabilities of the SNP being associated with trait 1 only (p1) at 1 × 10−4; the probability of the SNP being associated with trait 2 only (p2) at 1 × 10−4; and the probability of the SNP being associated with both traits (p12) at 1 × 10−5. Two signals were considered to have strong evidence of colocalization if the posterior probability for shared causal variants (PH4) was ≥0.8. The analysis was performed in R software (4.3.3).
Biological pathway analysis
The GENE2FUNC tool in FUMA(Watanabe, Taskesen, van Bochoven, & Posthuma, 2017) web application programming interface version 1.5.2 were used to investigate functional annotation and enrichment analysis of the genes coding for the 193 unique proteins identified by the PW-MR approach. This analysis involved calculating the log2 fold-change expression for each gene across 54 tissue types from the GTEx (Genotype-Tissue Expression) database. We conducted gene enrichment analysis to identify overrepresented biological processes using the Gene Ontology (GO) database, which helped determine the biological processes most relevant to the genes identified. To investigate the involvement of the identified genes in functional and signaling pathways, we used the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway database. Enrichment results were considered significant if they passed 5% FDR P-value threshold. Additionally, only enrichments involving at least two overlapping genes with the gene sets were considered to ensure robustness and biological relevance.
Druggability evaluation
The analysis of drug targets for the significant proteins identified in the MR Analysis was conducted using data from OpenTargets(Ochoa et al., 2023) (v.22.11), which is publicly accessible. We selected all drugs for which there was evidence for an association with the protein of interest.
Acknowledgements
The authors would like to thank all the participants and investigators of the studies (PRACTICAL, FinnGen, deCODE, UKB-PPP) for providing the invaluable data used in this research. Their commitment and effort have been essential for the success of this study. Furthermore, special thanks to our colleagues and collaborators for their insightful discussions and feedback throughout the research process. This research was funded by Beijing Municipal Natural Science Foundation (grant No. JQ24059, No. L234038), and National Natural Science Foundation of China (grant No. 82274015).