Abstract
With advances in cancer screening and treatment, there is a growing population of cancer survivors who may develop subsequent primary cancers. While hereditary cancer syndromes account for only a portion of multiple cancer cases, we sought to explore the role of common genetic variation in susceptibility to multiple primary tumors. We conducted a cross-ancestry genome-wide association study (GWAS) and transcriptome-wide association study (TWAS) of 10,983 individuals with multiple primary cancers, 84,475 individuals with single cancer, and 420,944 cancer-free controls from two large-scale studies.
Our GWAS identified six lead variants across five genomic regions that were significantly associated (P<5×10-8) with the risk of developing multiple primary tumors (overall and invasive) relative to cancer-free controls (at 3q26, 8q24, 10q24, 11q13.3, and 17p13). We also found one variant significantly associated with multiple cancers when comparing to single cancer cases (at 22q13.1). Multi-tissue TWAS detected associations with genes involved in telomere maintenance in two of these regions (ACTRT3 in 3q26 and SLK and STN1 in 10q24) and the development of multiple cancers. Additionally, the TWAS also identified several novel genes associated with multiple cancers, including two immune-related genes, IRF4 and TNFRSF6B. Telomere maintenance and immune dysregulation emerge as central, common pathways influencing susceptibility to multiple cancers. These findings underscore the importance of exploring shared mechanisms in carcinogenesis, offering insights for targeted prevention and intervention strategies.
Introduction
Advances in cancer screening, diagnosis, and treatment have led to earlier detection of cancer and increased survival in cancer patients1,2. Given the substantial worldwide prevalence of cancer and the rising rates of survival, there has been a considerable increase in the number of cancer survivors who face an elevated risk of developing a second primary cancer during their lifetime3–5. Understanding risk factors for multiple primary cancers can have practical implications for managing patients with multiple cancers and may help prioritize screening strategies among cancer survivors. For patients with a history of cancer and prior anticancer treatments, distinguishing between a newly developed metastasis from the initial cancer and a second malignancy can be challenging and is under-studied6. Recognizing these scenarios and conducting appropriate investigations is essential for shaping subsequent therapeutic strategies. Furthermore, the presence of multiple primary cancers can impact a patient’s eligibility for enrollment in clinical research protocols, as individuals with a prior cancer history or concurrent secondary malignancies are typically excluded from clinical trials. Therefore, it is crucial to identify and understand risk factors associated with multiple primary cancers to manage and treat patients effectively.
Increased susceptibility to multiple cancers could be attributed to a variety of factors, such as lifestyle factors, environmental exposures, and genetic factors3,4. Furthermore, treatments such as radiation therapy or chemotherapy, which are often administered for the initial cancer diagnosis, introduce their own set of potential risks associated with the development of subsequent primary cancers3,4. Additionally, hereditary cancer syndromes, accounting for 1% to 2% of all people with cancer diagnoses, are associated with an elevated risk of multiple cancers6. Some studies have shown that individuals with multiple cancers have a higher frequency of deleterious variants in known cancer-risk genes7–11, and others have explored the cross-cancer effects of common germline genetic variants and reported associations with multiple variants in many specific genomic regions, such as 5p15.33, 6p21-22, 8q24, and 9q3412–17. Despite these known syndromes and genetic variants, much of the variability in the occurrence of multiple cancers remains unknown.
In this study, we examine the contribution of common germline genetic variants to the susceptibility of multiple primary cancers from a genome-wide perspective. Using data from two large studies, the UK Biobank (UKB)18 and Kaiser Permanente Northern California Genetic Epidemiology Research in Adult Health and Aging (GERA)19, we conducted a genome-wide association study (GWAS) and a transcriptome-wide association study (TWAS) to better understand the genetic basis and underlying biology associated with the development of multiple primary cancers.
Materials and Methods
Study populations and phenotyping
The study design and analyses are illustrated in Figure 1. The UKB is a population-based cohort comprising approximately 500,000 individuals aged 40-69 years recruited between 2006 and 2010 from various regions across the United Kingdom18. GERA is a prospective cohort consisting of nearly 102,979 adults who are members of the Kaiser Permanente Northern California (KPNC) health plan who participated in the Research Program on Genes, Environment, and Health (RPGEH). GERA included RPGEH participants who had provided samples for genotyping, had filled out a detailed health and lifestyle survey, and had detailed electronic health records available19.
Schematic figure showing the study design and analytical pipeline of the study. Genome-wide association studies (GWAS) were performed in UK Biobank (UKB) and Kaiser Permanente Northern California Genetic Epidemiology Research in Adult Health and Aging (GERA) within each population grouping defined using a combination of self-reported race/ethnicity and clustering based on genetic ancestry principal components: European (EUR), African (AFR), South Asian (SAS), East Asian (EAS), and Latino (LAT). GWAS results from UKB and GERA were first combined within populations, using inverse-weighted fixed-effects meta-analysis, followed by a cross-ancestry meta-analysis. Next, using GTEx MASHR models from 49 tissues and cross-ancestry meta-analyzed GWAS summary results, we conducted multi-tissue transcriptome-wide association (TWAS) analyses. Primary analysis was performed for multiple cancer cases versus cancer-free controls (top left orange box). We conducted secondary analyses (top boxes with no background color): multiple cancer cases versus single cancer cases, multiple invasive cancer cases versus cancer-free controls, and multiple invasive cancer cases versus single invasive cancer cases. Sample size under the GWAS section corresponds to the number of multiple cancer cases by each population/ancestry group across UKB and GERA cohorts.
Cancer diagnoses and mortality data for UKB and GERA participants were obtained from national registries. Ascertainment of cancer diagnoses has been previously described12,20. Briefly, both studies included prevalent and/or incident diagnoses of invasive, borderline, and in situ primary tumors20. Non-melanoma skin cancer or metastatic cancer, as indicated by specific ICD codes, were not considered primary tumors. Most cancers were classified according to the SEER site recode paradigm21. We incorporated the morphological classifications outlined by the World Health Organization (WHO) for hematologic cancers, which categorized cancers into three major subtypes: lymphoid neoplasms, myeloid neoplasms, and NK- and T-cell neoplasms22.
Multiple cancer cases were defined as individuals with ICD-9 or ICD-10 codes indicating primary tumors at two or more distinct organ sites. Single cancer cases were defined as individuals with ICD-9 or ICD-10 codes indicating a primary tumor at only one organ site. Multiple and single invasive cancer cases were limited to participants with primary invasive cancers and excluded those with borderline and in-situ malignancies. In the UKB, controls were all individuals without a cancer diagnosis at the last follow-up. In the GERA cohort, cancer-free controls at the last follow-up were matched 1:1 to case individuals based on age at specimen collection, sex, genotyping array, and reagent kit.
All participants in the study provided informed consent. The UKB obtained ethical approval from the Research Ethics Committee, adhering to the UKB Ethics and Governance Framework. Approval for the original GERA study was granted by the institutional review board of Kaiser Permanente Northern California and the Human Research Protection Program (Committee on Human Research) at the University of California, San Francisco.
Genotyping and quality control
Genotyping and imputation processes for the UKB cohort have been previously described12,20. In brief, participants were genotyped on the UKB Affymetrix Axiom array (89%) or the UK BiLEVE array (11%)18. Imputation was performed using the Haplotype Reference Consortium (HRC), along with merged reference panels from UK10K and 1000 Genomes phase 318. Genetic ancestry principal components (PCs) were computed using fastPCA, based on a set of 407,219 unrelated samples and 147,604 genetic markers18. Population groupings for GWAS in UKB were determined based on self-reported race/ethnicity and genetic ancestry PCs. Individuals for whom either of the first two ancestry PCs fell outside of five standard deviations from the population mean within each self-reported ethnicity group were excluded from the analyses12. Using a robust method KING, we excluded samples with discordant self-reported and genetic sex and one individual from each pair of first-degree relatives12,23. Using a subset of genotyped autosomal variants with MAF□≥□1% and call rate□≥□97%, we excluded samples with heterozygosity >5□standard deviations from the mean18.
For the GERA cohort, genotyping was conducted using one of four Affymetrix Axiom arrays optimized for African, East Asian, European, and Latino racial/ethnic groups24,25, and has been previously described in detail24–26. Briefly, population groupings were determined based on self-reported race/ethnicity and PCs. Imputation was conducted using SHAPE-ITv2.56527 for pre-phasing genotypes, followed by IMPUTE2 v2.3.128 using the 1000 Genomes Project Phase I as a cosmopolitan reference panel28. Ancestry principal components (PCs) were estimated with Eigenstrat v4.2, as has been described29,30.
Analyses in both UKB and GERA were restricted to population groups with at least 50 multiple cancer individuals. Additional quality control (QC) measures were applied at variant level. Specifically, variant-level QC filters included imputation quality (INFO□<□0.30), minor allele frequency (MAF<0.01 in European population group and MAF<0.05 in African, East Asian, South Asian, and Latinx population groups).
Statistical Analysis
The study design and analytical strategy of our study is illustrated in Figure 1. The primary GWAS analysis compared participants with multiple cancer diagnoses to those without cancer in UKB and GERA within each population group defined using a combination of self-reported race/ethnicity and clustering based on genetic ancestry PCs. In the UKB analysis, all models were adjusted for age, sex, the first ten global PCs, and the genotyping array. In the GERA study, all models were adjusted for age, sex, and genotyping array, and 10 PCs within the European group or 6 PCs within each of the other population groups. GWAS results from UKB and GERA were first combined within each population group using inverse-variance-weighted fixed-effects meta-analysis using METAL, followed by cross-ancestry meta-analysis31. Cross-ancestry meta-analysis results were clumped around index variants with the lowest genome-wide significant (P<5×10−8) meta-analysis p-value to identify independent association signals. Clumping was performed and variants with linkage disequilibrium (LD) r2□>□0.01 within a ±10- Mb window of the lead variant were assigned to that lead variant.
Next, we performed a multi-tissue transcriptome-wide association study (TWAS) using the cross-ancestry meta-analyzed summary statistics and S-MultiXcan framework to estimate the association between imputed gene expression and multiple cancers32,33. We used the multivariate adaptive shrinkage prediction models from 49 tissues in GTEx version 834. An advantage of this approach is that it leverages the correlation in cis-eQTL effects across tissues. Genes with P<2.3×10-6 (0.05/22,244 genes Bonferroni correction) were considered statistically significant.
In addition to the primary analyses comparing cases with multiple cancers versus cancer-free controls, secondary GWAS and TWAS analyses were performed comparing: 1) a more restrictive case definition that restricted to multiple invasive neoplasms (excluding in-situ tumors) versus cancer-free controls; 2) cases with multiple cancers versus cases with a single cancer (case-case); and 3) cases with multiple invasive cancers versus cases with a single invasive cancer (case-case). Analyses were conducted using Plink2 and R v4.2.2 (R Foundation for Statistical Computing).
Results
Characteristics of the study populations are shown in Table 1. In total, the studies included 10,983 individuals with multiple primary cancers (of which 8,673 had invasive cancers), 84,475 individuals with single cancers (72,686 with single invasive cancers), and 420,944 cancer-free controls. Among all multiple cancer cases, there were a total of 387 distinct cancer pairs. Of these, 47 cancer pairs had at least 50 individuals with the pair of cancers (Figure 2). The top pairs represented were cervix and breast (N=444), followed by prostate and melanoma (N=303), and bladder and prostate (N=302). Stratified by the study cohort, the top pairs in UKB were cervix and breast (N=387), bladder and prostate (N=249), and breast and colorectal (N=218) (Supplementary Figure 1). The top pairs in GERA were prostate and melanoma, with prostate cancer being the first cancer (N=177), melanoma and prostate where melanoma was the first cancer (N=125), and prostate and bladder (N=118) (Supplementary Figure 1).
Circos plot showing the pairs of cancer diagnoses with at least 50 individuals in the UK Biobank and GERA studies combined. Each connection reflects the number of people with both linked primary cancers, where the color of the line shows the first cancer site diagnosed.
Genetic variants associated with multiple cancers
We conducted a cross-ancestry genome-wide association study (GWAS) and meta-analysis of 10,983 individuals with multiple primary tumors and 420,944 cancer-free controls. We refer to this as our primary analysis. Additionally, we conducted a sensitivity analysis restricting to 8,673 individuals with multiple invasive tumors only, excluding in-situ diagnoses. To isolate genetic susceptibility specifically for multiple primary cancers, we carried out additional analyses comparing individuals with multiple cancers to those with any single primary cancer (N=84,475 cancer controls). Parallel analyses were conducted restricting the comparison group to individuals with single primary invasive cancers (N=72,686 single invasive cancer controls) (Figure 1).
Our primary GWAS identified four independent genome-wide significant variants across four genomic regions associated with the risk of developing multiple cancers (Table 2 and Figure 3). Of the four lead genomic regions in Table 2, three were genome-wide significant in European ancestry individuals: rs2293607, rs201581170, and rs612611, one was genome-wide significant in African ancestry individuals: rs72725854. None were significant in the other population groups (Table 2). In the analyses restricted to 8,673 participants with multiple invasive cancers and 420,944 cancer-free controls, we identified three SNPs (3 European) across three loci (rs283732, rs9419958, andrs35850753) associated with multiple invasive cancers at P<5×10-8 (Supplementary Table 2, Supplementary Table 3 and Supplementary Figure 2). To further characterize genetic susceptibility to multiple primary cancers and distinguish these signals from general cancer risk loci, we conducted case-case comparisons. None of the variants identified in our case-control analyses reached genome-wide significance when comparing multiple cancers to single-cancer controls. However, we found one genome-wide significant genetic variant in the 22q13.1 region (rs192493667, OR=1.39, P=3.6×10-8) in our multiple primary cancers and single cancer control analysis. The association for this variant was driven by the European population group. Parallel case-case GWAS that was restricted to invasive tumors did not identify any genome-wide significant associations.
Manhattan plot (A) highlighting the lead chromosomal regions (P<5×10-8) from the genome-wide association study of multiple cancers (10,983 multiple cancers, 420,944 cancer-free controls). The solid black line signifies the genome-wide significance threshold (P<5×10-8), whereas the dashed black represents a suggestive significance threshold of P<1×10-6. (B) Locus zoom plots map the genomic location of significant SNPs on chr 3 and 10 to the ACTRT3 and OBFC1 loci, respectively, with the lead SNP annotated.
The lead variant in the 3q26.2 region, rs2293607 (TERC), was associated with an increased risk of multiple cancers (OR=1.11, P=5.8×10-10), and the magnitude of the association was comparable when restricting to invasive tumors (OR=1.10, P=1.5×10-7), but decreased considerably when cancer-free controls were replaced with single cancer controls (OR=1.06, P=0.001). Next, we identified two susceptibility signals in the 10q24.33 region. The first lead variant, rs201581170, was associated with an increased risk of multiple cancers (OR=1.16, P=3.1×10-12; invasive: OR=1.17, P=2.4×10-10) and remained directionally consistent in case-case analyses (OR=1.10, P=4.2×10-5; invasive: OR=1.10, P=3×10-4) (Supplementary Table 1). The second risk variant, rs9419958 (STN1), was detected when comparing multiple invasive cancers to cancer-free controls (OR=1.16, P=1.7×10-11). This SNP is in high LD with rs201581170 (r2=0.83) and also had an association with multiple cancers (OR=1.14, P=7.2×10- 12). However, its effect on risk was weakened when cancer-free controls were replaced with individuals diagnosed with a single invasive cancer (OR=1.09, P=1.4×10-4) (Supplementary Table 3).
We found two SNPs in the 8q24.21 region. The first, rs72725854 showed a significant association with a decreased risk of multiple cancers (rs72725854: OR=0.28, P=4.7×10-8). The association remained consistent in analyses restricted to multiple invasive cancers (OR=0.26, P=1.3×10-7) and when compared to single cancer controls (OR=0.33, P=6.4×10-5). Another SNP, rs283732, demonstrated a significant association with a decreased risk of multiple invasive cancers (OR=0.90, P=1.1×10-8). This SNP is not in LD with rs72725854 (r2=0.003). We also found an inverse association between rs612611 (11q13.3) and the risk of multiple cancers (OR=0.90, P=2.3×10-8). The association for both rs283732 and rs612611 was stable but not significant across all analyses (Supplementary Table 1 and Supplementary Table 3).
Lastly, we found a risk signal for multiple invasive cancers associated with rs35850753, an intronic variant in TP53 (OR=1.35, P=1.2×10-8), which exhibited a slightly weaker association when in-situ tumors were also included (OR=1.27, P=4.7×10-7). The magnitude of effect for rs35850753 was slightly attenuated in case-case analyses (OR=1.16, P=0.003; invasive: OR=1.24, P=8.1×10-5). Case-case analysis with single cancer as controls reported a genome-wide significant association between rs192493667 in the 22q13.1 region and multiple cancers (OR=1.39, P=3.6×10-8). This increased risk between rs192493667, and multiple cancers was consistent across all analyses.
TWAS identifies genes involved in telomere maintenance and immune dysregulation
We undertook a multi-tissue TWAS to assess the associations between genetically predicted gene expression and the risk of developing multiple cancers, following the same case-control and case-case comparison framework as the GWAS. Of the 22,244 genes tested, statistically significant (Bonferroni correction: P<2.3×10-6) associations were detected for 9 genes across 8 genomic loci (Table 3 and Figure 4). The TWAS recapitulated some of the loci identified in the GWAS, such as ACTRT3 in 3q26 and SLK and STN1 in 10q24 (Figure 1b and Figure 4a). Genetically predicted expression of LMAN2L (2q11, P=5.3×10-7), ACTRT3 (3q26, P=1.8×10-8), IRF4 (6p25, P=1.8×10-6), FAM208B (10p15, P=1.9×10-7), SLK (10q24, P=4.6×10-8), STN1 (10q24, P=1×10-6), SPIRE2 (16q24, P=6.5×10-7), and TNFRSF6B (20q13, P=6×10-10) were associated with risk of developing multiple cancers, and ARFGAP3 (22q13, P=1.9×10-6) was associated with multiple invasive cancers. Supplementary Table 4 reports the associations of these genes with multiple cancers in each tissue where the models were available. The case-case TWAS found four genes associated with multiple cancers: PSMA2, PLOD1, CDH5, and PLEKHM1 (Supplementary Table 5 and Supplementary Figure 3).
Multi-tissue transcriptome-wide association study (TWAS) results for (A) multiple cancers and cancer-free controls and (B) multiple invasive cancers and cancer-free controls. The top panel of the Miami plots depicts the associations for genes with mean Z scores > 0, and the bottom panel shows genes with mean Z scores ≤ 0. The threshold for statistical significance was determined based on the Bonferroni correction for 22,244 genes tested (P< 2.3×10-6, dashed black line), while the suggestive significance threshold was set at P<1×10-4 (dashed blue line).
Statistically significant genes (P<0.05/22,244), highlighted in bold) identified from the multi-tissue transcriptome-wide association study (TWAS) comparing multiple cancer cases to cancer-free controls, and multiple invasive cancer cases to cancer-free controls. The table also presents the associations of these significant genes in additional related analyses
The signal in the 3q26 region involved a cluster of genes: ACTRT3/APRM1, MYNN, LRRC34, and LRRIQ4, but only ACTRT3 was associated at a Bonferroni P<2.3×10-6. Elevated expression of ACTRT3 was associated with the development of multiple cancers across 15 tissues, with three tissues (cerebellar hemisphere, cerebellum, tibial nerve) showing an inverse association. However, ACTRT3 expression was not significantly associated with risk of multiple cancers (overall and invasive) in either of the case-case analyses.
For IRF4, predicted expression models were found in 33 different tissues, and they showed significant associations with increased cancer susceptibility in 20 of those tissues. Interestingly, all 20 tissues exhibited a positive association, except for sun-exposed lower leg skin. The expression of IRF4 showed a similar trend toward the risk of developing multiple cancers (overall and invasive) across all secondary analyses; however, none were significant after accounting for multiple testing correction. TNFRSF6B expression showed a significant inverse association with susceptibility to multiple cancers across five tissues, including EBV-transformed lymphocyte cells and prostate. In other tissues, the association of increased TNFRSF6B expression with multiple cancers was heterogeneous and not significant. In our multiple invasive cancer and cancer-free controls TWAS analysis, we identified a significant association between increased expression of TNFRSF6B and multiple invasive cancers (P=1.6×10-9). None of the associations were significant when restricted to case-case analyses.
Two genes in the 10q24 region showed association with susceptibility to multiple cancers. We identified an inverse association between elevated expression of SLK and multiple cancers across 31 tissues out of 44 tissues with a model available for this gene. The remaining 13 tissues, including ovary, prostate, and whole blood, showed a positive association between SLK expression and multiple cancers. Results for STN1 were mostly consistent across the tissues with decreased expression of STN1 associated with susceptibility to multiple cancers except in arterial tissues, neurological tissues, prostate, ovary and kidney cortex. This inverse association between increased STN1 expression and multiple cancers was significant in cerebellar hemispheres, esophagus muscularis, and thyroid. The association with the expression of SLK (P=2.2×10-7) and STN1 (P=1×10-6) was also significantly associated with multiple invasive cancers.. However, the associations between multiple cancers (overall and invasive) and expression of SLK and STN1 were not significant in either of the case-case analyses.
Discussion
Advances in cancer screening, prevention, and treatment have improved survival rates, leading to an increasing number of cancer survivors who may face an elevated risk of developing a second primary cancer3,6. Understanding the potential causes and risk factors of second primary cancers is thus crucial to managing and treating patients effectively. In this study, we sought to investigate the role of germline genetics on the risk of multiple primary cancers through GWAS and TWAS. Our GWAS findings revealed associations of common germline variants in several genomic regions, including 3q26, 8q24, 10q24, and 17p13, with multiple cancers. Previous work has shown that these genomic regions and genetic variants are well-established as cancer susceptibility loci,35–42 and that some of the variants have previously reported to have pleiotropic effects12–17,20.
Our multi-tissue TWAS recapitulated some of these regions: ACTRT3 (3q26), and SLK, STN1 (10q24). We found an immune-related gene, TNFRSF6B, associated with the risk of developing multiple cancers. Additionally, IRF4 has previously been implicated in single individual cancers43–47. However, our findings suggest that IRF4 may have pleiotropic effects, influencing the susceptibility to multiple cancers. Overall, the findings of this study reiterate the complex role of telomere maintenance and immunological pathways in the development of multiple primary tumors. Understanding these genetic contributions could improve risk stratification and inform screening and prevention strategies for individuals at risk of developing second primary cancers.
The GWAS findings of associated SNP rs2293607 in 3q26.2 alongside TWAS associations suggest that this region may influence cancer risk through transcriptional regulation, which may lead to pleiotropic effects across different cancer types. Interestingly, rs2293607, has been previously implicated in diverse phenotypes, including longer leukocyte telomere length48,49 and cancer pleiotropy12,13,50. Further exploration of this region revealed that SNPs in strong LD with rs2293607 (r² ≥ 0.90) also demonstrated associations with longer telomere length49,51–54 and were implicated in susceptibility to multiple cancers19,54,55, including colorectal cancer42,53,56, bladder cancer57–59, multiple myeloma60,61, lung adenocarcinoma62, and thyroid cancer36,63. Associated variants within the 3q26 region map to the TERC gene cluster, which includes ACTRT3/APRM1, MYNN, LRRC34, and LRRIQ464. The genetic variants within the TERC gene cluster have been previously implicated in promoting longer telomere length through telomerase over-expression49,51–54,65–67. Genetic predisposition to telomere maintenance is believed to promote cancer development by fostering continuous cell proliferation and the accumulation of mutations. In addition to increasing risk of specific cancers, such as lung68,69, melanoma68,70, and glioma71, our results indicate that genetic mechanisms resulting in longer telomere length may also influence predisposition to multiple primary tumors. Our TWAS detected a Bonferroni-significant positive association between the genetically predicted expression of ACTRT3 within the TERC gene cluster and susceptibility to multiple cancers. While the associations of MYNN, LRRC34, and LRRIQ4 did not meet our significance threshold, we did observe a positive association between LRRC34 and the diagnosis of multiple cancers, along with an inverse association between LRRIQ4 and MYNN and multiple cancer diagnoses. These associations between each gene and multiple cancer diagnoses across tissues, which is in sync with the fact that telomere length varies across tissue types72. Understanding these genetic underpinnings could contribute valuable insights into the molecular pathways involved in cancer development, offering new avenues for targeted interventions and therapeutic strategies.
The 10q24.33 region has shown a notable effect in the context of cancer susceptibility12,20, with multiple SNPs within the locus being associated with an increased risk of both multiple cancers and multiple invasive cancers. The SNP rs201581170 demonstrated a consistent direction of effect (although not reaching genome-wide significance in case-case analyses), suggesting a broader role of this SNP in the development of multiple cancer types, indicative of pleiotropic behavior across malignancies. Furthermore, rs9419958, which is in high LD with rs201581170, was associated with multiple invasive cancers. We believe that these two SNPs are capturing the same signal in relation to multiple cancer susceptibility. These findings are further supported by the TWAS analyses, with two genes, SLK and STN1, showing significant associations with multiple cancers across several tissues. STN1 gene also known as OBFC1 gene encodes a component of the CST (CTC1-STN1-TEN1) complex73. 48. This complex is essential for the maintenance of telomere ends, ensuring genomic stability, which is a critical factor in preventing replicative senescence and the onset of malignancies73. The association between telomere length and cancer susceptibility, mediated by variants in OBFC1 is well known40,49, 65, 70, 71. While telomere length may confer risk for cancer in some tissues, the impact may vary across different tissue types due to variations in cellular turnover rates, telomerase activity, and other tissue-specific factors7274., We also found a SNP rs35850753 associated with the increased risk of multiple invasive cancers. This intronic SNP is located in the 5’-UTR of TP53, which is known to increase the risk of many cancers, including breast, lung, leukemia, and neuroblastoma75–78.
Our TWAS also identified a positive association between IRF4 expression and multiple primary cancers. This association was consistent across most tissues. This gene has been previously associated with skin cancer43,79,80, lung cancer47, and hematological malignancies44,80,81. IRF4 is a member of the interferon regulatory factor (IRF) family of transcription factors, predominantly expressed in immune cells, where it transduces signals from various receptors to activate or repress gene expression82,83. IRF4 plays a crucial role in regulating various stages of lymphoid, myeloid, and dendritic cell differentiation84. Given the ubiquitous involvement of inflammation in carcinogenesis, IRF4 is an important cancer susceptibility gene because of its role as a regulator of immune response. Experimental studies of non-small cell lung cancer cell lines have demonstrated that overexpression of IRF4 exhibits tumor promoter activity, partially attributable to the activation of the Notch-Akt signaling pathway45. Furthermore, the implications of IRF4 extend beyond experimental settings, as its expression has been identified as a prognostic factor in various malignancies, including hematological malignancies46, lung cancer85, and breast cancer86.
Our cross-tissue TWAS analysis showed an inverse association between TNFRSF6B in 20q13.3 and the risk of multiple cancers. This association was heterogeneous across tissues with most tissues showing an inverse association. However, tissues from colon, breast, spleen, uterus, liver, and ovary showed a positive association. This correlates with previously published studies where elevated expression of the gene has been implicated in diverse tumors, such as colorectal87–90, pancreatic91–94, ovarian95–97, and liver cancers87,98,99. Overexpression of TNFRSF6B promotes the immune evasion of tumor cells and inhibits apoptosis100. Acting as a decoy, TNFRSF6B competitively binds with FasL ligands, expressed on the T cells, impeding their ability to induce apoptosis and eliminate tumor cells100. This contributes to the development of tumors. Additionally, TNFRSF6B influences the differentiation of immune cells such as T lymphocytes, macrophages, and dendritic cells, thereby causing immune dysregulation and promoting tumor angiogenesis100–102. Thus, the dysregulation of the immune system via TNFRSF6B across various tissues emphasizes its potential involvement in tumorigenesis and indicates its possible role in the development of multiple cancers.
We also observed a significant association between the expression of the PSMA2 gene in the 7p14 region and the risk of developing multiple cancers in our case-case analyses. Although PSMA2 was not significantly associated with multiple cancers when comparing to cancer-free controls, the finding from the case-case analysis suggests that this gene may have a role in cancer development. PSMA2 is an essential component of the 20S proteasome and impacts the proteasomal activity, which is crucial to tumor growth and immune evasion103–107. The role of elevated PSMA2 expression has been implicated in cancer progression by enhancing cellular proliferation, migration, and invasion across several tumor types, including colorectal, breast, and lung103,105,106.
This is the first study to investigate the role of germline genetics on the susceptibility to multiple cancers from a genome-wide perspective. The study has several strengths that contribute to the robustness of our findings. First, the incorporation of two distinct comparison groups, namely cancer-free and single-cancer individuals, allows us to discern signals specific to pleiotropic effects versus signals for the independent development of multiple cancers. Another strength of this study is that UKB and GERA are both linked to cancer registries and, therefore, ensure high-quality ascertainment of incident cancer diagnoses. Further, the inclusion of participants from diverse ancestry groups in the two large cohorts bolsters the generalizability of our results. However, our study also has certain limitations. First, combining multiple primary cancers into a single phenotype does not consider the timing and sequence of specific cancer diagnoses and, therefore poses a challenge for inferring the mechanisms underlying the observed associations. We were also unable to account for screening behaviors (which may influence cancer prevention, detection, and stage at diagnosis) and for the type of cancer treatment administered after initial cancer diagnosis, which may be an independent risk factor for developing a second primary tumor. Finally, our power to detect loci with small effect sizes in non-European populations may have been limited due to small sample sizes in these groups. Future research should prioritize larger studies in diverse populations, ensuring a more comprehensive understanding of the genetic underpinnings of multiple cancers.
Overall, our study investigates the genetic etiology of multiple primary cancers, unveiling the presence of pleiotropic signals that transcend traditional single-cancer associations. The convergence of genetic variants within known cancer pleiotropic regions suggests a shared molecular basis underlying diverse cancer types, including telomere maintenance. Additionally, we have identified two immune-related genes, IRF4, and TNFRSF6B, associated with the diagnosis of multiple primary cancers, highlighting the potential contribution of immune dysregulation to this genetic landscape, and emphasizing common pathways influencing the susceptibility to multiple cancers. These findings emphasize the need for research into the molecular mechanisms underlying development of multiple cancers for informing potential interventions and targeted therapies.
Data Availability
The UK Biobank is an open-access data resource, and the data is available from the UK Biobank access portal at https://www.ukbiobank.ac.uk. This research was conducted with approved access to UK Biobank data under application number 14105. The Kaiser Permanente Research Bank data are available via dbGaP (phs002809.v1.p1). Results generated from this work are included in the published article or Supplementary Materials. All additional data corresponding to the findings of this study are available within the article and its supplementary information files and from the corresponding author upon reasonable request.
Funding
This work was supported by the National Institutes of Health (grant numbers K07CA188142, K24CA169004, R01CA088164, R01CA201358, R25CA112355, RC2AG036607, and U01CA127298). Support for participant enrollment, survey administration, and biospecimen collection for the Research Program on Genes, Environment, and Health was provided by the Robert Wood Johnson Foundation, the Wayne and Gladys Valley Foundation, the Ellison Medical Foundation, and Kaiser Permanente National and regional community benefit programs. Data from the UK Biobank resource was obtained under application number 14105. LK is supported by funding from the NCI (R00CA246076) and REG is supported by a Young Investigator Award from the Prostate Cancer Foundation.
Data Availability
The UK Biobank is an open-access data resource, and the data is available from the UK Biobank access portal at https://www.ukbiobank.ac.uk. This research was conducted with approved access to UK Biobank data under application number 14105. The Kaiser Permanente Research Bank data are available via dbGaP (phs002809.v1.p1). Results generated from this work are included in the published article or Supplementary Materials. All additional data corresponding to the findings of this study are available within the article and its supplementary information files and from the corresponding author upon reasonable request.
Declaration of Interests
L.C.S. has received research grant funding from AstraZeneca, awarded directly to her institution that is unrelated to this work. J.S.W. is a non-employee co-founder of Avail Bio, and has served as an expert witness for legal matters unrelated to this work. No disclosures were reported for other authors.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.
- 9.
- 10.
- 11.↵
- 12.↵
- 13.↵
- 14.
- 15.
- 16.
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.
- 77.
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.
- 89.
- 90.↵
- 91.↵
- 92.
- 93.
- 94.↵
- 95.↵
- 96.
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.
- 102.↵
- 103.↵
- 104.
- 105.↵
- 106.↵
- 107.↵