Abstract
Robust prognostic and predictive factors for hepatocellular carcinoma, a leading cause of cancer-related deaths worldwide, have not yet been identified. Previous studies have identified potential HCC determinants such as genetic mutations, epigenetic alterations, and pathway dysregulation. However the clinical significance of these molecular alterations remains elusive. MicroRNAs are major regulators of protein expression. MiRNA functions are frequently altered in cancer. In this study, we aimed to explore the prognostic value of differentially expressed miRNAs in HCC and elucidate their associated pathways. To this aim, bioinformatics techniques and clinical dataset analyses were employed to identify differentially expressed miRNAs in HCC compared to normal hepatic tissue. We validated known associations and identified novel miRNAs with potential prognostic significance and proposed new targeting pathways based on our comprehensive analysis.
1. Introduction
Primary liver cancer is the second most common cause of cancer mortality in the world (1). Hepatocellular carcinoma (HCC) is the most common histology identified in primary liver cancers. HCC prognosis is primarily determined by stage, liver function, patient conditions and, indirectly, by HCC aetiology (2). The five-year survival ranges from more than 50% in operable diseases to less than 5% in the most advanced stages (3). This is due to the lack of effective treatments, especially in the setting of high tumour burden, where systemic treatments can lead to a median survival of less than two years. Despite extensive molecular studies which have identified potential determinants among genetic mutations, epigenetic alterations and pathways dysregulation (4), HCC pathogenesis is not entirely elucidated. A better understanding of the molecular processes driving HCC could lead to more effective treatments.
MicroRNAs (miRNAs) are short non-coding RNAs that regulate mRNA translation and stability (5). The human genome encodes approximately 2000 miRNAs, but each of these transcripts can regulate the expression of hundreds of mRNAs. It has been estimated that miRNAs regulate the expression of approximately 60% of the protein-coding genes. The expression of miRNAs is deregulated in many cancers (6), making them attractive biomarkers and therapeutic targets. The role of specific miRNAs in HCC has been studied (7), but there are only limited experiences on comprehensive studies assessing the prognostic function of miRNAs in HCC and the downstream pathways in this malignancy (8), (9), leading to the identification of non-univocal signatures.
In this study, we used a combination of bio-informatic techniques and clinical dataset analyses to identify miRNAs that are differentially expressed in HCC versus normal hepatic tissue and whose expression is associated with HCC prognosis. This approach allowed us to confirm known associations, identify several uncharacterised miRNAs and propose new potential targeting pathways based on our analysis.
2. Material and Methods
2.1 Differential expression analysis
A data-led approach was chosen over a literature-based candidate gene-identifying approach for the sake of analysis comprehensiveness. OncoMir Cancer Database (OMCD) (10) was used to access data on miRNA expression from Liver hepatocellular carcinoma (referred to as LIHC) dataset. LIHC dataset comprises The Cancer Genome Atlas (TGCA) data from 426 samples, of which 375 HCC patients and 51 non-HCC liver tissue. Differential expression between HCC and non-HCC liver tissue was tested via t-test on OMCD portal (Bonferroni-adjusted p values of < 0.05) and miRNAs whose expression in HCC was at least 2 folds different (ratios > 2 or < 0.5) from non-HCC were retained for further analyses.
2.2 Survival analysis
The shortlisted differentially expressed miRNAs were cross-referenced with miRNAs RNA-seq TGCA dataset provided by Kaplan-Meier Plotter (KMP) tool (11). Only RNA-seq data from TGCA samples of patients who had not undergone any other treatment (either radio- or chemotherapy or systemic drugs) than surgery were comprised in the survival analysis. Patients were dichotomized according to each differentially expressed miRNA median values, into high- and low-expression cohorts. Overall survival (OS) was chosen as the prognostic end-point of interest and compared among high- and low-expression patients through log-rank test p-value. Each comparison is presented as median OS (mOS), Hazard Ratio (HR) with 95% Interval of Confidence (95%IC) and p-value.
2.3 Construction of a prognostic miRNA signature
MiRNAs identified in 2.2 bearing a significant prognostic value were used to generate a prognostic signature through miRpower tool in the KMP. “Multiple genes” and “mean gene expression” options were set; patients were dichotomized according to the median value of the signature and the OS of the two cohorts was compared via log-rank test.
2.4 Pathway analysis
Pathway analysis was conducted to explore the mechanisms underpinning the prognostic effects of these differentially expressed miRNAs. This involved the identification of target genes for up- and down- regulated miRNAs, and KEGG pathway analysis using the Over Representation Analysis (ORA) method.
2.4.1 Selection of mature miRNAs
The TCGA miRNA expression data used by the OMCD and KM Plotter databases do not differentiate between mature 3p and 5p miRNAs (12). Both miRNA forms are capable of binding to target mRNAs through a sequence-specific base pairing mechanism and regulate gene expression by interacting with distinct sets of target genes. However, only one mature strand is typically incorporated into the RNA- induced silencing complex (RISC) and involved in regulating mRNA transcription. Thus, for each miRNA, it was necessary to determine whether the 3p or 5p form should be used for the analysis of downstream target genes and pathways. The predominant mature form of each miRNA was selected based on the number of experimental reads in the miRBase database (13) (Table S1, supplementary). The hsa-miR-3653 identifier was subsequently excluded as the available experimental data no longer supports its annotation as a miRNA (13).
2.4.2 Target gene identification
The first step in conducting the pathway analysis was to choose a database that provides miRNA-target interaction data. MirTarBase is a manually curated database of experimentally validated miRNA-target interactions (MTI) collated from reporter assay, western blot and next generation sequencing (NGS) studies (14). The database includes more than 360,000 MTIs and represents the largest and most up to date source of experimentally validated MTI data. MirTarBase was initially selected due its size and rigorous inclusion criteria. An alternative MTI database, miRPathDB was found to include the latest version mirTarBase data and was selected instead. The prognostic mature miRNA identifiers were queried in the mirRPathDB database, retrieving a comprehensive list of target genes. This list was cross validated using the miRTargetLink 2.0 database which provided a second mirror of the most recent mirTarBase data. A Python script was written to remove duplicates and genes with no experimental evidence of miRNA- interaction. Two genes could not be mapped to current Ensembl/Entrez IDs and were excluded from further analysis. The remaining target genes were relevant and aligned with the objective of the analysis.
2.4.3 Functional annotation and enrichment analysis
Functional annotation and enrichment analysis was conducted using the Database for Annotation, Visualization and Integrated Discovery (DAVID) tool. Alternative online functional enrichment tools WebGestalt and g:Profiler were evaluated, but DAVID was selected because it is highly cited, regularly updated, and well documented (15). The target gene lists for up-regulated miRNAs and down-regulated miRNAs were uploaded to DAVID which identified them as being for the Homo Sapiens species. Functional annotation enrichment was performed for targets of upregulated miRNAs and downregulated miRNAs separately, selecting only the KEGG pathway database option. Statistical analysis tested the hypothesis that the overlap between genes in the target gene list and a given pathway or functional category was greater than expected by chance. The statistical test used was Fisher’s exact test and Benjamini-Hochberg correction was applied to p-values to correct for multiple tests. Text-based result files for each gene list were downloaded from DAVID. A Python script was written to filter pathways with a Benjamini-Hochberg significance threshold of ≤ 0.05.
2.4.4 Analysis of enriched KEGG pathways
To focus on the most relevant pathways, KEGG analysis results were filtered to include only pathways that are putatively involved in HCC. Relevant pathways were defined as those listed as “related” in the KEGG HCC Pathway (hsa05225) (16). A Python script was used to filter the KEGG results according to a predefined set of keywords derived from these maps. The keyword list used to filter results was: Hepatocellular, Hepatitis, Alcohol, NAFLD, PI3K, AKT, P53, WNT, MAPK, Cell cycle, calcium signaling, TGF-beta.
2.4.5 Validation of the functional enrichment analysis
A validation of the DAVID functional enrichment analysis was performed using the WebGeSTALT tool (17). While the results from both analyses were largely concordant, WebGeSTALT identified a greater number of enriched pathways, suggesting that the algorithm employed by DAVID was more conservative.
2.4.6 Identifying direct miRNA interactions with cancer driver genes
A previously reported pan-cancer analysis of more than 9000 tumor exomes from 33 TGCA projects identified 299 oncogenic driver genes and 3473 associated driver missense mutations (18). Subsequent functional validation utilizing an independent dataset of experimentally validated mutations indicated that 60-85% of the predicted mutations were likely to be cancer drivers. A list of 299 HCC and pan-cancer associated genes was retrieved from TGCA and a Python script was used to return intersection between the prognostic miRNA target genes and the 299 putative cancer driver genes.
3.0 Results
3.1 Some differentially expressed miRNAs are prognostic in resected HCC
Using OMCD, levels of expression of miRNAs in HCC versus healthy liver tissue were compared. Out of the 106 significantly differentially expressed miRNAs, 59 had a significant, higher than 2-fold differential expression (HCC/healthy liver either >2 or <0.5), compared to normal liver: respectively, 23 were increased and 36 decreased in HCC (Table S2, supplementary). Forty of the differentially expressed miRNAs were also present in the KMP RNA-seq dataset which we have employed for prognostic stratification (21 of which up-regulated and 19 down-regulated in HCC versus healthy liver). Of the 40 shortlisted miRNAs, 10 showed a statistically significant correlation with OS (p for log-rank test <0.05); in particular, 5 among the HCC up-regulated (hsa-miR-501, hsa-miR-877, hsa-miR-1180, hsa-miR-3127, hsa-miR-3677) and 5 among the HCC down-regulated miRNAs (hsa-miR-3653, hsa-miR-99a, hsa-miR-145, hsa-miR-326, hsa-miR-139) (Table 1).
In keeping with our hypothesis, we found that for the majority of miRNAs, expression in HCC versus normal tissue and prognostic significance were highly correlated. All the miRNAs up-regulated in HCC were associated with negative prognosis, 4 out of 5 miRNAs that are down-regulated in HCC were associated with better prognosis. Hsa-miR-326 was the only transcript decreased in HCC showing an unexpected negative prognostic role (HR:1.97(1.37–2.83), p 0.0002).
To increase the robustness of our analysis, a 9-miRNA signature was generated from prognostically significant miRNAs (removing Hsa-miR-326). The signature yielded an HR for OS of 2.15 (1.52 – 3.05), with log-rank p of 0.00005 (Figure 1A), consistent across different clinical sub-groups, as shown in Figure 1B-F.
Taken together, these results suggest that the miRNAs up-regulated in HCC and associated with worse prognosis may be oncogenic, whilst the ones that are down-regulated in HCC and associated with better prognosis may act as tumour suppressors.
3.2 Functional pathways of miRNA-targeted genes
To understand which molecular pathways are modulated by differential miRNA expression in HCC, we used a dataset of validated miRNA/mRNA target pairs. A total of 982 unique target protein-coding genes for the 10 prognostic miRNAs were retrieved from MiRPathDB. Among these, 430 genes were targeted by up-regulated miRNAs, while 585 genes were targeted by down-regulated ones. We then run pathway analyses on the two separate lists of protein coding genes.
Target genes of up-regulated miRNAs were significantly enriched in 7 KEGG pathways (p<0.05), relating mainly to neurodegenerative diseases. After filtering for HCC-specific pathways, cell cycle was the only pathway associated with targets of up-regulated miRNAs (Table 2).
Targets of down-regulated miRNAs were significantly enriched in 93 KEGG pathways relating to numerous cancers, cancer related processes and associated signalling pathways. Ten HCC-specific pathways were significantly enriched (p<0.05), including HCC, Hepatitis B, MAPK signalling pathway, PI3K-Akt signalling pathway, Cell cycle, Hepatitis C, TGF-beta signalling and p53 signalling pathway (Table 3).
A graphical representation of the results of the above mentioned enrichment analyses is presented in Figure 2.
A Comprehensive overview of KEGG HCC-specific pathways can be found in Figure S1.
The process of intersecting the miRNA target genes with the HCC and pan cancer associated genes revealed that the list of 982 miRNA target genes contained 10 HCC driver genes and 11 pan-cancer driver genes (Table 4).
4.0 Discussion
In this study, we have identified a 9-miRNA prognostic signature for HCC. If confirmed in clinical datasets, this signature could enable patient stratification and treatment optimisation, especially for low- and intermediate-risk diseases. At these stages, several therapeutic options are available, but currently these treatments cannot be tailored based on individual molecular profiles. Interestingly, some miRNAs are well detectable in biological fluids, such as plasma (19). If the 9-miRNA signature proves to be detectable in plasma samples in the HCC population, it can be used as a minimally invasive biomarker for patient stratification and dynamic monitoring.
Several experimental studies have dissected the pathogenic role of specific miRNAs in HCC. Our results are in line with mechanistical evidence. Of the five up-regulated/negatively prognostic miRNAs, four have been shown to act as oncogenes in HCC cells: mir-501 by promoting epithelial to mesenchymal transition (20); mir-3127 and mir-1180 by promoting proliferation and tumorigenesis (21), (22); mir-3677 by promoting proliferation and drug resistance (23). Notably, our pathway analysis has identified cell cycle regulation as the main molecular pathway controlled by these miRNAs, further confirming the link between bioinformatic and experimental findings.
Our results on mir-877 are apparently in contrast with previous reports, showing that this transcript inhibits HCC proliferation (24). We found that this miRNA is up-regulated in HCC versus normal tissue, and that higher expression of this miRNA predicts poor prognosis. This implies an oncogenic function. Notably, another study showed that a genetic variant of this miRNA predicts HCC risk (25). It is not known if this genetic variant has a prognostic role in advanced disease settings. It is therefore conceivable that this miRNA could play different roles at different stages of disease progression (e.g. onco-suppressive for early stage, oncogenic for advanced stage), as demonstrated for other cancer-related genes (26). Further studies are therefore warranted to dissect the role of this miRNA at different stages of HCC progression, and to confirm its prognostic role in other clinical datasets.
In parallel with the results on putative oncogenic miRNAs, our results on putative tumour suppressors agree with pre-clinical results. Mir-3653, mir-326 and mir-139 have been shown to inhibit metastasis and progression of HCC cells (27), (28), (29). Similarly, mir-145 has been implicated in G2/M phase arrest by inhibiting cyclin B1 (30). Finally, a recent study has shown that mir-99a is a direct target of the epigenetic silencer EZH2 (31). EZH2 promotes cell proliferation, metastasis and drug resistance in several cancers by silencing the expression of tumour suppressors (32). We have shown that higher EZH2 activity (measured via a liquid biopsy approach) predicts porter prognosis in HCC patients treated with sorafenib (33). Hence our finding on mir-99a paves the way for the combined use of miRNA and epigenetic biomarkers for patient stratification and drug efficacy prediction in HCC.
In conclusion, we have identified for the first time a 9-miRNA signature that is differentially expressed in HCC and robustly predicts prognosis. This signature includes transcripts whose function has been independently validated in pre-clinical studies. Future studies on clinical samples (primary tumours and plasma) are warranted to confirm the clinical usefulness of this potential new biomarker panel.
Data Availability
Data are publicly available from the repositories referenced in the manuscript.
Supporting information
Table S1. Number of experimental reads of the predominant mature form of each miRNA.
Table S2. Fifty-nine differentially expressed miRNAs (2-fold) in HCC vs normal livers according to OMCD.
Footnotes
↵§ co-last authors