Abstract
Purpose Oral Squamous Cell Carcinoma (OSCC) is one of the most prevalent cancers in the world with maximum number of cases reported from India. Poor survival rate associated with OSCC can be attributed to non-availability of a biomarker, as one of the major reasons, leading to late presentation. Identification of an early diagnostic biomarker, which can also be used as a screening tool, will be helpful in reducing the disease morbidity and mortality.
Experimental Design In this article we report Parallel Reaction Monitoring (PRM) based validation of 12 candidate proteins, identified initially by TMT tag based relative quantification of salivary proteins on LC-MS, in the saliva of Oral Squamous Cell Carcinoma (OSCC) cases (N=50) and healthy controls (N=49), AZGP1, AHSG, KRT6C, S100A7, S100A9, KLK1, BPIFB2, IGLL5, CORO1A, LACRT, LCN2 and PSAP. Heavy isotope labelled reference peptides were used to produce calibration curve and absolute quantification of proteins and resulting data was analyzed statistically using R.
Results Salivary AHSG (p=0.0041**) and KRT6C (p=0.002**) were significantly upregulated in OSCC cases while AZGP1 (p=<0.0001***), KLK1 (p=0.006**) and BPIFB2 (p=0.0061**) were significantly downregulated. Multivariate logistic regression modelling resulted in a risk prediction model consisting of AZGP1, AHSG and KRT6C with p value <0.0001***. Using this model ROC with area under the curve of 82.2% was produced and sensitivity and specificity observed for this model was 78% and 73.5%. Positive and negative predictive values for the model were 76% and 75% respectively.
Conclusion We report a potential biomarker panel consisting of proteins AZGP1, AHSG and KRT6C for early diagnosis of OSCC.
1. Introduction
Oral cancer, with around 90% cases consisting of squamous cell type, is amongst the top ten prevalent (∼0.6 million) cancers in males around the world [1,2] with approximately 32% cases being reported from India alone. In India, it is the topmost prevalent (∼0.2 million) cancer in males [1,2]. India has an incidence rate of around 0.1 million per year with a mortality rate of 0.07 million. Even with the advancement in the treatment strategies in the last two decades, the survival rate of the disease is still very poor which is often associated with the late presentation of the disease. Non-availability of a suitable tumor marker could be one of the major attributions towards this. Histopathological evaluation of the tumor tissue biopsy along with radiological investigations are the current available diagnostic modality for oral cancer, which is an invasive procedure and advised once the visible symptoms start to appear. The multistep, prolonged and invasive procedure of current confirmatory tool renders it unsuitable as a screening tool. In this context a biomarker will be extremely useful for screening and early detection of the disease [3].
One promising approach to identify the potential biomarkers is to analyse the cancer related proteins in bodily fluids. Saliva, being the potential biofluid for surveillance of general health and diagnosis of disease and in proximity of oral cavity, makes a perfect biological fluid for the identification of biomarker for oral cancer [4,5]. In addition, the non-invasive procedure for collection of saliva makes a salivary biomarker ideal as a screening tool for oral squamous cell carcinoma (OSCC).
In this article we present the results of 12 candidate proteins (identified through Tandem Mass Tag (TMT) based relative quantification of the salivary proteome of OSCC) validated using parallel reaction monitoring (targeted proteomics). Using this approach absolute quantification of the candidate proteins was done in the saliva and data was analysed resulting in a potential biomarker panel as a risk prediction model with high sensitivity and specificity. REporting recommendations for tumour MARKer prognostic studies (REMARK) criteria was followed for reporting the study results [6].
2. Materials and methods
2.1. Subjects
The study was approval by the Institutional Ethics Committee (INT/IEC/2015/273). A prospective case control study was designed. Patients attending the Department of Radiotherapy and Department of Otolaryngology at Post Graduate Institute of Medical Education and Research, Chandigarh (India), undergoing surgery and/or receiving the standard radio/chemo-therapy with curative intent based on disease stage, decided as per the approved clinical protocol in the institute were enrolled in the study. Fifty, biopsy proven OSCC cases and age and gender matched 49 healthy volunteers were recruited after obtaining informed consent form and following the inclusion and exclusion criteria (supplementary data). Unstimulated saliva samples (at least 5 ml of saliva) were collected, following at least half an hour abstinence from any food and fluid including water, by collecting the saliva directly in a 50mL centrifuge tube. The collected sample was centrifuged at 5000 rpm for 20 minutes at 4°C and supernatant was collected and preserved at -80°C for further analysis. Patients were followed up after treatment completion till the end of the study or till the event (progressive disease) was recorded.
2.2. Selection of dysregulated proteins as potential candidate for biomarkers
The candidate proteins were selected from a preliminary shotgun proteomic data obtained by TMT tag based relative quantification of salivary proteins of OSCC cases on LC-MS (data not presented) where 135 dysregulated salivary proteins (supplementary table 1) were identified. These proteins were analysed for their gene ontology, protein-protein interaction network and fold change to select the candidate proteins. With this strategy, 12 highly dysregulated proteins (table 1), also reported to play significant role in cancer biology were selected for further analysis by Parallel reaction monitoring (PRM) based absolute quantification on mass spectrometer.
2.3. Standard reference peptides for Parallel Reaction Monitoring (PRM)
Quantotypic unique peptides (supplementary data) were chosen corresponding to the candidate proteins following the selection criteria for peptides for PRM. Tryptic peptides were purchased in the lyophilized form from JPT Peptide Technology (Berlin, Germany) in both light version and labelled version, where C terminal amino acid (lysine or arginine) was heavy labelled (K*= Lys U-13C6; U-15N2, R*= Arg U-13C6; U-15N4). Peptides were reconstituted as per the manufacturer’s instructions to a final concentration of 100 pmoles/µL and serially diluted ranging from 256 fmol/µL to 0.5 fmol/µL to obtain ten working standard concentrations.
2.4. Sample preparation for Parallel Reaction Monitoring
Total protein in the saliva samples was quantified using the Pierce® BCA Protein Assay Kit (#23227, Pierce Biotechnology, Rockford, USA) and following the manufacturer’s protocol. 50 µg of total protein from each sample was prepared for absolute quantification. Total protein was reduced, alkylated and trypsin digested. Digested samples were desalted using Sep-pak C18 cartridge (Waters), dried and reconstituted at the time of analysis with 0.1% formic acid and spiked in with the heavy labelled peptides with a concentration more than the limit of quantification as determined by the standard curves. (This section is mentioned in details in the supplementary data)
2.5. PRM method: sample acquisition and data analysis
PRM method was developed using pool of reference peptide to achieve good resolution and ion abundance. The method development and analysis part are mentioned in the supplementary data in details. Briefly, a 40 minutes liquid chromatography method was developed to resolve the peptides and a two-step mass spectrometer method was set to analyse the eluting peptides. First, a full scan MS was done to identify the precursor masses followed by a targeted MS of the selected precursor ions which were analysed and recorded on the orbitrap analyser. The raw files were imported into the skyline to analyse and obtain the product ion transition area of each peptide precursor. Standard curve with the reference peptide pool was generated to calculate limit of detection and quantification which was used as reference to spike the heavy peptide concentration in the sample digest (Supplementary data). The ratio of light to heavy summed transition area was multiplied with the amount of heavy peptide spiked in for the quantification of respective peptides in the samples. The samples were obtained in triplicate and were averaged for final quantification.
2.6. Statistical analysis
R was used for the graphical presentation and statistical analysis of the data [7]. Shapiro Wilk normality test was used to check the distribution of the data. Wilcoxon Sum Rank test was used to compare the median protein levels between two groups. Receiver Operating Characteristic (ROC) curve was generated to find out the optimum sensitivity, specificity and cut-off levels of proteins. Multivariate logistic regression was done to analyse the cumulative diagnostic potential of the proteins.
3. Results
3.1. Patient demography
Among the recruited cases and controls 80% were males and 20% were females. The mean age of cases and controls was 54.6 years and 54 years respectively. 78% (n=39) cases of the of the total, were diagnosed with late stage disease (TNM stage III/IV) and 22% (n=11) were diagnosed with early stage disease (TNM I/II). Post treatment disease status was recorded for the cases. Only 44 cases could be followed-up to record the status and 6 were lost to follow-up. The status was recorded as NED for the patients having no evidence of disease after treatment completion and progressive disease for the cases having progressive disease. Median follow-up time was 6 months. Ten cases were found to have no evidence of disease after treatment and 34 were having progressive disease (figure 1).
3.2. Five proteins were significantly dysregulated in OSCC cases
The salivary levels of all the 12 candidate proteins is mentioned in table 1. Out of the twelve proteins validated, two proteins AHSG and KRT6C were significantly upregulated and four proteins, AZGP1, KLK1 BPIFB2 and LACRT were found to be significantly downregulated (figure 2) (LACRT was not detected in all the cases and controls so was not included in further analysis).
The levels of the five significant proteins; AHSG, KRT6C, AZGP1, KLK1 and BPIFB2 were further analysed and compared as per the disease stage (figure 3). It was observed that the levels of AHSG and KRT6C were significantly changed in the late stage disease while for BPIFB2 the difference was significant only in the early stage cases. For AZGP1 and KLK1 the levels were significantly different in both early and late stage disease (Supplementary table 2).
3.3. Sensitivity and specificity of the significant protein
ROC curve was produced using pROC package of R to observe the sensitivity and specificity of the significantly dysregulated proteins. The area under the curve obtained was maximum for AZGP1 (72.8%). The sensitivity and specificity observed were 74% and 74.8% respectively. The area under the curve for AHSG was 66.8% and sensitivity and specificity of 60% and 69.39% respectively. For KRT6C area under the curve was 64% and a sensitivity and specificity of 60% and 63.26% respectively. Area under the curve for both KLK1 and BPIFB2 was 66%. The sensitivity and specificity for KLK1 was 58% and 69.39% respectively and 70% and 64.58% respectively for BPIFB2 (figure 4).
3.4. AHSG, KRT6C and AZGP1 combinedly make a diagnostic biomarker panel
Univariate logistic regression was applied to check the effect of protein levels on the disease progression. Case and control were selected as the dependent variable and protein levels as the independent variable. AHSG, KRT6C and AZGP1 were found significant upon univariate logistic regression analysis. The beta coefficient for AHSG, KRT6C and AZGP1 was 0.53, 0.184 and -0.01 respectively with a p value 0.012*, 0.03* and 0.003** respectively. Odds ratio was 1.7 with lower bound of 1.21 and upper bound of 3.21 for AHSG, 1.22 for KRT6C with lower bound of 1.01 and upper bound of 1.52 and 0.98 for AZGP1 with lower bound of 0.97 and upper bound of 0.99. With these significant proteins multivariate logistic regression was applied to obtain a risk prediction model for diagnosis of OSCC. A model with p value <0.0001*** was obtained. Receiver operating curve was obtained for this model. Area under the curve was 82.2% The sensitivity and specificity observed was 78% and 73.5% respectively (figure 5A). Further, the ROC revealed that the model was a little better for early stage OSCC diagnosis (figure 5B) with the area under the curve of 91% for early stage and 84% for late stage.
3.5. Survival analysis
Kaplan Meier analysis was done to analyse the impact of AHSG, KRT6C and AZGP1 on post treatment disease status of the cases. However, the data obtained was not significant.
4. Discussion
Successful identification of a biomarker for early diagnosis of OSCC will improve the overall outcome of the disease and reduce the morbidity associated with the disease, which is currently very dismal. We evaluated salivary proteins as potential biomarker(s) for early diagnosis of the disease which can also be used to screen the high-risk population.
We used parallel reaction monitoring (targeted proteomics) approach which is promising in protein quantification and holds great clinical applications. Parallel reaction monitoring (PRM), where all the transitions are analysed simultaneously in parallel, provides enhanced selectivity with better results, reflected in lower limit of detection and quantification [8]. Using this highly sensitive analytical PRM approach 5 of these 12 proteins were found to be significantly dysregulated in the saliva of our patients.
AZGP1, BPIFB2 and KLK1 were significantly downregulated in our data of which KLK1 and AZGP1 are well reported in cancer but BPIFB2 remains unexplored. AZGP1, an important protein involved in insulin sensitivity and thus play role in metabolism, cell cycle and cancer [9,10]. With metabolic reprogramming as emerging cancer hallmark, AZGP1 expression and role in cancer can be explained. The low mRNA/protein expression of AZGP1 is correlated with disease progression and poor survival in pancreatic cancer [11,12]. Different studies have reported contrasting expression of AZGP1 in different cancers [13–18]. Ibrahim et. al reported low RNA levels of AZGP1 in OSCC tumor tissue of betel quid users [19] which supports our observation at protein level as well. Role of AZGP1 in suppression of cellular invasion and migration [20–22], suggests its association with poor disease response. However, the reduction in levels of AZGP1 in cancer patients mandates a very sensitive detection method for its success as tumor marker in clinical practice. BPIFB2, a member of the lipid transfer/lipopolysaccharide binding protein family. BPIFB2 mRNA expression was reported to be dysregulated in OSCC tumors as compared to the normal counterparts [23] which is in concordance with our observations.
KLK1, a member of serine protease protein family, is involved in a number of physiological functions like remodelling of the extracellular matrix, cellular proliferation and differentiation, angiogenesis, apoptosis etc. The expression of KLK1 was found to be downregulated in a number of cancers including head and neck cancers of which oral cancer is a part [24].
On the other hand, AHSG and KRT6C were significantly upregulated in OSCC. KRT6C, a subtype of type II keratin, has its expression restricted to distinct epithelia type, like filiform papillae of tongue, stratified epithelial lining of oesophagus and oral mucosa and in glandular epithelia [25–27] and associated with abnormal differentiation or enhanced proliferation, like in case of wound healing or cancer with exception of only few body sites [28,29]. The Cancer Genome Atlas (TCGA) also reports high RNA expression of KRT6C in head and neck cancer.
AHSG is a protein of cystatin superfamily with multiorgan expression during embryogenesis [30] which limits to mainly liver and in some cases to osteoblast in human adults [31]. It is a multifunctional protein [32] reported to be associated with various disease conditions [33–36] including cancer [37,38].
However, there are few studies reporting the role of salivary AHSG in disease. To best of our knowledge this is for the first time we are reporting the salivary AHSG levels in oral cancer. In the current study we observed high expression of AHSG in OSCC. The levels were almost twice in cases as compared to the controls as analysed using three different approaches in our laboratory (global proteomics, targeted proteomics and ELISA).
We observed significant difference in AHSG and KRT6C expression between controls and late stage OSCC cases suggesting the role of proteins in disease progression towards aggressive course and this is supported by the reported observations that AHSG is required for the cellular adhesion, proliferation, migration and invasion of the cancer cells. KRT6C expression is also associated with abnormal and enhanced proliferation.
Since cancer is a multifactorial disease, we further used multivariate logistic regression modelling strategy to check the cumulative effect of these proteins as compared to their individual effect. To our expectation, we found that AHSG, KRT6C and AZGP1 together form a highly sensitive prediction model with high sensitivity, specificity and accuracy which is better than their individual diagnostic potential as the average area, sensitivity and specificity of individual proteins increased from 66%, 64% and 67% to 82%, 78% and 73% respectively. This results in a potential biomarker panel for diagnosis of oral cancer with good accuracy. Multivariable biomarker panel approach has been reported to be better in terms of accuracy, sensitivity and specificity, not just for oral cancer but other pathological conditions [39,40]. The panel developed in our study shows better diagnostic accuracy for early stage OSCC but diagnostic potential for late stage OSCC is also considerable. So, as an outcome of this study we report a sensitive biomarker panel which can be developed into a multiparameter rapid testing kit to explore its potential in clinical settings.
For future directions cross validation of this prediction model in terms of accuracy, precision, sensitivity, specificity and positive and negative predictive values using a separate large cohort of cases, disease controls and healthy controls should be done so that the potential value of this prediction model as a biomarker panel in clinical setting can be explored and a rapid detection kit can be developed for the model to facilitate population screening.
Data Availability
Data will be available upon request from the authors.
Conflict of interest
Declarations of interest
None
Acknowledgement
This work was supported by Department of Science and Technology-Science and Engineering Research Board (DST-SERB), New Delhi, (EMR/2016/003253) and Post Graduate Institute of Medical Education and Research (PGIMER), Chandigarh, India (71/2-Edu-15/128 & 71/2-Edu-16/4844-45). Indian Council of Medical Research (ICMR), New Delhi, India provided fellowship to Anu Jain [3/1/3/JRF-HRD-022 (10519)]. We acknowledge the logistic help of Prof JS Thakur and Dr. Sudhir Bhandari for collection of control samples and the help of Ms. Kriti, Ms. Rajandeep Kaur, Ms. Deeksha Sachdeva and Ms. Anshika Chauhan for collection and processing of control samples. Contribution of Mrs. Poornima Devadhar and Ms. Anagha Kanichery for sample processing is acknowledged.
Footnotes
jainanu1291{at}gmail.com, chinnu.kemmaai{at}gmail.com, rtsushmita{at}gmail.com, drjayabakshi{at}gmail.com, aditixchatterjee{at}gmail.com, keshav{at}yenepoya.edu.in, pal.arnab{at}pgimer.edu.in, Phone no: +91-9530801817, +91-172-2755177, Grants Department of Science and Technology- Science and Engineering Research Board (DST-SERB), New Delhi, (EMR/2016/003253), Post Graduate Institute of Medical Education and Research (PGIMER), Chandigarh, India (71/2-Edu-15/128 & 71/2-Edu-16/4844-45)., Indian Council of Medical Research (ICMR), New Delhi, India provided fellowship to Anu Jain [3/1/3/JRF-HRD-022 (10519)].