Summary
Head and neck cancer is the sixth leading cause of cancer across the globe and is significantly more prevalent in South Asian countries, including Pakistan. Prediction of pathological stages of cancer can play a pivotal role in early diagnosis and personalized medicine. This project ventures into the prediction of different stages of head and neck squamous cell carcinoma (HNSCC) using prioritized DNA methylation patterns. DNA methylation profiles for each HNSCC stage (stage-I-IV) were used to extensively analyze 485,577 methylation CpG sites and prioritize them on the basis of the highest predictive power using a wrapper-based feature selection method, along with different classification models. We identified 68 high-power methylation sites which predicted the pathological stage of HNSCC samples with 90.62 % accuracy using a Random Forest classifier. We set out to construct a protein-protein interaction network for the proteins encoded by the 67 genes associated with these sites to study its network topology and also undertook enrichment analysis of nodes in their immediate neighborhood for GO and KEGG Pathway annotations which revealed their role in cancer-related pathways, cell differentiation, signal transduction, metabolic and biosynthetic processes. With information on the predictive power of each of the 67 genes in each HNSCC stage, we unveil a dynamic stage-course network for HNSCC. We also intend to further study these genes in light of functional datasets from CRISPR, RNAi, drug screens for their putative role in HNSCC initiation and progression.
Competing Interest Statement
The authors have declared no competing interest.
Clinical Trial
No clinical trial undertaken in this study.
Funding Statement
We are grateful to the Higher Education Commission (HEC) of Pakistan who through their National Center for Big Data and Cloud Computing (NCBC) have provided funding and support for this project. We would also like to show our gratitude to CECOS University and Rehman Medical Institute (RMI) for being partners in this unique academia-industry complex at the Precision Medicine Lab (PML).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
No IRB approval required for this study.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All the data is publicly available at The Cancer Genome Atlas (TCGA). Case IDs of randomly selected patients is available in the supporting information S3. The R code for building machine learning models is available at Arsalan_Riaz/Rcode_patterns.zip at https://github.com/PML-research/Arsalan_Riaz