Abstract
Elucidating crucial driver genes is paramount for understanding the cancer origins and mechanisms of progression, as well as selecting targets for molecular therapy. Cancer genes are usually ranked by the frequency of mutation, which, however, does not necessarily reflect their driver strength. Here we hypothesize that driver strength is higher for genes that are preferentially mutated in patients with few driver mutations overall, because these few mutations should be strong enough to initiate cancer. We propose a formula to calculate the corresponding Driver Strength Index (DSI), as well as the Normalized Driver Strength Index (NDSI), the latter completely independent of the overall gene mutation frequency. We validate these indices using the largest database of human cancer mutations – TCGA PanCanAtlas, multiple established algorithms for cancer driver prediction (2020plus, CHASMplus, CompositeDriver, dNdScv, HotMAPS, OncodriveCLUSTL, OncodriveFML) and four custom computational pipelines that integrate driver contributions from SNA, CNA and aneuploidy at the patient-level resolution. We demonstrate that DSI and especially NDSI provide substantially different rankings of genes as compared to frequency approach. For example, NDSI prioritized members of specific protein families, including G proteins GNAQ, GNA11 and GNAS, isocitrate dehydrogenases IDH1 and IDH2, and fibroblast growth factor receptors FGFR2 and FGFR3. KEGG analysis shows that top NDSI-ranked genes comprise EGFR/FGFR2/GNAQ/GNA11 – NRAS/HRAS/KRAS – BRAF pathway, AKT1 – MTOR pathway, and TCEB1 – VHL – HIF1A pathway. NDSI does not seem to correlate with the number of protein-protein interactions. We share our software to enable calculation of DSI and NDSI for outputs of any third-party driver prediction algorithms or their combinations.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
AVB was supported by MIPT 5-100 program
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
IRB body approval not required as publicly available database was analyzed and no new experiments involving humans were performed
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
The crucial option to calculate the overlap of third-party algorithms' driver predictions added to PALDRIC GENE; SNADRIF p-value calculation optimized by removing noncoding genes; all results recalculated with improved pipelines and manuscript rewritten; algorithms' packages added as supplementary files.
Data Availability
All data are available as supplementary files