Abstract
With many rare tumor types, acquiring the correct diagnosis is a challenging but crucial process in pediatric oncology. Here, we present M&M, a pan-cancer ensemble-based machine learning algorithm tailored towards inclusion of rare tumor types. The RNA-seq based algorithm can classify 52 different tumor types (precision ∼99%, recall ∼80%), plus the underlying 96 tumor subtypes (precision ∼96%, recall ∼70%). For low-confidence classifications, a comparable precision is achieved when including the three highest-scoring labels. M&M’s pan-cancer setup allows for easy clinical implementation, requiring only one classifier for all incoming diagnostic samples, including samples from different tumor stages and treatment statuses. Simultaneously, its performance is comparable to existing tumor- and tissue-specific classifiers. The introduction of an extensive pan-cancer classifier in diagnostics has the potential to increase diagnostic accuracy for many pediatric cancer cases, thereby contributing towards optimal patient survival and quality of life.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
We gratefully acknowledge that financial support was provided by the Foundation Children Cancer Free (KiKa core funding) and Adessium Foundation.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Biobank and Data Access Committee (BDAC) of the Princess Máxima Center for Pediatric Oncology gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Author Jayne Y. Hehir-Kwa was erroneously omitted from the author list within the PDF-version, so has been added now. Final models were rerun to allow for their submission to Zenodo (no possible linking to patient samples). Small alterations were observed in performance, so figures were updated to reflect this change. The link to the Zenodo submission is added to the manuscript. The word 'malignancy' has been replaced with tumors, to reflect the fact that there are also benign tumors covered within our cohort. The paragraph 'Obtaining patient material and RNA-seq data' was missing from the Method section - it has now been added.
Data Availability
All data produced are available or will be available online on Github, Zenodo and ArrayExpress.