Abstract
Objective Brain metastases (BM) are associated with poor prognosis and increased mortality rates, making them a significant clinical challenge. Therefore, studying BMs can aid in developing better diagnostic tools for their early detection and monitoring. Systematic comparisons of anatomical distributions of BM from different primary cancers, however, remain largely unavailable.
Methods To test the hypothesis that anatomical BM distributions differ based on primary cancer type, we analyze the spatial coordinates of BMs for five different primary cancer types along principal component (PC) axes which optimizes their largest spread along each of the three PC axes. Data used in this analysis is taken from the International Radiosurgery Research Foundation (IRRF) and all patients underwent gamma-knife radiosurgery (GKRS) for the treatment of BMs which are labeled based on the primary cancer types Breast, Lung, Melanoma, Renal, and Colon. The dataset consists of six features including sex, age, target volume, and stereotactic Cartesian coordinates X, Y, and Z of a total of 3949 intracranial metastases. We employ PC coordinates to reduce the dimensionality of our dataset and highlight the distinctions in the anatomical spread of BMs between various cancer types. We utilized different Machine Learning (ML) algorithms: Random Forest (RF), Support Vector Machine (SVM), and TabNet Deep Learning (DL) model to establish the relationship between primary cancer diagnosis, spatial coordinates of BMs, age, and target volume.
Results Our findings demonstrate that the first principal component (PC1) exhibits a greater alignment with the Y axis, followed by the Z axis, with a minimal correlation observed with the X axis. Based on our analysis of the PC1 versus PC2 plots, we have determined that the pairs of Breast and Lung cancer, as well as Breast and Renal cancer, exhibit the most notable distinctions in their anatomical spreading patterns. In contrast, we find that the pairs of Renal and Lung cancer, as well as Lung and Melanoma, were most similar in their patterns. Our ML and DL results indicate high accuracy in distinguishing the distribution of BM for different primary cancers, with the SVM algorithm achieving a 97% accuracy rate when using a polynomial kernel and TabNet a 96% accuracy. The RF algorithm ranks PC1 as the most important discriminating feature.
Conclusions Taken together, the results demonstrate an accurate multiclass machine learning classification with respect to the distribution of brain metastases.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Partial funding through the USC Norris Comprehensive Cancer Centers Multi-Level Cancer Risk Prediction Models pilot Project Award, Molecular, Clinical and Neuro-imaging Determinants of Spatiotemporal Pathogenesis of Cancer-Specific Brain Metastases: Data Analysis and Longitudinal Modeling (12/01/2020-11/30/2021) is gratefully acknowledged.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
USC Biomedical IRB waived ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors