Pediatric brain tumor classification using deep learning on MR-images with age fusion ===================================================================================== * Iulian Emil Tampu * Tamara Bianchessi * Ida Blystad * Peter Lundberg * Per Nyman * Anders Eklund * Neda Haj-Hosseini ## ABSTRACT **Purpose** To implement and evaluate deep learning-based methods for the classification of pediatric brain tumors in MR data. **Materials and methods** A subset of the “Children’s Brain Tumor Network” dataset was retrospectively used (n=178 subjects, female=72, male=102, NA=4, age-range [0.01, 36.49] years) with tumor types being low-grade astrocytoma (n=84), ependymoma (n=32), and medulloblastoma (n=62). T1w post-contrast (n=94 subjects), T2w (n=160 subjects), and ADC (n=66 subjects) MR sequences were used separately. Two deep-learning models were trained on transversal slices showing tumor. Joint fusion was implemented to combine image and age data, and two pre-training paradigms were utilized. Model explainability was investigated using gradient-weighted class activation mapping (Grad-CAM), and the learned feature space was visualized using principal component analysis (PCA). **Results** The highest tumor-type classification performance was achieved when using a vision transformer model pre-trained on ImageNet and fine-tuned on ADC images with age fusion (MCC: 0.77 ± 0.14 Accuracy: 0.87 ± 0.08), followed by models trained on T2w (MCC: 0.58 ± 0.11, Accuracy: 0.73 ± 0.08) and T1w post-contrast (MCC: 0.41 ± 0.11, Accuracy: 0.62 ± 0.08) data. Age fusion marginally improved the model’s performance. Both model architectures performed similarly across the experiments, with no differences between the pre-training strategies. Grad-CAMs showed that the models’ attention focused on the brain region. PCA of the feature space showed greater separation of the tumor-type clusters when using contrastive pre-training. **Conclusion** Classification of pediatric brain tumors on MR-images could be accomplished using deep learning, with the top-performing model being trained on ADC data, which is used by radiologists for the clinical classification of these tumors. **Key points** * The vision transformer model pre-trained on ImageNet and fine-tuned on ADC data with age fusion achieved the highest performance, which was significantly better than models trained on T2w (second-best) and T1w-Gd data. * Fusion of age information with the image data marginally improved classification, and model architecture (ResNet50 -vs -ViT) and pre-training strategies (supervised -vs -self-supervised) did not show to significantly impact models’ performance. * Model explainability, by means of class activation mapping and principal component analysis of the learned feature space, show that the models use the tumor region information for classification and that the tumor type clusters are better separated when using age information. **Summary** Deep learning-based classification of pediatric brain tumors can be achieved using single-sequence pre-operative MR data, showing the potential of automated decision support tools that can aid radiologists in the primary diagnosis of these tumors. Keywords * MRI * pediatric brain tumor * data fusion * age * deep learning ## Introduction Tumors in the central nervous system are the second most common type of cancer in children and young adults up to the age of 19, with an estimated age-standardized rate (in 100,000 population) of 1.2 for incidence and 0.60 for mortality worldwide1, where brain tumors account for about 57% of the total causes of cancer deaths in this population2. Pediatric brain tumors (PBT) can be grouped concerning the location relative to the tentorium, as infratentorial or supratentorial. Tumors in the infratentorial brain region (posterior fossa) are more common in pediatric patients; however, the frequency varies depending on age3–5. Brain tumor treatment procedures are usually complicated where tumor detection and preliminary diagnosis are based on magnetic resonance images (MRI), and treatment planning also uses histopathological and molecular analysis of the tissue sample6. Diagnosis by radiologists, when comparing the first MRI diagnosis to the final histology diagnosis, varies greatly among tumor types and locations, with an overall sensitivity of 72% for broad tumor type classification (range 0-100%), which shows the need for computational methods to improve qualitative assessments7. Deep learning algorithms have been successfully applied to several medical image-related tasks and can be trained to assist radiologists in diagnosing brain tumors based on MR. Even though deep learning methods have led to reasonable advancements in adult brain tumor detection, classification, and segmentation8–10, their implementation in pediatric cases has been limited11,12 mainly due to the lack of large and standardized open access datasets13,14. Deep learning models trained on MR-images from adults will not perform well on images from children, since PBTs have different diagnostic properties. The “Children’s Brain Tumor Network” (CBTN)15,16 is one of the largest PBT datasets, and could potentially be used in the future similarly to the adult brain tumor segmentation challenge (BraTS)17–19, as a standard and reference dataset for development and comparison of deep learning methods. This study is, to the best of our knowledge, the first report on the implementation of deep learning on the MR dataset from CBTN for brain tumor classification, and also one of few hitherto published MR-based deep learning studies on any brain tumor pediatric dataset. This exploratory study aimed to investigate deep learning-based methods for the classification of pediatric brain tumors, considering different pre-operative MRI sequences, and fusing age and image information. A convolutional neural network (ResNet50) and a vision transformer (ViT) were implemented and evaluated, exploring three pre-training strategies, and investigating model explainability by visualization of activation maps and the learned feature space. ## Material and Methods ### Dataset cohort In this retrospective study, the dataset was obtained upon application to and approval from CBTN15 (accessed in 2021). The downloaded dataset contained 326 subjects, with tumor type information available for 273 subjects (females=153, males=116, not-available=4, age-range [0.01, 36.49] years). Patients older than 18 years (n=3) were included in the dataset given the pediatric tumor type diagnosis. The tumor types available were low-grade astrocytoma (ASTR) (n=132), medulloblastoma (MB) (n=67), ependymoma (EP) (n=45), atypical teratoid rhabdoid tumor (ATRT) (n=20), diffuse intrinsic pontine glioma (DIPG) (n=6), ganglioglioma (n=1), germinoma (n=1), and teratoma (n=1). Due to the limited number of subjects for DIPG, ganglioglioma, germinoma, and teratoma categories, these tumor types were excluded from the subsequent analysis. From the remaining tumor type groups, T1w-Gd, T2w MR sequences and diffusion-weighted (DW-MR) data were collected and used in the analysis. ### Data selection and exclusion An automated selection based on the image quality followed by a visual assessment was performed. Quality selection for T1w-Gd and T2w data was based on the voxel resolution, removing those with axial in-plane resolution larger then 1mm, and with less than 50 axial slices. These values were chosen to avoid artifacts due to a low image resolution. For the diffusion-weighted data, scans with at less than six diffusion-encoding directions were excluded. Visual assessment of all individual images was performed and data were excluded if: (i) images were acquired post-operatively, (ii) images showed the spine only, (iii) the tumor was not visible, (iv) the transversal plane had been clipped, and (v) image artifacts (motion, metal, induced by neurosurgical clips) were present. By visual assessment, the tumor location was saved as a boundary box. ### Pre-processing of image and age data The DW-MR data was processed using MRtrix3 software20 to obtain diffusion tensors from which the ADC map was calculated. Brain extraction was performed, followed by data harmonization, using a per-sequence voxel intensity normalization and interpolation down to 1 mm isotropic resolution. These steps were performed since the CBTN dataset was collected on a variety of MR scanners (manufacturer, field strength, gradient performance, etc.)16. The final volumes were reshaped to have 224 × 224 pixels in the transverse plane. Transversal 2D slices positioned within 20-80% of the tumor boundary box were extracted from the volumetric data to ensure that images showing only small portions of the tumor were not included. Transversal slices were used instead of the volumetric data due to the limited number of subjects available for training 3D deep learning models. A detailed description of the pre-processing steps and software used is available in the *Supplementary material*. The age in days of each subject from the earliest available scan was obtained from the CBTN portal, converted in years, and normalized using *z-score* normalization using the [0.5th, 99.5th] value range. The final composition of the dataset with age and sex information is summarized in Table 1. View this table: [Table 1.](http://medrxiv.org/content/early/2024/09/06/2024.09.05.24313109/T1) Table 1. Per tumor type and MR-sequence summary of the dataset. The age information was obtained from the earliest scan available for each subject. The number of extracted slices reflects the slices within the 20-80% of the tumor boundary box. m: mean, std: standard deviation, M/F: male/female, NA: not available, ASTR: astrocytoma, EP: ependymoma, MED: medulloblastoma. ### Network architecture and training Two deep learning model architectures extensively used in literature were employed in this study, distinguished by their feature extraction approach: ResNet5021 and Vision Transformer22 (ViT), in its *base 16* version. ResNet5021 is a deep convolutional neural network that uses stacked 2D convolutional layers and residual connections to extract image features. ViT22 is a transformer-based model free from convolution operations that uses self-attention to learn local and global relations between non-overlapping patches in an image. Both methods serve as image feature extractors, producing a 1D representation of an input image suitable for classification, with ViT showing to perform better than ResNet-like models on natural image classification tasks as well as being more robust to image perturbations when trained on sufficient data23. Given the limited training data available, transfer learning was used with the image encoding models fine-tuned on the target CBTN dataset starting from pre-trained weights. Three distinct pre-training strategies were investigated: supervised pre-training on *out-of-domain* data (ImageNet1K24), self-supervised pre-training on *close-to-domain* data (BraTS17–19) and self-supervised pre-training on *in-domain* (CBTN) data. For the self-supervised pre-training, the SimCLR25 framework was employed (see *Pre-training* section in the *Supplementary materials* for details). We also investigated the integration of image and age through a joint fusion approach. In this case, ResNet50 and ViT models were used to encode the image data, while a tabular network encoded the age information. Figure 1 shows a schematic representation of the network architecture when trained on image and age information. For the details on the implementation, model pre-training and fine-tuning, and data augmentation, see the *Supplementary materials*. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/06/2024.09.05.24313109/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2024/09/06/2024.09.05.24313109/F1) Figure 1. Schematic representation of image and age encoders whose 1D representations are concatenated for the final classification. ViT: visual transformer. ### Evaluation metrics and statistical methods A ten-times repeated five-fold stratified cross-validation scheme was employed in all experiments to account for the small size of the dataset. For each of the repetitions, subject-wise splitting was performed to obtain training, validation, and testing sets. Models’ performance was evaluated volume-wise in terms of Matthew’s correlation coefficient [-1, 1]26 since it is a more stable metric in case of class imbalance. Accuracy and area under the ROC curve (AUC) were also computed to allow comparison with previous studies. Class-wise F1-score, precision, and recall were additionally computed. Volume-wise predictions were obtained by soft voting aggregation of the models’ predicted probability for each of the slices in a volume. The Wilcoxon signed-rank test (two-sided) was used to investigate if there were difference in classification performance between models trained on image data alone or fused with age information, or when using different pre-training strategies. The Wilcoxon rank-sums (two-sided) test was instead used to compare models when trained on different MR sequences. A *p*-value< 0.05 was considered significant with Bonferroni correction applied when multiple comparisons were performed. ### Model explainability and learned feature space visualization In this work, Grad-CAM27,28 were computed for the last convolutional layer of the ResNet50 models, and for the last attention block in the ViT models, for the ground truth class. Grad-CAMs were employed to ensure that the models focused on relevant regions of the input image for classification, rather than to elucidate the specific reasons or features used by models for prediction. Additionally, principal component analysis (PCA) was performed on the image feature vectors obtained from the trained models to visualize the effect of pre-training and image-age fusion. ## Results ### Classification performance The highest classification performance was achieved by the ViT model pre-trained on ImageNet and fine-tuned on ADC data with age fusion (MCC: 0.77 ± 0.14, Accuracy: 0.87 ± 0.08). This was significantly higher than the best-performing models trained on either T2w (MCC: 0.58 ± 0.11) or T1w-Gd (MCC: 0.45 ± 0.16) data. Class-wise performance (see *Supplementary material* Table S1), showed that the classification of EP is the most challenging across settings, with an average F1-score over all the experiments of 0.37 ± 0.28, while ASTR and MED obtained 0.74 ± 0.18 and 0.76 ± 0.15, respectively. Looking at the overall effect of fusing image and age information, the models’ performance did not significantly change compared to models trained on image data only, except for the ResNet50 model pre-trained on ImageNet and fine-tuned on ADC data where the addition of age information significantly decreased classification performance. The effect of the three pre-training strategies was not consistent across MR sequences, model architectures, or input configuration. Moreover, there was no clear benefit between pre-training on *close-to-domain* or *in-domain* data. Of notice, the ViT models trained on ADC data had a significantly better performance when fine-tuned from ImageNet pre-trained weights compared to contrastive pre-training. Finally, looking at the different model architectures, ResNet50 and ViT models performed similarly when considering pre-training strategies and input configuration. A summary of the classification performance for all the experiments is presented in Figure 2 and in Table 2. View this table: [Table 2.](http://medrxiv.org/content/early/2024/09/06/2024.09.05.24313109/T2) Table 2. Subject-wise classification performance for the best performing models on all the available MR sequences (with and without age fusion). The overall best performing model is highlighted in gray. Models fine-tuned on ADC data perform significantly better than models fine-tuned on either T2w data or T1w-Gd. The addition of the age information did not significantly improve models’ performance. SimCLR: self-supervised contrastive pre-training strategy, std: standard deviation, MCC: Matthew’s correlation coefficient, AUC: area under the Receiver operating characteristic curve (macro-average). ![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/06/2024.09.05.24313109/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2024/09/06/2024.09.05.24313109/F2) Figure 2. Subject-wise classification performance on the test set for all the available MR sequences (with and without age fusion) and investigated model architectures and pre-training strategies. Each box plot summarises the Matthew’s correlation coefficient for the 50 models trained thought a ten-times repeated five-fold cross-validation scheme. Outliers are shown as diamond (⧫). Statistical significance is shown for the best performing models on each MR sequence (\***| two-sided *p*-value < 0.0001 using Wilcoxon rank-sums test adjusted with *post-hoc* Bonferroni correction). See Table 2 for the performance details of the best models for each MR sequence. ### Qualitative analysis of the image feature space Scatter plots for features extracted using the ViT model from T2w and ADC images are shown in Figure 3. On the training set, the features of the different tumor types are grouped in distinct clusters for both T2w and ADC. The different pre-training strategies did not substantially impact the feature space, with the SimCLR pre-training on CBTN showing a marginally improved cluster separation. The fusion of the age information with the image data resulted in a larger separation between the tumor type clusters compared to when only image information was used. On the test set, clusters were less distinct, with EP features largely overlapping with ASTR and MED, reflecting the lower F1-score for this class. Scatter plots for the ResNet50 models and T1w-Gd sequence are available in the *Supplementary material* Figures S1 and S2. ![Figure S2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/06/2024.09.05.24313109/F6.medium.gif) [Figure S2.](http://medrxiv.org/content/early/2024/09/06/2024.09.05.24313109/F6) Figure S2. Principal component analysis (PCA) of image features extracted by ResNet50 and ViT models fine-tuned on T1W-Gd data, with and without age information, using ImageNet or SimCLR pre-trained weights. The first and second principal components are presented, for both training and testing sets. Classes are color-coded. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/06/2024.09.05.24313109/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2024/09/06/2024.09.05.24313109/F3) Figure 3. Principal component analysis (PCA) of image features extracted by ViT models fine-tuned on T2w or ADC data, with and without age information, using ImageNet or SimCLR pre-trained weights. The first and second principal components are presented, for both training and testing sets. Classes are color-coded. The addition of the age information stretches the feature space and helps, in the training set, in clustering the tumor types separately. On the test set, ependymoma samples (orange dots) are scattered and overlapping with the other two classes, confirming the low F1-score for this class. ADC: apparent diffusion coefficient, SimCLR: self-supervised contrastive pre-training. ### Grad-CAMs Representative Grad-CAMs for models trained on the available MR sequences, with and without age fusion, and for the three pre-training strategies are presented in Figure 4. Results are shown for a transversal slice of a test subject for which all MR sequences were available. For the ResNet50 models, the Grad-CAMs focused primarily on the brain region with those of models trained on T2w data showing a better localization of the tumor compared to T1w-Gd or ADC. The Grad-CAMs of the ViT models highlighted the whole brain region with no discrimination of the tumor region, which is a consequence of the short and long relations between the image regions that these models learn. There is no overall difference in the Grad-CAMs between pre-training strategies, and when using age and image information. This was true for both the models, except for the ViT, where examples of activation being around the brain region can be found. Additional Grad-CAMs are available in the *Supplementary materials* Figure S3. ![Figure S3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/06/2024.09.05.24313109/F7.medium.gif) [Figure S3.](http://medrxiv.org/content/early/2024/09/06/2024.09.05.24313109/F7) Figure S3. Grad-CAMs for the models trained on different MR-sequences, with and without age fusion, and for the three pre-training strategies investigated. Grad-CAMs are computed with respect to the ground truth. The red square in the *tumor region* panel delineates the tumor. In the Grad-CAMs images, red color identify the parts of the input image used mostly contributing to the classification. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/06/2024.09.05.24313109/F4.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2024/09/06/2024.09.05.24313109/F4) Figure 4. Grad-CAMs for the models trained on different MR-sequences, with and without age fusion, and for the three pre-training strategies investigated. Grad-CAMs are computed with respect to the ground truth class and for the same subject (the transversal slice was taken to be as close as possible in all MR modalities). The red square in the *tumor region* panel delineates the tumor. In the Grad-CAMs images, the red color identifies the parts of the input image used mostly contributing to the classification. ADC: apparent diffusion coefficient, SimCLR: self-supervised contrastive pre-training. ViT: visual transformer. ## Discussion In this study, deep learning methods were implemented for the classification of PTBs based on pre-operative MR-images from the CBTN dataset. The effect of network architectures, pre-training, MR sequence, and fusion of patient age were investigated. ### Network architecture The ResNet50 model was chosen given previous reports in literature for similar tasks on both pediatric11,12 and adult brain tumor datasets29,30. The ViT model was selected as an alternative to convolution-based deep learning models given its success on natural image tasks31 and its increasing adoption in medical imaging-related tasks32. Both model architectures are available in most of the deep learning frameworks with and without supervised pre-trained weights on ImageNet, which is beneficial when training data is scarce. However, supervised pre-training does not always benefit the downstream task, with the pre-training dataset and objective having an impact on the final performance of the fine-tuned model33. For this reason, self-supervised contrastive pre-training25 was employed to bridge the gap between the pre-training and downstream dataset and objective. Overall, classification results did not benefit from the contrastive pre-training, with the best-performing model being fine-tuned from ImageNet weights. One reason the anticipated benefits of contrastive learning were not observed could be that models were trained to learn shared information from augmented views of the same transversal slice, discarding the fact that this information should be shared by all the slices in a subject. Nonetheless, PCA of the learned feature space showed a better distinction of the different tumor types when using contrastive pre-training, especially for ResNet50 models. ### MR sequences Among the MR sequences, ADC achieved the highest overall classification performance and class-wise F1-scores for both ASTR and MED tumor types, whereas models trained on T2w data achieved the highest F1-score for the EP class (see Table S1 in the *Supplementary material*). This can be attributed to the small number of EPs having ADC data (n=9) compared to those having T2w data (n=30). The results on ADC data were in agreement with those using deep-learning based-methods12, as well as intensity analysis34, and consistent with the information neuro-radiologists use when assessing tumor cellularity and possible tumor grade, during the primary diagnosis work-up. View this table: [Table S1.](http://medrxiv.org/content/early/2024/09/06/2024.09.05.24313109/T3) Table S1. Subjects-wise classification performance across all the available MR sequences (alone or with age fusion), for the implemented deep learning models and pre-training strategy. Classe-wise metrics are also provided. std: standard deviation, MCC: Matthew’s correlation coefficient, AUC: area under the ROC curve (marco-average). Table continues on multiple pages. ### Age information A joint fusion approach was used to combine the image and age information, allowing the image and age encoder to be jointly trained. Results show that the addition of age information did not improve classification performance across the different MR sequences, model architectures, and pre-training strategies. Preliminary investigations also explored the number of encoding layers in the age encoder, with no variation in outcome. This can be attributed to the overlapping age distribution of the different classes as well as to the choice of data fusion approach. By contrast, in a similar experimental setting the combination of image and age information improved model classification performance12. Thus, this leaves open the question of whether the benefits of combining age and image information for pediatric brain tumor classification are restricted to specific subject populations or if a more general method for image and age fusion needs to be explored which can be broadly and successfully applied. ### Model explainability To qualitatively assess the regions used for predictions class-activation mapping was implemented to highlight the regions in the image used by the model for prediction. Given the depth and complexity of both ResNet50 and ViT models, activation maps do not have sufficient spatial resolution to target the tumor region only and provide a visualization of tumor regions and/or image features that are relevant for the classification. Nevertheless, the models’ activation maps showed that the information used for classification fell within the brain region. Interestingly, the effect of the pre-training strategy and the input configuration seen in the feature space visualizations does not reflect on the class activation maps, suggesting that the models use the same brain regions for classification but rely on a somewhat different set of features. ### Comparison with related work The findings align with the few previously reported studies on deep learning-based pediatric brain tumor type classification, with Quon *et al*.,11 reporting a 0.92 accuracy (F1-score of 0.80 on T2w data classifying 4 tumor types and controls) and Artzi *et al*.,12 of 0.87 (F1-score 0.82 on diffusion Trace data classifying 3 tumor types and controls). This study’s class-wise results for ASTR and MB are similar to those previously reported, with EP having the lowest score in all studies. It should be noted that the comparison of the performance metrics among the studies can only be considered in very general terms due to the differences in tumor types included in the analysis and the evaluation protocol. Additionally, previous studies did not provide statistical analysis to assess the impact of specific MR-sequence and/or age fusion on models’ performance. ### Limitations This study has some limitations, particularly concerning two major aspects: (1) the amount and quality of data and (2) the use of 2D models. The dataset, although one of the largest accessible, has a relatively small size and the distribution of tumor types is unbalanced. Not all tumor types available in the dataset could be included in the analysis, limiting the applicability of the trained models in a real-world clinical scenario. The image quality varied greatly between each scan and subject. This large variability in data quality is advantageous for the development of a robust classification method but also adds complexity in determining which factors to adjust to optimize model performance. Moreover, the information regarding the site where the scans were acquired was missing. If available, it could have been used to stratify the subjects in training and testing sets based on MR-site. Additionally, we have only explored the use of single MR sequences separately, while multiple MR sequences (*e*.*g*., T2w and ADC data) should be considered in the future. Finally, the choice of 2D deep-learning models instead of 3D ones resulted in the spatial relationship between the slice and the tridimensionality of the tumor structure being lost. If more data were available, using the MR volumes as model input could address this limitation. Alternatively, while still using 2D slides, aggregation methods that incorporate the slice position for computing volume-wise predictions could be developed. Considering the limitations, the results for this study are preliminary and highlight the need for further research to develop methods that can match the diagnostic performance of radiologists that, for the investigated brain tumors, is 91.7% (range [85.1, 96.7]%) when using T1-w with and without contrast, T2-w, FLAIR and ADC7. ## Conclusions In this proof-of-concept study, the classification of pediatric brain tumors based on MR-images was achieved using deep learning methods. The vision transformer model pre-trained on ImageNet and fine-tuned on ADC data obtained the highest classification performance, with models trained on T2w data also achieving reasonable performance. Image and age fusion did improve classification performance, but not significantly. In future studies, the combination of multiple MR sequences along with more detailed clinical information and further refinements of the network architectures, pre-training and data fusion are warranted o aid radiologists in the clinical assessment of these tumors. ## Data Availability All data used in this study is available upon reasonable request to The Children's Brain Tumor Tissue Consortium (CBTTC) / The children's brain tumor network (CBTN). [https://cbtn.org/](https://cbtn.org/) ## 1 Code availability Code linked to this manuscript is available at [https://github.com/IulianEmilTampu/PediatricBrainTumorClassification](https://github.com/IulianEmilTampu/PediatricBrainTumorClassification). ## Supplementary material ### Material and Methods #### ADC computation The DW-MR data was processed using MRtrix3 software1 to obtain a diffusion tensor from which the ADC map was calculated. Briefly, Gibbs-ringing artefact removal, denoising and motion correction were performed before the diffusion tensor fitting. The ADC map was then computed. #### Intensity normalization, volume re-sampling and slice selection Data harmonization was performed since the CBTN dataset was obtained from a variety of centres and MRI scanners2. The brain from each MRI volume was extracted using a deep learning-based brain extraction tool3, followed by a per-sequence voxel intensity clipping confining the values in the [0.2th, 99.8th] percentile range of the brain region only. Min-max intensity normalization was also performed bringing the voxel intensity values in the [0, 1] range. Lastly, each volume was isotropically interpolated to 1 mm isotropic resolution and reshaped to have 224 × 224 pixels in the transverse plane using an order five spline interpolation function (nibabel.processing.conform). #### Data splitting Scans were split subject-wise, into training and validation (80%) and testing (20%), ensuring that subjects with multiple scans did not end up in the same set. When using pre-trained weights from the *in-domain* data (CBTN), test subjects were not used for neither for pre-training nor during fine-tuning, resulting in each of the ten repetitions having a different set of pre-training weights. #### Pre-training Transfer learning has been widely investigated and used to address the data scarcity problem in medical image analysis4. Thus, in this study, the image feature extractors underwent fine-tuning using pre-trained weights obtained from three distinct pre-training strategies: (1) supervised pre-training on *out-of-domain* data (ImageNet1K), (2) self-supervised pre-training on *close-to-domain* data (BraTS) and (3) self-supervised pre-training on *in-domain* (CBTN) data. Specifically, supervised pre-training utilized ImageNet1K dataset which is a collection of 1.2 million images divided in 1000 classes and is broadly used in literature for model pre-training and with pre-trained model weights available for download from most of the deep learning frameworks (ResNet50 and ViT ImageNet1K model weights from Pytorch were used in this study). For the self-supervised pre-training, the SimCLR5 framework was employed, implementing contrastive learning which enables models to learn visual representation from the data without the need of labels. Two distinct datasets were used for self-supervised pre-training: a *close-to-domain* dataset comprising of transversal slices of adult brain tumor obtained from BraTS20206–8, and an *in-domain* dataset composed of transversal slices from the CBTN dataset including all brain images showing tumor. Models were pre-trained on T2-w (TCGA n=22811, CBTN n=5803), T1w-Gd (TCGA n=22811, CBTN n=2584) or ADC (CBTN n=1383) images separately, to match the MR-sequences available for the CBTN dataset. Since BraTS2020 dataset does not provide ADC data, pre-trained weights on T2-w images were used when fine-tuning on the ADC target dataset. #### Image and age data fusion Several data fusion approaches have been proposed9–11, such as early fusion, joint fusion and late fusion. In this work, joint fusion (feature fusion) was used to combine image and age information for the prediction of tumor type. The advantage of joint fusion is that a single model is trained on both image and age information, with the age not blended immediately with the risk of not being properly used11. The age was encoded using a one dense layer bringing the age information into a three-valued vector. The encoded age vector was then concatenated to the encoded image vector before being fed to the classifier part of the model. #### Implementation and training Models were implemented in Python (3.9.17) using the PyTorch12 framework (2.0.1+cu117), and were trained on a computer with 20-core CPU and 4 NVIDIA Tesla V100 GPUs (32GB memory each). SimCLR pre-training on both the *close-to-domain* and *in-domain* data run for 500 epochs, minimizing the contrastive loss between positive pairs of augmented images (see5 for details) using AdamW optimizer13 with CosineAnnealingLR learning rate scheduler14 (initial learning rate=1.0e-05). During fine-tuning on the target CBTN dataset, only the last 2 convolutional blocks of ResNet50 (22.1M parameters) and the last 6 attention layers for ViT (43.3M parameters) were trained to minimize the weighted categorical cross-entropy loss with balanced weights computed on the training set (sklearn.utils.class_weight function). Adam optimizer15 (*β*1 = 0.9, *β*2 = 0.999) with exponential learning rate decay (starting learning rate=1e-5, *γ* = 0.99) was also used. Fine-tuning run for 200 epochs with the training stopping if the validation loss did not decrease over 20 epochs. Data augmentation was applied16–19 during both pre-training and fine-tuning, using both geometric transformations (random rotations between ±45°, random horizontal and vertical flipping, and 10% random width and height shift) and random color jitter (brightness up to 50%). In addition, the TrivialWideAugment automatic data augmentation method20 was also used, given negligible computational overhead and performance improvement see in natural image classification tasks (num_magnitude_bins=15). Code linked to this manuscript is available at (removed for anonymity). ![Figure S1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/09/06/2024.09.05.24313109/F5.medium.gif) [Figure S1.](http://medrxiv.org/content/early/2024/09/06/2024.09.05.24313109/F5) Figure S1. Principal component analysis (PCA) of image features extracted by ResNet50 models fine-tuned on T2w or ADC data, with and without age information, using ImageNet or SimCLR pre-trained weights. The first and second principal components are presented, for both training and testing sets. Classes are color-coded. ## Results ## Acknowledgments The research was made possible in part due to The Children’s Brain Tumor Tissue Consortium (CBTTC) / The children’s brain tumor network (CBTN). The study was financed by Swedish Childhood Cancer Foundation (MT2021-0011, MT2022-0013), Joanna Cocozza’s Foundation (2022-2024), Linköping University’s Cancer Strength Area (2022) and ALF Grants, Region Östergötland (974566). * Received September 5, 2024. * Revision received September 5, 2024. * Accepted September 6, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. 1.Ferlay, J., Ervik, M., Lam, F. et al. Global cancer observatory: cancer today. [https://gco.iarc.fr/today/home](https://gco.iarc.fr/today/home) (2022). Accessed: 2023. 2. 2.Sharma, R. A systematic examination of burden of childhood cancers in 183 countries: estimates from GLOBOCAN 2018. Eur. J. Cancer Care 30, e13438 (2021). 3. 3.Ostrom, Q. T., Patil, N., Cioffi, G. et al. CBTRUS statistical report: primary brain and other central nervous system tumors diagnosed in the United States in 2013–2017. Neuro-oncology 22, iv1–iv96 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/neuonc/noaa200&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F06%2F2024.09.05.24313109.atom) 4. 4.Corti, C., Urgesi, C., Massimino, M. et al. Effects of supratentorial and infratentorial tumor location on cognitive functioning of children with brain tumor. Child’s Nerv. Syst. 36, 513–524 (2020). 5. 5.Pollack, I. F. Brain tumors in children. New Engl. J. Medicine 331, 1500–1507 (1994). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJM199412013312207&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7969301&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F06%2F2024.09.05.24313109.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1994PU86600007&link_type=ISI) 6. 6.Louis, D. N., Perry, A., Wesseling, P. et al. The 2021 WHO classification of tumors of the central nervous system: a summary. Neuro-oncology 23, 1231–1251 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/neuonc/noab106&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=34185076&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F06%2F2024.09.05.24313109.atom) 7. 7.Dixon, L., Jandu, G. K., Sidpra, J. & Mankad, K. Diagnostic accuracy of qualitative MRI in 550 paediatric brain tumours: evaluating current practice in the computational era. Quant. Imaging Medicine Surg. 12, 131 (2022). 8. 8.Ali, S., Li, J., Pei, Y. et al. A comprehensive survey on brain tumor diagnosis using deep learning and emerging hybrid techniques with multi-modal MR image. Arch. Comput. Methods Eng. 29, 4871–4896 (2022). 9. 9.Amin, J., Sharif, M., Haldorai, A. et al. Brain tumor detection and classification using machine learning: a comprehensive survey. Complex & Intell. Syst. 1–23 (2021). 10. 10.Tandel, G. S., Biswas, M., Kakde, O. G. et al. A review on a deep learning perspective in brain cancer classification. Cancers 11, 111 (2019). 11. 11.Quon, J., Bala, W., Chen, L. et al. Deep learning for pediatric posterior fossa tumor detection and classification: a multi-institutional study. Am. J. Neuroradiol. 41, 1718–1725 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiYWpuciI7czo1OiJyZXNpZCI7czo5OiI0MS85LzE3MTgiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8wOS8wNi8yMDI0LjA5LjA1LjI0MzEzMTA5LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 12. 12.Artzi, M., Redmard, E., Tzemach, O. et al. Classification of pediatric posterior fossa tumors using convolutional neural network and tabular data. IEEE Access 9, 91966–91973 (2021). 13. 13.Shaari, H., Kevrić, J., Jukić, S. et al. Deep learning-based studies on pediatric brain tumors imaging: narrative review of techniques and challenges. Brain Sci. 11, 716 (2021). 14. 14.Huang, J., Shlobin, N. A., Lam, S. K. & DeCuypere, M. Artificial intelligence applications in pediatric brain tumor imaging: A systematic review. World neurosurgery 157, 99–105 (2022). 15. 15.The Children’s Brain Tumor Network. [https://cbtn.org/](https://cbtn.org/). Accessed: 2021. 16. 16.Lilly, J. V., Rokita, J. L., Mason, J. L. et al. The children’s brain tumor network (CBTN)-Accelerating research in pediatric central nervous system tumors through collaboration and open science. Neoplasia 35, 100846 (2023). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neo.2022.100846&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36335802&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F06%2F2024.09.05.24313109.atom) 17. 17.Menze, B. H., Jakab, A., Bauer, S. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE transactions on medical imaging 34, 1993–2024 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TMI.2014.2377694&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25494501&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F06%2F2024.09.05.24313109.atom) 18. 18.Bakas, S., Akbari, H., Sotiras, A. et al. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. scientific data 4 (1), 170117 (2017). 19. 19.Bakas, S., Reyes, M., Jakab, A. et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. preprint arxiv:1811.02629 (2018). 20. 20.Tournier, J.-D., Smith, R., Raffelt, D. et al. MRtrix3: A fast, flexible and open software framework for medical image processing and visualisation. Neuroimage 202, 116137 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/J.NEUROIMAGE.2019.116137&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F06%2F2024.09.05.24313109.atom) 21. 21.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016). 22. 22.Dosovitskiy, A., Beyer, L., Kolesnikov, A. et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arxiv:2010.11929 (2020). 23. 23.Bhojanapalli, S., Chakrabarti, A., Glasner, D. et al. Understanding robustness of transformers for image classification. In Proceedings of the IEEE/CVF international conference on computer vision, 10231–10241 (2021). 24. 24.Deng, J., Dong, W., Socher, R. et al. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255 (Ieee, 2009). 25. 25.Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In International conference on machine learning, 1597–1607 (PMLR, 2020). 26. 26.Chicco, D. & Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics 21, 1–13 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12864-020-6490-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32977771&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F06%2F2024.09.05.24313109.atom) 27. 27.Zhou, B., Khosla, A., Lapedriza, A. et al. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2921–2929 (2016). 28. 28.Selvaraju, R. R., Cogswell, M., Das, A. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 618–626 (2017). 29. 29.Chelghoum, R., Ikhlef, A., Hameurlaine, A. & Jacquir, S. Transfer learning using convolutional neural network architectures for brain tumor classification from MRI images. In Artificial Intelligence Applications and Innovations: 16th IFIP WG 12.5 International Conference, AIAI 2020, Neos Marmaras, Greece, June 5–7, 2020, Proceedings, Part I 16, 189–200 (Springer, 2020). 30. 30.Rehman, A., Naz, S., Razzak, M. I. et al. A deep learning-based framework for automatic brain tumors classification using transfer learning. Circuits, Syst. Signal Process. 39, 757–775 (2020). 31. 31.Khan, S., Naseer, M., Hayat, M. et al. Transformers in vision: A survey. ACM computing surveys (CSUR) 54, 1–41 (2022). 32. 32.Shamshad, F., Khan, S., Zamir, S. W. et al. Transformers in medical imaging: A survey. Med. Image Analysis 102802 (2023). 33. 33.Zoph, B., Ghiasi, G., Lin, T.-Y. et al. Rethinking pre-training and self-training. Adv. neural information processing systems 33, 3833–3845 (2020). 34. 34.Tanyel, T., Nadarajan, C., Duc, N. M. & Keserci, B. Deciphering machine learning decisions to distinguish between posterior fossa tumor types using MRI features: What do the data tell us? Cancers 15, 4015 (2023). ## References 1. 1.Tournier, J.-D., Smith, R., Raffelt, D. et al. MRtrix3: A fast, flexible and open software framework for medical image processing and visualisation. Neuroimage 202, 116137 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/J.NEUROIMAGE.2019.116137&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F06%2F2024.09.05.24313109.atom) 2. 2.Lilly, J. V., Rokita, J. L., Mason, J. L. et al. The children’s brain tumor network (CBTN)-Accelerating research in pediatric central nervous system tumors through collaboration and open science. Neoplasia 35, 100846 (2023). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neo.2022.100846&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=36335802&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F06%2F2024.09.05.24313109.atom) 3. 3.Isensee, F., Schell, M., Pflueger, I. et al. Automated brain extraction of multisequence MRI using artificial neural networks. Hum. brain mapping 40, 4952–4964 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/hbm.24750&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F06%2F2024.09.05.24313109.atom) 4. 4.Kim, H. E., Cosa-Linan, A., Santhanam, N. et al. Transfer learning for medical image classification: a literature review. BMC medical imaging 22, 69 (2022). 5. 5.Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In International conference on machine learning, 1597–1607 (PMLR, 2020). 6. 6.Menze, B. H., Jakab, A., Bauer, S. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE transactions on medical imaging 34, 1993–2024 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TMI.2014.2377694&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25494501&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F09%2F06%2F2024.09.05.24313109.atom) 7. 7.Bakas, S., Akbari, H., Sotiras, A. et al. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. scientific data 4 (1), 170117 (2017). 8. 8.Bakas, S., Reyes, M., Jakab, A. et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. preprint arxiv:1811.02629 (2018). 9. 9.Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E.J. Multimodal biomedical AI. Nat. Medicine 28, 1773–1784 (2022). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-022-01981-2&link_type=DOI) 10. 10.Kline, A., Wang, H., Li, Y. et al. Multimodal machine learning in precision health: A scoping review. npj Digit. Medicine 5, 171 (2022). 11. 11.Huang, S.-C., Pareek, A., Seyyedi, S. et al. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ digital medicine 3, 136 (2020). 12. 12.Paszke, A., Gross, S., Massa, F. et al. Pytorch: An imperative style, high-performance deep learning library. Adv. neural information processing systems 32 (2019). 13. 13.Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. arXiv preprint arxiv:1711.05101 (2017). 14. 14.Loshchilov, I. & Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arxiv:1608.03983 (2016). 15. 15.Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. preprint arxiv:1412.6980 (2014). 16. 16.Paul, J. S., Plassard, A. J., Landman, B. A. & Fabbri, D. Deep learning for brain tumor classification. In Medical Imaging 2017: Biomedical Applications in Molecular, Structural, and Functional Imaging, vol. 10137, 253–268 (SPIE, 2017). 17. 17.Chlap, P., Min, H., Vandenberg, N. et al. A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 65, 545–563 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/1754-9485.13261&link_type=DOI) 18. 18.Kumar, R. L., Kakarla, J., Isunuri, B. V. & Singh, M. Multi-class brain tumor classification using residual network and global average pooling. Multimed. Tools Appl. 80, 13429–13438 (2021). 19. 19.Sajjad, M., Khan, S., Muhammad, K. et al. Multi-grade brain tumor classification using deep CNN with extensive data augmentation. J. computational science 30, 174–182 (2019). 20. 20.Müller, S. G. & Hutter, F. Trivialaugment: Tuning-free yet state-of-the-art data augmentation. In Proceedings of the IEEE/CVF international conference on computer vision, 774–782 (2021).