ABSTRACT
Gliomas are highly fatal and heterogeneous brain tumors. Molecular subtyping is critical for accurate diagnosis and prediction of patient outcomes, with isocitrate dehydrogenase (IDH) mutations being the most informative tumor feature. Molecular subtyping currently relies on resected tumor samples, highlighting the need for non-invasive, preoperative biomarkers. We investigated the integration of glioma polygenic risk scores (PRS) and radiomic features for prediction of IDH mutation status. The elastic net classifier was trained on a panel of 256 radiomic features from preoperative MRI scans, a germline PRS for IDH mutation and demographic information from 159 glioma cases in The Cancer Genome Atlas. Combining radiomics features with the PRS increased the area under the receiver operating characteristic curve (AUC) for distinguishing IDH-wildtype vs. IDH-mutant glioma from 0.824 to 0.890 (PΔAUC=0.0016). Incorporating age at diagnosis and sex further improved the classifier (AUC=0.920). Our multimodal classifier also predicted survival. Patients predicted to have IDH-mutant vs. IDH-wildtype tumors had significantly lower mortality risk (hazard ratio (HR)=0.27, 95% CI: 0.14-0.51, P=6.3×10−5), comparable to prognostic trajectories observed for biopsy-confirmed IDH mutation status. In conclusion, our study shows that augmenting imaging-based classifiers with genetic risk profiles may help delineate molecular subtypes and improve the timely, non-invasive clinical assessment of glioma patients.
INTRODUCTION
Gliomas are the most common primary malignant brain tumors in adults1. These neoplasms encompass multiple subtypes with distinct somatic mutations that delineate different clinical trajectories2. Although glioma classifications continue to evolve, several key features have been used to define molecular subtypes since 2016: isocitrate dehydrogenase 1 and 2 mutations (collectively referred to as IDH mutations), chromosome 1p and 19q co-deletion, and TERT promoter mutations2–4. The 2021 World Health Organization (WHO) glioma classification guidelines use these tumor molecular features to define three glioma subtypes5: tumors without IDH mutation (IDH-wildtype glioblastomas), tumors with IDH mutation and an unbalanced translocation between chromosomes 1 and 19 (IDH-mutant 1p19q co-deleted oligodendrogliomas), and IDH-mutant tumors without 1p19q co-deletion (IDH-mutant astrocytomas). IDH-wildtype glioblastomas (GBM) are more aggressive, have fewer treatment options and are associated with significantly shorter overall survival than IDH-mutant gliomas2,6. The early establishment of a molecular diagnosis for gliomas is important to predict tumor behavior and guide treatment of individual patients7–9.
Currently, the classification of gliomas into prognostically significant subtypes is based on histopathological and molecular assessment of tissue samples obtained from biopsy or resection. Therefore, the evaluation of treatment options and prognostication are often delayed until after surgery, which carries the risk of permanent operative complications or may not be readily accessible in low-resource settings. Recent efforts have been aimed at using noninvasive procedures such as imaging10–13 and germline genotyping14,15 to provide insight into the presence or absence of clinically relevant somatic mutations (e.g. IDH mutation) prior to surgical interventions. By providing timely insight into the tumor molecular profile, these noninvasive tools may complement the standard histopathological assessment of surgical specimens to expedite treatment decisions and better inform patient management.
Major advances have been made in the use of tumor radiographic features obtained from preoperative imaging data to classify gliomas into molecular subtypes. Earlier applications of machine learning (ML) to imaging data for tumor classification used textural analysis approaches or rule-based systems such as VASARI16,17. Since these approaches rely on manual feature selection, research efforts have since focused on the development of deep learning models such as convolutional neural networks (CNNs) that automatically extract features from complex images11–13. Recent CNN-based models for glioma classification have shown promising results, with model predictions recapitulating prognostic outcomes expected for different molecular subtypes. However, previous studies of imaging features have not accounted for other known clinical and genetic indicators of subtype-specific glioma risk.
Genome-wide association studies (GWAS) for glioma have shown that inherited genetic variation influences disease risk and that different molecular subtypes are associated with distinct genetic risk loci18,19. Polygenic risk scores (PRS), which aggregate the effects of risk alleles across the genome to provide a personalized genetic susceptibility profile20, have been shown to predict subtype-specific glioma risk and accurately distinguish among molecular subtypes14,15. Since gliomas sometimes present with non-characteristic radiographic features (e.g. IDH-wildtype non-enhancing tumors)13, inherited genetic variation could offer an additional indicator of malignancy risk independent of radiomic features that might improve the performance of imaging-based classification models.
In this study, we integrated radiomic features extracted from pre-operative multimodal magnetic resonance imaging (MRI) scans with germline PRS profiles to classify gliomas according to IDH mutation status. Using the developed classification model, we also identify predictive features in subtype discrimination and assess their clinical significance.
MATERIALS AND METHODS
Study population
The analysis group consisted of 768 glioma cases (384 IDH-wildtype, 384 IDH-mutant) with available tumor molecular data from The Cancer Genome Atlas (TCGA). Cases were genotyped on the Affymetrix 6.0 array and imputed with the TOPMed reference panel, with standard quality control procedures as previously described14,21. Briefly, SNPs with a call rate <95% were excluded along with those at a low minor allele frequency (MAF<0.005) or showing significant deviation from the Hardy-Weinberg equilibrium (P<10−6). Analyses were also restricted to individuals of predominantly European ancestry. Among the 768 glioma cases with genotyping data, 159 cases (82 IDH-wildtype, 77 IDH-mutant) also had available radiomic data extracted from pre-operative multimodal MRI scans provided by The Cancer Imaging Archive22–24. As described in Bakas et al22, T1-weighted pre-contrast (T1), T1-weighted post-contrast (T1-Gd), T2 and T2-FLAIR scans of each patient underwent standard pre-processing including registration, resampling and skull stripping, followed by computer-aided assignment of segmentation labels to tumor sub-regions (e.g. peritumoral edema). Computer-aided segmentation labels were then manually-revised by a neuroradiologist. Based on the assigned labels of each tumor sub-region, a panel of radiomic features were extracted, which included intensity, volumetric, morphologic, histogram, textural, spatial and tumor diffusion parameters.
Genetic data preprocessing and feature extraction
An overview of the study design and analysis is provided in Figure 1. Using individual-level genotyping data, we fit four previously developed subtype-specific PRS for each patient in TCGA, as described in Nakase et al14. Briefly, the GBM PRS and the non-GBM PRS were trained using summary statistics from a GWAS of 10346 cases (5395 GBM, 4466 non-GBM) and 14,687 controls, while PRS for molecular subtypes were developed using GWAS results from 2632 cases (1115 IDH-wildtype, 699 IDH-mutant) and 2445 controls14,18,19.
For each individual, each subtype-specific PRS was converted to a standardized z-score based on the in-sample TCGA distribution. To adjust for population stratification, we regressed out the effects of the first 10 genetic ancestry principal components and used the residualized z-scores for each PRS in subsequent processing. Next, we fit a logistic regression model with IDH mutation status as the outcome and the four residualized PRS features as the explanatory variables on the subset of 609 patients without radiomic data. This model was then applied to the 159 patients with both radiomic and germline genetic data to extract a new composite PRS feature based on the weighted effect of each subtype-specific PRS.
Radiomic data preprocessing and feature extraction
Of the 723 radiomic features, 467 were not available for all cases and were excluded from the analysis. For each of the remaining 256 radiomic features, we calculated its standardized z-score. We assessed for potential confounding by age, sex and brain volume (excluding skeletal structures) by calculating the Pearson correlation between the standardized z-score of each radiomic feature and each factor. For each feature that was significantly correlated with age, sex or brain volume (P<0.05), residualization was performed to omit the effect of these factors25.
Model development and evaluation
We used an elastic net to classify glioma cases according to IDH mutation status in the subset of the TCGA dataset with both radiomic and germline genetic information. In addition to the single composite PRS feature and the 256 radiomic features, age at diagnosis and sex were used as model inputs. We performed 5-fold cross-validation with 80% of the data used for training the model and 20% of the data used for independent testing in each experimental iteration. We used standard logistic regression for the demographics only (age at diagnosis and sex) and PRS only models. Classification performance was quantified using accuracy, precision, recall, F1-score and area under the receiver operating characteristic curve (AUC). The difference in AUC between models was assessed using DeLong’s test. Feature importance was quantified using the distribution of weights across the 5 folds of the training/testing split. Features whose mean weights were significantly different from zero based on a one-sample t-test (P<0.05) were regarded as predictive features.
Survival analysis
We examined how well the integrated genetic and radiomic IDH classifier predicted overall survival. Follow-up time was calculated from date of diagnosis to death or end of follow-up. Kaplan-Meier curves were used to visually compare survival trajectories based on predicted IDH status. Differences in event time distributions were assessed using the log rank test. To incorporate covariates, hazard ratios (HR) for predicted IDH status were estimated using Cox proportional hazards models with adjustment for age at diagnosis and sex, and stratification by disease grade. In addition to prognostic associations for predicted IDH status, we also evaluated the association of each feature that was included in the classifier with overall survival.
RESULTS
Characteristics of study population
Basic summary information of study participants overall and stratified by available genotyping and radiomic data is provided in Table 1. The median age of included cases was 52 years, with more males (58.5%) than females (41.5%). Overall, we included 384 subjects diagnosed with IDH-wildtype GBM, 235 with IDH-mutant astrocytoma and 147 with IDH-mutant 1p19q-codeleted oligodendroglioma.
IDH classification
We examined the classification performance of different combinations of features for predicting IDH mutation status (Table 2). In models limited to features of a single category (i.e. demographics, genetics or radiomics), we found that the radiomics-based model was the most predictive for IDH status classification. The radiomics-based model yielded higher AUC (0.824, 95% confidence interval (CI): 0.755-0.894) than the composite PRS model (0.702, 95% CI: 0.622-0.782, PΔAUC=0.039) and the demographics only model (0.751, 95% CI: 0.673-0.827, PΔAUC=0.19). Radiomics features also exhibited improved accuracy (0.774), recall (0.779) and F1-score (0.769) compared to the other single-category models.
Next, we assessed whether germline genetics, demographics, and radiomics might contribute orthogonal information towards IDH status classification (Table 2). The elastic net model that included both PRS and radiomic features achieved significantly improved performance compared to the radiomics-based model, with an AUC of 0.890 (95% CI: 0.837-0.943, PΔAUC=0.0016). Model performance was further improved by adding age at diagnosis and sex to the radiomics features (AUC=0.906, 95% CI: 0.856-0.955). Overall, the integrated model that included demographics, radiomics, and PRS features achieved the highest AUC (0.920, 95% CI: 0.876-0.964), accuracy (0.849), recall (0.844) and F1-score (0.844). Repeating the 5-fold cross-validation procedure for 500 random training/testing splits showed that the inclusion of genetic and demographic features in the radiomics-based model increased classification performance (Figure 2). The AUC distribution of the full integrated model achieved a median of 0.937 (interquartile range (IQR)=0.920-0.937) compared to a median of 0.882 (IQR=0.871-0.893) for the combined radiomics and genetics model and a median of 0.838 (IQR=0.824-0.851) for the radiomics-only model.
In sensitivity analyses, we assessed the relative performance of the various models using radiomic features that were not adjusted for confounding factors such as age, sex and brain volume (Supplementary Table 1). The radiomics model showed significantly higher AUC (0.887, 95% CI: 0.833-0.942) than both the composite PRS feature model (AUC=0.702, PΔAUC=5.7×10−4) and the demographics-only model (AUC=0.751, PΔAUC=0.0034). Adding germline genetics to radiomics features increased accuracy (0.843 vs. 0.830), F1-score (0.832 vs. 0.814) and AUC (0.920, 95% CI: 0.878-0.962, PΔAUC=0.018). The full integrated model that included age at diagnosis and sex had the best overall classification performance with an accuracy of 0.868, F1-score of 0.863, and an AUC=0.920 (95% CI: 0.878-0.963). When we repeated the 5-fold cross-validation procedure for 500 iterations (Supplementary Figure 1), the full integrated model showed a higher median AUC (0.927, IQR=0.927-0.934) than the combined radiomics and genetics model (AUC=0.920, IQR=0.911-0.927) and the radiomics-only model (AUC=0.881, IQR=0.891-0.900).
Feature importance
Overall, 40 out of the 256 features had non-zero cumulative weights in the elastic net model that included all available features (Figure 3A). Of those 40 features, 8 were significantly predictive of IDH mutation status: age at diagnosis, germline genetic susceptibility, the ratio of the enhancing tumor (ET) volume to whole tumor (WT) volume, heterogeneity of the non-enhancing tumor (NET), the percentage of the tumor core (TC) in the frontal lobe, the ratio of the NET volume to the WT volume, the ratio of the ET to the TC volume and the ratio of the NET volume to the TC volume (Figure 3B). Various texture-based radiomic features calculated from gray-level intensities also had non-zero cumulative weights in the elastic net model, although their mean weights were not significantly different from zero (Figure 3A). Compared to IDH-wildtype tumors, IDH-mutant tumors were generally diagnosed at an earlier age, had minimal areas of enhancement and more often developed in the frontal lobe (Figure 3B).
Survival analysis
Patients predicted to have IDH-mutant gliomas showed significantly improved survival compared to patients predicted to have IDH-wildtype gliomas (HR=0.27, 95% CI: 0.14-0.51, P=6.3×10−5; Table 3). The difference in median survival between IDH-mutant and IDH-wildtype predictions was 72.7 months (P=2.1×10−11; Figure 4B). Median survival time for patients predicted to have IDH-mutant (87.4 months) vs. IDH-wildtype tumors (14.7 months) was similar to patients with biopsy-confirmed IDH-mutant (87.4 months) vs. IDH-wildtype tumors (14.3 months; Figure 4A). Among grades II or III gliomas, patients predicted to have IDH-mutant tumors had significantly lower mortality risk than patients predicted to have IDH-wildtype tumors (HR=0.28, 95% CI: 0.11-0.69, P=0.0062; Figure 4D), which was comparable to the survival difference observed for IDH mutation status based on postoperative molecular profiling (HR=0.19, 95% CI: 0.07-0.53, P=0.0015; Figure 4C). For grade IV gliomas, the median survival difference between patients predicted to have IDH-mutant tumors compared to patients predicted to have IDH-wildtype tumors was attenuated at 7.4 months (P=0.30; Figure 4D). Similar differences in median survival were observed when we compared patients with grade IV gliomas based on biopsy-confirmed IDH mutation status (10.5 months, P=0.087; Figure 4C).
Of the 40 features that had non-zero cumulative weights in the integrated IDH classifier, age at diagnosis and 23 radiomic features were associated with all-cause mortality at P<0.05 (Figure 5, Supplementary Table 2). Prognostic radiomic features were primarily related to characteristics of the ET and NET regions. For instance, higher relative enhancing tumor volume (i.e. ET/TC) was associated with a 51% increase in overall mortality risk (HR=1.51, 95% CI: 1.22-1.86, P=1.2×10−4), while tumors with greater solidity of the NET had lower mortality risk (HR=0.61, 95% CI: 0.49-0.77, P=3.1×10−5).
DISCUSSION
In this case study, we integrated radiomic features extracted from pre-operative MRI scans, a composite PRS for IDH-mutant glioma and demographic features such as age at diagnosis and sex to classify gliomas according to IDH mutation status. Previous models used to classify gliomas into clinically-relevant molecular subtypes have relied on either radiomic11–13 or germline genetic features14,15 and have not assessed whether inherited glioma susceptibility may assist in the classification of gliomas that present with non-characteristic radiographic properties. With larger glioma GWAS yielding greater insight into subtype-specific glioma risk, we sought to leverage the unique germline genetic signatures of glioma subtypes18,19 in integrative classification models to potentially improve preoperative prognostication and treatment algorithms. We found that the inclusion of diverse features extracted from multiple complementary sources may improve model accuracy. The elastic net model that included all available features achieved the best overall classification performance, with an AUC of 0.920 and accuracy of 0.849 on 5-fold cross-validation. These results are consistent with observations for other cancers such as thyroid cancer, where polygenic risk scores improved imaging-based classifiers of malignancy risk26. We also found that age at diagnosis and germline genetic susceptibility were among the most predictive features along with volume of ET and NET, and the percentage of the TC in the frontal lobe. Critically, our predicted subtype labels were clinically significant such that patients predicted to have IDH-wildtype glioma showed significantly higher mortality risk than patients predicted to have IDH-mutant glioma.
Tumor molecular markers, especially IDH mutation status, have critical implications for prognosis7,8 and treatment response9. Currently, the classification of gliomas into clinically-relevant subtypes relies on molecular profiling of surgical tumor specimens. This often delays the evaluation of non-surgical treatment options until after invasive surgery, thereby forgoing the opportunity for neoadjuvant therapy and postponing the future use of adjuvant chemoradiotherapy. While combined temozolomide and radiotherapy remains the standard of care for IDH-wildtype glioblastoma27, Vorasenib, an inhibitor of IDH1 and IDH2 enzymes, was recently shown to improve progression-free survival in patients with IDH-mutant glioma28. Although the optimal timing of Vorasenib treatment has not been studied, when provided earlier in the disease course, it might delay subsequent interventions in patients with low-grade glioma and improve quality of life. Our study provides further support for the use of noninvasive tools such as genotyping and imaging to accurately predict IDH mutation status in gliomas prior to surgical intervention, and thus potentially identify those patients suitable for neoadjuvant therapy with IDH inhibitors. However, the added value of these ML approaches in patient management will require careful evaluation of their potential risks and contextualization in different clinical settings.
This work has several limitations. First, statistical power for model comparisons and survival analyses was limited since only 21% of TCGA participants had both radiomic and genotyping data available. Second, while cross-validation provides some robustness by evaluating performance based on in-sample hold-out subsets, external populations from different clinical centers are required for an unbiased and more informative evaluation. Lastly, our analyses focused on IDH mutation status, and we did not consider more refined molecular subtypes based on additional features, such as 1p19q codeletion, TERT promoter mutations and EGFR amplification due to limited sample size. However, of the currently used somatic mutations, IDH status is the most prognostically significant2,4.
In this analysis, we used a predefined set of radiomic features from imaging data with manually-revised segmentation labels22. Several recent studies have developed end-to-end CNN-based models that automatically learn imaging features and perform both tumor segmentation and classification within a single framework11–13. These multi-task deep learning approaches do not rely on manual delineation of tumor regions, reduce computational burden and require minimal input from health care providers once trained, thereby facilitating their integration into existing neuro-oncology workflows. However, CNN approaches are mostly image-intensity based and do not account for other types of features that might be informative for subtype discrimination, which may limit their performance, especially in patients with non-characteristic radiographic findings. As individual-level genotyping on larger cohorts of glioma patients become available, CNN-based models that incorporate germline genetic and demographic features can be trained using a late-fusion strategy13,29.
This work has several important strengths. Our study applies genome-wide PRS for glioma that were previously developed using the largest available collection of glioma GWAS data and have been shown to accurately estimate subtype-specific glioma risk in multiple independent populations14. We also leverage genetic data without corresponding radiomic data in TCGA to generate a composite PRS feature that reflects the joint effects of multiple subtype-specific PRS on the risk of IDH-mutant glioma. In addition to evaluating classification of IDH mutation status, we also assessed the degree to which our predicted molecular subtypes delineated survival trajectories, which is informative for evaluating the potential utility of preoperative glioma classification models in clinical practice.
In summary, this the first study to demonstrate that integration of genetic risk profiles with MRI-based radiomic features significantly improves IDH status classification of glioma cases in TCGA. Given the available data, our case study helps motivate future research on multimodal glioma classification models. Although this work underscores the potential added value of multiple complementary features in glioma subtype classification, an assessment of the clinical utility of this integrated approach will require further testing in larger cohorts using different imaging-based ML algorithms.
AUTHOR CONTRIBUTIONS
Conceptualization of project: TN and LK. Methodology: TN, QZ and LK. Main analyses: TN. Primary data collection and curation: TN, SSF, GG, and LK. Drafting of the manuscript: TN and LK. All authors contributed to, reviewed and approved the final manuscript.
COMPETING INTERESTS
All authors declare no financial or non-financial competing interests.
DATA AVAILABILITY
Genotype data of glioma cases from The Cancer Genome Atlas (TCGA) are available from the Database of Genotypes and Phenotypes (dbGaP) under accession phs000178. Radiomic data of glioma cases from TCGA can be obtained from The Cancer Imaging Archive (https://www.cancerimagingarchive.net)23,24. The data required for fitting polygenic risk scores for glioma are available at: https://zenodo.org/records/10790748.
ACKNOWLEDGEMENTS
This study received no funding.