ABSTRACT
Background In non-small cell lung cancer (NSCLC), alternative strategies to determine patient oncogene mutation status are essential to overcome some of the drawbacks associated with current methods. We aimed to review the use of radiomics alone or in combination with clinical data and to evaluate the performance of artificial intelligence (AI)-based models on the prediction of oncogene mutation status.
Methods A PRISMA-compliant literature review was conducted. The Medline (via Pubmed), Embase, and Cochrane Library databases were searched for studies published through June 30, 2023 predicting oncogene mutation status in patients with NSCLC using radiomics. Independent meta-analyses evaluating the performance of AI-based models developed with radiomics features or with a combination of radiomics features plus clinical data for the prediction of different oncogenic driver mutations were performed. A meta-regression to analyze the influence of methodological/clinical factors was also conducted.
Results Out of the 615 studies identified, 89 evaluating models for the prediction of epidermal growth factor-1 (EGFR), anaplastic lymphoma kinase (ALK), and Kirsten rat sarcoma virus (KRAS) mutations were included in the systematic review. A total of 38 met the inclusion criteria for the meta-analyses. The AI algorithms’ sensitivity/false positive rate (FPR) in predicting EGFR, ALK, and KRAS mutations using radiomics-based models was 0.753 (95% CI 0.721–0.783)/0.346 (95% CI 0.305–0.390), 0.754 (95% CI 0.639–0.841)/ 0.225 (95% CI 0.163–0.302), and 0.744 (95% CI 0.605–0.846)/0.376 (95% CI 0.274–0.491), respectively. A meta-analysis of combined models was only possible for EGFR mutation, revealing a sensitivity/FPR of 0.800 (95% CI 0.767–0.830)/0.335 (95% CI 0.279–0.396). No statistically significant results were obtained in the meta-regression.
Conclusions Radiomics-based models may represent valuable non-invasive tools for the determination of oncogene mutation status in NSCLC. Further investigation is required to analyze whether clinical data might boost their performance.
INTRODUCTION
Lung cancer represents the most often diagnosed cancer in both women and men worldwide, ranking first and third, respectively, and remaining the leading cause of cancer death1. Non-small cell lung cancer (NSCLC), the most frequent histological subtype, accounts for 80%–85% of cases, being adenocarcinoma the most common subtype (40%–50% of cases). Adenocarcinoma can be further subdivided into distinct molecular subtypes2. Indeed, molecular subtyping has become highly relevant in the disease context, as genotype-driven therapy (“targeted therapy”) is nowadays the standard of care for a significant subgroup of patients with advanced and metastatic NSCLC3. However, traditional methods for determining the molecular genotype, as well as the possible emergence of drug resistance mutations during patient’s follow-up, entail invasive biopsies and genetic sequence testing, procedures with multiple number of associated drawbacks including high costs, sampling bias, lack of enough sample, turnaround time, and medical complications4-6. Importantly, the overall accessibility of molecular diagnostics and liquid biopsy may be limited for many patients7, highlighting the need to investigate complementary methods to characterize the oncogene mutation status of lesions.
Radiological imaging represents a potent non-invasive tool for lung cancer, from the screening, diagnosis and staging of the disease to the management, therapeutic planification and follow-up of both early- and advanced-stage cases8. Specifically, computed tomography (CT) remains the standard of care for lung cancer visualization, providing excellent morphological and textural information. In recent years, radiomics, the process of extracting and analyzing quantitative features from medical images to investigate potential connections with biology and clinical outcomes, has gained increasing attention for its applicability in several oncological diseases including lung cancer8. The application of artificial intelligence (AI) to imaging analyses has enabled important clinical needs to be met. This includes the prognostication of outcomes or the prediction of response to treatment, disease progression, or the mutational and molecular profiling of tumors9. In particular, the use of radiomics coupled with AI methods has demonstrated to be a promising non-invasive alternative tool for the prediction of oncogene mutation status in NSCLC8.
In this systematic review and meta-analysis we aimed to: 1) review the available scientific evidence on the use of imaging-based models and radiomics for the prediction of the main targetable oncogenic driver alterations in NSCLC, including epidermal growth factor receptor (EGFR), anaplastic lymphoma kinase (ALK), and Kirsten rat sarcoma virus (KRAS); 2) analyze the overall performance of specifically, AI-based methods, for the prediction of oncogene mutation status; 3) evaluate whether the inclusion of clinical variables in the models improve their performance; 4) evaluate the impact of the available evidence from a clinical perspective.
MATERIAL AND METHODS
This systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines10. The review was registered on PROSPERO before initiation (registration no. CRD42022349809).
Search strategy
A systematic search for eligible publications published through 30 June 2023 was performed in Medline (via Pubmed), Cochrane Library and EMBASE databases using the keywords “Radiomics”, “NSCLC” and “Mutational status”. Further details on the search terms used in each database are provided in Supplementary Table S1. There were no limitations on the publishing year, participant age, or nationality. The search was exclusively limited to English-language publications.
Study selection
Literature search and study selection were independently performed by two reviewers (A.F.M. and A.L.S.). To find relevant publications, they reviewed the titles and abstracts. Studies that satisfied the inclusion criteria were then manually assessed for eligibility by full-text screening. Covidence systematic review software (Veritas Health Innovation, Melbourne, Australia. Available at www.covidence.org) was used as a screening and data extraction tool.
Inclusion criteria
Papers were included in the qualitative synthesis (systematic review) if meeting the following inclusion criteria based on Patient, Index test, Comparator, Reference test, Diagnosis of reference (PIRD) questions: 1) being focused on the ability of radiomics to predict oncogene mutation status in NSCLC; 2) radiomics features were extracted from CT or from F-18 fluoro-deoxy-glucose (FDG)/CT scans; 3) a full text was available; 4) were written in English.
Exclusion criteria
Papers describing studies conducted using MRI scans (not the standard of care for NSCLC patients) or performed in phantom or animal models, or published as case reports, editorials, reviews, poster presentations, letters, editorials, or meeting abstracts were excluded. Papers not on the field of interest were also excluded.
For the quantitative synthesis (meta-analysis), the following additional exclusion criteria were applied: 1) oncogene mutation status was not the primary objective of the paper; 2) were focused on specific mutation subtypes; 3) did not apply AI-based methodologies; 4) developed simultaneous detection models or discriminant models; 5) sensitivity or specificity metrics were not available and could not be calculated; 6) were not comparable with the other articles included (model was developed based on intra- and extra-tumor derived radiomics features); 7) only included models developed with a combination of quantitative features extracted from PET/CT or from PET images (strictly adhering to a clinical perspective, PET scanning equipment is not always available and CT remains the standard of care for NSCLC patients); 8) did not reach a sufficient quality score according to the quality assessment (described below).
Quality assessment
The methodological quality of each study for its possible inclusion in the quantitative assessment was evaluated by using the Checklist for Artificial Intelligence in Medical Imaging (CLAIM)11. Classification, image reconstruction, text analysis, and workflow optimization are some of the applications of AI in medical imaging that are addressed by CLAIM, which is modeled after the Standards for Reporting of Diagnostic Accuracy Studies (STARD) guideline12-15. CLAIM checklist consists of 42 items divided into the conventional sections included in peer-reviewed scientific articles: title or abstract (1 item), abstract (1 item), introduction (2 items), methods (28 items subdivided into study design [2 items], data [7 items], ground truth [5 items], data partitions [3 items], model [3 items], training [3 items] and evaluation [5 items]), results (5 items subdivided into data [2 items] and model performance [3 items]), discussion (2 items) and other information (3 items). The CLAIM guideline offers a roadmap for writers and reviewers with the intention of fostering clear, open, and verifiable scientific discourse on the use of AI in medical imaging11.
For our quality assessment, a score was calculated for each paper ([total score, 42 - number of “not applicable” fields in each case]). A cut-off value of at least half of the total score after removing the “not applicable” items was established for the inclusion in the quantitative analysis. Therefore, this cut-off value varied for each study depending on the number of items that were applicable from among the 42 total items included in the CLAIM checklist (e.g., a cut-off value of 19 was established for those studies in which only 38 items of the checklist were applicable). See Supplementary Table S2. The assessment of the rigor, quality, and generalizability of the work of all enrolled studies was performed by three reviewers (A.J.P., F.B.B. and A.P.P.).
Data extraction
Data extracted included the following: (1) study details: first author, publication year, research questions, study design; (2) patient details: the source of data acquisition (single-center/multicenter), sample size, smoking history, age, sex, TNM staging, treatment status (naïve or any treatment received prior image acquisition), histological subtype; (3) imaging details: imaging modality, plain or contrast CT; (4) oncogene mutation status-related information: type of mutation, specific subtype of mutation (if available), sequencing method; sequencing kit (5) radiomics details: segmentation software, type of segmentation (manual, automatic, or semi-automatic), radiomics feature extraction software, number of imaging features extracted, number and name of radiomics features included in final models, features selection methods, type of models constructed (machine learning [ML], deep learning [DL], classical statistical model), final classifier used in machine learning models, clinical variables included in the models (if applicable), and models performance. Two independent reviewers (A.F.M. and A.L.S.) completed the initial screening and extracted data from all included studies.
Data analysis
For studies including models based on features extracted from different imaging modalities, only those based on CT scans were included in the quantitative analysis. A bivariate analysis of sensitivity and specificity as proposed by Reitsma et al.16 was chosen to perform the meta-analyses. This method has the distinct advantage of preserving the two-dimensional nature of the underlying data. It can also produce summary estimates of sensitivity and specificity (false positive rate [FPR, 1-specificity]), recognizing any possible correlation between these two measures. The method uses a random effect approach in which the values of the sensitivity and FPR estimates are obtained with restricted maximum likelihood. As a complement to the bivariate approach, the summary receiver operating characteristic (sROC) was calculated by converting each pair of sensitivity and specificity into a single measure of accuracy, the diagnostic odds ratio (DOR).
The analyses were carried out by reproducing the confusion matrices of each model presented in the studies, the number of cases and the prevalence of oncogene mutant positive cases. All calculations were performed on the basis of validation cohorts for studies applying a training/validation split method, or on the basis of the total sample when cross-validation was the validation strategy. To ensure homogeneity, calculations were conducted based on internal validation cohort data when external validation was also performed (minority of the cases).
Finally, a meta-regression analysis was performed to measure the possible influence of the following predictors: (1) average age of the cases, (2) manual segmentation vs semi-automatic segmentation vs both procedures (no studies including automatic segmentation approaches met the inclusion criteria for the quantitative analysis), (3) whether the model included only radiomics features or was combined with clinical variables, and (4) whether the model was classified as ML or DL. The heterogeneity in the description of the clinical variables included in the models prevented the inclusion of additional predictors of greatest clinical interest. Only the best model from each study according to its DOR was selected. When the mean/median age was not available due to the heterogeneity among studies when presenting descriptive results, it was inferred from the information obtained. Thus, mean and median values were indistinctly considered; when both values were provided, an average of both was calculated. If mean values were absent, median values were considered and viceversa. If both values were absent from the validation cohort, mean/median age from the total cohort was considered. When this information was not available either, the study was not included in the meta-regression.
All the analyses were performed using R Statistical Software v4.2.2 and the packages mada and tidyverse.
RESULTS
In total 615 articles were obtained according to the search strategy (Figure 1). After de-duplication, 397 studies were obtained and screened. According to the inclusion and exclusion criteria, 89 studies were included in the qualitative analysis (systematic review), all of them developing models for the prediction of EGFR, ALK, and/or KRAS. Out of those, 38 were found eligible for the quantitative part of the study (meta-analyses). As detailed in Supplementary Table S2, all papers passed this quality check and were therefore included.
Qualitative analysis (systematic review)
Methodological characteristics of the studies
The methodological characteristics of the studies are summarized in Table 1. Most of the studies (n = 69/ 89) applied exclusively ML algorithms, while this methodology was also used to build comparator models in 10 articles in which DL techniques were the main methodological approach followed. Only three studies exclusively applied DL algorithms, while classical statistical models were used in seven publications. Among the 79 articles applying ML techniques, the most common classifier used was logistic regression (n = 38), followed by support vector machine (n = 35) and by random forest (n = 29). In terms of partitioning strategy, training-validation split was the most frequent technique (n = 71). External validation was only performed in a small set of studies (n = 9). Regarding imaging techniques, CT was the most frequently used modality (n = 61), followed by PET/CT (n = 22) and by PET alone (n = 4). Additionally, in one study17 PET/CT scans and contrast-enhanced CT images independently acquired were collected, while in another study18, PET/CT, CT, and contrast-enhanced diagnostic quality (CTD) images were used. Of the 61 studies conducted with CT scans, 39 included non-contrast-enhanced images, 18 contrast-enhanced images, in two contrast-related information was not specified and in two both contrast- and non-contrast-enhanced scans were included. Regarding tumor segmentation, a manual approach was followed in 48 studies, and automatic and semi-automatic segmentations were applied in two and 31 studies, respectively; three studies applied both methodologies (for verification or a different approach according to the imaging modality used) and five studies did not specify the method utilized for tumor segmentation.
Clinical characteristics of the studies
The 89 studies evaluated in the qualitative synthesis included a total of 32,084 patients with NSCLC. Although most of the studies included >200 patients, in 42 publications, the sample size did not reach this figure and in 11 studies sample size was even lower than 100. All studies were retrospective and mostly unicentric (n = 72); the number of participant centers was not specified in one study19. In general, basic clinical and demographic information collected included sex, age, smoking status, TNM stage, histology, and treatment status at the moment of image acquisition, although this information was not available in 13, 9, 22, 34, 22 and 24 studies out of the 89 assessed, respectively. The clinical characteristics of the patients included in the 89 studies are depicted in Table 2. The median [range]/ mean ± standard deviation (SD) age of patients was 61.78 [59–64.17] years and 61.71 ± 3.64 years, respectively. In terms of sex, the total population was balanced, with 13,574 females and 14,066 males. The smoking history was available for 23,200 patients, and many were non-smokers (n = 12,813); while smoking history was unknown for 1,146 patients. Out of the 55 studies detailing information about the TNM stage, the majority of them (n = 40) included information about the four stages (I-IV), either provided per group or grouped in stages I-II and stages III-IV. Among the 15 studies that did not include patients of all stages, two studies included only early stage patients (stages I and II)20, 21, two included patients stage II-IV22, 23, six included only patients of stages III and IV24-29 (three of them with a majority of stage IV patients24, 27, 29), and five included patients of stages I-III without including the most advanced stage19, 30-33. A total of 65 studies included patients with adenocarcinoma: 43 exclusively including this histology subtype and 22 including other NSCLC histology types as well. Finally, in most of the cases (n = 65), images were acquired before patients received any treatment, with two studies also including post-treatment images34, 35. In 24 studies, no information on treatment was detailed, although in some of them image acquisition before surgery31, 36-41, before polymerase chain reaction (PCR)42, or before pathological diagnosis43 was detailed as an inclusion criterion. In five studies19, 44-47, authors specify that patients had not received radiotherapy or chemotherapy, but no information on targeted therapy was provided. Finally, only one study48 out of the 89 included in the systematic review, which did not meet the inclusion criteria to be considered for the meta-analysis, included patients who had received treatment with tyrosine kinase inhibitors (TKIs).
Quantitative analysis (meta-analysis)
A total of 38 studies met the inclusion criteria for the quantitative assessment (n = 17,066 patients). Three main different meta-analyses including radiomics-based models were conducted: 1) a meta-analysis including studies focused on the detection of EGFR (n = 34 studies)17, 25-28, 32, 33, 36, 38, 39, 46, 49-71; 2) a meta-analysis including studies focused on the detection of ALK (n = 3 studies)72-74; 3) a meta-analysis including studies focused on the detection of KRAS (n = 4 studies)47, 50, 54, 62. In three studies, authors developed models for the detection of both EGFR and KRAS50, 54, 62. Furthermore, a separate meta-analysis was conducted for combined models (radiomics features + clinical variables) for the prediction of EGFR (not enough studies for ALK or KRAS mutations). Studies included in all the meta-analyses conducted are summarized in Supplementary Table S3. Details on the radiomics features included in the EGFR, ALK, and KRAS models are summarized in Supplementary Table S4, Supplementary Table S5 and Supplementary Table S6, respectively. In terms of radiomics variables, models grouped different combinations of first order, shape, gray level co-occurrence matrix (GLCM), gray level size zone matrix (GLSZM), gray level run length matrix (GLRLM), neighboring gray tone difference matrix (NGTDM), and gray level dependence matrix (GLDM) features. Clinical data included sex, smoking history, and/or histological type in the majority of studies.
EGFR
Results of the meta-analysis focused on models built with radiomics features are summarized in Figure 2. Note that this meta-analysis also included a study50 in which predictions were based on features extracted by a multi-channel and multi-task deep learning model with the ability to simultaneously detect EGFR and KRAS oncogene mutations; and consequently, did not include radiomics features (only single-task results for the independent prediction of EGFR and KRAS were considered for the quantitative analysis). A hierarchical sROC curve was plotted for the included 24 studies 17, 25, 27, 32, 36, 39, 46, 49-51, 54, 55, 57-60, 62, 64-66, 68-71 that evaluate the performance of AI algorithms in predicting EGFR mutation status in NSCLC (Supplementary Figure S1). Eight studies assessed more than one model32, 36, 39, 50, 59, 60, 65, 70. As observed, radiomics-based models exhibited high diagnostic performance in predicting EGFR mutation status with an overall AUC of 0.766. The AI algorithms’ sensitivity in determining the EGFR mutation status varied from 0.362 to 0.948, resulting in an estimate of 0.753 (95% CI 0.721–0.783). The FPR of these algorithms ranged from 0.022 to 0.761, with a estimate of 0.346 (95% CI 0.305–0.390). Detecting a positive case for EGFR mutation was almost six times more likely than not detecting it (DOR = 5.70 [95% CI 4.74–6.81]).
The effect of adding clinical variables to radiomics models or to models including both radiomics and deep features65 (models including clinical data and radiomic or deep features referred in this work as combined models) in the prediction of EGFR mutation was also analyzed. This meta-analysis included 23 studies25, 26, 28, 32, 33, 38, 39, 46, 51-53, 56-58, 61-69, of which four of them developed more than one model32, 39, 56, 67. Results are depicted in Figure 3 and sROC curve in Supplementary Figure S1. Overall, the performance of combined models slightly improved compared to radiomics models, with an AUC of 0.811 and a sensitivity of 0.800 (95% CI 0.767–0.830; model’s sensitivity ranging from 0.523 to 0.944). The FPR resulted similar with a value of 0.335 (95% CI 0.279–0.396; model’s FPR ranging from 0.167 to 0.760.). Detecting a positive case for EGFR mutation with combined models was more than eight times more likely than not detecting it (DOR = 8.35 [95% CI 6.77–10.20]).
ALK
The meta-analysis focused on radiomics-based models included three studies72-74, one of which developed two different models, one based on pre-contrast images and another one on post-contrast images73. An overall AUC of 0.831 was obtained for the prediction of ALK aberration, with a sensitivity ranging from 0.682 to 0.825, resulting in an estimate of 0.754 (95% CI 0.639–0.841). The FPR of these algorithms ranged from 0.167 to 0.277, with an estimate of 0.225 (95% CI 0.163–0.302). Detecting a positive case for ALK aberration was 11 times more likely than not detecting it (DOR = 5.70 [95% CI 5.83–19.10]) (Figure 4 and Supplementary Figure S2). Given the lack of enough studies developing combined models, a meta-analysis to assess the effects of adding clinical variables in the prediction of ALK aberration was not possible. The only study74 that developed a model including age, sex, smoking history, smoking index, clinical stage, distal metastasis and pathological invasiveness of the tumor in combination with conventional CT features and different first order, GLCM, GLSZM, and GLRL radiomics features demonstrated increased performance in predicting ALK aberration of the combined model vs the radiomics-based model, but only in the primary cohort (AUC, 0.83–0.88, p = 0.01), not in the testing cohort (AUC, 0.80–0.88, p = 0.29).
KRAS
Four studies met the inclusion criteria for the meta-analysis assessing models for KRAS mutation prediction47, 50, 54, 62, among which, three of them also developed models for EGFR mutation prediction50, 54, 62. KRAS/EGFR models were independently built except in one study, in which a multi-channel multi-task DL model for the prediction of both KRAS and EGFR mutations was developed50. However, and according to the inclusion criteria, only single-task metrics were considered for the quantitative analysis despite the multi-channel version displayed the highest performance for the simultaneous detection of both oncogenic driver mutations. Results of the meta-analysis evaluating radiomics-based models are shown in Figure 4 and Supplementary Figure S3. KRAS mutation was predicted with an overall AUC of 0.732 and a sensitivity of 0.744 (95% CI 0.605–0.846; model’s sensitivity ranging from 0.641 to 0.875. The FPR was 0.376 (95% CI 0.274–0.491; model’s FPR ranging from 0.259 to 0.468). Detecting a positive case for KRAS mutation with radiomics-based models was more than five times more likely than not detecting it (DOR = 8.35 [95% CI 1.98–11.70]). Like ALK, the lack of enough KRAS studies made it impossible to perform a meta-analysis analyzing combined models. Only Ríos Velázquez et al.62 built a model including age, sex, smoking status, race, and clinical stage together with radiomics features that performed similar to the radiomics model (AUC = 0.69 [95% CI: 0.63–0.75] vs AUC = 0.63 [95% CI: 0.57–0.69]) and worse than a model developed only with clinical data AUC = 0.75 [95% CI: 0.69–0.80].
Meta-regression and subgroup analysis
The possible effects of different predictors on the predictive performance of the models was evaluated for EGFR mutation (not enough studies were available for ALK or KRAS mutations). Neither age, nor the use of contrast, nor the type of segmentation (manual/semi-automatic/automatic), nor the model (radiomics/combined), nor the AI methodology (machine learning/deep learning), yielded statistically significant results (Supplementary Table S7).
DISCUSSION
At present, molecular testing performed on biopsied tissue remains the gold standard for diagnosis and genotyping in advanced NSCLC75, 76. However, given the associated limitations and inconveniences, such as the lack of enough tissue for successful testing77, 78, or the long turnaround times76, there is a need to validate and incorporate new procedures into routine clinical practice. In recent years, liquid biopsy has emerged as a promising alternative in NSCLC, especially in clinical scenarios78. Likewise, radiomics have shown encouraging results in prognosis and prediction in this setting79. In general, both methodologies possess great potential, since they are both simple, straightforward to do, and repeatable at patient follow-up visits, which makes it possible to gather important data about the type of tumor, its aggressiveness, its progression, and its response to therapy80. Radiomics has the additional advantage of only requiring medical images and capturing patient-level and tissue-level heterogeneity, such as CT scans in lung cancer, that are usually acquired as part of the patient’s standard journey, representing an affordable methodology both in terms of resources and costs. It is important that new techniques are properly validated to facilitate their standardization, prior to incorporation into the routine clinical workflow.
To our knowledge, this is the first systematic review and meta-analysis that analyzes the performance and applicability of different imaging-based models for the prediction of three of the most common oncogene mutations—EGFR, ALK and KRAS—in NSCLC from a clinical perspective and with a special focus on AI methodologies. So far, results were only available for EGFR studies and did not take clinical aspects into account81. Thus, the results of our different meta-analyses demonstrate that AI-based models developed with CT-derived radiomics features showed good performance in predicting EGFR, ALK, and KRAS mutations with a sensitivity of 0.753 [95% CI (0.721–0.783)], 0.754 [95% CI (0.639–0.841)] and 0.744 [95% CI (0.605–0.846)], respectively. Whether the inclusion of clinical variables increase models’ performance cannot be concluded from our results, although we believe that increasing the number of studies would probably confirm the trends observed in our quantitative analysis of EGFR mutation.
Our outcomes point to radiomics as a candidate screening tool for oncogene mutation status determination. We especially focused on CT-based models, aiming to obtain conclusions as applicable as possible to the standard clinical workflow since CT remains the most utilized imaging tool in NSCLC82. From our work, we conclude that in addition to additional validation of our findings that future studies should be conducted that consider the following important aspects. Firstly, a minimum sample size should be guaranteed to ensure the reliability of the results obtained with AI-based models83, 84. In both our systematic review and meta-analyses, more than half of the studies were conducted in >200 patients (n = 46/89 and n = 23/38 [n = 20/34 for EGFR, n = 2/3 for ALK and n = 3/4 for KRAS]), but still a sizable number had small sample sizes, which definitely limited the relevance of their conclusions. Multicentric designs would be also desirable to get more solid conclusions, an approach that few studies followed (n = 16/89 in the systematic review and n = 10/39 in the meta-analysis [n = 10/34 for EGFR, n = 0/3 for ALK and n = 3/4 for KRAS]). Secondly, including independent cohorts for external validations would reinforce the results, leading to more robust and reproducible models. Out of the 89 studies included in the qualitative analysis, only 9 used external cohorts for validation43, 46, 60, 63, 67, 68, 85-87, of which five were included in the EGFR meta-analysis46, 60, 63, 67, 68. Finally, it is important that patient populations reflect clinical practice. Thus, considering the potential applicability of the models for diagnostic purposes, studies should be conducted in treatment naïve populations to avoid possible therapy-related confounding effects, an inclusion criterion mostly applied in the studies evaluated in this work, but still missing in some of them. Additionally, studies should be carried out preferably in stage III-IV NSCLC patients (especially in those at stage IV, for whom clinical guidelines recommend molecular testing75, 76). As demonstrated in this work, most of the studies published so far do not provide information on TNM stage or include patients from all stages. Despite the heterogeneity of the studies evaluated, we believe that the evidence provided is enough as to demonstrate the potential of radiomics in oncogene mutation status determination. Thus, AI-based models using radiomics extracted from CT scans could be effective non-invasive screening tools to detect targetable driver mutations in NSCLC with good sensitivity and moderate specificity. These tools would not be intended to replace gold standard techniques, such as PCR or next-generation sequencing, but to allow for the potential earlier identification of ideal candidates to be genetically tested, saving time, costs, and samples. Consequently, a high sensitivity would ensure the identification of oncogene mutation positive patients for whom laboratory-based testing would be subsequently confirmed.
When the influence of different factors on the prediction of EGFR mutation was evaluated, no statistically significant results were obtained, probably due to the limited number of studies included and the presence of missing data. However, some of those factors might play an essential role and should be considered when developing accurate models to be potentially implemented into clinical practice. Indeed, some of the studies included in our qualitative analysis analyzed the impact of different methodological aspects on the performance of the models. For example, Huang et al.35 demonstrated that interobserver variability in tumor segmentation affects the use of radiomics to predict oncogene mutation status, which suggests that automatic or semi-automatic models might be more suitable. In the study by Shiri et al.88, the application to radiomics features of ComBat harmonization improved the performance of the models toward more successful prediction of EGFR and KRAS mutations. Likewise, other authors have pointed to the impact of the experimental settings on the robustness of radiomics features89, or the influence of CT slice thickness on the predictive performance of radiomics-based models31. It is also worth mentioning the relevance of using a particular AI methodology. Although we found no differences in the EGFR mutation predictive performance between ML and DL methods, most likely due to the limited number of available DL-based studies, the latter might offer some advantages over the former. Thus, while in radiomics analysis a process of lesion segmentation and subsequent feature extraction is required, which introduces certain degree of variability and can be a high time-consuming task, DL models only required a bounding box of the lesion, greatly reducing this effect. On the other hand, DL models, and in particular end-to-end convolutional neural network (CNN) models, such those developed in most of the DL studies included in our work37, 42, 43, 50, 58, 68, 85, 86, 90, 91, are generally more complex in terms of the number of parameters, allowing to solve more complicated problems than traditional ML models. Considering available evidence, it seems reasonable to think that methodologic approaches should be carefully revised when validation studies are designed and conducted.
Our study has also some limitations, mainly derived from the limitations of the publications included. Thus, it is based on retrospective studies displaying great heterogeneity in terms of methodology and patient clinical characteristics, which clearly hamper the impact of our conclusions. Additionally, the limited available evidence for ALK and KRAS mutations, makes it difficult to draw solid conclusions. Despite this, our work gathers the most up-to-date and complete evidence (all models developed in each of the studies were analyzed) on imaging-based models for the prediction of three of the most important oncogene mutations in NSCLC, following a clinical approach and a special focus on AI models. Our exhaustive review and meta-analyses are intended to provide solid evidence for future research in the field.
In conclusion, radiomics-based models offer a useful and non-invasive method for determining the status of EGFR mutations in NSCLC and seem to retain similar predictive value for ALK and KRAS mutations. Additionally, although the inclusion of clinical variables tends to increase the performance of the models, further validation is required.
Data Availability
All data produced in the present work are contained in the manuscript.
CONFLICTS OF INTEREST
GJW reports personal fees from Quibim related to this work. He is a former employee of SOTIO Biotech Inc., and reports personal fees from Imaging Endpoints II, MiRanostics Consulting, Gossamer Bio, International Genomics Consortium, Angiex, Genomic Health, Oncacare, Rafael Pharmaceuticals, Roche, Immunocore, Kymera, and SPARC-all outside this submitted work; has ownership interest in MiRanostics Consulting, Exact Sciences, Moderna, Agenus, Aurinia Pharmaceuticals, and Circulogene-outside the submitted work; and has issued patents-all outside the submitted work. The remaining authors declare no conflicts of interest.