Abstract
Immune checkpoint inhibitors, especially PD-1/PD-L1 blockade, have revolutionized cancer treatment and brought tremendous benefits to patients who otherwise would have had a limited prognosis. Nonetheless, only a small fraction of patients responds to immunotherapy, and the costs and side effects of immune checkpoint inhibitors cannot be ignored. With the advent of machine and deep learning, clinical and genetic data has been used to stratify patient responses to immunotherapy. Unfortunately, these approaches have typically been “black-box” methods that are unable to explain their predictions, thereby hindering their clinical and responsible application. Herein, we developed a “white-box” Bayesian network model that achieves accurate and interpretable predictions of immunotherapy responses against non-small cell lung cancer (NSCLC). This Tree-Augmented naïve Bayes model (TAN) precisely predicted durable clinical benefits and distinguished two clinically significant subgroups with distinct prognoses. Furthermore, Our state-of-the-art white-box TAN approach achieved greater accuracy than previous methods. We hope our model will guide clinicians in selecting NSCLC patients who truly require immunotherapy and expect our approach will be easily applied to other types of cancer.
Background Immune checkpoint inhibitors have revolutionized cancer treatment. Given that only a small fraction of patients responds to immunotherapy, patient stratification is a pressing concern. Unfortunately, the “black-box” nature of most of the proposed stratification methods, and their far from satisfactory accuracy, has hindered their clinical application.
Method We developed a “white-box” Bayesian network model, with interpretable architecture, that can accurately predict immunotherapy response against non-small cell lung cancer (NSCLC). We collected clinical and genetic information from several independent studies, and integrated this via the Tree-Augmented naïve Bayes (TAN) approach.
Findings This TAN model precisely predicted durable clinical benefit and distinguished two clinically significant subgroups with distinct prognoses, achieving state-of-the-art performance than previous methods. We also verified that TAN succeeded in detecting meaningful interactions between variables from data-driven approach. Moreover, even when data have missing values, TAN successfully predicted their prognosis.
Interpretation Our model will guide clinicians in selecting NSCLC patients who genuinely require immunotherapy. We expect this approach to be easily applied to other types of cancer. To accelerate the uptake of personalized medicine via access to accurate and interpretable models, we provide a web application (https://pred-nsclc-ici-bayesian.shinyapps.io/Bayesian-NSCLC/) for use by the researchers and clinicians community.
Funding KAKENHI grant from the Japan Society for the Promotion of Science (JSPS) to H.S (21K17856).
Prior evidence Many studies have advocated the use of biomarkers, such as Programmed Death Ligand-1 (PD-L1) and Tumor Mutational Burden (TMB), to estimate the therapeutic effect of immune checkpoint inhibitors in cancer and utilize them in personalized medicine. Because such single factors are insufficient, many artificial intelligence (AI)-based prediction models have been developed. However, most of these have been “black box” models, in that they lack interpretability. Furthermore, most of them are unable to handle missing data, which is problematic because it is challenging in clinical settings to acquire all of the necessary input information.
Added valued To address this, we developed an interpretable graphical Tree-Augmented naïve Bayes (TAN) model, and demonstrated its state-of-the-art “white box” performance. It achieved good predictive performance, even when some of the data were missing, and identified relationships between variables that were consistent with previous reports.
Implications We present the first evidence of a specialized graphical “white-box” model that achieves state-of-the-art performance in immunotherapy, providing strong support for the applicability of interpretable AI models in clinical decision-making. Research using larger datasets will further improve stratification and precision medicine.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study was funded by KAKENHI grant from the Japan Society for the Promotion of Science (JSPS) to H.S (21K17856).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All clinical and mutation information is available from the cBioPortal database ( http://www.cbioportal.org), and the specific explanation of each cohort can be obtained in the original papers.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data produced in the present work are contained in the manuscript.
Abbreviations
- AUC
- Area under the curve
- BN
- Bayesian network
- DCB
- Durable clinical benefit
- HC
- Hill-climbing method
- ICIs
- Immune checkpoint inhibitor
- ML
- Machine learning
- MCMC
- Markov Chain Monte Carlo
- NB
- Naïve Bayes
- NN
- Neural network
- NSCLC
- Non-small cell lung cancer
- TAN
- Tree-Augmented naïve Bayes
- TMB
- Tumor mutational burden