Abstract
Background Involvement of many variables, uncertainty in treatment response, and inter-patient heterogeneity challenge objective decision-making in dynamic treatment regime (DTR) in oncology. Advanced machine learning analytics in conjunction with information-rich dense multi-omics data have the ability to overcome such challenges. We have developed a comprehensive artificial intelligence (AI)-based optimal decision-making framework for assisting oncologists in DTR. In this work, we demonstrate the proposed framework to Knowledge Based Response-Adaptive Radiotherapy (KBR-ART) applications by developing an interactive software tool entitled Adaptive Radiotherapy Clinical Decision Support (ARCliDS).
Methods ARCliDS is composed of two main components: Artificial RT Environment (ARTE) and Optimal Decision Maker (ODM). ARTE is designed as a Markov decision process and modeled via supervised learning. Given a patient’s pre- and during-treatment information, ARTE can estimate treatment outcomes for a selected daily dosage value (radiation fraction size). ODM is formulated using reinforcement learning and is trained on ARTE. ODM can recommend optimal daily dosage adjustments to maximize the tumor local control probability and minimize the side effects. Graph Neural Network (GNN) is applied to exploit the inter-feature relationships for improved modeling performance and a novel double GNN architecture is designed to avoid unphysical treatment response. Datasets of size 117 and 292 were available from two clinical trials on adaptive RT in non-small cell lung cancer (NSCLC) patients and adaptive stereotactic body RT (SBRT) in hepatocellular carcinoma (HCC) patients, respectively. For training and validation, dense data with 297 features were available for 67 NSCLC patients and 110 features for 71 HCC patients. To increase the sample size for ODM training, we applied Generative Adversarial Network to generate 10,000 synthetic patients. The ODM was trained on the synthetic patients and validated on the original dataset.
Results Double GNN architecture was able to correct the unphysical dose-response trend and improve ARCliDS recommendation. The average root mean squared difference (RMSD) between ARCliDS recommendation and reported clinical decisions using double GNNs were 0.61 ± 0.03 Gy/frac (mean±sem) for adaptive RT in NSCLC patients and 2.96 ± 0.42 Gy/frac for adaptive SBRT HCC compared to the single GNN’s RMSDs of 0.97 ± 0.12 Gy/frac and 4.75 ± 0.16 Gy/frac, respectively. Overall, For NSCLC and HCC, ARCliDS with double GNNs was able to reproduce 36% and 50% of the good clinical decisions (local control and no side effects) and improve 74% and 30% of the bad clinical decisions, respectively.
Conclusion ARCliDS is the first web-based software dedicated to assist KBR-ART with multi-omics data. ARCliDS can learn from the reported clinical decisions and facilitate AI-assisted clinical decision-making for improving the outcomes in DTR.
Introduction
Optimal decision-making in Knowledge Based Response-Adaptive Radiotherapy (KBR-ART) is a difficult task1. The difficulties arise from a slew of factors, such as, involvement of many variables, uncertainty in treatment response, and inter-patient heterogeneity 2. In the absence of a quantitative framework, clinical decisions are primarily influenced by physician’s professional experiences, which may result in inter-physician variability. Thus, there is a need for a robust and user-friendly clinical decision-support tool for objective decision-making in KBR-ART that is data-driven and consistent3.
Adaptive Radiotherapy Clinical Decision Support (ARCliDS) is a web-based software tool for AI-assisted optimal decision-making in KBR-ART4–6 and potentially other oncology applications involving dynamic treatment regime (DTR)7,8. ARCliDS provides a quantitative approach to overcome the decision-making difficulties via a set of data analytics algorithms, which include feature selection of important variables, statistical ensemble for representing uncertainties of treatment response, and, most importantly, integration of information-rich dense multi-omics datasets for capturing inter-patient heterogeneity9–11. ARCliDS combines all the above data analytics capabilities and presents a user-friendly interface for evaluating relevant clinical use cases. Moreover, it is complementary to the current treatment planning system; the integration may facilitate an introduction of multi-omics information into the treatment planning workflow.
DTR including adaptive RT (ART)12 are designed for treatment personalization. A popular ART paradigm and implementation is to adapt treatment plans to accommodate during-treatment anatomical changes due to weight loss, tumor regression and/or diminution of the volume of surrounding normal tissue and organ at risk (OAR). A complementary ART paradigm is KBR-ART which provides a response-based adaptive framework for personalizing RT as shown in Figure 1, where the response assessment is not limited to observing anatomical changes. It is divided into three phases: Pre-Treatment Assessment, Treatment Response Evaluation (evaluation phase) and Treatment Adaptation (adaptation phase). In the pre-treatment phase, a patient’s disease and condition is assessed and a treatment plan is tailored. In the evaluation phase, a patient’s treatment response is evaluated by comparing pre and mid treatment multi-omics information changes. Based on the treatment responses, the patient’s associated outcome probabilities are estimated. In the adaptation phase, treatment planning is adapted for a personalized and an optimal outcome. Two endpoints are considered: tumor control and normal tissue complication. The goal of KBR-ART is to maximize tumor control probability (TCP) and minimize normal tissue complication probability (NTCP).
To demonstrate the potential of ARCliDS, two clinical use cases are presented. In both studies, the evaluation time was around 1 month. The first use case is based on the UMCC (University of Michigan Cancer Center) 2007-123 phase II dose escalation clinical trial NCT01190527 13, where inoperable or unresectable non-small cell lung cancer (NSCLC) patients were administered with 30 daily dose fractions. The patients received roughly 50 Gy [Gray = J/Kg] equivalent dose in 2 Gy fractions (EQD2) in the evaluation phase and up to a total dose of 92 Gy EQD2 in the adaptation phase. The evaluation phase lasted for roughly two-thirds of the 6-week treatment period. In the second clinical use case, patients with hepatocellular carcinoma (HCC) received adaptive SBRT in clinical trials NCT01519219, NCT01522937, and NCT0246083514. In the evaluation phase, patients received 3 daily dose fractions followed by 1 month break, and in the adaptation phase, a suitable sub-population of the patients received 2 additional daily doses.
A large sample size that is representative of the true population is preferred for all data driven and statistical modeling. However, due to financial, feasibility, and ethical reasons, obtaining a large dataset in medical field is often impractical. In our case, a dataset of size 117 and 292 were available for NSCLC patients and HCC patients, respectively. Dense multi-omics data with 297 features were available for only 67 NSCLC patients and 110 features for 71 HCC patients. These datasets, albeit on the smaller size, are unique as KBR-ART is still in its clinical trial phase and hence the largest multi-omics datasets for KBR-ART. So, although the sample size looks small, the information-rich dense multi-omics dataset is the largest of its kind.
Under the current United States Food and Drug Administration (FDA) definition and guidelines, ARCliDS is categorized as a Software as a Medical Device (SaMD)6. SaMD is defined as software intended to be used for medical purposes independently in contrast to software intended to drive a hardware medical device (software in a medical device). This definition was recently adopted by FDA to include AI software15,16 which can automatically learn from user cases and continuously update after deployment, as opposed to traditional software, which stays fixed after deployment (excluding version update). Therefore, ARCliDS also has two modes of operation: Operation Mode and Learning Mode as shown in Figure 2. After the initial training, both modes can run simultaneously (online learning) in the clinic.
ARCliDS is composed of two main AI components. The first component is the Artificial Radiotherapy Environment (ARTE) for estimating the predicted outcome and the second component is the Optimal Decision-Maker (ODM) for decision-making. In Operation Mode, ARCliDS asks for a patient’s pre and mid treatment multi-omics information, and current treatment plan. It feeds that information into ARTE and ODM, and obtains outcome estimates, state dynamics, and the optimal dose adaptation value. All of the estimated results come with associated uncertainty. The results are presented in two main plots: outcome space spanned by TCP and NTCP, and population distribution plots as further explained in the Graphical User Interface (GUI). During the Learning Mode, ARTE is trained first on the available data, and then ODM is trained on the ARTE. The details of the training are presented in the Methods section and SM.
ARCliDS presents a significant improvement to Tseng et al.’s 17 and Niraula et al.’s 5 methods. The improvement comes from the graphical representation of patients’ features. Convolution neural networks (CNNs) are known to perform well because they exploit the feature locality of images18. In other words, pixels at neighboring areas of an image are correlated, and CNN architectures can capture those correlations. Graph Neural Networks (GNNs) are similar except they exploit the non-local relationship between feature values19. Computationally, GNNs use fewer network connections compared to fully connected NNs, which help in learning by reducing redundancies. From another perspective, information from one feature only goes to its neighboring features. In this work, we have borrowed the feature graph from Luo et al.’s work on multi-objective Bayesian Network20 which identified the most important features related to RT outcome of interest by finding the Markov Blanket of the outcomes. Details of feature selection procedure are presented in Supplementary Materials (SM) section S1. For NSCLC and HCC, we were able to (coincidentally) select 13 important features.
The materials in this manuscript are arranged as follows. We begin by introducing ARCliDS’ GUI. We then present the details of ARTE and ODM in the Methods section. ARTE is further divided into the descriptions of patient state, transition function (TF), and RT outcome estimator (RTOE). We have proposed a linear-quadratic-linear (LQL) type TF. This is an improvement from Niraula et al.’s5 work which employed a linear-quadratic (LQ) type function. LQL model is a generalization of LQ model that covers both RT and SBRT. Furthermore, for the RTOE, we applied GNN to general logistic function guided double NN architecture, which was developed by Niraula et al.5 To improve the robustness of ODM, we have replaced the online learning approach with the Planning and Learning approach21. We trained and validated ARCliDS in two different cancer sites and different treatment types to show its versatility. We conclude the manuscript with a discussion and future steps.
Methods
I. Graphical User Interface
We have designed ARCliDS as a Web Application (app) using R Shiny as shown in Figure 3. The app consists of 4 main panels: Data Input Panel, Outcome Space, Population Distribution Plot and Report Print. Beside these, there are accessibility tools such as help information, user guide, documentation, zooming, and printing option.
1. Data Input Panel
Patient Data can be input manually or via a data file. Multiple patients’ states can be input for visual comparison. The inputs can be saved or printed if necessary. There is a dedicated space for Physician notes.
2. Outcome Space
We present the AI recommendations in the Output Space. The Output Space is spanned by TCP in the x-axis and NTCP in the y-axis. We contoured and colored it with the Reward Function, providing additional insight on the AI’s Decision Making. Given a patient’s information, it shows treatment outcome for a range of daily dose fractions and marks the treatment outcome for the optimal dose recommendation. It provides uncertainty assessment for both the outcome estimate and AI recommendation.
3. Population Distribution Plot
Knowing the patient’s state value and its relative position to the population, provides information on patient’s “whereabouts”. To accommodate a comparison on the feature level, we have included histograms for each feature and patient’s state value atop.
4. Report Print
We have designed a report printing in html format. The interactive nature of the report, even outside the app, makes it much easier to communicate with other users.
II. Artificial RT Environment (ARTE)
Radiation damages both cancer cells and normal tissue cells. To quantify the relationship between the applied radiation and the treatment response, we consider the radiation absorbed by tumor and surrounding normal tissue, and the probabilities of tumor control (TCP) and normal tissue complication (NTCP).
The absorbed radiation is spatially non-uniform, so it is generally converted to a homogenous dose value by weighted-averaging of the treatment sites from treatment planning. Generalized equivalent uniform dose (gEUD) is one such metric22. It is expected that for a fixed radiation site, gEUD must increase with increasing applied radiation as shown in Figure 4. We have assumed a linear-quadratic-linear (LQL)23 type monotonic proportionality relationship (S1) as presented in the SM, which further results in the following two relationships for KBR-ART, i.e.,
And, where, g stands for gEUD, N for nth daily dose fractions, d for dose fractions, DT for threshold doses, and α/β ratio is a tissue-specific parameter. The subscript 0, eval, and adapt of N and g corresponds to pre-, mid-, and after-treatment, respectively while deval and dadapt corresponds to applied daily dose fractionations during the evaluation phase and adaptive phase, respectively. Dividing the relationships (1) and (2) yields four equations for gadapt as listed in SM Table S1.
With the assumption that the increment of radiation increases both TCP and NTCP, we have applied a sigmoid shape generalized logistic function to represent the outcome probability as follows, where the patient-specific parameters μ and T are functions of their multi-omics state. By applying patient’s pre and mid treatment multi-omics information, the above dose-response relationship captures inter-patient heterogeneity.
Applying the equations from Table S1 and Eq. (3), ARTE is built as a Markov Decision Process (MDP) as shown in Figure 5. ARTE takes in patient’s state (s, c) and daily dose fractionation (d) as the input and returns patient’s next state (s’) and outcome (tcp, ntcp) as the output. The state dynamics is modeled by the TF and the associated outcome by the RTOE.
A. Patient States
Patient State, S ⊂ ℝk, represents a patient’s information at a given time. It consists of patient’s features such as dosimetric, clinical, radiomics, genomics, and imaging information as listed in Tables S3 and S10. MDP assumes that patient’s state at time t+1 only depends on patient’s state and dose, D ⊂ ℝ, at time t. In KBR-ART, three time points are relevant as shown in Figure 4: (i) pretreatment (t=0), (ii) mid treatment (t = eval), and (iii) post treatment (t = adapt).
Previously, deep regression models5,17 were applied to learn state transitions for all the patient features. In this work, however, to reduce modeling error, we consider the non-dosimetric variables at any time points as predictors for the treatment outcome and directly use the clinically measured values. We only estimate the state transition for dosimetric variables since the treatment outcome largely depends on the radiation. Hence, we have divided the patients states into time-varying dosimetric variables, S ⊂ ℝm, and a set of other fixed multi-omics covariates, C ⊂ ℝk−m. A complete patient’s state is given by (s, c) ∈ S × C.
B. Transition Function (TF)
TF: S × D → S, predicts the next state, st+ 1 for patients in state, st, under given dose, dt+ 1. We have used two TFs for the dosimetric variables, tumor gEUD and normal tissue gEUD. We empirically found that deep model-based TF for gEUDs does not always maintain the causal monotonic relationship. Thus, we applied the well-known LQL based relationship from Eq. (S1) to guarantee an increasing monotone relationship between the dose applied and dose absorbed as presented in Eq. (1) and Eq. (2).
C. RT Outcome Estimator (RTOE)
RTOEs, TCP: S × C → [0,1] and NTCP: S × C → [0,1], estimate tcp and ntcp for the patient’s state, (st+ 1, c). In this work, we have applied GNN as the RTOE as shown in Figure 6. Each patient is assigned with a graph of features and then a binary classification is learned on the graph level. We first applied a single GNN for RTOE. While the performance is improved drastically compared to a fully connected classifier, the single GNN was found to not respect the monotonicity between the dose value and the outcome probability. To meet the monotonic relationship, we applied a double GNN architecture with a generalized logistic function named as generalized logistic function guided double GNN (GloGD-GNN).
D. Reward Function
Reward function, R: [0,1] × [0,1] → ℝ, assigns a value to the (tcp, ntcp) pair. The reward function is selected such that its optimization results in maximal tcp and minimal ntcp. We have adopted, R = tcp(1 − ntcp), reward function for ARCliDS. As seen from Figure 7, it is smallest at the negative outcomes, {(tcp, ntcp)} = {(0, 0), (0, 1), (1, 1)}, and largest at the positive outcome, (tcp, ntcp) = (1, 0).
Additionally, a goal is defined. By default, the goal can be defined as tcp > 50% and ntcp < 50%, which rounds to positive outcome. Furthermore, goal based on population endpoints can be added. For NSCLC, a goal of tcp > 70% and ntcp < 17.2 %, 17 and for HCC, tcp > 90% and ntcp < 25 % is added.
Combining the reward and goal, the reward scheme for NSCLC and HCC is defined as following,
Note that goals might be unattainable for some patients.
III. Optimal Decision Maker (ODM)
We utilized a deep reinforcement learning algorithm for training the ODM. ODM is composed of a Q (quality) function, DQN: S × C → ℝd, and a selector. Given a patient’s state st, deep Q-net generates a set of q-values, Q ⊂ ℝd, for a range of dose. Q-Net maps k-dimensional state space to d-dimensional action (dose-decision) space. During operation mode, it simply follows greedy policy and selects the dose having the maximum q-value. For training, we have adopted a model-based RL paradigm that is divided into two phases: Planning and Learning, as shown in Figure 8. During Planning, an exhaustive search is carried out where all patient’s states (st, c) and the range of adaptive dose dt are fed into the ARTE and the resulting states st+1, and rewards rt+1 are saved into the Memory. During Learning, the DQN is trained via double DQN algorithms24 using the Memory as a one-step optimization problem.
IV. Uncertainty Estimate via Statistical Ensemble
Model uncertainty is estimated using Statistical Ensembling. The statistical ensemble technique trains several identical models and finds averages and deviations of the prediction. This method estimates uncertainty purely based on the trained model. Additionally, this also helps with desensitizing ARCliDS to the noise associated with the stochastic optimization algorithm used by NNs. NNs utilize a large number of randomly initiated weights and as a result, learned weights are different from model to model25. ARCliDS presents the average prediction μ as an expected value, and the covariance cov as an uncertainty estimation as shown in Figure 9. For ODM, standard deviation σ is used as the uncertainty estimate.
V. Analysis
ARCliDS was trained and validated on two different use cases from two different types of RT treatments. The first use case is an adaptive RT clinical trial of NSCLC patients, and the second use case is an adaptive SBRT clinical trial. After feature selections, we first built five ARTEs for each disease using the dataset. We then generated 10,000 synthetic patients using a generative adversarial network (GAN)26. Five ODMs were trained using the five ARTEs and 4,000 randomly chosen patients from the pool of the 10,000 synthetic patients. The trained ODM models were then validated on the original dataset.
Since there is no ground truth of what an optimal radiation dosage for a certain outcome would be, our evaluations are based on two metrics. The first metric is root mean square difference (RMSD) value between the ODM recommendation and the retrospective clinical decision used in treatment planning. However, since RMSD is a symmetric metric, i.e., it cannot differentiate a higher dose from a lower dose recommendation compared to the clinical decision, we separated the positive and negative clinical outcomes for additional insight. For the positive clinical outcome, a lower RMSD indicates agreement with the good clinical decisions. For the bad clinical outcome, additional comparison is needed. For this purpose, we have adopted a second metric for self-evaluation as presented in Table 1. The self-evaluation scheme is also based on the assumption that increasing radiation results in a higher value for both TCP and NTCP. Using this assumption, we can further evaluate the recommendations for patients with negative clinical outcome.
Additionally, we present a comparison for two different ARCliDS models. The first is built with Single GNN as RTOE and fully connected double deep Q-network as ODM (Single GNN RTOE+ DDQN ODM) and the second with GloGD GNN as RTOE and fully connected double deep Q-network as ODM (GLoGD GNN RTOE + DDQN ODM).
Results
The main results are summarized and presented in Figures 10 and 11 and in SM sections S7.5 and S8.5. For the NSCLC patients, the overall RMSDs between the two ARCliDS models’ average recommendation and reported clinical decisions, ordered as GLoGD GNN RTOE + DDQN ODM and Single GNN ROTE +DDQN ODM, were 0.61 ± 0.03 Gray/fraction [Gy/frac] (mean±sem) vs 0.97 ± 0.12 Gy/frac, respectively. The overall Self-Evaluation results were Good: 55% vs 39%, Bad: 13% vs 21%, and Not Sure: 13% vs 40%, respectively. The RMSDs for patients with positive clinical outcomes were 0.66 ± 0.02 Gy/frac vs 0.96 ± 0.11 Gy/frac respectively, and the Self-Evaluation results were Good: 36% vs 18%, and Not Sure: 82% vs 64%, respectively. The RMSDs for patients with negative clinical outcomes were 0.55 ± 0.05 Gy/frac vs 0.97 ± 0.12 Gy/frac, respectively, and the Self-Evaluation results were Good: 74% vs 59%, and Bad: 26% vs 41%, respectively.
For the HCC patients, the overall RMSDs were 2.96 ± 0.42 Gy/frac vs 4.75 ± 0.16 Gy/Frac, respectively. The overall Self-Evaluation results were Good: 46% vs 23%, Bad: 11% vs 14%, and Not Sure: 42% vs 62%, respectively. The RMSD for patients with positive clinical outcomes were 2.79 ± 0.50 Gy/frac vs 4.25 ± 0.26 Gy/frac, respectively, and the Self-Evaluation results were Good: 50% vs 26%, and Not Sure: 50% vs 74%, respectively. The RMSD for patients with negative clinical outcomes were 4.02 ± 0.23 Gy/frac vs 6.78 ± 0.35 Gy/frac, respectively, and the Self-Evaluation results were Good: 30% vs 10%, and Bad: 70% vs 90%, respectively.
Discussion
To our knowledge, there are software for ART27,28 but ARCliDS is the first interactive software dedicated to KBR-ART that will be available through a web portal. In this work, we have shown its applicability to adaptive RT and SBRT. However, ARCliD’s underlying technology can be generalized to any other DTR to optimize sequential decision-making with multi-omics data for deciding the order of treatments, including multi-modality treatment, given that an artificial treatment environment can be sufficiently modeled.
We have implemented tools such as GAN and GNN and invented novel techniques such as GloGD-GNN to overcome data-related issues for developing ARCliDS. We applied GAN to learn the underlying patient’s feature distribution and generated 10,000 synthetic patients for training the ODM. We adopted GNN for modeling RTOE as exploiting the inter-relationship between the features can improve model prediction. Mathematically, the inter-relationship can be represented by a directed graph G(V, E) where the nodes V represent patient features and edges E represent the relationships. Analyzing the inter-feature relationships before feeding it to the NN reduces the number of connections and hence simplify the learning process. As a novel approach, we applied GNN on the feature space as opposed to the sample space. As shown in the SM, every patient is represented by a directed graph of features, set by the treatment and disease type. RTOE is then designed as a graph classification problem where the node value differs from patient to patient.
As seen from Figures 10 and 11, the models in descending order according to the RMSD and Self-Evaluation measures, for both NSCLC and HCC, are GLoGD GNN RTOE + DDQN ODM, and Single GNN RTOE +DDQN ODM. As expected, correction of RTOE with GLoGD architecture helps to maintain the monotonic relationship between the outcome probability and daily dose fractionation. An example is provided in Figures S7, S8, S20, and S21 and a detailed AUROCC analysis is presented in the SM.
Our framework has some limitations. Clinically, RT dose adaptation can be performed in different ways: (1) change dose per fractions, and (2) change the number of fractions. For SBRT, the former is suitable, however for some diseases and modalities the latter may be more appropriate. For instance, when RT is combined with chemotherapy, increasing the number of fractions is preferred. Our framework only covers the former. ARCliDS uses several biomarkers such as cytokines as predictors. Due to the lack of standardization, biomarker levels of the same blood sample measured in two labs can be quite different also known as batch effect. So, biomarker levels of external dataset must be carefully examined before applying ARCliDS. For dosimetric predictor, we have used gEUD, however, for lung and liver, mean dose could also be applied. Another limitation is the number of NTCP’s considered in ARCliDS. In practice there may be more than 1 normal tissues of interest. For NSCLC, heart and lungs are the dose-limiting organs at risk (OAR). For HCC, although liver is usually the main OAR, in some patient, who has tumors near the intestine, intestine is also considered during designing the treatment plan. Finally, beside data-related shortcomings, ARCliDS prediction and recommendation uncertainty, which is based on statistical ensembles, can be improved by training more models; however, this will require more computational power and time.
Although we have the largest dataset of its kind, a larger sample size and balanced dataset will improve ARCliDS performance. We dedicate subsequent paragraphs to discuss data-related limitations, methods we implemented to overcome those limitations, and other possible solutions.
The learning of an environment model is the bottle neck of ARCliDS. For learning a good ARTE, a sufficient sample size and a balanced dataset are necessary. In the adaptive HCC patient’s cohort, only 1 patient did not achieve local control. As a result, RTOE for TCP had an unusually high AUROCC uncertainty. Although we applied class-imbalance correction techniques such as SMOTE and weighted loss function, we witnessed that such techniques fall short in correcting a highly imbalanced dataset. In addition, the toxicity count was also low --there were only 7 patients that showed toxicity. While this is a clinically desirable result, it hinders the learning process and hampers model generalizability. To make the matter worse, patients with highest liver gEUD didn’t show toxicity as shown in Figure S19. This reflects inter-patient heterogeneity, where some patients have poor pre-treatment liver function, who at a higher risk of toxicity for lower dose. Nevertheless, we performed a hyperparameter search for maximizing the generalizability of ARTE.
High noise-to-signal ratio due to inter-patient heterogeneity becomes even more pronounced with a small sample size. The medical field is especially doomed with a small sample size primarily due to the privacy issue29. Such issues make it difficult to learn correct trends in purely data-driven learning. We found that Single-GNN RTOE predicted unphysical trends between daily dose fractionation and TCP/NTCP. For correction, we applied a GLoGD-GNN architecture to infuse prior knowledge into the data-driven technique. We found that it corrects the trend and can also increase the model predictability. Alternatively, distributive learning features such as federated learning can be added to ARCliDS to overcome the small sample size issue. In federated learning approach only the model parameters are shared and data stays within the firewall of individual institutions30.
Sample size issue in training ODM can also be overcome by using synthetic patients. Since ODM of ARCliDS learns via model-based reinforcement learning, computationally the task of ODM is to learn ARTE. This can be considered as an interpolation problem in a continuous feature space. This problem can be tackled using brute-force by exhaustively selecting patient’s state. However, this assumes a uniform distribution which is generally not true. Therefore, we applied generative adversarial network (GAN) to learn the underlying patient’s feature distribution, as shown in SM and generated synthetic patient states for training the ODM. In principle, a conditional GAN31 can be applied to generate patients states distribution along with the outcome, however, a low sample size coupled with severe class-imbalance makes it impossible to correctly learn the underlying conditional probability distribution.
We found that the RMSD values for adaptive SBRT in HCC was higher than adaptive RT in NSCLC. There are three reasons for the higher RMSD value: (1) A larger range of adaptive dose values was explored for SBRT, i.e., 1 to 15 Gy/frac compared to 1.5 to 4 Gy/frac; given that the sample sizes are comparable, the datapoints for HCC are much sparser resulting in higher interpolation error; (2) most of the patients with a clinically negative outcome for SBRT received a lower adaptive dose than the positive case; this can confuse the RTOE, which assumes higher doses results in higher TCP and NTCP; (3) due to class-imbalance, the corrected GLoGD-GNN RTOE yielded a flatter monotonic relation than expected that did not spanned the whole probability space; we observed that the AI agent failed to satisfy the population-based goal of TCP > 90% and NTCP < 25%. So, we could set the computation goal of TCP >50% and NTCP < 50%. A smaller RMSD value can be achieved with a large sample size and well-balanced dataset.
In conclusion, we have built a user-friendly software for AI-assisted clinical decision-making and demonstrated its performance in adaptive RT. The underlying technology behind the software is generalizable to other sequential decision-making tasks in oncology. We employed GNNs to exploit the inter-feature relationship. We trained and validated our software in two different treatment types for two different diseases. We repeated the training and validation for 2 different models to test our hypothesis of improved model performance. The results confirmed our hypothesis. Statistical Ensemble was adopted to assess the model uncertainty.
Data Availability
The training data analyzed in this study are obtained from the University of Michigan. Restrictions apply to the availability of these data, which were used under the data sharing protocol for this study. ARCliDS is publicly available at https://arclids.shinyapps.io/ARCliDS for a limited time.
Contributors
I.E.N. conceived the study. I.E.N., R.K.T.H., M.M.M., J.J., and I.D.D, supervised the project. T.S.L., S.J., M.M.M., R.K.T.H., and K.C supervised and took part in the original clinical trials. D.N. designed and developed the software, wrote the programming codes, and carried out model training and validation. K.C. collected the data, provided clinical insight, and evaluated the software. Y.L. conducted feature selection. D.N., W.S., and J.J. contributed to the algorithmic development of the underlying methodologies. All authors carefully analyzed the methods and results. D.N. drafted the manuscript and all authors critically read and contributed to the final version.
Data Sharing
The training data analyzed in this study are obtained from the University of Michigan. Restrictions apply to the availability of these data, which were used under the data sharing protocol for this study. ARCliDS is publicly available at https://arclids.shinyapps.io/ARCliDS for a limited time.
Declaration of Interest
IEN is on the scientific advisory of Endectra, LLC., act as deputy editor for the journal of Medical Physics and receives funding from NIH and DoD. IDD acknowledges NIH grants R01-MH126137 and T32-GM141746. TSL and RKTH acknowledge NIH grant P01 CA59827 for funding the clinical trials that produced the datasets. A provisional patent was filed, titled “Adaptive Radiotherapy Clinical Decision Support Tool and Related Methods”, Application Serial Number: 63/272,888, filed on 10/28/2021.
Supplementary Materials
S1. Feature Selection and Graph Building via Markov Blanket Approach
In this work, we adopted the feature graph from Yi et al.’s work on multi-objective Bayesian networks for radiotherapy outcome prediction1. The steps of building an appropriate multi-objective Bayesian network (MO-BN) for joint prediction of TCP and NTCP include large-scale feature selection and network structure learning. The first step intends to identify important features from a high-dimensional dataset by exploring extended Markov blankets (MBs) of TCP and NTCP. An MB is an inner family found by constraint-based algorithms such as incremental association Markov blanket (IAMB) and Hiton approaches. The MB contains all variables carrying information about TCP and NTCP that cannot be obtained from any other variables. For each member in the MB of TCP and NTCP, a next-of-kin MB for this member can also be derived, which is combinedly known as extended MB.
The second step is to combine the important features from the extended MBs and search for the best stable MO-BN for joint prediction. After accommodating radiobiologically plausible relationships based on reported literature, Tabu Search is employed to generate a stable MO-BN structure from 300 randomly generated bootstrap samples. Bayesian Dirichlet equivalent (BDe) that provides an inherent penalty for model complexity is used as a scoring function to obtain the stable MO-BN. However, an initial stable network may not be the best model for joint TCP and NTCP prediction due to unimportant or redundant features in it. So, a leaf node is found out by increasing the arc threshold in the MO-BN generation. After removing the leaf node, a stable MO-BN can be generated again from Tabu Search. The process continues until the resulting MO-BN reaches its maximal prediction performance based on cross-validation. The network structure found for NSCLC and HCC are presented in Figures S5 and S18.
S2. ARCliDS Training Sequence
Train Artificial Radiotherapy Environment (ARTE) via supervised learning using patient’s feature and dose plan as the input, and corresponding clinical outcomes as the label.
Model Transition Function (TF) to take in mid-treatment feature (seval) and predict after-treatment feature (sadapt) for the adaptation daily dose fractionation (dadapt) More details are presented in Section S4.
Train RT Outcome Estimators (RTOE) to estimate treatment outcome (tcp, ntcp) for the predicted sadapt and covariate c as a binary classifier as shown in Figure S1. More details are presented in Section S5.
Design Reward Function that yields reward r which increases with increasing tcpiand decreases with increasing ntcp. For instance:
Train Optimal Decision Maker (ODM) via deep reinforcement learning on a trained ARTE from step 1.
Inputs seval, c, and dadapt to the ARTE and store sadapt, tcp, ntcp and r for all GAN-generated synthetic patients More details on GAN are presented in Section S6.
Train ODM via Deep Q-learning as a one-step optimization process, i.e., one training episode for all patients representing the adaptive phase. More details are presented in Section S7.
S3. Transition Function for generalized equivalent uniform dose (gEUD)
GEUD absorbed by a patient must increase monotonically with increasing daily dose fractionation based on known principles of radiation biology3. To enforce this monotonic relation, we use a function of the form, where, i is the ith daily dose fractionation, g stands for gEUD, N stands for number of daily dose fractions, d stands for amount of dose fractions, DT stands for threshold dose related to the linear-quadratic-linear (LQL) model, and α/β ratio is a parameter that differentiates tissue type.
As shown in Figure S2, for our case, two relations arise from relation (S1), as follows:
And, where, g stands for gEUD, N for nth daily dose fractions, d for dose fractions, DT for threshold doses, and α/β ratio is a tissue-specific parameter. The subscript 0, eval, and adapt of N and gcorresponds to pre-, mid-, and after-treatment, respectively while deval and dadapt corresponds to applied daily dose fractionations during the evaluation phase and adaptive phase, respectively.
Furthermore, four scenarios can arise. The four different gEUD transition functions can be derived by taking the ratios of relations (S2) and (S3) as presented in Table S1.
S4. RT Outcome Estimator (RTOE)
S4.1 Generalized Logistic Function Guided Double GNN (GLoGD-GNN) for Monotonic TCP/NTCP
Likewise, both TCP and NTCP must increase monotonically with increasing radiation dose. Due to the patient heterogeneity related noise in the data, RTOE composed of a single GNN classifier could not adequately represent the monotonic relationship. Thus, we developed a guided dual GNN architecture named GLoGD-GNN, which comprises of two GNNs, μGNN and TGNN that is fed into a generalized logistic function along with the gadapt. GloGD-GNN can be summarized as follows, where, μGNN and TGNN are GNN’s with sigmoid output layer and takes in sadapt as input. GEUD gadapt is min-max normalized before feeding it to the logistic function. During training, the weights of the GNN’s are updated alternately; when one is training, the other is kept frozen.
S4.2 RTOE Hyper Parameter (HP) Tuning and Validation
We performed a grid search for HP tuning. We used Adam optimizer for the optimization and Area Under the Receiver Operating Characteristic Curve (AUROCC) as the performance metric. We applied 10-fold stratified shuffle 80-20 split on the imbalanced dataset, where the data sets were stratified according to the binary outcome so that each test sets contained the same ratios of the outcome class. In addition, we randomly oversampled the minority class of each training split. Both training and validation dataset were batched and randomly sampled during model training.
For the Single GNN architecture, we searched the following HP space:
Optimizer learning rate = [0.0001, 0.0005, 0.001, 0.005],
Number of nodes = [256, 128, 64], Training Epoch = [50, 100, 200, 300]
For the GloGD GNN architecture, the HP search space was as follows:
Optimizer learning rate for mu GNN = [0.0001, 0.0005, 0.001],
Optimizer learning rate for T GNN = [0.0001, 0.0005, 0.001],
Number of nodes = [256, 128, 64], Training Epoch = [100, 200, 300, 400]
After selecting the best HP that corresponded to the best average test AUROCC score, we reran validation using the best HP to test the reproducibility. The best HP and all of the original datasets were then used to train five RTOE models. The results from validation on the NSCLC and HCC datasets are summarized in Figure S3.
S5. Synthetic Patients via Generative Adversarial Network (GAN)
To extend the sample size of our dataset, we generated 10,000 synthetic patients via Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP)4,5. We compared the distribution of synthetic patient with the original patient population data using the Jensen Shannon Divergence (JSD) metric as shown in the subsequent sections. JSD value of 0 means complete overlap and 1 means complete separation. We did not perform any statistical hypothesis test on the learned distribution as it is not necessary for the training of ODM. In principle, a uniform distribution works just as fine, however, with increase in computational complexity. Nevertheless, we ascertained the similarity by visual inspection and JSD. A comparison between the original and generated datasets are presented in Figures S13 and S26.
We applied a four-layer deep neural network with 256 nodes for both the discriminator and generator. We designed the generator to take in a 64-dimension normally distributed random numbers. We applied ADAM optimizer for the training with the learning rate of 1E-4, β1 of 0.5, and β2 of 0.9 and trained the GAN for 500 epochs. To maintain the stability by keeping discriminator ahead, we trained the discriminator 5 times for generator’s every training epoch. As in the original work, we used a Gradient Penalty weight of 10 for our training.
S6. Double Deep Q-Learning for Optimal Decision Maker (ODM)
ODM was trained using Double deep Q-learning (DDQN) algorithm. DDQN uses two neural networks, namely policy Q-net and target Q-net. This approach helps in correcting the overestimation of the q-values. We used a 5 layer deep 256 node wide architecture. For training parameters, batch size was set to 128, gamma (discount factor) to 0.8, Polyak factor to 0.99 (used for updating the target net’s parameter) and Adam optimizer’s learning rate to 0.0005.
For robustness, a planning and learning scheme were applied. Since in clinical setting, the physicians only get to select one action, we treated the problem as a one-step optimization problem, i.e., all episodes were terminated after one step. So, we performed an exhaustive search (i.e., explored all actions), before training the q-network. Note that this results in a greedy algorithm.
In the planning phase, 4000 out of 10,000 synthetic patients were randomly selected and their next states for all possible actions were exhaustively found using ARTE and stored. Additionally, the reward values and binary information on where they met the goal outcome were also stored. Then the DDQN was trained for 300 epochs. Huber loss was used as the loss function for updating the policy Q-net and Polyak update us used for updating the target Q-net.
The update rule for double deep Q-net learning is as follows,
Here, Yt is the target value, γ is the discount factor, wt is the weights of policy net Qp, and is the weights of the target network Qt. The policy net is updated with the target value every step via stochastic gradient descent. However, the target net is kept almost fixed, only updating in small increment by copying parameters from target net via Polyak averaging, i.e. where α is the polyak factor.
For quantifying model uncertainty (or decision confidence), we trained 5 identical models.
Use Case 1: NSCLC
S7.1 Data Description
Dense multi-omics information on 117 patients were available. Out of that, 67 patients had complete information.
S7.3 RTOE Hyper Parameter Tuning
Tables S4-S7 lists the top 10 performing HP for the NSCLC dataset followed by receivers operating characteristics (ROC) for the best performing HP (marked in bold) shown in Figures S4-S12. To check for the reproducibility, ROC were generated by retraining the models with the best performing HP. The reported AUROC values are in the mean±stdev format, where the mean AUROCC value is the area under the mean true positive rate curve, while the standard deviation is calculated from the AUROCC of 10 individual model output.
S7.4 Synthetic Patient Generation via WGAN-GP
S7.5 ODM Decision Analysis
We trained the models on only the synthetic data and set aside the entire clinical dataset for testing. To further test the generalizability, each model was trained with 4000 out of 10,000 randomly chosen synthetic patients. After learning, the models were tested on the clinical data. We compared two model architectures as listed in Table S8 in terms of Root Mean Square Difference (RMSD) and Mean Absolute Difference (MAD) calculated with respect to the reported clinical decisions. For a level comparison, all architectures were trained under identical conditions.
S7.5.1 DDQN trained on Single GNN RTOE - NSCLC
S7.5.2 DDQN trained on GLoGD-GNN RTOE- NSCLC
Use Case 2: HCC
As shown in the Figure S17, decision making in the Adaptive Arm is a two-step process. First, the patient is given 3 daily dose fractions and evaluated if they are safe for more radiation. Second, for those that are safe, there is a question of what the optimal adaptive dose should be. The first decision can be decided by an RTOE, while the second decision by another RTOE and an ODM.
The Non-Adaptive RTOE is trained with input from non-adaptive patient’s pre and mid treatment information and label from their outcome. The Adaptive RTOE is trained on the adaptive patient’s dataset.
S8.1 Data Description
Information on 292 patients with 360 tumor sites were available. The greater number of tumor sites is due to multiple tumor or recurrence. Out of that, 81 patients with 104 tumor sites had dense multi-omics data available and 71 patients with 99 tumor sites had complete information.
For the training of Adaptive RTOE, we added the 3 non-adaptive patients to the list of adaptive patients to increase the LC=0 count. To train RTOE for LC, the data from patients with multiple tumor sites were considered as different datapoints. Due to small sample size of non-adaptive patients, we considered only the second decision-making problem for the adaptive patients.
S8.3 RTOE Hyper Parameter Tuning
Tables S12-S15 lists the top 10 performing HP for the HCC dataset followed by receivers operating curves (ROC) for the best performing HP (in bold) shown in Figures S22-S25. To check for the reproducibility, ROCs were generated by retraining the models with the best performing HP. The reported AUROCC values are in the mean±stdev format, where the mean AUROCC value is the area under the mean true positive rate curve, while the standard deviation is calculated from the AUROCC of 10 individual model output.
S8.4 Synthetic Patient Generation via WGAN-GP
S8.5 ODM Decision Analysis
S8.5.1 DDQN trained on Single GNN RTOE – HCC
S8.5.2 DDQN trained on GLoGD-GNN RTOE - HCC
S8.5.3 Experimentation with population-based reward goal for HCC
Due to data-related issues, as described in the Discussion Section S10, both TCP and NTCP response estimated by the GloGD-GNN ROTE were flatter than expected. Due to a flatter NTCP response, which did not span the whole probability space, the population-based reward goal severely limited the AI dose recommendation. In this section, we present plots for three different reward goals. As seen from the plots, for ntcp < 25%, the AI recommended lower dose values. The best recommendation corresponded to tcp > 90% and ntcp < 40%, however, the overall RMSD was lower than tcp > 50% and ntcp <50%.
Footnotes
Funding Statement This work was partly supported by National Institute of Health (NIH) grant R01-CA233487 and its supplement.