ABSTRACT
Background Late-stage cancer immunotherapy trials strive to demonstrate the clinical efficacy of novel immunotherapies, which is leading to exceptional responses and long-term survival in subsets of patients. To establish the clinical efficacy of an immunotherapy, it is critical to adjust the trial’s design to the expected immunotherapy-specific response patterns.
Methods In silico cancer immunotherapy trials are virtual clinical trials that simulate the kinetics and outcome of immunotherapy depending on the type and treatment schedule. We used an ordinary differential equation model to simulate (1) cellular interactions within the tumor microenvironment, (2) translates these into disease courses in patients, and (3) assemble populations of virtual patients to simulate in silico late-stage immunotherapy, chemotherapy, or combination trials. We predict trial outcomes and investigate how therapy-specific response patterns affect the probability of their success.
Results In silico cancer immunotherapy trials reveal that immunotherapy-derived survival kinetics – such as delayed curve separation and plateauing curve of the treatment arm – arise naturally due to biological interactions in the tumor microenvironment. In silico clinical trials are capable of translating these biological interactions into survival kinetics. Considering four aspects of clinical trial design – sample size calculations, endpoint and randomization rate selection, and interim analysis planning – we illustrate that failing to consider such distinctive response patterns can significantly reduce the power of novel immunotherapy trials.
Conclusion In silico trials have three significant implications for immuno-oncology. First, they provide an economical approach to verify the robustness of biological assumptions underlying an immunotherapy trial and help to scrutinize its design. Second, the biological basis of these trials facilitates and encourages communication between biomedical researchers, doctors, and trialists. Third, its application as an educational tool can illustrate design principles to scientists in training, contributing to improved designs and higher success rates of future immunotherapy trials.
INTRODUCTION
Immunotherapy is revolutionizing the treatment landscape for patients with advanced cancers. While the number of immuno-oncology drugs under investigation is rising rapidly – around 4700 agents are currently in the development pipeline - the need to further improve patient outcomes remains high1. Well-designed immunotherapy trials are crucial to establish advances in clinical outcomes robustly. Unfortunately, the odds for cancer treatments to successfully pass the development pipeline are unfavorable, and only a minority of the treatments (5-10%) will ultimately obtain market approval2, 3, 4. Even for cancer therapies that reach late-stage development, approval rates remain modest at around 27%5. The primary reason in most of these trials (i.e., 63.7%) is failure to demonstrate efficacy5, which can be partly attributed to suboptimal trial design choices based on overly optimistic assumptions of the treatment effect. Such assumptions may be used to erroneously justify low numbers of patients or inappropriate endpoints and lower the power of these trials5, 6.
Undoubtedly, design choices in immunotherapy trials are complex, and conventional design methods are not naturally well attuned to the unique characteristics of immunotherapies7. Their broad spectrum – ranging from immunomodulators to cell therapies, cancer vaccines, oncolytic viruses, and CD3-targeted bispecific antibodies – illustrates the variety in molecular mechanisms, leading to novel toxicity profiles, response patterns, and survival kinetics8, 9, 10. These observations render a ‘one-design-fits-all’ approach futile and stress the need for immunotherapy- or even combination-therapy-tailored designs.
Immunotherapies are known to induce a delayed clinical effect and long-term overall survival (OS) in only a subset of patients11. The survival curve reflects these phenomena by a delayed curve separation and a plateau of the treatment arm at later stages of the trial, respectively12. Many immunotherapies, thereby, violate a fundamental premise that underlies the design of many trials: the proportional hazard assumption (PHA) – essentially stating that the treatment effect should remain constant over time13. As a result, immunotherapy trials based on this principle can have an overestimated power12, 13 and require a longer follow-up to demonstrate efficacy than initially planned12, increasing the likelihood of a negative trial.
These issues led to the development of innovative methods such as novel radiological criteria to quantify tumor responses9, 14, 15, (surrogate) endpoints to capture unique survival kinetics10, 16, 17, 18, 19, biomarkers to enrich for potentially responding patients for treatment20, 21, 22, 23, and statistical methods to retain a trial’s power in the presence of non-conventional survival kinetics24, 25, 26. Despite the plethora of available methods, it still remains complicated to predict trial outcomes in advance and adjust the methodology accordingly. The stakes are high: accurate predictions could augment the likelihood of a positive trial, whereas a misjudgment could result in a negative trial, potentially compromising patient benefit, vast amounts of work, and (public) research funds.
In this study, we use late-stage in silico cancer immunotherapy trials to investigate how design decisions affect the trial outcome in the context of cancer immunotherapy, possibly combined with chemotherapy. The mechanism-based nature of these trials allows researchers to translate cellular processes in the tumor microenvironment and interventions thereon into immunotherapy-specific response patterns, survival kinetics, and outcomes of novel immunotherapy trials. An in silico immunotherapy trial is based on clear-cut biological assumptions and provides an intuitive means to predict risk profiles and treatment efficacy. Moreover, it equips researchers with a tool to verify trial designs and analysis strategies of upcoming trials before the trials’ execution. First, we show that our simulations replicate late-stage immunotherapy or combination trials realistically and capture their typical survival kinetics. Then, we demonstrate various applications of these trial simulations, including the ability to predict the trial outcomes and calculate sample sizes for specific treatment and control groups. Finally, we illustrate the consequences of (not) considering immunotherapy-specific response patterns in settings selected for educational purposes, such as selecting survival endpoints and randomization ratios of upcoming trials and planning interim analyses.
METHODS
Mechanism-based model of the tumor microenvironment
We extended our previously published ordinary differential equation (ODE) model that describes cancer development in the tumor microenvironment and the subsequently induced anti-tumor response in patients27. Briefly, the model describes cancer onset and progression, starting with the malignant degeneration of a single cell into a tumor cell. This tumor cell divides, leading to a proliferating mass (ρ : growth rate) of tumor cells (T; equation 1; Figure 1A). An anti-tumor immune response is induced within the tumor microenvironment, leading to the killing of tumor cells (at killing rate ξ; equation 1). The killing of tumor cells is implemented using a quasi-steady state approximation proposed by Borghans et al.28, 29. For details on the implementation, we refer to our previous work27. The immune cells within the tumor microenvironment originate from tumor-draining lymph nodes, where naive cytotoxic T cells (N) turn into activated T cells (S) at priming rate α. Since a larger tumor mass is generally more immunogenic, the priming rate is scaled by the tumor size (; equations 3 and 4). Primed T cells proliferate in the lymph nodes at rate ps and migrate into the tumor microenvironment at rate ms (equation 2). The natural life span of T cells in the tumor microenvironment is modeled by incorporating a natural death rate δ of T cells. The system of ODEs is presented below:
(A) Cellular interactions between a tumor and the immune system are captured in an ODE model. This model describes immunogenic tumor growth leading to a T cell response originating from lymph nodes. Disease courses in patients could be steered by immunotherapy, chemotherapy, or a combination of both. Parameters: α = naive T cell priming rate, δ = effector T cell death rate, ξ = effector T cell killing rate, ρ = tumor growth rate, ps = effector T cell proliferation rate, and ms = effector T cell migration rate. (B) After implementation, we used clinical trial-derived survival data to fit the model parameters (see Suppl. Figure 1). (C) Patients received either no treatment (placebo), chemotherapy, immunotherapy, or both. Disease trajectories based on tumor-immune dynamics were simulated for each patient, resulting in individual survival outcomes. (D) Subsequently, cohorts of patients were constructed based on the fitted parameters to simulate actual immunotherapy trials. (E) Applications of these trials include predicting survival outcomes of trials, estimating appropriate sample sizes, selecting endpoints and randomization ratios, and investigating the timing of interim analyses.
![Embedded Image](https://www.medrxiv.org/sites/default/files/highwire/medrxiv/early/2021/09/13/2021.09.09.21263319/embed/graphic-2.gif)
Since a constant time-independent tumor growth rate would unlikely be observed in a clinical setting, we have added two additional parameters to the model that influence the growth rate of tumors in a time-dependent manner, which are:
The tumor growth rate decline (Δρ): a parameter that describes to which extent the proliferation rate of tumor cells gradually declines over time.
The decay rate of the tumor growth rate decline (ρδ hereafter referred to as ‘decline decay rate’): a parameter that indicates at which pace the tumor growth rate decline decreases.
The model parameters and their values are listed in Table 1; their rationale is described previously27. Moreover, the values of the tumor growth rate decline and its decay rate were set to augment the interpatient variability in tumor development and allow for more extensive disease trajectories. At baseline, only one tumor cell and a pool of 106 naive T cells are presumed to be present, while activated T cells are absent, yielding the following initial conditions for the simulations: T(0) = 1, I(0) = 0, S(0) = 0, and N(0) = 106.
Overview of model parameters
Simulating untreated disease, chemotherapy, and immunotherapy in individual patients
Using this ODE model, we simulated cancer development and disease trajectories in patients. We varied the tumor properties (i.e., the tumor growth rate, the growth rate decline, and the decline decay rate) between patients extensively to guarantee interpatient variation in disease courses. Unless otherwise specified, the remaining model parameters are set to the same values for all patients (Table 1).
Each patient is simulated from cancer onset (i.e., malignant transformation of the first cell) for up to more than two years (i.e., 800 days). A simulated time step corresponds to one day. The diagnosis threshold of a tumor mass was set to 65 * 108 cells, corresponding to the size at which common malignancies are diagnosed. The lethal tumor burden is set to 1012 tumor cells (a tumor volume of approximately 10.6 dm3).
Disease trajectories of patients with cancer can be steered with therapy. Given their prominent roles in the oncological treatment landscape, we included immune checkpoint inhibitors (ICI) and chemotherapy in the model. Both treatments function through their primary modes of action. ICI are implemented as follows: once a cancer reaches a diagnosis threshold, immune checkpoint inhibitors increase the killing rate of cytotoxic T cells (multiplication factor: 0-7), enabling them to eradicate tumor cells. The duration and potency of the ICI treatment eventually determines patient outcome.
In patients treated with chemotherapy, the immune system is still present; however, it is not boosted (as is the case during ICI treatment), hence the T cells are not potent enough to curb the tumor growth. Once a patient is diagnosed with cancer, chemotherapy can reduce the tumor growth rate with its cytotoxic capacity (multiplication factor: 0-1). Again, the duration and potency determine patient outcome. By default, the treatment duration for ICI and chemotherapy are two years and six months, respectively.
Simulating untreated cohorts in an in silico trial: model fitting
To expand our modeling approach from a single patient into a trial cohort, we simulated multiple patients with individualized disease courses based on unique tumor properties. As an illustrative example, we took a publicly available dataset of patients with advanced lung cancer from the North Central Cancer Treatment Group (NCCTG) and regarded the survival times of these patients as if they were untreated30. We fitted our model to the NCCTG dataset to show that our trial model can reproduce authentic survival kinetics as observed in clinical trials. Specifically, we searched for parameter combinations reflecting realistic survival times for each patient in our model. Since a single-parameter fitting approach could not generate a sufficiently wide range of survival times, we used a multidimensional fitting approach. An overview of the fitting approach is depicted in Supplementary Figure 1.
Specifically, we fitted the tumor growth rate (ρ), the decline in tumor growth rate over time (Δρ), and the decline’s decay rate (ρδ) of patients to OS. The fitting approach comprised two processes: 1) sampling non-censored survival values and 2) translating these survival values to model parameters. We fitted a (parametric) Weibull distribution to the NCCTG lung cancer dataset (shape κ: 1.32, scale λ: 417.76; Supplementary figure 1A). The choice for a Weibull distribution is based on the Akaike information criterion and the fact that a Weibull model is a survival model from which the parameters (i.e., scale and shape) contain a mechanistic meaning and can, therefore, be interpreted. From the Weibull distribution, non-censored survival values were sampled (Supplementary figure 1B). To translate the survival times into model parameters, we generated a 3D grid containing survival times for ranges of parameters (ρ : 1.76 – 150, Δ: -0.6 -0, ρδ: -2 - 0; Supplementary figure 1C). From this grid, combinations of these three parameters that led to a similar OS (i.e., the isosurfaces) were extracted with a marching cubes algorithm (Supplementary figure 1D). The isosurfaces corresponding to the non-censored survival times from the Weibull distribution were selected, and parameter combinations were sampled from these isosurfaces randomly (Supplementary figure 1E). To illustrate that these parameter combinations led to realistic survival kinetics as seen in clinical trials, we compared the original NCCTG lung cancer dataset with our simulation result (Supplementary figure 1F; Figure 1).
Simulating late-stage immunotherapy trials
Late-stage (i.e., phase III) clinical trials traditionally contain two arms: a control arm and a treatment arm. The control arm can be a placebo (i.e., untreated) or a standard of care therapy. To construct phase III in silico immunotherapy trials, we extended the simulations with treatment cohorts (mono-chemotherapy, mono-immunotherapy, chemoimmunotherapy, or induction chemotherapy followed by immunotherapy). These cohorts facilitate the comparison between various treatment regimens. A treatment cohort uses the same baseline parameter distribution as a control cohort. It differs in one critical aspect, though: once patients in the treatment arm reach a tumor burden that corresponds to the diagnosis threshold, patients can be treated with chemotherapy, ICI, or combination therapy, as described above. The distribution of survival parameters is, unless otherwise specified, derived from the most mature, digitized data from the CA184-024 trial, as shown below31. At inclusion into the trial, patients are randomly assigned to a study arm (randomization).
The primary endpoint of the trials is the 2-year OS. Given the absence of accrual times in in silico trials, the trial duration equals two years, which provides each patient in the trial with 24 months follow-up at the time of analysis. If the OS endpoint is not reached for a patient, the patient is considered censored for the endpoint and regarded as such in subsequent analyses.
Power simulations
To illustrate how the analysis method can affect the outcome of immunotherapy trials, we use several simulation approaches to calculate the power of trials. Power simulations were performed as follows: per data point, 1000 clinical trials are simulated. The survival data from each trial is analyzed with a log-rank test (dependent on the proportional hazard assumption) or proportions test (Pearson’s chi-squared test; independent on the proportional hazard assumption), and we count the number of positive trials (defined as p < 0.05). The percentage of positive trials indicates the power of the trial.
Data digitization & reconstruction
For some survival curves, the raw data was not available. Therefore, we extracted data points from the Kaplan-Meier curves with WebPlotDigitizer (https://apps.automeris.io/wpd/), and individual patient data was reconstructed with the IPDfromKM package in R.
Analyses
Analyses and visualizations were performed in R. The complete list of R packages used throughout this manuscript is provided in Supplementary Table 1. The code is accessible via https://github.com/jeroencreemers/in-silico-clinical-trials.
RESULTS
Generating trial populations based on tumor-immune dynamics
We used in silico cancer immunotherapy trials – a mechanism-based simulation platform of cancer-immune dynamics – to investigate the consequences of immunotherapy-specific response patterns on trial design principles27. Patients within these trials are simulated with an ODE model, which describes cancer development in a patient by modeling the interaction between tumor cells and the immune system27. In short, the model describes the following tumor-immune dynamics in the tumor microenvironment: immunogenic tumor growth leading to priming and clonal expansion of T cells in the lymph nodes, migration of effector T cells from lymph node to the tumor microenvironment, and formation of tumor-immune complexes to enable tumor cell killing (see Methods; Figure 1A). The model allows the treatment of patients with ICI and chemotherapy. ICI increase the killing rate of T cells and have a direct effect on the tumor-immune dynamics. Chemotherapy has a cytotoxic effect on the tumor, slowing down its growth rate. A detailed description of the model, including the rationale for parameter selection, has been published previously27.
The in silico clinical trials describe cancer outcomes on three levels: (1) a cellular level, (2) a patient level, and (3) a trial population level. Cellular interactions in the tumor microenvironment are translated into clinical trial outcomes as follows: firstly, the ODE model is implemented, and model parameters are selected using a fitting approach (Figure 1B; see Methods). Next, individualized disease trajectories – either treated or untreated – of cancer patients are generated (Figure 1C). Eventually, patients are randomized into two cohorts to resemble conventional phase III trials: a control group (either placebo or chemotherapy) and a treatment group (immunotherapy, chemoimmunotherapy, or induction chemotherapy followed by immunotherapy; Figure 1D). Since the cellular dynamics (e.g., tumor burden over time or the efficacy of T cell killing) and survival outcomes of these patients are known and can be modified, in silico clinical trials are suited to answer questions like: “Assuming that a novel treatment X increases T cell killing by 5%, how does this translate to a survival benefit in patients? Moreover, how many patients are needed to establish this benefit in a clinical trial? When should one analyze the results?” (Figure 1E).
In silico late-stage immunotherapy trials yield realistic survival outcomes
To illustrate that this in silico clinical trial approach can generate realistic survival kinetics as observed in late-stage immunotherapy trials, we fitted the simulation to three different datasets: (1) the NCCTG lung cancer survival dataset30; (2) the CA184-024 trial [ipilimumab + dacarbazine vs. dacarbazine in previously untreated metastatic melanoma31]; and (3) the CheckMate 066 trial [nivolumab + placebo vs. dacarbazine + placebo in treatment-naive metastatic melanoma patients without BRAF mutation32]. The choice for these trials is based on the size of the trials and the maturity of the data. The follow-up of the CA184-024 trial and the CheckMate 066 trial were five and three years, respectively. As the last two datasets were not publicly available, we extracted the data using image digitization (see Methods). As a reference for the in silico trials, we visualized the Kaplan-Meier estimators of these datasets (Figure 2A). Both trials were digitized correctly, as reflected by the nearly identical risk tables compared to the original manuscripts31, 32. Next, we fitted our trial simulation model on the NCCTG dataset and the control arms of the CA184-024 and CheckMate 066 trials (Figure 2B; black lines). Given the limited response rates of dacarbazine for metastatic melanoma (15%), the patients in the control arm were regarded as untreated, and the model was fitted as such. For simplicity, we did not simulate dropout or censoring in the trials shown in this paper, although it could be added to the simulation. On average, the simulations capture the survival kinetics of the trials accurately, which is reflected by similar median overall survival values and reasonably corresponding risk tables. The final step to fully resemble late-stage immunotherapy trials in a simulation setting is replicating the treatment arms of the CA184-024 and CheckMate 066 trials. These simulated patients were treated with ICI upon diagnosis. ICI increased their T cell killing rate seven-fold and prolonged their survival, leading to OS benefit in the in silico trial that matched the original trial. Hence, these in silico trials couple the disease mechanism and mechanistic treatment effect to a realistic clinical trial outcome. Interestingly, although not incorporated explicitly in our model, the trials show survival kinetics arising as a consequence of the interaction between tumor and immune cells typically seen in immunotherapy trials: a delayed curve separation and a plateau of the survival curve of the treatment arm at later stages of the trial (Figure 2B (last two columns)).
(A) Kaplan-Meier estimators of the NCCTG, CA184-024, and CheckMate 066 trials. While the NCCTG dataset is publicly available 30, the others are carefully reconstructed survival curves based on digitized data from their respective manuscripts 31, 32. (B) These trial simulations can mimic representative survival kinetics as observed in actual immunotherapy trials. Specifically, typical immunotherapy-related survival kinetics – such as a delayed curve separation and a plateauing survival curve in the treatment arm – arise from these simulations as emergent behavior. Note: We matched the risk table intervals to the original manuscripts for comparative purposes.
In silico immunotherapy trials enable a priori prediction of trial outcomes and uncover immunotherapy-specific response patterns
The design and the success rate of any clinical trial depends, among others, on an accurate a priori prediction of the survival kinetics – i.e., the shape of the survival curves and the trial outcome. For late-stage immunotherapy trials, commonly observed immunotherapy-induced response patterns are a delayed curve separation and a plateauing tail of the survival curve of the treatment arm (Figure 2). These characteristic survival curve shapes reveal a violation of a vital premise at the basis of many clinical trials: the proportional hazard assumption (PHA). The PHA states that the ‘instantaneous death rate’ of a patient (i.e., the hazard rate) in both arms of the trial should be proportional, resulting in a constant hazard ratio. Many traditional design methods, ranging from sample size calculations to outcome analyses, depend on this theory. For late-stage immunotherapy trials, this induces two problems: (1) while a violation of the PHA needs to be addressed during trial planning, the hazard rates – and thereby the fact if the trial adheres to or violates the PHA – become available after analysis of the trial, and (2) if a trial does not adhere to a PHA, what will be the shape of the survival kinetics? Especially in an era where treatment and control arm regimens are becoming increasingly complex, adjusting the design and analysis methods to unknown survival kinetics is challenging.
In silico clinical trials can provide principled estimates of the shape of the survival curve, including the underlying hazard rates and hazard ratios before trial execution. The most traditional scenario would be a trial in which patients are randomized 1:1 to mono-chemotherapy or placebo. Given the direct chemotherapy effect, the PHA is generally assumed to hold for these trials. An in silico trial in which chemotherapy reduces the tumor growth rate to 70% for the duration of the trial indeed replicates these assumptions (Supplementary Figure 2): the survival curves separate from the start of the trial, and the hazard ratio is constant over time. However, what happens if the chemotherapy effect does not last for the entire trial but for – maybe more realistically – 6 months? Initial proportional separation of the survival curves is followed by a nearly parallel decay of both curves, leading to an early survival benefit for the chemotherapy arm (Figure 3A). Consequently, for any therapy with a non-constant treatment effect – even for chemotherapy trials – deviations from the PHA might be observed. When we switch to immunotherapy in the treatment arm, a violated PHA becomes immediately apparent. Recall that in our model, immunotherapy exerts its mechanistic effect indirectly on the tumor via an increase in the killing rate of T cells. Through approximately the first six months, the hazard rates remain constant over time, but after that, they start to decline in the immunotherapy group (red line), yielding a non-constant hazard ratio over time (Figure 3B).
(A-E) Examples of 1:1 randomized trials with various (treatment) regimens (n=600 per arm). (A) A traditional chemotherapy trial (vs. placebo) only shows a proportional hazard ratio when the biological treatment effect targets the tumor directly and remains constant over time (compare to Supplementary Figure 2). (B) An in silico immunotherapy trial elicits typical immunotherapy-induced survival kinetics (i.e., delayed curve separation) and violates the proportional hazard assumption. (C-E) More intricate treatment or control regimens – (C) chemoimmunotherapy vs. chemotherapy, (D) immunotherapy + chemotherapy-placebo vs. chemotherapy + immunotherapy-placebo, or (E) induction chemotherapy followed by immunotherapy vs. immunotherapy – induce more complex survival kinetics, including (D) crossing survival curves or (E) only a temporary separation of the survival curves. The horizontal bars in column 1 indicate the duration of the treatment effect (T = treatment, C = control). The red dot in column three indicates the overall hazard ratio. For an accurate prediction of the hazard rates and hazard ratios, saturated survival curves (n=100.000 patients per arm) were used, and the data was smoothed before plotting.
The flexibility of in silico trials lies in their ability to incorporate complex treatment regimens. For example, let us assume one would be interested in estimating the survival curves and underlying hazard ratio over time of a chemoimmunotherapy vs. chemotherapy trial (Figure 3C), an immunotherapy + chemotherapy-placebo vs. chemotherapy vs. immunotherapy-placebo trial (Figure 3D), or a trial with induction chemotherapy followed by immunotherapy vs. immunotherapy (Figure 3E). Mechanism-based immunotherapy trials provide the means to translate biological assumptions regarding the disease and treatment effects into survival kinetics (including its hazard rate/ratio estimates). These survival kinetics, such as crossing survival curves (Figure 3D) or a temporary curve separation (Figure 3E), may be hard to predict otherwise and can be detrimental for the trial outcome if not dealt with appropriately.
Ignoring immunotherapy-specific response patterns might cause an overestimation of an immunotherapy trial’s power
To investigate the consequences of violating the PHA on the power of a clinical trial, we compared the power calculated using a PHA-dependent method (the Log Rank test) with a non-PHA-dependent method (Pearson’s Chi-squared test) for different clinical scenarios. An essential difference between both methods is that the Log Rank test considers the entire survival curve, while Pearson’s Chi-squared test only compares the number of events in both arms at 24 months. Logically, in a scenario that approximates the PHA the closest (such as a chemotherapy vs. placebo trial), a PHA-dependent method is superior (Figure 4A). However, a traditional immunotherapy trial violates the PHA, leading to a vast underestimation of the power of the trial when PHA-dependent methods are used for its planning (Figure 3B/4B). Given similar survival characteristics, this also holds for chemoimmunotherapy vs. chemotherapy trials (Figure 3C/4C). When survival curves cross during the trial, as seen in the immunotherapy vs. chemotherapy trial (Figure 3D), the PHA does not hold, and there is only a small survival benefit at 24 months for one of the arms of the trial. In this case, both methods predict the need for a large sample size to reach adequate power (Figure 4D). This insight indicates that it may be more rational to lengthen the trial’s follow-up (i.e., a different endpoint) instead of increasing the number of participants; a finding that – on top of the absolute power – requires insight in the shape of the survival kinetics. As a general finding, ignoring immune-specific response patterns might induce an overestimation of a trial’s current power, while in reality, a larger sample size (or a different endpoint) is necessary to establish the desired outcome.
A power comparison using a PHA-dependent method (Log-rank test) or a PHA-independent method (Pearson’s Chi-squared test) was made in different clinical scenarios: (A) a chemotherapy vs. placebo trial, (B) an immunotherapy vs. placebo trial, (C) a chemoimmunotherapy vs. chemotherapy trial, and (D) an immunotherapy vs. chemotherapy trial. Neglecting immunotherapy-specific response patterns while setting up a novel trial leads to a significant underestimation of the number of patients needed to show the efficacy of a treatment, thereby overestimating the power of the trial and reducing the probability of success.
In silico trials can validate endpoints and randomization ratios before trial execution
Clearly, the success rate of novel immunotherapy trials depends on more than its sample size alone. To establish an OS benefit of the treatment arm, it is crucial to analyze the trial once the data have reached a certain maturity – i.e., the treatment needs to be granted sufficient time to induce a survival benefit. We assumed that a delayed curve separation in immunotherapy trials would prolong the follow-up needed to establish an OS benefit of immunotherapy and thereby defer reaching maturity of the trial data. If the therapy is effective, data maturity can be regarded as the time point when a treatment effect can be observed. Hence, an optimal trial endpoint would be the earliest time at which this treatment effect can be detected with sufficient power. Therefore, we analyzed the power of differently sized trials with respect to their OS endpoint. Herein, we distinguished trials that were subject or were not subject to a delayed curve separation (immunotherapy and chemotherapy, respectively). In a classic chemotherapy trial, the treatment effect translates directly to a survival benefit in the treatment arm – the survival curves separate from the start. Therefore, the highest power will be obtained after the total duration of the treatment effect (Figure 5A, panel 1). In this case, the treatment effect lasts for six months, leading to the 6-months OS as the endpoint with the highest power. The delayed curve separation in immunotherapy trials renders it futile to analyze OS data early on in the trial (Figure 5B, panel 1). A practical ramification is that in the presence of a delayed curve separation, the trial requires a sufficiently long follow-up and an adequate size to gain power and detect immunotherapy-specific treatment effects. Mechanism- and simulation-based power calculations with in silico trials can consider these specific kinetics when determining the sample size for upcoming trials.
(A-B) In silico trials can be used to find the optimal endpoint (panel 1) or randomization ratio (panel 2) of novel trials. (A) Since the survival curves in classical chemotherapy trials separate from the trial onset, the highest power – and most optimal endpoint – is obtained at the end of the treatment interval (i.e., after six months in this example; see Figure 3A). Although less influential, a similar observation can be made for randomization ratios (study size panel 2: 300 patients). (B) Delayed curve separation in immunotherapy trials emphasizes that a premature final analysis of the primary OS endpoint is detrimental to the trial outcome. These trials permit validating the pre-specified survival outcomes of novel trials a priori. Commonly selected randomization ratios do not seem to be heavily influenced by immunotherapy-specific response patterns (study size panel 2: 1200 patient). Trial characteristics are similar to Figure 3A/B.
Given the observation that both the size of an immunotherapy trial and its endpoint heavily influence the probability of finding the survival benefit of interest, we presumed that increasing the size of the treatment arm – i.e., an unequal randomization scheme – would similarly affect the power. Instead of varying the study size, we now varied the randomization ratio (second panel of Figure 5A/B). Interestingly, while the power logically depended on the OS endpoint, the randomization ratio did not greatly affect the power (Figure 5B). Considering that an unequal treatment allocation may provide ethical benefits, we confirm that the randomization ratio in immunotherapy trials is of secondary importance compared to its size or primary OS endpoint. In summary, our in silico immunotherapy trials replicate existing insights from trial design on how violation of the PHA affects power and analysis choices. Our ability to directly translate biological assumptions on treatment mechanisms into survival kinetics allows the researcher to reason in a principled manner about whether such violations of the PHA would or would not be expected in their specific trial design and how the problem could be addressed if it arises.
Validating interim analyses to balance patient benefit and trial resources
We have observed a clear tradeoff between the power of an immunotherapy trial on the one hand, and the primary OS endpoint, and correspondingly the data maturity, on the other. Luckily, the two are not entirely mutually exclusive: interim analyses have been developed for ethical purposes to establish positive or harmful treatment effects early. However, there is a catch: the necessity to control for multiple testing at each interim analysis lowers the significance threshold on the final analysis to maintain the same overall type I error rate. This begs the question: “How many interim analyses should you plan, and when should you plan them?” Again, principled answers to such questions can be obtained with the help of in silico immunotherapy trials. To illustrate this, we simulated 1000 immunotherapy trials with 1200 patients per trial, randomized 1:1 over immunotherapy with a strong treatment effect or a placebo (Figure 6A). In the absence of interim analyses, the vast majority of the trials are predicted to end up positive. Adding interim analyses (O’Brien-Flemming approach) to the equation induces a tradeoff. On the one hand, increasing the number of equally-spaced interim analyses increases the probability of early detecting a positive treatment effect (e.g., approximately 60% of the trials are positive after 18 months in the case of three interim analyses; Figure 6A). On the other hand, the overall probability of ending up with a negative trial due to more stringent analyses (i.e., less power) also increases, especially in the case of immunotherapies with a weaker treatment effect (±57% without an interim analysis vs. ±63% with three interim analyses; Figure 6B). In an actual trial, the latter needs to be corrected by including additional patients to maintain the pre-planned power. Furthermore, we observe that the timing of the interim analysis is crucial. Whereas an interim analysis at 18 months provides additional value to the trial, interim analyses before 16 months are predicted to be wasteful: both due to the presence of non-proportional hazards and less mature data. As a control, we simulated trials without any treatment effect. By design, approximately 95% of the trials should end up negative irrespective of the number of interim analyses, which seemed to be the case (Figure 6C). Logically, the weaker the treatment effect, the higher the probability of erroneously finding a harmful treatment effect – a characteristic that the simulation also exhibits (Figure 6B/C).
(A) In the case of immunotherapy with a potent effect, in silico trials help develop a rationale for the timing of the interim analyses. While an interim analysis at 12 months might not add value to the trial, analyses after 16 and 18 months, respectively, have a probability of approximately 28% and 60% to lead to early stopping with a positive result. (B) Multiple interim analyses can reduce the probability of confirming the desired treatment effect in case of a weak immunotherapy effect. (C) In the absence of any treatment effect (a control scenario), the number of interim analyses does not heavily influence the trial outcome. Each trial simulation contains 1200 patients (randomization ratio 1:1) to ensure adequate power of the trial. Trials are analyzed with a proportions test (Pearson’s chi-squared test). Treatment effect (amplification factor of T cell killing rate): strong = 7, weak = 3, no effect = 0 (see Methods).
DISCUSSION
In this study, we used mechanism-based in silico cancer immunotherapy trials to predict survival kinetics and response profiles of novel immunotherapy trials. Complementary to conventional design methods, in silico trials provide the ability to investigate the implications of a researcher’s biological (as opposed to statistical) hypotheses of a drug’s mechanism of action for the design, conduct, analysis, and outcome of clinical trials. When comparing the simulated outcomes to actual immunotherapy trial outcomes, we showed that in silico trials are suited to translate complex biological mechanisms (such as observed during the treatment of patients with ICI) into realistic trial outcomes. Crucially, the survival kinetics that arose from these mechanism-based simulations reflected two pivotal components often found in immunotherapy trials: a delayed curve separation and a plateauing tail of the survival curve at later stages of the trial. In line with genuine immunotherapy trials, we find that these immunotherapy-specific response patterns differ considerably from chemotherapy-based kinetics. Our findings confirm that diversity in survival kinetics profoundly impacts the outcomes of immunotherapy trials33. Consequently, these features need to be considered when deciding on the sample size, endpoint, randomization ratio, and the number and timing of interim analyses of a novel immunotherapy trial.
Over the past two decades, in silico clinical trials are gaining in popularity. These trials enable investigating, among others, how novel drugs, treatment schedules, dosing regimens, and interpatient heterogeneity affect the outcome of a clinical trial34. In silico clinical studies have a wide range of applicability from pediatric infectious35 and orphan diseases36 to diabetes37, inflammatory autoimmune diseases38, traumatic injury39, psychiatric illness40, and cancer. In oncology, several in silico clinical trials involving chemotherapy and tyrosine kinase inhibitors have been performed41, 42. Moreover, with the onset of checkpoint inhibitors, in silico immunotherapy trials have gained interest, leading to trials with anti-CTLA-4-antibodies and anti-PD-(L)1 antibodies43, 44, 45. The common denominator in these trials is that they primarily center around the therapies’ dosing regimens and treatment schedules. Herein lies the main difference with our simulation approach: although the ‘key ingredients’ of these approaches are similar – they are based on a mathematical abstraction of a disease mechanism – our trials do not aim to optimize treatment schedules. Instead, we complement traditional design methodology by adding the means to predict trial outcomes and elucidate trial kinetics a priori to steer design decisions of novel immunotherapy trials. These trials differ from traditional trial design research in that these, often statistically-grounded, approaches simulate clinical trials based on population-level assumptions (e.g., with particular distributions of survival times, study durations, or with a specific censoring mechanism). Examples of these high-level simulation approaches include, but are certainly not limited to, studies aiming to calculate the sample size and power of clinical trials46, 47, 48. Since these methods lack a direct link to the underlying biological disease mechanism, interpreting their parameters for individual trial participants is difficult or even impossible. In contrast, in silico trials are founded on biological assumptions but then translate these assumptions into statistical concepts such as hazard ratio kinetics. In this manner, simulated trials encourage an interdisciplinary discussion about the design of an upcoming trial.
In silico clinical trials are applicable in several settings. First, they provide the means to verify clinical trial and treatment assumptions before investing extensive amounts of work and funds into the development and execution of a clinical trial and can, thereby, function as a proof of principle of the soundness of the hypotheses for an upcoming trial. Scrutinizing each aspect of the trial supports optimal design decisions and might reduce unanticipated outcomes. Moreover, this mechanism-based approach does not necessitate a deep understanding of complex mathematical theorems; instead, it requires a biological understanding of a disease. This mechanistic basis is intuitive, which benefits the communication between clinical doctors and biomedical researchers on the one hand and statisticians and clinical trialists on the other. Additionally, in silico trials might serve as excellent educational tools. The ability to simulate a wide range – from basic to highly advanced – research questions can be exploited in teaching activities for entry-level clinicians to experienced trialists. A final implication, which holds for any trial simulation, is that they may provide insight when conventional clinical trials are unfeasible due to practical or ethical constraints (e.g., clinical trials in rare diseases, pediatrics, or critical care medicine).
Nonetheless, in silico clinical trials have to be considered in light of some limitations. The most critical limitation is universal to any – either in vitro, in vivo, or computational – scientific model: the immunotherapy trial outcomes depend heavily (if not entirely) on the biological assumptions of the model, meaning that incorrect interactions or erroneous parametrization of the model might induce inaccurate outcomes. The parameterization, in particular, might pose a problem: given the often novel treatment mechanisms, data to fine-tune the parameters of the model accurately might be scarce. In these cases, the simulation itself can be used as a sensitivity analysis to assess to what extent a certain parameter range influences the robustness of the predictions. In addition, while the model itself is intuitive to understand, translating biological principles into an ODE model and implementing it into a simulation requires thorough knowledge of computational methods, limiting its widespread applicability.
In summary, in silico cancer immunotherapy trials offer a versatile approach to simulate immunotherapy trials based on biological assumptions. Furthermore, as a simulation tool, they facilitate the verification of trial design decisions to optimize the probability of a successful immunotherapy trial and contribute to high-quality research for cancer patients.
Data Availability
The code is available at GitHub: https://github.com/jeroencreemers/in-silico-clinical-trials
DECLARATIONS
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Availability of data and material
The code is available at GitHub: https://github.com/jeroencreemers/in-silico-clinical-trials
Competing interests
Not applicable
Funding
JC was funded by the Radboudumc. CF received an ERC Adv Grant ARTimmune (834618) and an NWO Spinoza grant. IV received an NWO-Vici grant (918.14.655). JT was supported by a Young Investigator Grant (10620) from the Dutch Cancer Society and an NWO grant (VI.Vidi.192.084).
Authors’ contribution
JHAC and JT conceived this study. JHAC performed the experiments and wrote the manuscript under the supervision of JT. All authors provided feedback on the manuscript and reviewed the manuscript prior to submission.
Acknowledgments
Not applicable.
LIST OF ABBREVIATIONS
- ICI
- immune checkpoint inhibitor
- NCCTG
- North Central Cancer Treatment Group
- ODE
- ordinary differential equation
- OS
- overall survival
- PHA
- proportional hazard assumption