Abstract
Colorectal Cancer (CRC) is a leading cause of cancer deaths in the United States. Despite significant overall declines in CRC incidence and mortality, there has been an alarming increase in CRC among people younger than 50. This study uses an established microsimulation model, CRC-SPIN, to perform a ‘stress test’ of colonoscopy screening strategies. First, we expand CRC-SPIN to include birth-cohort effects. Second, we estimate natural history model parameters via Incremental Mixture Approximate Bayesian Computation (IMABC) for two model versions to characterize uncertainty while accounting for increased early CRC onset. Third, we simulate 26 colonoscopy screening strategies across the posterior distribution of estimated model parameters, assuming four different colonoscopy sensitivities (104 total scenarios). We find that model projections of screening benefit are highly dependent on natural history and test sensitivity assumptions, but in this stress test, the policy recommendations are robust to the uncertainties considered.
1 Background
Colorectal Cancer (CRC) is the second-leading cause of cancer deaths in the United States. In 2021, 104,270 new CRC cases and 52,980 CRC cancer deaths were projected to occur in the US (Siegel et al. 2021). From 1991 through 2018, cancer death rates fell by 31%, a consistent decline attributed to smoking reduction, early detection, and improved treatment. For CRC specifically, the decrease in annual mortality from 1980 through 2018 was 53% among males (from 32.8 to 15.8 annual deaths per 100,000 people) and 55% among females (from 24.4 to 10.9 annual deaths per 100,000 people) (NCI 2021).
Despite significant overall decreases in CRC incidence, CRC incidence has increased among adults younger than 50. Adults born around 1990 have double the risk of colon cancer and quadruple the risk of rectal cancer compared to those born in 1950 (Siegel et al. 2017). This concerning development has raised questions about the potential causes of this phenomenon and the appropriate policy responses. While an Age-Period-Cohort model (Siegel et al. 2017) identified a birth cohort effect underlying the increase in incidence, there is no consensus on the specific causes. Current screening guidelines have reduced the recommended age to start screening from 50 to 45 years (Knudsen et al. 2021a).
To date, CRC microsimulation models do not incorporate birth cohort effects. The absence of an integrated approach to account for changing risk may be problematic for several reasons. First, the data used to calibrate microsimulation models spans several decades. If the natural history of the disease has changed over time, those changes cause confounding in parameter estimates and tensions between calibration targets that might be reconciled if models were extended to include birth cohort effects. Second, data from new studies may conflict with existing model predictions that do not address changes in risk. The absence of birth cohort effects may undermine the validation of existing models if those effects prove consequential. Finally, the hypothesis of no birth cohort effects is inconsistent with recent cancer incidence data (Siegel et al. 2017).
This analysis informs the debate around policy responses to the early onset of CRC by evaluating the robustness of currently recommended (Knudsen et al. 2021a) CRC screening policies to a set of uncertainties. First, we perform Bayesian calibration of two specifications of the CRC-SPIN model (Rutter and Savarino 2010) to characterize both structural and parametric uncertainties surrounding the natural history of CRC. Both model specifications include a non-parametric birth cohort effects model. Second, we simulate all colonoscopy screening strategies evaluated in the cost-effectiveness analysis that informed the most recent USTFPS screening recommendations (Knudsen et al. 2021a) to assess how screening cost-effectiveness depends on within and between those models. Finally, we investigate whether current colonoscopy screening recommendations are robust to uncertainties surrounding the natural history of CRC and the sensitivity of colonoscopy by investigating the stability of the cost-effectiveness frontier.
2 Methods
2.1 Natural History Model
2.1.1 Adenoma Risk with Birth Cohort Effects
CRC-SPIN simulates adenoma risk using a Non-homogenous Poisson Process, as shown in eq. (1). Individual log-risk Ψia of person i at age a depends on their sex, age, and birth year. This equation includes two modifications relative to previous CRC-SPIN versions. First, eq. (1) allows the specification of adenoma risk change points k0, k1, k2, k3 instead of fixed change points previously used in the baseline model (k0 = 20, k1 = 50, k2 = 60, k3 = 70). Second, the model allows for increased adenoma initiation risk based on the birth year of each individual. In eq. (1), δ(x) = 1 when x is true and 0 otherwise. α0i ∼ N(Aa, σa) allows for between-individual variation in adenoma risk. All risk parameters Aa, σa, α1, α2, αk0, αk1, αk2, and αk3 are unknown parameters that need estimation.
The birth cohort effects model defines adenoma risk birth-cohort effects αy for each birth-cohort year within the set of years I = [1880, 1881, …, 1975], allowing the population-level average risk to change. This range of years encompasses the range of birth cohorts represented by calibration targets used by the model and can be expanded or contracted as information from new birth cohorts become available. The indicator variable BCiy is one if individual i was born in year y and zero otherwise. This effect captures secular changes that resulted in higher adenoma risk while making no additional assumptions about the underlying causes and minimal assumptions about the functional form of those changes.
Because a non-parametric model of birth cohort effects with free year parameters substantially expands the parameter space, we use the following strategy to enable the parameter search while allowing a flexible functional form. First, we define a set of years in which the birth cohort effect will be estimated: 1980, 1910, 1940, 1955, 1970, and 1975. The year 1940 is a reference point at which the effect is zero. The minimum and maximum dates are chosen based on the available calibration targets. The intermediate knots were chosen to be evenly spaced, with a finer resolution after 1940, where risk changes were known to be pronounced. Second, we produce a smooth, monotonic, biologically-plausible interpolation between these points to define birth cohort effects between the years used as knots, using a method based on piecewise radial functions (Stineman 1980; Jhannesson, Bjornsson, and Grothendieck 2018)1. Finally, we enforce monotonic increases in risk after 1950, following the evidence found in Age-Period-Cohort analyses (Siegel et al. 2017). After initiation, adenomas are distributed in the large intestine following a multinomial distribution with probabilities P (l = Rectum) = 0.09, P (l = Sigmoid Colon) = 0.24, P (Descending Colon) = 0.12, P (l = TransverseColon) = 0.24, P (l = AscendingColon) = 0.23, and P (l = Cecum) = 0.08.
2.1.2 Model Specifications
This paper employs two alternative specifications of the CRC-SPIN model. In the base-line model specification (BC-20), we set the minimum age at adenoma initiation k0 R 20, consistent with prior CRC-SPIN analyses. In a second model specification (BC-10), we set k0 R 10, allowing adenomas to be initiated after age 10. The age at which adenomas are initiated can be considered a deep uncertainty because there is little information to estimate this parameter. Increasing cancer incidence at younger ages (Siegel et al. 2017) might suggest that a model with a younger minimum age at first adenoma initiation, at age 10, could be more plausible than a model with an older minimum age of 20. We investigate this question by calibrating and evaluating screening policies’ robustness across both model specifications. This approach can be readily extended to more than two model specifications. The only reason we do not present additional model specifications is the computational burden of adding a new model to this analysis.
2.2 Bayesian Inference of Natural History Parameters
Approximate Bayesian Computation (ABC) is an inference framework that requires the specification of i) calibration targets defined as summary statistics derived from data, ii) prior distributions for a set of unknown parameters, and iii) a model that maps parameter sets to model-predicted targets. We use the Incremental Mixture Approximate Bayesian Computation (IMABC) algorithm (Rutter, Ozik, DeYoreo, et al. 2019) to estimate the natural history model’s joint distribution of parameters. Based on these three elements, IMABC iteratively samples the parameter space and approximates the posterior distribution of model parameters. Table 1 presents the prior distribution for each parameter used across both model specifications.
Parameter Prior and Posterior Distributions
2.2.1 Calibration Targets
This analysis uses a set of 43 calibration targets derived from eight sources to calibrate CRC-SPIN. Following prior CRC-SPIN calibrations (Rutter, Ozik, Deyoreo, et al. 2019; DeYoreo et al. 2022), we use data from Corley et al. (2013) to inform adenoma prevalence by sex and age. Pickhardt et al. (2003) informs the distribution of adenoma size. Church (2004) and Lieberman et al. (2008) inform the distribution of CRC size. The UK Flexible Sigmoidoscopy Screening Trial (Atkin et al. 2010) informs the proportion of individuals who are screen-detected with CRC by sex. Cancer incidence data by sex, age, and location (rectal vs. colon) from SEER 1975-1979 (National Cancer Institute 2022) informs cancer incidence risk.
This analysis included two new targets relative to prior CRC-SPIN calibrations. Increased risk among new cohorts is informed by recent SEER data (National Cancer Institute 2022), using SEER 2009 and SEER 2014 incidence for young adults (aged 40-44). Table 2 lists each calibration target, tolerance intervals used in this calibration, and the mean and 95% credible intervals of the posterior distribution obtained from calibration.
Calibration Targets and Tolerance Intervals
2.2.2 Incorporating New Information via Sequential Calibration
This study used the following strategy to calibrate birth cohort effect parameters. For each candidate parameter set evaluated by the model, we first evaluate targets that are fast to simulate (e.g., Corley et al. (2013) adenoma prevalence) and only evaluate computationally-expensive targets (SEER cancer incidence) for parameters that result in adequate predictions of these easy-to-simulate targets. This approach leverages IMABC’s iterative tolerance interval updating strategy and saves computing time by not spending resources on parameter sets that could not have generated the calibration target data.
After calibrating the model to targets used in previous analyses (Rutter, Ozik, DeYoreo, et al. 2019), we incorporated recent information from CRC incidence using a sequential approach. As demonstrated by DeYoreo et al. (2022), one can add new information to the model by adding calibration targets after calibrating the model to an initial set of targets. First, we used IMABC to calibrate 41 calibration targets, except for the two SEER cancer incidence targets that contain new information about increased CRC risk in recent cohorts (SEER 2009 and SEER 2014). Because these two targets use approximately 1/3 of the computing time required to simulate all targets, avoiding evaluating them before all other targets are met allowed the algorithm to build a posterior distribution without the burden of new, expensive targets. After building a posterior distribution with an effective sample size (ESS) greater than 10,000 for each model specification, we evaluated the new targets across this distribution and selected the parameters within all tolerance intervals for all calibration targets. This second step is akin to a one-step rejection sampling ABC scheme, wherein we use the posterior of prior calibration to condition the parameter distribution on new data. This approach is very similar to the sequential approach used by DeYoreo et al. (2022), with the exception that we do not need to perform new IMABC iterations to increase the size of the new parameter distribution because the resulting distribution is already dense enough (sample size > 1000) to support the analysis of screening strategies.
2.3 Colonoscopy Screening Strategies
We used the two calibrated birth cohort models to project outcomes for the 26 colonoscopy screening strategies evaluated in existing CRC screening recommendations (Knudsen et al. 2021a). These strategies involve three policy levers: age to start screening (45, 50, 55), periodicity of screening colonoscopies (5, 10, 15), and age to end screening (70, 75, 80, 85). A grid experimental design over the three policy levers results in 36 strategies, but only 26 are simulated after removing combinations resulting in redundant strategies. Screening strategies are coded as [age to start]-[age to end],[interval]. For instance, strategy 45-75,10 refers to performing colonoscopy screening at ages 45, 55, 65, and 75. Following comparable studies (Knudsen et al. 2021a), perfect adherence to screening is assumed across all model runs.
2.3.1 Colonoscopy Sensitivity Scenarios
The sensitivity of colonoscopy exams is uncertain and very heterogeneous across gastroenterologists and gastroenterology clinics (Corley et al. 2014). Hence, assessing how sensitivity affects the expected benefit from different screening strategies is important. Table 3 presents the sensitivity assumptions made in four sensitivity scenarios considered in this analysis and corresponding sources. Section 8 provides an extended discussion around the rationale for considering those scenarios and evaluates the plausibility of the “Very Low” sensitivity scenario by re-analyzing data from a meta-analysis of tandem colonoscopy studies (Van Rijn et al. 2006). This analysis assumes time-independent, uncorrelated colonoscopy sensitivity conditional on lesion size and no heterogeneity in sensitivity across the population.
Colonoscopy Sensitivity Assumtpions
2.3.2 Cost-Effectiveness Outcomes
This study simulates a cohort of 2 million average-risk 40-year-old adults from the general US population followed through death. Following recent analyses used to guide policy (Knudsen et al. 2021a), we use Life-years Gained (LYG) from screening as the effectiveness measure by comparing a no-screening scenario to each scenario where individuals are screened. We perform this comparison for each parameter set and population block. The number of colonoscopies (ncol) performed is used as the burden measure. Incremental efficiency ratios (ICER)2 are computed by comparing two non-dominated screening strate-gies as where Δncol is the additional number of colonoscopies required by a more intensive screening regimen and ΔLYG is the incremental number of life-years gained by that regimen. We also remove extended-dominated strategies from the Pareto frontier so that efficiency ratios increase monotonically with effectiveness.
2.3.3 Experimental Design
Following the Robust Decision Making (RDM) decision-analytic approach (Lempert et al. 2006), we created a large-scale experimental design to stress-test the cost-effectiveness of alternative screening strategies. The experimental design of this study considered natural history uncertainty (represented by the posterior distribution of model parameters), structural assumptions (represented by two different model specifications), and uncertainty related to the efficacy of the interventions (represented by the four sensitivity scenarios). Hence, the experimental design of this study consisted of the combination of 2 model specifications, 500 natural history parameter sets for each model specification, 26 screening strategies and 1 “No-Screening” scenario used as the comparator, and 4 colonoscopy sensitivity scenarios, resulting in 105,000 unique model runs. The dimensions of the experimental design were defined such that the screening experiments would be performed using a computing budget of approximately 200,000 core hours. Each unique model run simulates 2 million individuals3. This experimental design results in 840 model runs in this experiment, which results in 210 billion life histories and over 0.63 trillion adenomas.
2.4 High-performance Computing Infrastructure
The scale of this experiment required the design and development of high-performance computing (HPC) tools and workflows. Experiments were run on computing resources at Argonne’s Laboratory Computing Resource Center and the Argonne Leadership Computing Facility. The model code for this experiment was developed in R, and HPC workflows were developed using Swift-T (Wozniak et al. 2013), EMEWS (Ozik et al. 2016), and the crcrdm R package (Nascimento de Lima 2022). These experiments also used a relational Postgres SQL database to allow for massively parallel execution of experiments and rapid retrieval of results. More information about the computational environment and software developed for these experiments can be found in sections 9 and 10.
3 Results
3.1 Natural History Parameter Estimates
Appendix table 1 presents model parameters, their prior distribution used for calibration, and their respective 95% posterior credible intervals for each model. While some parameter estimates are similar across models, these results demonstrate that model estimates are contingent on the assumed minimum age at adenoma initiation. For instance, model All estimates were presented with up to three significant digits of precision. BC-20 predicts lower adenoma prevalence at age 25 than model BC-10, and the posterior distributions of the models do not overlap. Moreover, the minimum age at which adenomas are allowed to start also affects the estimates of birth cohort effects. If adenomas are initiated earlier (as in model BC-10), then the birth cohort effects needed to explain increased incidence are also expected to be lower.
While table 1 presents a high-level summary of model estimates, the posterior distribution for both models exhibits a complex correlation structure that is not captured in these low-dimensional summaries. Nevertheless, inspecting the posterior distribution of model estimates computed from raw model parameters can reveal important insights. Figure 1 shows each model’s joint distribution of cumulative adenoma initiation risk at ages 25 and 80. Unsurprisingly, the model that allows earlier adenoma initiation (BC-10) implies a higher cumulative risk of adenoma initiation by age 25. Despite those differences, both models have approximately the same implied cumulative adenoma initiation risk at age 80. While clinical knowledge or future evidence might favor one model over the other, this study uses both models’ posterior distributions without making further assumptions about how likely each model is.
Posterior Distribution for Cumulative Adenoma Initiation Risk
Figure 2 presents results for the non-parametric birth cohort effects as incidence risk ratios. Shaded areas represent predictive credible intervals that are consistent with the data. The horizontal black line at one represents the hypothesis of no change in adenoma initiation risk by birth cohort, whereas the vertical grey line, set at 1940, represents the reference cohort against which all other cohorts are compared. This figure presents several noteworthy findings. First, it shows that the assumption of no birth cohort effects is implausible, particularly after 1940. Second, the implied relative risk ratios differ across both models (see the birth-cohort effects in table 1). This finding reinforces our suspicions that birth-cohort effects would interact with other model parameters, and estimates from one model may not translate directly to other models. Finally, these findings suggest that assuming no birth cohort effects before 1940 may be a plausible assumption.
Birth Cohort Incidence Risk Ratio Estimates by Model
3.2 Cost-Effectiveness Estimates
Figure 3 4 presents mean posterior estimates for three strategies across the eight scenarios considered in this analysis. Each scenario can be seen as one Probabilistic Sensitivity Analysis (PSA). Tables 4, 5 and 6 present the same posterior mean estimates for each combination of strategy and model, along with 95% credible intervals. Each row in figure 3 displays one model outcome measure. The top row presents Life-years gained per 1000 people. The second row shows the number of colonoscopies demanded by each screening strategy. The last column presents the incremental cost-effectiveness ratio (i.e., the number of colonoscopies one should be willing to do to gain one marginal life-year to choose that strategy). The columns represent different strategies with decreasing intensity levels from left to right. All three strategies are part of the Pareto-efficiency curve.
Estimates of Benefits, Burdens and Incremental Efficiency Ratios of Select CRC Colonoscopy Screening Strategies increases to 2690 (95% PI [2500, 2960]) for the 45-70,15 strategy and to 3960 (95% PI [3830, 4150]) for the 45-75,10 strategy. There are slight differences in the number of colonoscopies by sensitivity scenarios because higher sensitivity colonoscopy screening will result in a slight decrease in the required follow-up surveillance colonoscopies.
LYG estimates and 95 percent Credible Intervals
Number of Colonoscopies Estimates and 95 percent Credible Intervals
Efficiency Ratio Estimates and 95 percent Credible Intervals
Back-to-back Colonoscopy Studies
Figure 3 A shows that LYG estimates are highly contingent on sensitivity assumptions. Under baseline sensitivity and model assumptions (Sensitivity R Baseline and Model R BC-20), Strategy 45-75,10 results in 412 (95% PI [313, 559]) LYG / 1000 people. Under a high sensitivity scenario, the same strategy results in 422 (95% PI [319, 570]) LYG / 1000 people. The benefit from the same policy decreases to 375 (95% PI [285, 515]) if sensitivity is very low - a 12% difference in effectiveness.
Figure 3 B demonstrates that the number of colonoscopies performed is similar across sensitivity scenarios and natural history assumptions and highly contingent on the intensity of the screening strategy. Under baseline assumptions, strategy 55-70,15 is expected to result in 2330 (95% PI [2200, 2500]) colonoscopies per 1000 individuals. This estimate II. Similarly, the minimum age to start adenoma could also be expressed as a continuous variable and be regarded as one natural history parameter, but we chose two specific values to facilitate model calibration and comparisons. Therefore, the scenarios chosen in this figure represent the boundary cells of a hypothetical fine-grained design that one could run with an unlimited computing budget.
Figure 3 C shows how the cost-effectiveness ratios change across assumptions. As expected, lower sensitivity assumptions imply higher efficiency ratios. This result demonstrates the robustness of colonoscopy screening strategies. Even under the most pessimistic combination of scenario and model specification, the efficiency ratio estimate for strategy 45-75,10 is 11.3 (95% PI [8.64, 14.3]). Under baseline assumptions, the efficiency ratio estimates improves to 9.83 (95% PI [7.27, 12.4]) colonoscopies per life years gained (on the margin).
3.3 Cost-Effectiveness Efficiency Frontier
Figure 4 presents the set of non-dominated screening strategies at the cost-effectiveness efficiency plane, with the burden (number of colonoscopies per 1000 people) in the horizontal axis and benefits (life-years gained per 1000 people) in the vertical axis. 95 % prediction intervals for LYG are displayed as shaded areas. The figure demonstrates that estimates differ by model and sensitivity, but the efficiency frontier is reasonably stable across model specifications and scenarios. Most strategies considered efficient under one scenario are also efficient in other scenarios. Table 4 presents efficiency ratio estimates for each non-dominated policy. Figure 4 also shows that the differences between specifications BC-10 and BC-20 and relatively small when contrasted with the uncertainty resulting from the posterior distribution of natural history parameters. There is a wide range of possible LYG outcomes for each given policy.
Cost-Effectiveness of Screening Strategies by Sensitivity and Model
4 Discussion
To our knowledge, this is the first CRC microsimulation analysis to produce and use a posterior distribution of birth-cohort effects in adenoma initiation risk. The birth-cohort model used in this study offers one alternative to explain the increase in CRC incidence. This approach mirrors the prior approach used by CISNET models (Knudsen et al. 2021a) but produces a posterior distribution for birth cohort effect parameters. This study also estimated birth cohort effects for two alternative CRC-SPIN specifications and demonstrated that those estimates are contingent on other model assumptions, such as the minimum age at adenoma initiation.
This study’s estimates of CRC screening cost-effectiveness measures varied substantially across and within scenarios. These results are in line with previously published cost-effectiveness analyses. For instance, life-years gained estimates for strategy 45-75,10 was 412 (95% PI [313, 559]) LYG / 1000 people for the specification BC-20 and 389 (95% PI [297, 510]) for specification BC-10. The range of LYG for the same strategies presented by Knudsen et al. (2021a) across the three CISNET models was 291-361. The point estimate from our study falls within the range of outcomes presented in Knudsen et al. (2021a), and the range of outcomes in our study has a wide overlap with ranges presented previously. Our estimate’s upper bound is somewhat higher than the baseline scenarios. These results reflect uncertainty in the natural history of parameters and birth-cohort effects that are not incorporated in the baseline scenario used in Knudsen et al. (2021a). These results demonstrate that natural history parameter uncertainty can be as relevant as model specification differences. As computational resources become more widely available and more microsimulation models transition to performing PSAs, the uncertainty intervals reported in multi-model analyses will better characterize the full range of uncertainty implied by different models.
Our findings highlight the importance of access to high-sensitivity screening to ensure that the benefits of screening are equitably distributed in the population. If access to high-sensitivity screening is unequal, then the benefits of screening will not be equally distributed in the population. For instance, a person who receives decennial colonoscopy screening from 45 to 75 years with very low colonoscopy sensitivity is estimated to have a 12% decrease in LYG compared to another person who receives the same screening strategy but with high-sensitivity colonoscopy.
Despite the uncertainties associated with LYG estimates, this study projected that currently-recommended colonoscopy screening strategies would be cost-effective across many conditions. While the cost-effectiveness of screening strategies changed across scenarios, the cost-effectiveness frontier remained stable across scenarios.
This study presents a series of limitations and opportunities for further extensions. First, we only present one explanation for increased CRC risk through an overall increase in the risk of adenoma initiation based on birth cohorts. This model does not include alternative model specifications that are plausible. For instance, the increased risk of CRC could be concentrated in specific locations of the large intestine. Moreover, birth cohort effects might also be present in other model parameters, including those that control growth. These alternative hypotheses represent alternative biological explanations for increased CRC incidence, and those differences can be decision-relevant. Future investigations can expand the space of plausible models and further investigate the robustness of screening strategies across those additional models that have yet to be proposed.
Future work can also build upon the eight probabilistic sensitivity analyses performed in this analysis and further investigate the significance of uncertainty by computing the opportunity cost associated with sub-optimal screening strategies and performing Value of Information (Claxton and Sculpher 2006) analyses. Because the efficiency ratio intervals of currently-recommended strategies are relatively low, those analyses are not expected to change existing recommendations. Nevertheless, one can still calculate the expected opportunity loss of each strategy on willingness to pay thresholds to determine the optimal strategy given a specified willingness to pay threshold using a net monetary benefit (NMB) measure. We refrain from doing so in this analysis for a few reasons. First, existing cost-effectiveness analyses of CRC screening strategies do not use QALYs as the outcome measure and do not consider the costs of colonoscopies in dollars, nor do they estimate all burdens of colonoscopy screening using a dollar amount. Second, the purpose of this paper was to compare our results to those previously published in Knudsen et al. (2021a) directly, and changing the analytical framework and measures (i.e., changing the outcome from LYG to discounted QALYs) would work against this goal. Either way, this study removes barriers to adopting Value of Information analyses with microsimulation models, which can still be performed based on our results.
5 Conclusion
This paper calibrated two specifications of the CRCSPIN model, including birth-cohort effects on adenoma initiation, and presented a stress test of CRC colonoscopy screening strategies. This study reported prediction intervals for benefits, burden, and incremental efficiency ratio for all CRC screening strategies considered in US screening guidelines across four colonoscopy sensitivity scenarios and two CRC-SPIN model specifications. Our results demonstrate that current USPSTF recommendations are robust under a wide range of conditions. In particular, decennial colonoscopy screening from 45 to 75 years was projected to remain at the cost-effectiveness frontier regardless of model specification or sensitivity level.
7 Appendix I: Model Specification
7.1 Adenoma Growth
Adenoma growth is simulated as follows. For each lesion, CRC-SPIN simulates the lesion time to reach 10 mm (t10ij) using a Fréchet (inverse Weibull) distribution, with cumulative distribution function (2). This function is defined separately for lesions in the colon and in the rectum, which results in four parameters (β1t, β2t, β1r, β2r). CRC-SPIN allows variability in growth rates by sampling t10ij from the cdf given by eq. (2).
Eq. (3) describes the size (diameter) of each adenoma at any point in time dij(t). This equation is a Richard’s Growth model, following a W0-form (Tjørve and Tjørve 2010). d∞ R 50 mm is the maximum adenoma size, d0 = 1 mm is the size at adenoma initiation, rij is the maximum adenoma growth rate, t is time after adenoma initiation, and p determines the relative size at maximum growth rate.
Given this equation, eq. (2) can be rearranged to allow the calculation of the lesionspecific growth rate at the growth inflection point. Eq. (4) is used to obtain by rij by substituting dij(u) with 10, u10ij with the lesion-specific value sampled from the Fréchet distribution.
7.2 Transition from Adenoma to Cancer
The size at transition to preclinical cancer is modeled with a log-normal distribution, where the size of the jth adenoma of the ith person is defined as ln(Sij) ∼ N(μij, σγ). The expected log-diameter at transition to preclinical cancer is defined by eq. (5).
7.3 Time from Preclinical to Clinical Cancer (Soujourn Time)
Eq. (6) defines the soujourn time Tij (time from preclinical transition to symptomatic cancer) of adenomas. CRC-SPIN uses a Weibull distribution with scale parameter λ1, shape parameter λ2 and the λ3 parameter allows for different soujourn time for cancers in the colon and in the rectum.
8 Appendix II: Colonoscopy Sensitivity
Colonoscopy sensitivity to detect adenomas conditional on their size is crucial in model-based screening cost-effectiveness analyses. The assumptions made about sensitivity directly define the expected effectiveness of screening strategies. Colonoscopy sensitivity is informed by tandem colonoscopy studies wherein two colonoscopies are performed on the same patient. Researchers count the number of adenomas missed by the first colonoscopy n1 and found in the second exam n2. The adenoma miss rate (AMR) is computed by the number of adenomas found in the second exam divided by the total number of adenomas found.
Existing studies (Zauber et al. 2008; Knudsen et al. 2016, 2021b) conventionally use data from tandem colonoscopy reviews (i.e., Van Rijn et al. 2006) to inform colonoscopy sensitivity assumptions. Nevertheless, the analysis provided by the Van Rijn et al. (2006) review might not be directly used to determine the expected colonoscopy sensitivity without further consideration. Miss rates reported in tandem colonoscopy studies or other designs offer an upper bound for colonoscopy sensitivity but not a lower bound.
This section develops a Bayesian approach to evaluate the plausibility of colonoscopy sensitivity scenarios conditional on adenoma size using the same data reviewed by Van Rijn et al. (2006) and assuming sensitivity = 1 − AMR results in a demonstrably biased estimator. This estimate can only reflect an upper bound for sensitivity under the assumption that the second exam missed no adenomas. Because about 1/4 of adenomas are missed in the first exam, this assumption is problematic. As diagnostic tests are improved, the “real” miss rate of conventional colonoscopy will gradually decline, predictably demonstrating that previously-made colonoscopy sensitivity assumptions were overly optimistic.
8.1 Data
Table 7 presents the list of studies used by Van Rijn et al. (2006). Each row represents one combination of adenoma size category and study included in the review. n1 is the number of adenomas found in the first colonoscopy, and n2 is the number found in the second exam. Adenoma miss rates AMR = n2/(n1 + n2) reflect the proportion of adenomas missed by the first exam and found in the second exam.
8.2 Model
Let N be the total number of adenomas present in a study population. The examiner counts the number of adenomas found in the first n1 and in the second exam n2, within a particular size category t ∈ [d, m, l], where d are diminutive adenomas (<5mm), m are medium adenomas (5-10mm) and l are large adenomas (>10mm). N might be close or equal to n1 + n2 if sensitivity is high, but N is never observed.
Eq. (7) describes a model of the data generation process underlying tandem colonoscopy studies, assuming the same, uncorrelated sensitivity for both exams. This model is conditioned on the size of the adenoma and represents a single study. Adenoma size subscripts are omitted from these equations for clarity.
I specify uninformative priors for colonoscopy sensitivity S ranging from 0.05 to 1 and priors for N such that Nmin = n1 + n2 and Nmax = Nmin/(1 − pwmiss)where pwmiss 0.5 represents the maximum proportion of adenomas missed by both exams. Setting this proportion to 0 results in an illogical assumption if sensitivity is not 1, but is nevertheless an assumption made in prior studies. Setting it well above 0.5 would only be consistent with a pessimistic assumption that sensitivity is very low and adenoma prevalence is very high. Future analyses can plausibly use a more informative prior for N based on expected adenoma prevalence. However, we refrain from doing so in this analysis since the risk profile of the studies is heterogeneous, and known adenoma prevalence depends on the sensitivity of colonoscopy. Moreover, this appendix aims to provide plausible lower and upper bounds for sensitivity scenarios that are still consistent with data used by existing studies.
8.3 Plausibility of Sensitivity Assumptions
As stated, a known relationship exists between the unknown expected proportion of adenomas missed by two back-to-back colonoscopies E(pmiss) and adenoma sensitivity S. We know that E(pmiss) = (1 − S)2 = 1 − 2S + S2. If S = 1, then E(pmiss) = 0. If S = 0, then E(pmiss) = 1. If S = 0.75, then tandem colonoscopy studies will miss E(pmiss) = 1 − 2 ∗ 0.75 + 0.752 = 0.065 of adenomas in expectation. If sensitivity is 50%, then these studies are expected to miss about 25% of adenomas in expectation. While tandem colonoscopy studies offer some information about sensitivity, this information is not sufficient to identify sensitivity.
To define bounds for sensitivity, we use the model outlined in eq. (7) to simulate each study reviewed by Van Rijn et al. (2006). The purpose is to derive plausible sensitivity values for the scenarios and illustrate what those sensitivity assumptions would imply about the proportion of adenomas missed by both exams. Figure 5 shows the posterior distribution for colonoscopy sensitivity to diminutive adenomas for a set of tandem studies and overlays the assumption made by each of the four sensitivity scenarios considered in this analysis as dotted lines. This figure demonstrates that the very high and baseline sensitivity scenario assumptions might be compatible with the sensitivity of Hixon, 1991. However, they are unlikely to have generated the results observed in Rex, 1997. The Very Low sensitivity assumption seems aligned with the mode of the posterior distribution of Harrison, 2004, and is not ruled out by results from Rex, 1997 and Rex, 2003 I. These results demonstrate that all sensitivity scenarios included in this study are plausible with at least one tandem colonoscopy study reviewed by Van Rijn et al. (2006) and should not be deemed implausible. Future work might extend this analysis to generate a full sensitivity distribution representing today’s colonoscopy studies.
Posterior Distributions of Colonoscopy Sensitivity for Diminutive Adenomas (less than 5 mm) for Select Tandem Colonoscopy Studies
8.4 Continuous-size Sensitivity Functions
While table 3 presents average sensitivity conditional on categorical sizes, CRC-SPIN is a continuous-size, continuous-time model and requires sensitivity to be defined for any arbitrary adenoma size between 1 and 50 mm. In prior analyses, CRC-SPIN has used a quadratic functional form to approximate the baseline sensitivity. Instead, this analysis uses the Stineman interpolation method Jhannesson, Bjornsson, and Grothendieck (2018) to generate plausible sensitivity assumptions consistent with the categorical average sensitivities. The Stineman interpolation method offers a few advantages over the prior quadratic functional form, allowing direct control over the minimum and maximum sensitivity values, more flexibility over the functional form of the sensitivity curve, and ensuring that sensitivity is monotonic. Figure 6 shows the resulting sensitivity functions produced with the monotonic Stineman interpolation method. These sensitivity curves match the average categorical sensitivities in table 3.
Continuous Colonoscopy Sensitivity Functions by Scenario
9 Appendix III: Computing Environment
Model calibration and screening strategy experiments were performed on Bebop and Theta at Argonne National Laboratory. Bebop is a High-Performance Computing cluster managed by the Laboratory Computing Resource Center at Argonne National Laboratory. This study used approximately 200,000 core hours to calibrate both model specifications and around 200,000 core hours to run screening experiments. Four HPC workflows were developed to support this study. All workflows were based on Swift-T (Wozniak et al. 2013) and EMEWS (Ozik et al. 2016).
10 Appendix IV: The crcrdm R package
The crcrdm R package (Nascimento de Lima 2022) was designed to support this and future large-scale microsimulation studies. crcrdm is an R package created to support large-scale cancer screening analyses based on cancer microsimulation models. The package supports large-scale computational tasks historically deemed unfeasible for microsimulation models, such as defining and conducting Probabilistic Sensitivity Analyses (PSAs) or robustness analyses of large models and large combinations of parameter sets. The package helps the user perform large-scale experiments with microsimulation models by partitioning the memory usage of models to a manageable size and organizing the experimental design to run across different nodes (e.g., the same model run is parallelized across different computers). The package also supports multi-model experimental designs. This package implements the crcmodel and the crcexperiment classes and can be used to perform Robust Decision Making Analyses of multiple cancer screening models using high-performance computing resources.
Acknowledgements
This research was supported by a Rothenberg Dissertation Award provided by the Pardee RAND Graduate School and by the National Cancer Institute (NCI) as part of the Cancer Intervention and Surveillance Modeling Network (CISNET) through grant U01-CA253913. This research used resources of the Argonne Leadership Computing Facility, a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. I thank the Argonne Leadership Computing Facility staff for their timely and critical support. This research was completed with resources provided by the Laboratory Computing Resource Center at Argonne National Laboratory. The content is solely the author’s responsibility and does not necessarily represent the views of any sponsor.
Footnotes
↵1 The Stineman method (Stineman 1980; Jhannesson, Bjornsson, and Grothendieck 2018) provides a well-behaved smooth interpolation approach. We defined a set of year knots at which the birth cohort effect is estimated and used the Stineman interpolation method for the years between the knots. Unlike methods based on polynomials such as splines, the Stineman interpolation is guaranteed to be monotonic.
↵2 Following Knudsen, et al. (2021a), we do not use the term Incremental Cost-Effectiveness ratios because this analysis does not consider costs, but the number of colonoscopies as a proxy for the burden of screening.
↵3 We verified that doubling the sample size does not change any of the estimates presented in this paper.
↵4 This figure presents a low-dimensional version of what would otherwise be one of the outputs of RDM’s vulnerability analysis step. Note that colonoscopy sensitivity could be expressed by a continuous variable. We choose colonoscopy sensitivity scenarios that match the values used in existing analyses to allow comparisons to the existing literature. We add a new “Very Low” sensitivity scenario that we show is plausible in Appendix