Abstract
The lifetime radiation associated risk of cancer incidence has been of interest at least seventy years after atomic bombings of Hiroshima and Nagasaki in 1945. Since most of radiation epidemiological studies are retrospective, it is important to assess the potential magnitude of radiation-related cancer risk. The R-package so-call LARisk implements to easily calculate lifetime attributable risks (LAR) of radiation associated cancer incidence presenting bootstrap-based confidence intervals of the estimated risks. The main characteristics of the LARisk package are to flexibly compute the LAR-associated risks, effectively summarize large data, and promptly update domain-knowledge such as new regression coefficients for excess relative risk or excess absolute risk models, population information, etc. The manuscript provides a complete guide for usage of the LARisk package, which will enable to contribute the prospective studies in radiation epidemiology.
Introduction
The lifetime radiation related risk of cancer incidence remains elevated for at least seventy years after atomic bombings of Hiroshima and Nagasaki, Japan in 1945 [1]. Estimating radiation related cancer risk has made steady progress [1, 2], but most studies have been retrospective. Forward projection of radiation related cancer risk is essential to allow predictive studies of irradiated groups to ultimately control excess radiation exposure occurring, e.g. from medical radiography (computed tomography, CT) scans [3, 4].
There are several reports to discuss lifetime radiation related cancer risk projection models. The U.S. Nuclear Regulatory Commission developed the NUREG model to predict health effects from radiation exposure at nuclear power plants, considering the dose and dose-rate effectiveness factor (DDREF) and age-specific risk coefficients [5]. The U.S. National Research Council proposed the BEIR VII model [6] for radiation exposed groups, focusing on cancer risk projection from low linear energy transfer (LET) radiation exposure with weighted average estimates from excess relative risk (ERR) and excess absolute risk (EAR) models [6]. The BEIR VII risk models were based primarily on the Life Span Study (LSS) study from the Radiation Effects Research Foundation (RERF), which has advantages of its large population, including of both sexes and all ages, long follow-up, and high-quality mortality and cancer incidence data [6]. However, LSS is based on single acute exposure and still limited in its ability to precisely estimate site-specific cancer risks. Therefore, we acknowledge that the BEIR VII models need to be updated with other studies of chronic low-level radiation exposure with the longer duration follow-up.
The lifetime attributable risk (LAR) means the number of excess event incidences for an irradiated group compared with a non-radiation exposed group. The LAR risk projection model is defined as where L is the latency period for each cancer, M (D, e, a, s) is a risk model, and the ratio S(a, s)/S(e, s) is the conditional probability of a person alive and cancer-free at age-at exposure (e) to approach an attained age (t). We used cancer-free survival function after removing cancer incidence in the risk projection models [7]. In radiation epidemiology, LAR is the probability of premature cancer incidence attributable to radiation exposure in a representative member of the population [8–10] over their lifetime. World Health Organization (WHO) used LAR for Fukushima health risk assessment [11], and many other studies also have forecast radiation related cancer risk using LAR [3, 12–22].
The US National Cancer Institute ‘RadRAT’ program estimates lifetime risk of cancers induced by radiation exposure [23] based on BEIR VII [6], which estimated radiation related cancer risk transferred from Japanese (atomic bomb survivors) to US populations, and considered uncertainties related to risk model regression coefficients, e.g. ERR and EAR models (see [23] and [6] for more details), minimum latency, DDREF and radiation dose exposure distributions and population transfer weights. The ‘RadRAT’ program can be accessed for free at https://radiationcalculators.cancer.gov/radrat/ and is widely used to assess lifetime cancer risks attributable to radiation [20, 24, 25]. Although ‘RadRAT’ was designed to be easy and simple to use, it is difficult to flexibly use batch files for large data analyses or multiple cases, or ensure timely knowledge updates, e.g. baseline incidence rate in a specific population, life tables, etc. Thus, we developed an LARisk R package to calculate lifetime attributable risk for radiation related cancer built over the online RadRAT program. LARisk provides significantly enhanced flexibility by incorporating functions to project LAR for the batch file considering modified baseline incidence rate and/or population transfer weights, and DDREF application and a summary function based on average LAR for each case. In addition, LARisk uses the parametric bootstrap method to compute risk estimates and confidence intervals because the population distributions for uncertainties are known. The parametric bootstrap method is more powerful and provides small variances compared with the nonparametric bootstrap if the distribution assumption is known and the sample sizes are relatively small.
The remainder of this paper is organized as follows. Section LAR models describes the LAR risk models in detail, including the underlying ERR and EAR functions and discusses statistical uncertainties. Section Illustration of LARisk package discusses input information, output values, arguments, and result interpretation in the LARisk package; briefly describes LARisk options; Section An example with chronic exposure rate uses data examples to describe how to use LARisk. Finally, Section Concluding remarks summarizes the paper.
LAR models
Risk model
Risk models for the LARisk package were based on those developed for the RadRAT program to estimate lifetime risk following low-dose radiation exposure. ERR (Excess Relative Risk) and EAR (Excess Absolute Risk) models were used for exposed individuals compared with unexposed individuals, based on a dose response function, ρ(D) with radiation dose D in Gray (Gy), and effect modification, ε(X) = exp(αT X) where X is a vector of several covariates including age at exposure, attained age, sex, etc. and α is a vector of regression coefficients. ρ(D) can be linear, linear-quadratic, quadratic, linear-threshold, nonparametric, etc. For example, See [6], [1], [26], [27] and [28] for more details.
LARisk incorporates both ERR and EAR for a risk model (1) to compute LAR estimates of solid cancer and leukemia. The standard form of ERR or EAR models for solid cancer, except thyroid and breast, is defined as where the risk is assumed to be linearly related to the radiation dose, a is attained age, and e is age at exposure, which are considered to be effect modifications where The form for thyroid and breast cancer is slightly different from the standard model. Thyroid only depends on age at exposure effect modification. The breast cancer model is defined as where e** = (e − 25)/10, α2 is equal to 3.5 for a ≤ 50 and 1.0 elsewhere. Gallbladder, brain/CNS, and thyroid cancers only consider a risk model based on ERR to calculate LAR, and breast cancer utilizes a risk model based on only EAR. A standard form of a risk model for leukemia takes the model as which is linear-quadratic in dose and age at exposure, and time since exposure (t) is considered as the effect modification. For chronic exposure, the risk model is linear with dose, i.e., θ = 0. A generalized nonlinear model, in particular a Poisson nonlinear model, is required to directly estimate the parameters [6, 26, 28, 29].
It is possible that LAR can be obtained from the multiplicative or additive model controlling weights. The multiplicative model assumes that radiation induced excess cancer risk proportionally increases with baseline risk as where λ0(a, s) is the excess risk for non-exposed group. On the other hand, the additive model assumes that radiation induced excess cancer risk increases independently of baseline risk as Both LARERR and LAREAR, are LAR components based on ERR and EAR models, respectively, defined as where m(a) is population cancer incidence rate, and S(a, s)/S(e, s) is the probability of surviving to age a, conditional on survival to age e, given sex s. Thus, the LAR model in the LARisk package is computed as and where w is a population transfer weight.
Uncertainty assessment
Uncertainties including statistical uncertainties must be considered when calculating LAR, since we use probability distribution functions based on radiation epidemiology and Monte Carlo (MC) simulation. The uncertainties are as follows.
risk model coefficients,
dose and dose-rate effectiveness factor,
minimum latency,
risk population transfer between populations (e.g. Japan to US),
radiation doses, and
baseline incidence rate.
Uncertainties 1–4 including statistical uncertainties have the most critical impact to assess lifetime cancer risk attributable to radiation, whereas uncertainties 5 and 6 cause LAR variability regardless of the model. The baseline incidence rate in particular (uncertainty 6) differs with population and year, generating different incidence rate distributions by age.
To calculate radiation induced future cancer risk, we substitute the formula (2) into (5) for solid cancer as The regression coefficients, βERR,s, βEAR,s, αERR,1, αEAR,1, αERR,2, and αEAR,1, were determined from radiation epidemiologic research with atomic bomb (A-bomb) survivors at RERF, which has conducted health related research among A-bomb survivors in Hiroshima and Nagasaki, Japan, for more than 70 years. Tables 1 and 2 show the regression coefficients for solid cancers and leukemia, respectively. For leukemia, we replace the formula (2) by (4) to derive a similar expression to the formula (6).
The uncertainties of risk model coefficients for solid cancers focus on sex-specific main effect parameters, whereas other parameters are specific constants derived from A-bomb survivor epidemiology studies. Table 1 shows uncertainties for ERR and EAR model parameters. Dose coefficient, β, for stomach, colon, liver, lung, breast, ovary, bladder, and thyroid cancer is assumed to follow log-normal distribution, hence regression coefficient distributions in LAR models for prostate and uterus cancer are also considered to be normally distributed. Uncertainty in β is described by cumulative distribution functions (CDFs) for oral, oesophagus, gallbladder, pancreas, rectum, kidney, and brain/CNS cancers. α2 of breast cancer has two values. See the formula 3. In contrast to solid cancer, uncertainty for leukemia is associated with five parameters. Since the covariance matrix for regression coefficients in the leukemia LAR model should be considered, we calculate an approximate estimator for lifetime risk variance using the delta method following the BEIR VII committee [6].
The dose and dose-rate effectiveness factor (DDREF) combines two concepts of a low-dose effectiveness (LDEF) and a dose-rate effectiveness factor (DREF) [6, 30, 31]. LDEF means a ratio of the slope of a linear-quadratic dose-response function for acute exposure and the slope of the linear term of the same model, whereas DREF is computed when an acute exposure is divided and the number of fractions is large enough. ICRP suggested to utilize a DDREF of 2 to reduce solid cancer rates obtained from moderate-to-high acute dose studies for low-dose or low-dose rate exposures [31, 32]. The DDREF distribution is considered as a lognormal with geometric mean= 1.5 and standard deviation= 1.35 [23]. DDREF is always applied for chronic exposure without dose-dependence. In acute exposures, DDREF is applied by comparing dose limit, DL, and exposure dose. If dose < DL (generated 30 − 200 mGy using log-uniform distribution), DDREF is considered as used. DDREF is not applied for leukemia, because the uncertainty associated with low-dose risks is modeled using a linear term.
Latency uncertainty for radiation-induced cancers should be considered, because each person has cancer different development time after radiation exposure. Latency distribution is commonly represented by an S-shaped curve with the parameter in the center of the S-shaped function generated from the triangular distribution for solid cancer, thyroid and leukemia as T (5, 7.5, 10), T (3, 5, 7) and T (2, 2.25, 2.5), respectively. An uncertain latency adjustment is between 4 and 11 years after radiation exposure for solid cancer, 2.5 and 7.6 years for thyroid cancer, and 0.4 and 4.1 years for leukemia [33].
We must also consider effect of transferring the risk from Japanese population to population of interest. Because parameters of risk model coefficients are commonly obtained using the Japanese A-bomb survivors cohort, the effect is explained by the weighted mean of LAR values based on ERR and EAR model estimates. Initially we use the RadRAT weight to transfer the risk from Japanese to US populations. Thus, the weight w = 0.7 is assigned to the ERR model and the remaining (1 − w = 0.3) weight to the EAR model for most cancers. Lung cancer weight is reversed, and only the EAR model for breast is used. For thyroid, gallbladder, and brain/CNS, only ERR is used for calculation of LAR.
Radiation dose uncertainty can be applied to a fixed value or various probability distributions, including lognormal, normal, triangular, log-triangular, uniform, and log-uniform.
The baseline incidence rate depends on population information.
Parametric bootstrap
Computation of parametric bootstrap confidence intervals for risk estimates
Since uncertainties’ distributions for estimation of LAR are already known, we apply the parametric bootstrap to compute the confidence intervals for all risk estimates. The parametric bootstrap needs distribution assumption for generation and the other procedure is identical to the general nonparametric bootstrap. The parametric bootstrap is more powerful, in that the method provides small variances compared with the nonparametric bootstrap if the distribution is assumed to be right and the sample sizes are relatively small. The parametric bootstrap algorithm is provided in Algorithm 1.
Illustration of LARisk package
Main arguments
The LARisk package has 3 main functions for estimating lifetime attributable risks such as LAR, LAR_batch and LAR_group. The primary argument for the LAR function is as follows.
The argument ‘data’ should include some prerequisite information such as sex, and birth year(s), exposure year, distributions of exposure dose, radiation dose, cancer sites, and exposure rate, shown as follows.
For ‘site’, we provide the irradiated organ or cancer site. In ‘dosedist’, ‘dose1’, ‘dose2’, and ‘dose3’ arguments, we insert the distribution of the exposure dose such as ‘fixedvalue’, ‘lognormal’, ‘normal’, ‘triangular’, ‘logtriangular’, ‘uniform’ or ‘loguniform’ with appropriate parameters for each distribution. For instance, if the exposure dose has a normal distribution with mean of 2.3 and standard deviation of 0.8, we set ‘dose1=2.3’, ‘dose2=0.8’, and ‘dose3=NA’. If the dose has fixed value of 3.2, ‘dose1=3.2’, ‘dose2=NA’ and ‘dose3=NA’ should be employed. If new population information should be considered, lifetime and cancer incidence rate tables are employed in a ‘basedata’ argument as follows.
LARisk has default for these tables, which were made in 2010 and 2018, South Korea such as ‘life2010’, ‘incid2010’, ‘life2018’ and ‘incid2018’. Lifetime and cancer incidence rate tables should also have the specified format as follows.
Also, the cancer incidence rate table consists of ‘Site’, ‘Age’, ‘Rate_m’, and ‘Rate_f’ where ‘Rate_m’ and ‘Rate_f’ are incidence rates of each cancer site according to male and female, respectively.
The argument ‘weight’ is employed to compute LAR values through the weighted average of LAR values based on ERR and EAR models. For example, if the ‘weight’ for LAR based on between EAR and ERR models for stomach cancer is set to 0.5, run the below code.
LARisk sets the default weights to 0.7 for most cancers, 0.3 for lung cancer, 0.0 for breast and1.0 for thyroid for LAR based on EER (see Table 3).
Dose and dose-rate effectiveness factor (DDREF) is the logical option to select whether or not to consider ‘DDREF’ for computing LAR values. The value ‘DDREF’ is to modify the effect of exposure to low-dose. The value of ‘DDREF’ is considered differently as per exposure rate. However, if the site is leukemia, DDREF dose not apply even if DDREF = TRUE.
Estimating LAR values using LAR, LAR_batch and LAR_group
To describe how to use functions installed in the LARisk package, we use the toy data ‘nuclear’, also installed in LARisk, assuming scenario of the data in which everyone is exposed to radiation at the same time in 2011. The data includes 20 people for 10 males and 10 females, and the radiation exposed age has ranges from 3 to 81 years with acute exposed rate and fixed dose distribution.
LAR, LAR_batch, and LAR_group functions provide 3 main estimated values such as lifetime risk, future risk and lifetime baseline risk. Among them, ‘LAR’ and ‘F_LAR’ in Table 4 have mean values and confidence limits (lower and upper) for each cancer site, solid cancer and total (solid cancer+leukemia), respectively. See Table 4 for more details.
The LAR function prints the total LAR, future LAR, baseline future risk, and total future risk. If you want to obtain more detailed result, you can use the ‘summary’ function. The ‘summary’ function provides the person’s gender and year of birth, and risks by cancer type, confidence level, and current year. In the result, the ‘LAR’ tab includes site-specific LAR, LBR, and LFR, whereas the ‘Future LAR’ tab contains site-specific future LAR, BFR, and TFR.
If you compute LAR values for many individuals, you can use the LAR_batch function. Unlike LAR, LAR_batch calculates each person’ risk after reading multiple people’s data at once. For example, suppose that we want to calculate the LAR values in the toy data ‘nuclear’ with person “ID”. The sums of LAR values of all cancers for 3 observations are computed as follows. Here the argument ‘max.id’ means the number of individuals that should be printed.
Similarly, using the ‘summary’ function, you can get more detailed results for each cancer site as follows.
The LAR_group function averages estimated LAR values by group, which provides grouped LAR, grouped future LAR, and grouped baseline risk values for each group. The average LAR values in the toy data ‘nuclear’ can be computed by ‘distance’ as follows.
Saving outputs
LARisk includes a function write_LAR which writes outputs from LAR, LAR_batch, and LAR_group into a CSV file.
The format of the saved file is illustrated in Table 5.
An example for chronic exposure rate with a toy data ‘organ’
In this section, we consider a toy data ‘organ’ which consists of the 20 radiation exposed people over various periods with 14 males and 6 females having the job information ‘occup’. The data is shown as follows.
Suppose that we want to calculate LAR values with the current year=2021 using the Korean population in 2018. First, we compute LAR values with ‘ID01’ as follows.
The estimated total LAR values of the person ‘ID01’ is 1.6981 with the 90% confidence interval (1.1149, 2.5132) and the future risk is 1.6759 with the 90% confidence interval (1.1132, 2.4744). The person ‘ID01’ is a man born in 1985. ‘ID01’ was exposed to thyroid, oesophagus, ‘rectum’, and kidney except for leukemia. Thus, The LAR values of leukemia are all 0.
We also compute LAR values by group - male and female using LAR_group as follows.
The above results illustrate that the estimated average LAR values (90% confidence intervals) for female and male are 11.1856(9.5265, 13.5145) and 27.1674(23.8700, 28.7939), respectively. LAR_group can consider two group variables such as gender and occupation in this toy ‘organ’ data as follows.
Concluding remarks
We proposed the LARisk package built in R to calculate projected and/or lifetime radiation related cancer risks after exposure. We described and considered the various uncertainties in the lifetime risk calculations to construct this package. Three main flexible and useful functions were developed, and many arguments/options are available to promptly update new knowledge from radiation epidemiology and related fields. Of particular interest is the ability to input population information, replacing cancer incidence rates and survival functions for the population of interest and observed year to be used for lifetime risk calculation. This greatly simplifies to employ new baseline incidence rate and population survival rate data for a given year. Detailed description of the arguments/options and related data formats, etc. are provided in the LARisk vignette (https://cran.r-project.org/web/packages/LARisk/vignettes/LARisk-vignette.html).
LARisk is an effective and simple method to calculate excess lifetime risk from radiation exposure, and can be applied to any population of interest.
Data Availability
Toy data sets in LARisk R-package