ABSTRACT
Background Examining inflammation-related DNA methylation alterations in blood could help elucidate the role of inflammation in lung cancer etiology and aid discovery of factors that are key to lung cancer development and progression. In a nested case-control study, we estimated the neutrophil-to-lymphocyte ratio using a validated index, methylation-derived NLR (mdNLR), and quantified DNA methylation levels at loci previously linked with circulating concentrations of C-reactive protein (CRP). We examined associations between these measures and lung cancer risk, and among the cases, lung cancer survival, using pre-diagnostic blood samples of cases (median of 14 years before diagnosis) and controls in the CLUE I/II cohorts. Our analyses controlled for self-reported smoking and methylation-predicted cumulative smoking in order to better focus our examinations on the DNA methylation marks that are informative of the immune response profile.
Results Using conditional logistic regression and further adjusting for BMI, batch effects, and a smoking-based methylation score, we observed a 47% increased risk of non-small cell lung cancer (NSCLC) for one standard deviation increase in mdNLR (n = 150 pairs; OR: 1.47 [1.08, 2.02]) and found the estimated CRP Scores to be inversely associated with risk of NSCLC risk after additionally adjusting for methylation-predicted pack-years (n = 150 pairs; Score 1 OR: 0.57 [0.40, 0.81]; Score 2 OR: 0.62 [0.45, 0.84]; Score 3 OR: 0.65 [0.44, 0.95]). Using Cox proportional-hazards models and adjusting age, sex, smoking status, methylation-predicted pack-years, BMI, batch effect, and stage, we observed a 27% increased risk of dying from lung cancer for one standard deviation increase in mdNLR (n = 145 deaths in 205 cases; HR: 1.27 [1.08, 1.50]). A 50% increased risk of dying from lung cancer for one standard deviation increase in mdNLR was observed for NSCLC cases (n = 103 deaths in 149 cases; HR: 1.50 [1.19, 1.89]).
Conclusions A better understanding of inflammation-associated methylation-based biomarkers in lung cancer development could provide insight into critical pathways that may help identify new markers of early disease and survival.
Background
Lung cancer is the leading cause of cancer deaths in the US, projected to account for 21.7% of all cancer-deaths in 2021 (1). A large percentage of lung cancer patients are diagnosed at an advanced stage (2) and five-year relative survival rates for those patients are between 3-6% (3). Thus, early detection remains a key strategy to improve survival. However, the currently recommended strategy for lung cancer screening – low-dose computed tomography (LDCT) for persons 50 to 80 years old with a least a 20 pack-year smoking history and currently smoke or have quit within the past 15 years – is expensive and has a high false positive rate (4). Modifying current lung cancer screening strategy by performing risk stratification could help prioritize LDCT screening and optimize secondary prevention. We propose that immune system markers could be incorporated into such risk stratification tools to help identify persons at higher risk of lung cancer to target for screening.
While smoking is the most important risk factor for lung cancer in the population, there is growing evidence that the immune system, in response to or independent of smoking, plays an important role in lung cancer development, acting potentially through the genesis of chronic inflammation (5). Furthermore, it is plausible that inflammatory profiles prior to lung cancer diagnosis are associated with lung cancer-specific survival. Markers of systemic inflammation, including elevated levels of C-reactive protein (CRP) and the peripheral blood neutrophil to lymphocyte ratio (NLR), also have been identified as robust marker of cancer associated inflammation (6, 7). Elevated CRP levels (6), elevated serum levels of pro-inflammatory cytokines (8-10), increased neutrophil counts and decreased lymphocyte counts(11, 12),and polymorphisms in inflammation-related genes (13-16) have been associated with increased lung cancer risk. These inflammatory measures have also been associated with poor survival of lung cancer patients in several retrospective and a few prospective studies (17-19). In addition, both experimental and epidemiologic studies support a role for chronic inflammation as a hallmark of cancer development and progression (6, 20-23).
Understanding the role of inflammation in lung cancer etiology could aid discovery of factors that stimulate, suppress, or modulate cancer risk and cancer survival. We posit that such understanding could be gained by examining DNA methylation alterations in blood that are associated with the systemic immune response. In addition, measuring methylation-derived inflammatory responses using pre-diagnostic samples provides the opportunity to capture informative individual systemic inflammatory profiles years prior to diagnosis, potentially shedding light on risk factors key to lung cancer development and progression, e.g., underlying genetics, exposure to environmental risk factors, and behavior risk factors.
In the current study, we predicted peripheral blood leukocyte composition and a neutrophil to lymphocyte index using validated DNA methylation markers (mdNLR) and quantified DNA methylation levels at loci previously linked with circulating concentrations of CRP, and evaluated their associations with lung cancer risk and lung cancer-specific survival. To address this question, we used pre-diagnostic blood samples of cases and controls obtained from the CLUE I/II cohorts. Our analyses controlled for self-reported smoking and methylation-predicted cumulative smoking in order to better focus our examinations on the DNA methylation marks that are informative of the immune response profile (24).
Results
Population characteristics
Characteristics of the 208 lung cancer cases and their 208 matched controls included in this analysis are presented in Table 1. Over 99% of the majority of participants were white. The median time between blood draw and lung cancer diagnosis was 14 years. The median age at blood draw in 1989 was 59 and 57 years in cases and controls, respectively. Overall, 55% of cases and controls were women and 11% were never smokers (Table 1).
Methylation-derived mdNLR index, leukocyte proportions, and lung cancer risk
We observed a 47% increased risk of non-small cell lung cancer (NSCLC) for one standard deviation increase in mdNLR (n = 150 pairs; OR: 1.47 [1.08, 2.02]). However, higher mdNLR values were not statistically associated with overall risk of lung cancer in our study. This association was comparable for NSCLC cases diagnosed within 10 years and beyond 10 years after blood draw. mdNLR was not statistically significantly associated with risk of total lung cancer or small cell lung cancer (SCLC). In addition, immune cell ratios for CD4/CD8, NLR, B-cell/lymphocyte, T-cell/lymphocyte, and monocyte/lymphocyte did not exhibit statistically significant associations with risk of lung cancer, either overall or risk for lung cancer specific subtypes (Table 2).
Methylation-derived CRP scores and lung cancer risk
CRP Score 1 was built using 54 CpG sites that were previously associated with inflammatory markers, while CRP Score 2 and 3 were each built with a subset of these 54 CpGs that were putative cell-specific or cell type invariant respectively. Using data from a previously published pancreatic cancer dataset (25), all three scores were moderately correlated with log CRP and log IL-6 levels (Table 3). In this nested case-control study, CRP Scores 1, 2, and 3 were not associated with lung cancer risk when taking into account the matching factors and adjusting for BMI and four surrogate variables for batch effects (n = 208 pairs; Score 1 OR: 0.96 [0.77, 1.21]; Score 2 OR: 0.89 [0.71, 1.11]; Score 3 OR: 1.11 [0.89, 1.40]).
However, when additionally adjusting for methylation-predicted pack-years, inverse associations with total lung cancer risk were observed for Score 1 (OR: 0.76 [0.59, 0.99]) and Score 2 (OR: 0.77 [0.61, 0.98]). We also observed a 33% decreased risk of lung cancer for one standard deviation increase in CRP Score 1 (OR: 0.67 [0.47, 0.97]) among those with time to diagnosis over 10 years. All three CRP Scores were inversely associated with risk of NSCLC risk after additionally adjusting for methylation-predicted pack-years (n = 150 pairs; Score 1 OR: 0.57 [0.40, 0.81]; Score 2 OR: 0.62 [0.45, 0.84]; Score 3 OR: 0.65 [0.44, 0.95]). In addition, we found statistically significant inverse association between CRP Score 1 and risk of NSCLC among cases diagnosed within 10 years and beyond 10 years, and between Inflammation Score 2 for cases diagnosed within 10 years of blood draw (Table 2).
Survival analysis
We examined whether the immune cell ratios and CRP Scores were associated with risk of dying of lung cancer among lung cancer cases (Table 4, Figure 1). We observed a 27% increased risk of dying from lung cancer for one standard deviation increase in mdNLR (n = 205 cases deleted 3 cases with person-year = 0 or > 25 years; HR: 1.27 [1.08, 1.50]). A 50% increased risk of dying for one standard deviation of mdNLR was observed for NSCLC cases (n = 149 cases; HR: 1.50 [1.19, 1.89]). Among the NSCLC cases whose mdNLR was from <=10 years before their diagnosis, we found a 78% increased risk of dying for one standard deviation increase in mdNLR (HR: 1.78 [1.17, 2.73]). In comparison, the risk of dying for one standard deviation increase in mdNLR was slightly attenuated to 42% among the NSCLC cases whose mdNLR was from >10 years before their diagnosis (HR: 1.42 [1.03, 1.95]). Immune cell ratios for CD4/CD8, B-cell/lymphocyte, T-cell/lymphocyte, and monocyte/lymphocyte were not associated with lung-cancer specific death, with the exception of a 40% increased risk for one standard deviation increase in B-cell/lymphocyte ratio among the NSCLC cases whose methylation-predicted ratio was from >10 years before diagnosis (HR: 1.40 [1.01, 1.92]). Furthermore, the three CRP Scores were not associated with lung cancer-specific death, except for a 47% decreased risk of dying for one standard deviation increase in CRP Score 1 among the NSCLC cases with BMI ≥ 25 kg/m2 (HR: 0.53 [0.28, 1.99]).
Discussion
Our study prospectively assessed predicted immune profiles using DNA methylation markers and examined associations between previously identified DNA methylation markers of inflammation and lung cancer risk and survival. Using pre-diagnostic blood samples of lung cancer cases and controls who participated in the CLUE I/II cohorts [23], pre-diagnosis mdNLR was associated with increased risk of NSCLC, and among cases, with total lung cancer and NSCLC lung cancer specific death. In addition, we built a series of methylation-derived inflammation scores to capture individual systemic inflammatory profiles years before lung cancer diagnosis; these scores were inversely associated with risk of lung cancer, especially for NSCLC after adjusting for methylation-predicted packyears smoked.
Studies on NLR and lung cancer risk and survival typically measure pre-treatment NLR at diagnosis or up to 30 days prior to treatment (26-28). Unlike prior studies, we were able to assess individual systemic inflammation profiles many years prior to diagnosis by using methylation markers of inflammation. Our study is not directly comparable to prior studies since we measured mdNLR using blood samples from subjects with a median of 14 years prior to lung cancer diagnosis. To our knowledge, only one other cohort, the multicenter β-Carotene and Retinol Efficacy Trial (CARET), examined pre-diagnosis mdNLR and lung cancer risk and survival using blood drawn years prior to diagnosis (median 4.7 years) (29, 30). In this study of heavy smokers from CARET, researchers reported a 21% increased risk of lung cancer per one standard deviation increase in mdNLR (OR: 1.21 [1.01, 1.45]), a 30% increased risk of NSCLC for one standard deviation increase in mdNLR (OR: 1.30 [1.03, 1.63], and no association between higher pre-diagnosis mdNLR and risk of developing SCLC (OR: 1.06 [0.77, 1.47]) (29). In contrast, we found no statistically significant association between mdNLR and overall risk of lung cancer in CLUE, but did observe a 47% increased risk of NSCLC for one standard deviation increase in mdNLR (n = 150 pairs; OR: 1.47 [1.08, 2.02]).
CARET researchers recently reported that pre-diagnosis mdNLR is positively associated with increased mortality for SCLC cases, but not for other case types (30). In comparison, we observed a positive association between pre-diagnosis mdNLR and lung cancer-specific and NSCLC-specific mortality. In the case of SCLC, the number of cases was too limited for us to estimate stable associations (N = 29). Taken together, the CLUE and CARET results suggest that a systemic inflammatory profile marked by elevated NLR could indicate a lesser ability to mount a robust immune response to a developing lung cancer and/or a more favorable environment for cancer progression. Differences in findings between the two studies could stem from differences in study populations. The CARET cohort is exclusively heavy smokers, including a subgroup exposed to asbestos. In comparison, our analysis in the CLUE I/II cohorts included never, ever, and current smokers. Furthermore, our study population had a lower mdNLR in the lung cancer cases (mean 1.48 and SD 0.82) than in CARET (mdNLR mean 2.18 and SD 1.46).
We also investigated three CRP Scores that we built from 54 CpG sites that had been strongly associated with CRP in previous studies. We found these CRP Scores to be moderately correlated with log-CRP and log-IL6 in the controls of a previously published pancreatic cancer dataset (25). CRP is a systemic marker of chronic inflammation and has been reported as a risk factor for cancer development (31). Previous studies of pre-diagnostic circulating CRP concentration and lung cancer risk (7 cohorts (8, 9, 17, 32-34) and 3 nested case-control studies (6, 10, 35)) have consistently found a moderate positive association between pre-diagnostic CRP concentrations and lung cancer risk. In our study, CRP Scores were not associated with lung risk when taking into account the matching factors, BMI, and batch effects. However, we observed an inverse association when additionally adjusting for methylation-predicted pack-year. Our results suggest that when strict control of smoking is applied, our CRP Score is likely capturing the unique individual immune response that is not driven by smoking. Furthermore, these results provide preliminary evidence supporting the hypothesis that systemic inflammation not driven by smoking could have a protective effect on individuals. While smoking is by far the most important risk factor for lung cancer, our DNA methylation-based CRP Scores provide the opportunity to examine inflammatory measures not related to smoking that could play a role in modulating cancer risk years prior to diagnosis.
Like other observational studies, our study included a limited number of NSCLC and SCLC cases. The relatively small sample size of SCLC cases (N = 29) impacted our ability to observe associations for this subtype (SCLC comprises about 15% of lung cancer cases in the U.S.). Our study is also limited by a lack of replication dataset and reduced generalizability (study population is mainly Caucasian and with very few cases of never smokers). The CRP Scores we built should be investigated in other populations to ensure that what we observed did not arise due to chance.
Conclusions
Our study suggests that elevated pre-diagnosis mdNLR and a lower non-smoking-related systemic inflammatory profile before diagnosis are associated with higher cancer risk and poorer lung cancer-specific survival. These relationships were especially evident for NSCLC. As the most common subtype of lung cancer, most NSCLC cases are diagnosed with locally advanced or metastatic disease. Our prospective results support future evaluation of whether DNA methylation-based inflammatory measures could enhance lung cancer risk stratification to improve targeted lung cancer screening.
Methods
Study Population
This nested case-control study selected cases and controls from individuals who participated and provided blood in both CLUE I and CLUE II (24). The CLUE I cohort was developed to identify serologic precursors of cancer and was conducted in Washington County, Maryland, in the fall of 1974. A blood sample was collected from 25,620 volunteers at the time of participation (36, 37). The CLUE II cohort was conducted from May through October 1989. During this time, all participants donated a blood sample which was collected in tubes containing heparin from 32,894 individuals and kept chilled until centrifuged, aliquotted into plasma, erythrocytes, and buffy coat and frozen at 70° C (38). In CLUE II, the baseline for this study, health information was collected at the time of blood draw, including attained education, cigarette smoking status, cigarette smoking dose, cigar/pipe smoking status, and self-reported weight and height.
Incident lung cancer cases were ascertained from linkage to the Washington Co. cancer registry (before 1992 to the present) and the Maryland Cancer Registry (since 1992 when it began to the present). We ascertained 241 incident lung cancer cases who participated in CLUE I and were diagnosed after the day of blood draw in CLUE II through January 2018. Cases were characterized with respect to histology. We used incidence density sampling to select one control matched to each case on age, sex, and smoking intensity. The Institutional Review Board at the Johns Hopkins Bloomberg School of Public Health and the Tufts University Health Sciences Campus Institutional Review Board approved this study.
DNA methylation measurements
Extracted DNA was bisulfite-treated using the EZ DNA Methylation Kit (Zymo) and DNA methylation was measured with the 850K Illumina Infinium MethylationEPIC BeadChip Arrays (Illumina, Inc, CA, USA). All samples and all array experiments were performed blinded to case control status. Details on DNA methylation measurements, data preprocessing processing and quality control assessment/screening are provided in the Supplementary Methods. The 850K methylation microarray has been validated from a biological and technical standpoint. Reproducibility of results from 850K Illumina array has been previously shown to be very high (r=0.997) (39). DNA volume and quality was sufficient for 208 of the cases and 222 controls for a total of 208 matched pairs.
Estimation of peripheral blood leukocyte composition
Peripheral blood leukocyte subtypes proportions, including CD4+ T cells (CD4T), CD8+ T cells (CD8T), B cells (Bcell), neutrophils (Neu), natural killer cells (NK), and monocytes (Mono) were estimated using the “estimateCellCounts2” function in the FlowSorted.Blood.EPIC Bioconductor package (40), which is based on previously published reference-based cell mixture deconvolution algorithm with reference library selection conducted using the IDOL methodology (41).
Methylation-Derived Neutrophil Lymphocyte Ratio (mdNLR)
The peripheral blood neutrophil–to-lymphocyte ratio (NLR) is a cytological marker of both inflammation and poor outcomes in cancer patients (42-46). We used a DNA methylation-derived NLR (mdNLR) index to predict the common clinical NLR parameter using a previously described approach (7). This index is based on normal isolated leukocyte reference DNA methylation libraries and established reference-based cell-mixture deconvolution algorithms (7, 47).
Inflammation-associated CpG score
We selected 54 CpG sites that have been strongly associated with inflammatory measures to build a CRP Score (48, 49). Ligthart and colleagues identified 58 CpGs (48) that were associated with serum C-reactive protein (CRP) levels (listed in their Table 2) using 450K DNA methylation data. 45 of these 58 CpG sites were validated to have same direction of protein-methylation associations by Myte et al (49). We used 54 of the 58 CpG sites reported by Ligthart et al. in this study since the remain 4 CpG sites were not contained on the 850K array we used to measure methylation. To compute the CRP Score, we multiplied the beta value at each selected CpG sites with the effect size estimates reported by Ligthart et al. These estimated beta coefficients represented the change in DNA methylation per one unit increase in log CRP. In the CRP Score formula, we weighted the beta coefficients estimated by Ligthart et al. with their corresponding standard errors. We named this score CRP Score 1. Since most of the estimated beta coefficients are negative, CRP Score 1 ranged between -0.059 and -0.026. A score closer to zero indicated higher CRP levels. Based on CRP Score 1, we computed two additional CRP Scores, one cell (leukocyte)-type invariant (CRP Score 2) and one cell-specific (CRP Score 3). Among the 54 inflammation (CRP)-associated CpGs we identified putative cell-type invariant and cell-specific CpGs by conducting ANOVA analysis using the data set described in Salas & Koestler et al. (40) and publicly available on the Gene Expression Ombinbus (GSE110555). The data set used for this ANOVA analysis consisted of EPIC methylation data profiled in purified leukocyte cell population isolated from different healthy adults. Specifically, methylation signatures were available for CD4T, CD8T, NK, Bcell, Monocytes, and Neutrophils. One-way ANOVA models were fit independently to each of the 54 CRP-associated CpGs treating methylation as the dependent variable and cell type as the independent variable. We tested the null hypothesis that the mean methylation beta-value is the same across the cell types. The F-statistic, corresponding p-value, and maximum absolute pairwise difference in the mean methylation beta value across cell types were calculated for each of the 54 CpGs. We then selected subgroups of CpG sites that had the top 10 smallest or top 10 largest F-statistic value to build two additional CRP Scores. CRP Score 2 consists of putative cell-specific CpGs with high F-statistics, e.g., those exhibiting a difference in mean methylation beta-values between at least two of the six cell types. CRP Score 3 is made of cell type invariant CpGs with low F-statistics, e.g., CpGs for which there did not appear to be a substantial difference in mean methylation beta-values across the normal six leukocyte subtypes. Score 2 ranged between -0.0002 and 0.0046 while Score 3 ranged between -0.025 and -0.016. In the subsequent statistical analyses, we used a standardized version of CRP Score 1, 2, and 3 (mean = 0, sd = 1) in all regressions for easier interpretation of results.
Statistical analyses
All statistical analyses were performed in R (version 3.5.1). Immune cell ratios (e.g., CD4/CD8, neutrophil/lymphocyte, B cell/lymphocyte, T cell/lymphocyte) were calculated for each sample by taking the ratio of its predicted cell proportions described above. We used a pancreatic cancer dataset to estimate the Spearman’s rank correlation and Pearson correlation between estimated values of CRP Scores 1, 2, and 3 with the log CRP and log IL-6 levels (25).
To account for the 1:1 case/control matching structure in our data, a series of multivariable conditional multivariable logistic regression models were used to examine the association between DNA methylation-based inflammatory measures (CRP Scores 1-3 and continuous mdNLR) and lung cancer risk. Models were fit with age, sex, and smoking status (never, former, current) as matching factors, and were adjusted for potential confounding factors, including body mass index (BMI), batch effect, and previously described methylation-predicted pack-years smoked (50). We repeated these analyses by lung cancer histology (NSCLC, SCLC), length of time between blood draw and diagnosis (<=10, >10 years), and BMI (<25, ≥ 25 kg/m2). Among the lung cancer cases, we examined the association between these same pre-diagnostic DNA methylation-based inflammatory measures (CRP Scores 1-3 and continuous mdNLR) and risk of lung cancer specific death using a series of multivariable Cox proportional hazard regression adjusting for age, gender, smoking status, BMI, stage (three strata: stage 1 & 2, stage 3 & 4, and missing), cell proportion, batch effects, and methylation-predicted pack-years smoked.
Data Availability
The datasets generated during the current study are available from the corresponding author on reasonable request and will be deposited into dbGaP by publication.
Declarations
Ethics approval and consent to participate
The Institutional Review Board at the Johns Hopkins Bloomberg School of Public Health and the Tufts University Health Sciences Campus Institutional Review Board approved this study.
Consent for publication
NA
Competing interests
The authors report no conflicts of interest.
Funding
This work was supported by 2018 American Association for Cancer Research (AACR)-Johnson & Johnson Lung Cancer Innovation Science (18-90-52-MICH).
Note: The funders had no role in the design of the study; the collection, analysis, and interpretation of the data; the writing of the manuscript; and the decision to submit the manuscript for publication.
Authors’ contributions
DSM., KTK and EAP designed the study, obtained funding and acquisition of data. JL assisted with preparation of dataset. DSM supervised all research activities. MR and NZ conducted the statistical analyses. NZ drafted the manuscript. DSM., EAP, KTK, and DCK interpreted the data and provided critical revisions of the manuscript. All authors read and approved the final version of the manuscript.
Acknowledgements
Cancer incidence data were provided by the Maryland Cancer Registry, Center for Cancer Surveillance and Control, Maryland Department of Health, 201 W. Preston Street, Room 400, Baltimore, MD 21201. We acknowledge the State of Maryland, the Maryland Cigarette Restitution Fund, and the National Program of Cancer Registries of the Centers for Disease Control and Prevention for the funds that helped support the availability of the cancer registry data.