Abstract
Background The relative risk of smoking on lung cancer have been reported to be much higher in white population than that in East Asians. However, it’s unknown whether genetic background underlies this disparity between ethnic groups. To assess the role of ethnic differences in genetic factors associated with this phenomenon.
Methods We first constructed ethnic-specific polygenic risk scores (PRSs) to quantify individual genetic risk of lung cancer in Chinese and white populations. Then, we compared genetic risk and smoking as well as their interactions on lung cancer between two cohorts, including the China Kadoorie Biobank (CKB) and the UK Biobank (UKB). We also evaluated the absolute risk reduction over a 5-year period.
Results 19 SNPs and 23 SNPs were identified to construct the PRSs in Chinese and white populations, and smoking-related loci were only included in white populations. The PRSs were consistently associated with lung cancer risk respectively, but stronger associations were observed in smokers of the UKB (HR 1.26 versus 1.15, P=0.028). A significant interaction between genetic risk and smoking on lung cancer was observed in the UKB (RERI, 11.39 [95% CI, 7.01-17.94]), but not in the CKB. By comparing heavy smokers with nonsmokers, a greater absolute risk reduction was found in the UKB (10.95 versus 7.12 per 1000 person-years, P<0.001), especially for those at high genetic risk.
Conclusions In China, tobacco control alone is not enough to reduce the burden of lung cancer, and comprehensive policies should be made to lower its high incidence.
Background
Lung cancer continues to be the leading cause of cancer death worldwide. Although tobacco smoking is the main risk factor for lung cancer, the reported relative risks (RRs) and absolute risks for lung cancer morbidity associated with tobacco smoking are much lower in East Asians (RRs, 2.4-6.5) than in white population (RRs, 9.4-23.2) (1-3). Several theories have been proposed to explain this phenomenon, “the smoking paradox”, but few were based on solid data, especially with regard to ethnic differences in genetic factors (4). Tobacco control efforts implemented in high income countries since the 1960s have led to considerable reductions in lung cancer (5). However, the potential impact of tobacco control on the burden of lung cancer in Chinese population still needs further evaluation.
The development of lung cancer results from an intricate interplay between genetic and environmental factors, and the heritability of lung cancer has been estimated at 15%-18% (6, 7). In the past decade, genome-wide association studies (GWASs) of lung cancer have identified 51 risk loci, most of which were derived from populations of white and East Asian (8). Recent large-scale population studies have demonstrated that the combined effect of these genetic loci, polygenic risk score (PRS), can serve as an efficient tool to quantify individual inherent risk of lung cancer in populations of white and Chinese descents (9, 10). However, the two previously reported ethnic-specific PRSs for lung cancer are not directly comparable because of the differences in underlying genetic architectures, constructing strategy, and selecting susceptibility loci.
To comprehensively explore the genetic disparity underlying “the smoking paradox”, we generated Chinese-specific and White-specific PRSs for lung cancer by using the largest available GWAS datasets of lung cancer for populations of Chinese and white with unified standards and processes. Then, we used data from two nationwide prospective cohorts, the China Kadoorie Biobank (CKB) and the UK Biobank (UKB), to evaluate the effectiveness of the PRSs in predicting lung cancer risk and to dissect the complex relations between smoking and genetic risk in Chinese and white populations, respectively.
Methods
Study populations
The CKB is a population-based prospective cohort study in China. The study design for the CKB has been described previously (11). In brief, a total of 512,714 Chinese adults aged 30-79 years were recruited from ten geographically diverse regions across China between 2004 and 2008. All participants completed an interviewer-administered electronic questionnaire on smoking and other health-related information, underwent physical measurements, and provided blood samples at baseline. A total of 100,641 participants were selected for genotyping based on a clustered random selection method (12). For the present study, we included individuals with both genotypic and phenotypic data and excluded those with lung cancer diagnosed before baseline, leaving 100,615 eligible participants in the final analysis.
The UKB is also a population-based cohort study, with more than 500,000 participants aged 37-73 years who were recruited from 22 centers throughout the United Kingdom between 2006 and 2010. Details of the study have been described previously (13). Each participant provided information on smoking and other health-related information through extensive baseline questionnaires, interviews, physical measurements, and a blood sample collected for genotyping. Of the 502,527 available participants, we excluded those withdrawing from the UKB, of non-white decent, with lung cancer at baseline, with missing information on genotypes and smoking, or those failed in quality control of genotypes. Overall, 406,880 eligible participants were included in the present study.
Each participant in the CKB and the UKB completed a written informed consent form. The CKB has been approved by the Ethical Review Committee of the Oxford Tropical Research Ethics Committee, University of Oxford and the Chinese Center for Disease Control and Prevention. The UKB has been approved by the multicenter Research Ethics Committee, the National Information Governance Board for Health and Social Care in England and Wales, and the Community Health Index Advisory Group in Scotland.
Procedures
We used the largest available GWAS datasets of lung cancer in populations of Chinese (13,327 cases and 13,328 controls) (9) and of white descent (29,266 cases and 56,450 controls) (14) to evaluate the associations and corresponding effects of all the previously reported single-nucleotide polymorphisms (SNPs; e.g. 81 SNPs in 51 loci) associated with lung cancer risk. Meanwhile, a meta-analysis dataset of lung cancer with balanced sample sizes of white descent (13,793 cases and 14,027 controls from the INTEGRAL-ILCCO OncoArray Project) and Chinese descent (13,327 cases and 13,328 controls) was used to evaluate selective candidate variants to construct a trans-ancestry PRS for sensitivity analysis (9, 10). Details of these GWAS datasets have been reported elsewhere previously (9, 10, 14). The criteria to exclude the redundant variants and the process for genotyping and imputation used in the CKB and UKB are described in the Supplement.
The PRSs were created following an additive genetic model as previously described. In short, the dosage of each risk allele for each individual was summed after multiplication with its respective weight (e.g. the Ln of the odds ratio [OR]) derived from the datasets mentioned above. The processes of PRS construction were blinded to the endpoints in the CKB and the UKB.
Smoking measures were self-reported at initial assessment with an interviewer-administered questionnaire in the CKB or a touchscreen in the UKB (11, 13). Based on pack years smoked, participants were defined as nonsmokers (less than 100 cigarettes in lifetime), light smokers (pack-years<30), and heavy smokers (pack-years≥30). The assessment of covariates is provided in the online data supplement.
The primary outcome for analysis was the event of incident lung cancer, classified by the 10th Revision of International Classification of Diseases (ICD-10 codes C33-34). Participants in the CKB cohort were followed up through ongoing electronic linkage with the Chinese national health insurance claim database, established chronic disease registries, and local death registries semi-annually, supplemented by active confirmation of cancer diagnosis by trained staff. Complete follow-up for the CKB was available through December 31, 2016. For the UKB, incident lung cancer events were ascertained through linkage to national cancer registries in England, Wales and Scotland. The complete date of follow-up was March 31, 2016 for England and Wales, and October 31, 2015 for Scotland.
Statistical analyses
We assessed lung cancer risk in participants from the enrolment until the time of lung cancer diagnosis, death, or the end of follow-up, whichever occurred first. We assessed a potential nonlinear relationship between the PRS and lung cancer risk by use of restricted cubic spline analysis. Cox proportional hazard models were used to assess associations between genetic factors and smoking with lung cancer incidence and to estimate hazards ratios (HRs) and 95% confidence intervals (CIs). Schoenfeld residuals were used to test the proportional hazards assumption. The genetic risk was categorized into low (the bottom quintile), intermediate (quintiles 2-4) and high (the top quintile) based on distributions of PRSs, as described previously (12). Relative excess risk due to interaction (RERI) and the attributable proportion because of the interaction (AP) were calculated to measure the interaction on the additive scale (15). The mediation proportion by the mediator was calculated by comparing estimates from models with and without the hypothesized mediator (16). We then calculated cumulative risk as the incidence of lung cancer occurring in a given group during follow-up. We also calculated absolute risk reduction as the difference in lung cancer incidence between given groups over a 5-year period.
We performed several sensitivity analyses to examine the robustness of the results: (1) trans-ancestry PRSs with the same SNPs and effects were generated simultaneously to define genetic risk of the participants; (2) additional environmental exposures were included in the model, including passive smoking and ambient PM2.5 concentration; (3) genetic risk levels were reclassified by quartile or tertiles; (4) smoking status was reclassified as never, former, and current smokers; (5) participants who were diagnosed with lung cancer within the first year of follow-up were excluded; and (6) analysis were restricted to participants with complete covariates for comparison with the results of imputation. All P-values were two-sided, and P<0.05 was considered statistically significant. All statistical analyses were performed in R (version 3.5). Further details about statistical analyses are provided in the online data supplement.
Results
The study design is shown in Figure E1, and the baseline characteristics of participants are provided in Table 1. More smokers were observed in the UKB (45.16%) than in the CKB (34.38%). In the CKB, during a median follow-up of 10.42 years (IQR 9.34-11.30), 1,392 incident lung cancer cases were diagnosed; while there were 2,025 incident lung cancer cases in the UKB during a median follow-up of 7.17 years (IQR 6.48-7.75). In both the CKB and UKB, differences in the incidence of lung cancer were observed among nonsmokers, light smokers, and heavy smokers, while similar incidence was observed between male and female under the same smoking status (Table E1).
The associations of 81 reported susceptibility loci with lung cancer in GWAS datasets of Chinese population and white population are shown in Table E2. After systematic evaluation and filtering, 19 SNPs and 23 SNPs were retained for the calculation of Chinese-specific (PRS-19) and White-specific PRSs (PRS-23), respectively (Figure E2). As shown in Figure 1, the majority of the susceptibility loci were ethnic specific, and differences in the frequency and association effects were also observed in shared loci between two populations. Specifically, the loci of 8p21.2-CHRNA2 (rs11780471), 15q25.1-CHRNA5 (rs55781567), and 19q13.2-CYP2A6 (rs56113850) that were smoking-related (14), showed higher frequencies and greater effects in white population.
The PRS-19 and PRS-23 were consistently associated with lung cancer risk in a linear and dose-response relationship in the CKB and UKB, respectively (Figure E3). Compared with participants at low genetic risk, participants at intermediate and high genetic risk had significantly greater risks of lung cancer, with HRs of 1.25 (95% CI, 1.08-1.45) and 1.62 (95% CI, 1.37-1.92) in the CKB, and 1.44 (95% CI, 1.26-1.64) and 1.87 (95% CI, 1.62-2.17) in the UKB, respectively (Table 2). In nonsmokers, similar linear associations and effects were observed between two cohorts (Pheterogeneity=0.898); however, in smokers, greater associations were observed in the UKB than in the CKB (HR 1.26 [95% CI, 1.20-1.32] versus 1.15 [95% CI, 1.07-1.23] per SD of PRSs increase; I2=79.3%, Pheterogeneity=0.028) (Figure 2). In addition, the observed associations were attenuated, if the ethnic-specific PRSs were cross-used (Table E3). These results did not change significantly when additional environmental exposures were included in the model, or genetic risk was reclassified by quartile or tertiles of the PRSs, or the analysis were restricted to participants without missing covariates, or by excluding incident cases occurred during the first year of follow-up (Table E4-7).
We also observed “the smoking paradox” that the relative risks were 2.87 (95% CI, 2.40-3.44) in the CKB and 15.79 (95% CI, 13.77-18.10) in the UKB among heavy smokers compared with nonsmokers. The associations did not change after further adjustment for genetic risk (Table E8), and similar results were observed in current smokers compared with nonsmokers (Table E9). The PRS-19 was not associated with smoking amount (i.e., pack-year) in the CKB, but the PRS-23 was significantly associated with smoking amount in the UKB (P<0.001, Table E10), which mediated a proportion of 2.06% (95% CI, 1.41%-2.90%) for the association between PRS-23 and incident lung cancer (Figure E4). To rule out the influences of different PRS compositions, we also constructed a PRS based on 25 SNPs from a trans-ancestry GWAS meta-analysis (Figure E5), and observed a similar mediation effect only in the UKB but not in the CKB (Table E11).
We further evaluated the joint effect of genetic risk and smoking on lung cancer risk and found that the HRs of participants with a high genetic risk and heavy smoking were 4.95 (95% CI, 3.61-6.77) and 25.63 (95% CI, 18.58-35.36) in the CKB and the UKB, respectively, compared with those with a low genetic risk and never smoking (Figure 3). Furthermore, we observed significantly additive interactions between genetic risk and smoking on incident lung cancer in the UKB but not in the CKB (Table 3). Specifically, for heavy smokers with a high genetic risk, the RERI was 11.39 (95% CI, 7.01-17.94), accounting for 44% (95% CI, 32%-55%) of the risk in those participants who had both a high genetic risk and heavy smoking in the UKB. We repeated the analyses by using the trans-ancestry PRS, reclassifying genetic risk levels by quartile or tertiles of the PRS, or excluding incident lung cancer occurred during the first year of follow-up, and observed similar additive interactions in the UKB but not in the CKB (Figure E6 and Table E12-16).
A higher standardized 5-year absolute risk of lung cancer was observed in nonsmokers of the CKB than that in the UKB (3.07 versus 0.66 per 1000 person-years), whereas similar absolute risks were observed in heavy smokers between two cohorts (10.19 versus 11.60 per 1000 person-years) (Figure 4). As a result, a greater risk reduction (never smokers versus heavy smokers) was found in the UKB than that in the CKB (P<0.001), with an estimation of 10.95 (95% CI, 9.89-11.96) and 7.12 (95% CI, 5.87-8.37) per 1000 person-years, respectively. In participants of low genetic risk, the reductions were comparable between two cohorts (6.54 [95%CI, 4.69-8.15] versus 7.05 [95%CI, 4.38-9.43] per 1000 person-years in the UKB and CKB, respectively). However, in those of high genetic risk, the reductions were expanded in the UKB (14.57 [95%CI, 12.06-17.05] per 1000 person-years) compared with that in the CKB (8.01 [95%CI, 5.27-10.47] per 1000 person-years). Similar patterns were noted by reanalyzing with the trans-ancestry PRS (Figure E7) or reclassifying genetic risk levels by quartile or tertiles of the PRS (Table E17-18).
Discussion
In the present study, we integrated the largest GWASs of lung cancer and nationwide prospective cohorts in populations of Chinese and white descents to disaggregate the effects of genetic risk and smoking on lung cancer incidences. Our results indicated that even though the Chinese-specific and White-specific PRSs were consistently associated with risk of lung cancer respectively, ethnic differences were observed for the compositions, relative risks, mediation effects, and additive interactions with smoking of the two PRSs. Therefore, to the best of our knowledge, this study is the first to provide convincing evidences that ethnic differences in genetic background are involved in “the smoking paradox” between Chinese population and white population.
Studies have shown that PRS, as an indicator of genetic risk, can efficiently predict incidence of site-specific cancer and overall cancer (9, 12, 17, 18). For lung cancer, ethnic-specific PRSs have been shown to be effective in discriminating subpopulations at high risk of lung cancer and informing the optimal lung cancer screening strategy (9, 10). Although only 23 SNPs were included in the construction of white-specific PRS, a similar association was observed compared with that of previously used PRS with 114 SNPs in the UKB (10). In consistent with previous findings of ethnic heterogeneity in some lung cancer susceptibility loci (8), the present study provided a comprehensive panorama of genetic differences for lung cancer, and highlighted the role of smoking-related genetic loci in lung cancer susceptibility of white population. Furthermore, our findings also support the notion that an ethnic-specific PRS predicts individual risk more accurately (19).
Several possible explanations have been proposed for the “smoking paradox”, such as difference in epidemics of cigarettes exposure between developed and developing countries (3, 4) or in toxicity and filters changing over time in different countries (20). Our study further indicated that interactions between genetic risk and smoking may be another one of the important reasons for the “smoking paradox” in white population, in that a relative excess risk of up to 44% could be explained by the observed additive interaction. The interactions may be due to smoking addiction-related susceptibility loci (such as CHRNA5 in 15q25.1) have higher frequency and stronger effects in white population than in Chinese population. However, the interaction was unlikely to be simply mediated by the number of cigarettes smoked, because only 2.06% of the associations between PRS and lung cancer risk could be explained by smoking amount. These results suggested that high genetic risk was probably associated with multiple risk mechanisms for lung cancer in white population, including delayed smoking cessation, increased intensity of smoking exposure, and potential impact on treatment response (21). These findings reveal the genetic basis of the strong association between smoking and lung cancer in white population, and could help explain the huge reduced lung cancer burden in white population after tobacco control during the past decades (22).
Another possible explanation for the “smoking paradox” is a high incidence of lung cancer in nonsmokers of East Asian (23), which is also supported by our cohort study. For example, we found that the estimated age-standardized 5-year absolute risks in nonsmokers were more than 3 times higher in the CKB than that in the UKB across genetic risk groups. In addition, differences in genetic risk are probably not the reason for the observed high incidence of lung cancer in nonsmokers of Chinese, because we also observed a lower distribution of trans-ancestry PRS in the CKB than that in the UKB (Figure E8). These results indicated that environmental risk factors beyond smoking, especially those of high exposure levels in Chinese populations, need to be further explored in relation to this discordance. For example, recent studies have shown that exposure to high concentrations of PM2.5 in the ambient environment could increase lung cancer risk in both Chinese population and white population; however, the average exposure concentration was estimated to be 65 μg/m3 in China between 2000-2015, compared with 10 μg/m3 in the UK around 2010 (24-26). Therefore, our results indicate that further efforts are needed to clarify and control the causes for the high incidence of lung cancer in nonsmokers of Chinese population.
Here, we further showed that the absolute risks of lung cancer were reduced for nonsmokers compared with smokers across genetic risk groups in Chinese population and white population, consistently. Therefore, our findings support the notion that public efforts to promote smoking cessation will lead to an overall reduction of lung cancer risk across ethnic groups (27). However, we observed that the benefits of tobacco control would be greater in white population than that in Chinese population, especially for those at high genetic risk. This indicated that precision interventions for smoking cessation based on genetic risk are feasible for white population (28), but not for Chinese population. Taken together, our results showed that it was more complicated to control lung cancer epidemic in China, and comprehensive policies against smoking and nonsmoking risk factors should be made to lower its high incidence.
The present study has several strengths, including the large sample size from two well-ethnically defined lung cancer GWASs and two well established nationwide prospective cohorts in China and the UK; the standardized approaches to assess individual genetic risk of lung cancer simultaneously; and a series of sensitivity analyses to show the robustness of the findings. Nevertheless, we also acknowledge several limitations. First, the sample size of lung cancer GWAS is obviously larger in white population than that in Chinese, which may lead to the reported lung cancer susceptibility loci more relevant to white population. Second, information on smoking was mainly self-reported and only measured once; thus, misclassification was inevitable and behavioral changes during the follow-up may have an effect on risk estimates. Third, the calculation of pack-years assumed that cumulative number of cigarettes had the same health effects, which might not be true. Fourth, although personal characteristics and comorbidities were controlled in the present study, additional potential confounders, e.g. occupational exposure were not assessed in the cohorts, might result in potential residual confounding. Finally, both CKB and UKB were not designed to include a representative study population (11, 13); therefore, further investigations are warranted to evaluate to what degree these findings may be generalized to the general population.
In summary, our comprehensive analysis demonstrated that ethnic differences of genetic factors were involved in “the smoking paradox” observed between Chinese population and white population. Specifically, our results highlighted that white population were more susceptible to lung cancer caused by smoking and had a greater benefits of smoking cessation, especially in high genetic risk population. Moreover, nonsmokers of Chinese had consistently higher absolute risk of lung cancer than those of white population across genetic risk groups. These results collectively indicate that tobacco control alone is not enough to reduce the burden of lung cancer in China, and more comprehensive policies against smoking and nonsmoking risk factors should be made to lower the high incidence of lung cancer in China.
Data Availability
Researchers can apply to use the UK Biobank and the China Kadoorie Biobank resource and access the data used. No additional data are available.
Supplementary Information
The online version contains supplementary material.
Authors’ Contributions
HS, LL and ZH supervised the entire project and design the work. MZ, JL, YH, and HM contributed to the data analysis, data interpretation, and writing of the report. XW, MJ, ZM, CW, JD, NL, YG, RW, IM, RH, DC, CY, GJ, ZC, QW, and CA contributed to the discussion, data interpretation and revised the manuscript. All authors reviewed or revised the manuscript and approved the final draft for submission. The authors declare no conflicts of interest.
Funding
This work was supported by National Natural Science Foundation of China (81820108028, 91943301, 81922061, 81973123, and 81803306); Research Unit of Prospective Cohort of Cardiovascular Diseases and Cancers, Chinese Academy of Medical Sciences (2019RU038); National Science Foundation for Post-doctoral Scientists of China (Grant No.2018M640466).
Availability of data and materials
Researchers can apply to use the UK Biobank and the China Kadoorie Biobank resource and access the data used. No additional data are available.
Declarations
Ethics approval and consent to participate
The CKB has been approved by the Ethical Review Committee of the Oxford Tropical Research Ethics Committee, University of Oxford and the Chinese Center for Disease Control and Prevention. The UKB has been approved by the multicenter Research Ethics Committee, the National Information Governance Board for Health and Social Care in England and Wales, and the Community Health Index Advisory Group in Scotland.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Acknowledgements
This research was conducted using the UK Biobank (Application Number: 48700) and China Kadoorie Biobank. We thank the investigators and participants involved in the UK Biobank and China Kadoorie Biobank for their contributions to this study. We also thank to all principal investigators who participated in the INTEGRAL-ILCCO OncoArray Project.
Abbreviations
- AP
- the attributable proportion because of the interaction
- CI
- confidence interval
- CKB
- the China Kadoorie Biobank
- GWAS
- genome-wide association study
- HR
- hazards ratio
- ICD-10
- the 10th Revision of International Classification of Diseases
- OR
- odds ratio
- PRS
- polygenic risk score
- RERI
- relative excess risk due to interaction
- RR
- relative risk
- SNP
- single-nucleotide polymorphism
- UKB
- the UK Biobank