Abstract
Objectives To develop and validate the QCancer2 (10-year risk) lung model for estimation of future risk of lung cancer and to compare the model performance against other prediction models for lung cancer screening
Design open cohort study using linked electronic health records (EHRs) from the QResearch database (1 January 2005 – 31 March 2020)
Setting English primary care
Participants 12.99 million patients aged 25-84 years were in the derivation cohort to develop the models and 4.14 million patients were in the validation cohort. All patients were free of lung cancer at baseline.
Main outcome measure Incident lung cancer cases
Methods There were two stages in this study. First, Cox proportional hazards models were used in the derivation cohort to update the QCancer (10-year risk) lung model in men and women for a 10-year predictive horizon, including two new predictors (pneumonia and venous thromboembolism) and more recent data. Discrimination measures (Harrell’s C, D statistic, and ) and calibration plots were used to evaluate model performance in the validation cohort by sex. Secondly, seven prediction models for lung cancer screening (LLPv2, LLPv3, LCRAT, PLCOM2012, PLCOM2014, Pittsburgh, and Bach) were selected to compare the model performance with the QCancer2 (10-year risk) lung model in two subgroups: (1) smokers and non-smokers aged 40-84 years and (2) ever-smokers aged 55-74 years.
Results 73,380 incident lung cancer cases were identified in the derivation cohort and 22,838 in the validation cohort during follow-up. The updated models explained 65% of the variation in time to diagnosis of lung cancer in both sexes. Harrell’s C statistics were close to 0.9 (indicating excellent discrimination), and the D statistics were around 2.8. Compared with the original models, the discrimination measures in the updated models improved slightly in both sexes. Compared with other prediction models, the QCancer2 (10-year risk) lung model had the best model performance in discrimination, calibration, and net benefit across three predictive horizons (5, 6, and 10 years) in the two subgroups.
Conclusion Developed and validated using large-scale EHRs, the QCancer2 (10-year risk) lung model can estimate the risk of an individual patient aged 25-84 years for up to 10 years. It has the best model performance among other prediction models. It has potential utility for risk stratification of the English primary care population and selection of eligible people at high risk for the targeted lung health check programme or lung cancer screening.
What is already known on this topic
Using risk prediction models to stratify people at the population level and selecting those at the highest risks is an efficient and cost-effective strategy for screening programmes. It avoids waste of resources in screening patients at low risk.
An ideal prediction model should have excellent discrimination and calibration in the target population.
The Liverpool Lung Project (LLPv2) and the Prostate Lung Colorectal and Ovarian (PLCOM2012) models had only moderate discrimination and were not well-calibrated when externally validated using the Clinical Practice Research Datalink (CPRD) data for the English primary care population.
What this study adds
Developed and validated using robust statistical methodologies, the QCancer2 (10-year risk) lung model shows excellent discrimination and calibration in both sexes. It can estimate an individual adult patient’s risk for each year of follow-up, for up to 10 years.
The QCancer2 (10-year risk) lung model has the best model performance in discrimination and calibration when compared with the other eight models (QCancer (10-year risk), LLPv2, LLPv3, LCRAT, PLCOM2012, PLCOM2014, Pittsburgh, and Bach) in three predictive horizons (5/6/10 years) and two sub-populations (smokers and non-smokers aged 40-84 years and ever-smokers aged 55-74 years).
The QCancer2 (10-year risk) lung model can be applied to the English primary care population to select eligible patients for the Targeted Lung Health Check programme or lung cancer screening using low dose CT.
Introduction
Lung cancer is a research priority in the UK. It is the third most common cancer in incidence and the most common cause of cancer death in the UK1. Lung cancer survival in the UK is worse than in other European countries (the EuroCare studies)2 and international counterparts (the ICBP studies)3,4. Research evidence from randomised controlled trials (RCTs) has shown that using low dose computerised tomography (LDCT) for lung cancer screening reduces mortality5–7. The United States Preventive Services Task Force (USPSTF) recommended using LDCT for lung cancer screening in 20138, and relaxed the eligibility criteria for screening in 2021, by lowering the age threshold from 55 to 50 years, and smoking exposure from 30 to 20 pack-years9. However, lung cancer screening is still not a routine service in the UK currently. The UK National Screening Committee will meet in June 2022 to discuss this issue further.
NHS England launched a new service, the Targeted Lung Health Check (TLHC), for ever-smokers aged 55-74 years registered with a GP in autumn 201910. Although this programme is expanding in geographical regions in England, patients outside the participating Clinical Commissioning Groups (CCGs) cannot access this service, which could be a potential issue of health inequality. The TLHC programme uses the Liverpool Lung Project (LLP, version 2)11 and the Prostate, Lung, Colorectal and Ovarian (PLCOM2012)12 cancer screening models to calculate individual patients’ risk scores as part of the eligibility criteria. Patients with risk scores higher than the thresholds (≥2.5% in LLPv2 or ≥1.51% in PLCOM2012) are considered at high risk of developing lung cancer and eligible for the TLHC programme. However, O’Dowd et al13 externally validated these two models using English primary care datasets (Clinical Practice Research Datalink, CPRD) and concluded that the two models did not have satisfactory discrimination (C statistic ≤0.7) and calibration. This raises the question of whether it is suitable to use these two prediction tools to select eligible participants from the English primary care population for TLHC and lung cancer screening. Even though the LLPv2 and PLCOM2012 models have been validated, they were developed on highly selected study samples. The LLP model was developed based on patients in Northwest England, where the age-standardised incidence of lung cancer is much higher than other regions of England14. The PLCOM2012 model was developed for the US population. The demographic and clinical characteristics of these study samples may be different from the English primary care population. Therefore, these two models may not be directly applicable to the English primary care population to select eligible participants for lung cancer screening (external validity and limited generalisability). Our team has previously developed and validated the QCancer (10-year risk) algorithms for various cancer sites15 using electronic health records (EHRs) from the QResearch database. A web calculator is available for public use to calculate an individual’s risk of various cancer for up to 10 years, at https://www.qcancer.org/10yr/male/ (for men) and https://www.qcancer.org/10yr/female/ (for women).
In order to provide timely research evidence for the UK National Screening Committee to expedite the decision making for population-based lung cancer screening programmes in the UK, we conducted this study to develop and validate multivariable prognostic models for lung cancer in men and women and compare the model performance of the QCancer2 (10-year risk) lung models against the other models for lung cancer screening, including the LLPv2 and PLCOM2012 models currently used in the TLHC programme in England.
Methods
There were two stages in this study. We first updated the QCancer (10-year risk) lung model with more recent data, a larger sample size, and new predictors and validated the model. The second stage was evaluating the model performance of relevant models for lung cancer screening using the same validation dataset with the same criteria and in two patient subgroups (details below). We have published a comprehensive research protocol and statistical analysis plan for the DART-QResearch project16 before conducting the analyses.
Study design, study population, and data source
This is a population-based cohort study. We used EHRs from the QResearch® database (version 45) as the data source. The QResearch® database is one of the largest healthcare databases in England, and a trusted research environment accredited by Health Data Research UK. The inclusion and exclusion criteria were similar to those in the previous QCancer (10-year risk) algorithms15, which allowed us to compare the model performance between the original and the updated versions of models. We included adult patients aged 25-84 years registered with general practices contributing to the QResearch database between 01 January 2005 and 31 March 2020 and excluded patients with a diagnosis of lung cancer before cohort entry. The broad age range covers the majority of the adult primary care population. It also provides great flexibility to select patients in different age groups to evaluate the model performance (Stage 2) or to perform subgroup analysis. The included patients needed to have been registered in the general practices for at least 12 months, and the general practices needed to have contributed to the QResearch database for a minimum of 12 months before the cohort entry date. This was to ensure complete data before cohort entry.
Primary study outcome
The primary study outcome was an incident diagnosis of lung cancer recorded on one or more of the four linked data sources – primary and secondary (hospital episode statistics, HES) care databases, cancer registry (from Public Health England, PHE) and death registry (from the Office for National Statistics, ONS). We used the earliest date on any of the four records as the date of lung cancer diagnosis. We used Read/SNOMED-CT codes to identify events from the GP record, and ICD-10 codes to identify events from HES, cancer and death registries.
Updating the QCancer (10-year risk) lung model
The predictors for the original models include age, sex, ethnicity, Townsend scores (a proxy for socioeconomic status), smoking status and intensity, alcohol status, BMI, COPD, asthma, asbestos, prior cancer, and family history of lung cancer. Two additional predictors (pneumonia11 and venous thromboembolism17) were included in the updated model. We used well-established methodologies for developing and validating the risk prediction algorithms15,18. Three-quarters of general practices were randomly selected for the derivation dataset and the remaining quarter for the validation dataset. Multiple imputation with chained equations was used to replace missing values for ethnicity, body mass index (BMI), alcohol and smoking status, with five imputations. We assumed that the absence of clinical information in the database indicated that the patients did not have specific health conditions, personal or family histories. Therefore, we did not perform multiple imputation for the clinical predictors. We used the imputed values in our main analyses and Rubin’s rules to combine the results across the imputed datasets19.
Fractional polynomials20 were used to model non-linear relationships between the continuous variables (age, BMI, Townsend scores) and the outcome. Cox proportional hazards models were used to estimate the coefficients for each risk factor for men and women separately, using robust variance estimates to allow for patients clustering within general practices. We initially fitted a full model with all variables for men and women, and retained the variables with a hazard ratio (HR) of <0.91 or >1.10 (for non-continuous variables) for clinical significance and statistical significance at the 0.01 level.
How the QCancer2 (10-year risk) lung model works
We used the coefficients for each predictor from the final Cox regression as weights and combined them with the baseline survivor function evaluated for up to 10 years to derive the risk equations15. The baseline survivor function was estimated based on the zero values of the centred continuous variables and all binary predictors. The risk equations allow us to derive absolute risk estimates for each year of follow-up, but we had a specific focus on 5-, 6-, and 10-year risk estimates, as we intended to compare the QCancer2 (10-year risk) lung model against other models for lung cancer screening in Stage 2 of this study.
Other prediction models for lung cancer screening to be compared against the QCancer2 (10-year risk) lung model
Toumazis et al conducted a systematic review of risk prediction models for lung cancer screening21. There were also empirical studies comparing some mainstream prediction models published in recent years22,23. We referred to these studies and included models with predictive horizons ≥5 years, as the purpose of screening is to early detect cancer from asymptomatic populations. The sojourn time for lung cancer progressing from preclinical stage (detectable by screening tests) to clinical stages is 3-6 years, longer in women than in men24. Detecting lung cancer at the earliest possible stage is thought to be most effective in reducing mortality, provided over-diagnosis is avoided. Therefore, prediction models for a longer period are more suitable for screening than those designed for a shorter period (e.g. 1 or 2 years) for diagnostic purposes.
We included seven models to be compared against our models, including LLPv2, LLPv3, LCRAT, PLCOM2012, PLCOM2014, Pittsburgh, and Bach. The LLPv2 and PLCOM2012 are used in the TLHC programme to select eligible participants for lung health check. LLP was updated and the newest version (LLPv3) was published in 202114. The authors recalibrated the intercepts for each age group by sex, while the predictors and their coefficients remained unchanged between LLPv2 and LLPv314. PLCOM2014 was updated based on PLCOM2012 and could be used to calculate the risk for never smokers25, while PLCOM2012 was developed for ever-smokers only. The 5-year Lung Cancer Risk Assessment Tool (LCRAT)26 was selected because an empirical study concluded that this model performed the best in classifying future lung cancer cases using data from three UK cohorts23. The 6-year Pittsburgh Predictor27 was included for its simplicity – only 4 predictors in the model (age and three smoking variables), with reasonably predictive accuracy. The PLCO, LCRAT, and Pittsburgh models used either all or part of the PLCO/NLST data in their models. Hence, some study samples overlapped in these models. The Bach model28 can predict 10-year risk, which is the same predictive horizon as our models. Most models were developed based on the US population, except for LLP using regional data from Northwest England. A total of nine prediction models were included in this study. The general information and predictors for each model are summarised in Table 1.
Summary of predictors in the QCancer2 (10-year risk) lung model and other prediction models
Handling EHR data for the variables in other prediction models
Due to different study designs (RCT, case-control study, or survey) and different study samples, variables in other prediction models are not necessarily available in EHR. O’Dowd et al. externally validated the LLPv2 and PLCOM2012 models using CPRD in the English population13. We handled most variables in a similar way as reported in their paper, except for ethnicity, the categories of smoking intensity, and the age of starting smoking. Information on patients’ ethnicity is available in the QResearch database. Therefore, we need not assume all patients were white as O’Dowd did. There are five smoking categories in the QResearch database, non-smoker, ex-smoker, light smoker (1-9 cigarettes/day), moderate smoker (10-19 cigarettes/day), and heavy smoker (≥20 cigarettes/day). We used the median number of cigarettes smoked per day for each category, i.e. 5 for light smokers, 15 for moderate smokers, and 30 for heavy smokers for the PLCO and other models, while O’Dowd et al used the lower bound of the number for moderate and heavy smoking, which may underestimate the cumulative lifetime smoking exposure. We assumed people started smoking at 15 years, as people aged ≥50 years at cohort entry (2005) were born in the 1930s-1950s. People at that time started to smoke at a much younger age and the smoking prevalence was much higher than nowadays29. The current legal age of smoking at 18 was set in 2007. We wrote Stata codes for the QCancer (10-year risk), LLPv3, and PLCOM2014 models, while the risk scores for the LLPv2, LCRAT, PLCOM2012, Pittsburgh, and Bach models were calculated using the “lcmodels” R package.
Evaluation of model performance
We calculated the absolute predicted risks of individual patients by each algorithm in the validation datasets. We calculated the following measures for discrimination for each algorithm, including the (explained variation, where higher values indicate a greater proportion of variation in time to event is explained by the model)30, the D statistic (where higher values indicate better discrimination)31, and Harrell’s C statistic32 (similar to the area under the receiver operating characteristic curve but taking account of the censored nature of cohort data) at the prediction horizon of each algorithm (5, 6 or 10 years). These statistics were combined across the imputed datasets using Rubin’s rules. To assess calibration, we used the “pmcalplot” package in Stata33 to compare the mean predicted risks (x-axis) against the observed risk (obtained using the Kaplan-Meier estimates, y-axis) in twentieths by sex for each algorithm.
The three discrimination measures were calculated and calibration plots were made for the updated QCancer2 (10-year risk) lung model for the whole validation cohort (men and women aged 25-84 years). In addition, we evaluated the discrimination, calibration, and net benefits of the prediction models in two subgroups – people aged 40-84 years (including non-smokers) as subgroup 1, and ever-smokers aged 55-74 years (the criteria for the TLHC programme)10 as subgroup 2. There were two main reasons to set these two subgroups. Firstly, the included algorithms were developed with different inclusion and exclusion criteria (such as age range and smoking exposure), these two subgroups broadly cover the inclusion and exclusion criteria for the included algorithms so that they could be compared fairly. Secondly, we aim to provide research evidence regarding which algorithm will be the best to select eligible people for the TLHC programme if rolling out across England (through research evidence from subgroup 2). Decision curve analysis34 was used to evaluate the net benefit of the prediction models (clinical usefulness) for models in three predictive horizons by sex. We calculated the net benefits across a range of threshold probabilities. In general, the model with the highest net benefit at any given risk threshold is considered to have the most clinical value35,36.
All the analyses were conducted in the QResearch server using Stata 17.0 and R 4.1.0. We used the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement to guide the conduct and reporting of this study37,38. A TRIPOD checklist for this article is in supplementary table S1.
Ethical approval
This project was approved by the QResearch Scientific Committee on 8 March 2021. QResearch is a research ethics approved database, confirmed by the East Midlands – Derby Research Ethics Committee (Research ethics reference: 18/EM/0400, project reference: OX37 DART).
Patient and public involvement
Two lay representatives from the Roy Castle Lung Cancer Foundation were involved in the project set-up phase. They reviewed our lay summary and provided feedback as part of ethical approval for this project. The lay members were not involved in the research process, but we will engage patients and stakeholders in creating accessible materials in lay language for the public to understand prediction models and risk scores of lung cancer and the benefits and harms of the TLHC or lung cancer screening when we disseminate our study findings.
Results
Population characteristics
Table 2 shows the baseline demographic and clinical characteristics of the study population and the incident lung cancer cases in the derivation and the validation cohorts for the updated QCancer2 (10-year risk) lung model. There were 12,991,042 patients aged 25-84 years in the derivation cohort and 73,380 people developed incident lung cancer during the follow-up period. There were 4,137,199 patients in the validation cohort and 22,838 developed incident lung cancer. The ratio of men and women was almost 1:1 in the derivation and validation cohorts, but men had a higher proportion (56%) than women (44%) in the incident lung cancer cases. The mean age for incident lung cancer cases (66 years) was 21 years older than that of the primary care population (45 years) in this study. The Townsend quintile was equally distributed in general in the derivation and validation cohorts, but had a smaller proportion in the two bottom Townsend quintiles (more deprived) in incident lung cancer cases. The percentages of ethnicity, BMI, alcohol, and smoking status recorded in the EHRs in the whole cohort were around 72%, 83%, 82%, and 93%, respectively. The proportions of comorbidities, family history of lung cancer, and any prior cancer at baseline in incident lung cancer cases were much higher than those in the primary care population.
Baseline demographic and clinical characteristics of the derivation and validation cohort for the updated QCancer2 (10-year risk) model
Comparison of the predictors between the original and the updated QCancer (10-year risk) lung models
The hazard ratios (HRs) for the predictors in the original and updated QCancer (10-year risk) lung models are in Table 3. The coefficients for the predictors in the updated models by sex did not change much, compared with those in the original models. Two new predictors in the updated model, pneumonia and venous thromboembolism, were both significant and had higher HRs in women than in men. Other significant predictors included ethnicity, smoking status and intensity, family history of lung cancer, COPD, asthma, prior oral, blood, and renal cancers. Compared with white, other ethnicities had a smaller HR (HR<1), except that Bangladeshi and Chinese women had an HR>1. The HR increased with the current smoking status and smoking intensity (cigarette per day) and was generally in a dose-response relationship in both sexes. The HR for moderate smokers (10-19 cigarettes/day) was smaller than that in light smokers (1-9 cigarettes/day) in men. Two smoking-related conditions, COPD and prior oral cancer, had an HR>2 in both sexes. Asbestos exposure, prior colorectal and gastric-oesophageal cancers, and drinking status, were all significant in men, but not significant in women. Such differences showed the importance of developing models separately by sex, as men and women may have different significant predictors. Even for the same predictors, the coefficients and the association between predictors and the outcome could be different between the sexes.
Hazard ratios for the predictors in the original and updated QCancer (10-year risk) lung models
Supplementary Figure S1 shows the non-linear relationship (fractional polynomials) between the continuous variables (age, BMI, Townsend score) and the outcome (incident lung cancer cases) by sex. Supplementary Figure S2 shows the non-linear relationship between the interaction terms (age and smoking status) and the outcome by sex.
Evaluating the model performance for prediction models for lung cancer screening
The prediction models being compared and evaluated are summarised in supplementary table S2. The descriptive statistics of the predicted risks for each model in the two subgroup populations in three predictive horizons by sex are in supplementary tables S3 and S4.
Discrimination
Results for the three discrimination measures (Harrell’s C, D statistics, and ) for the original and updated QCancer (10-year risk) lung models in patients aged 25-84 years are shown in Table 4. The three discrimination statistics were slightly improved in the updated model. The updated QCancer2 (10-year risk) lung model explained 65% of the variation in time to diagnosis of lung cancer. The D statistic was around 2.80 and the C statistic was around 0.9 (indicating excellent discrimination) in men and women.
Discrimination statistics of the original and updated QCancer2 (10-year risk) lung models in patients aged 25-84 years by sex
The discrimination statistics for patient subgroup 1 (smokers and non-smokers aged 40-84 years) and subgroup 2 (ever-smokers aged 55-74 years) are in Tables 5 and 6, respectively, reported by sex and predictive horizons. In each patient subgroup, the QCancer2 (10-year risk) lung cancer model had the highest values for all three discrimination statistics across three predictive horizons (5/6/10 years) when compared with other prediction models. For other prediction models, in the 5-year horizon, LLPv3 had slightly higher Harrell’s C statistics than LLPv2 and LCRAT in patient subgroup 1, but comparable D statistics and in the two patient subgroups for both sexes. In the 6-year horizon, PLCOM2014 had the highest Harrell’s C, but the Pittsburgh predictor had the highest D statistic and
in patient subgroup 1. PLCOM2014 consistently had the highest statistics in patient subgroup 2. In the 10-year horizon, the QCancer2 (10-year risk) lung model out-performed the Bach model. Through a comprehensive model evaluation (by sex, different age groups, and smoking exposure), we found that the three discrimination measures were often different between sexes and there was no consistent pattern of higher statistics in men or women. This again demonstrates the importance of model evaluation in different patient subgroups and sexes, rather than pooling the data together for a single summarised statistic.
Discrimination statistics of models for lung cancer screening in three predictive horizons in patients aged 40-84 years (patient subgroup 1)
Discrimination statistics of models for lung cancer screening in three predictive horizons with the TLHC criteria: Ever-smoker patients aged 55-74 years (patient subgroup 2)
Calibration
Calibration plots are presented in Figures 1–3. Figure 1 is the updated QCancer2 (10-year risk) lung model in three predictive horizons by sex. Figures 2 and 3 are calibration plots for the two patient subgroups. The QCancer2 (10-year risk) lung model was generally well-calibrated across different risk levels in both sexes, in three predictive horizons, in the whole validation cohort and the two subgroups. For the other prediction models, in the 5-year horizon, a common problem for the LLPv2, LLPv3, and LCRAT models in patient subgroup 2 (ever-smokers aged 55-74 years) was overestimation (the predicted risks were much greater than the observed risks) in both sexes, especially at higher risk bands. Calibration of these three models was better in patient subgroup 1, but still overestimated the risk at high-risk levels. Compared with LLPv2, LLPv3 was better calibrated in both patient subgroups. In the 6-year horizon, the PLCOM2012 severely under-estimated at low-risk thresholds and over-estimated at high-risk thresholds, particularly in patient subgroup 2. PLCOM2014 under-estimated across all risk bands in both sexes and both patient subgroups, while the Pittsburgh predictor over-estimated the risk. The Bach model was poorly calibrated in both sexes, particularly in patient subgroup 2.
Calibration plots for the QCancer2 (10-year risk lung model) in the three predictive horizons by sex
Calibration plots for smokers and non-smokers aged 40-84 years old (patient subgroup 1) in the three predictive horizons and by sex
Calibration plots for ever smokers aged 55–74 years (patient subgroup 2, the targeted lung health check criteria) in the three predictive horizons and by sex
Decision curve analysis
Figures 4–5 show the net benefit curves for the prediction models in the two subgroups by sex in three predictive horizons. The QCancer2 (10-year risk) lung model had the highest net benefit, compared with other prediction models and strategies considering either no patients or all patients for intervention across a range of risk thresholds.
Decision curve analysis for patient subgroup 1 (smokers and non-smokers aged 40-84 years old, including never smokers)
Decision curve analysis for patient subgroup 2 (ever-smokers aged 55-74 years old, the TLHC criteria)
Discussion
Principal findings
In this study, we developed and validated the QCancer2 (10-year risk) lung model with more recent data and a larger sample size to quantify the future risk of men and women being diagnosed with lung cancer in the next 10 years. The predictors include sociodemographic characteristics, lifestyle, comorbidities, family history and personal history of other cancers. The updated model performed slightly better than the original version. We compared the QCancer2 (10-year risk) lung model against other prediction models and found that the QCancer2 (10-year risk) lung model had the best performance in discrimination, calibration, and net benefits in two subgroups across three predictive horizons.
Comparison with existing literature
The QCancer (10-year risk) models are prognostic models, which can estimate an individual patient’s future risk of cancer while the patient has not yet presented any symptoms. Prognostic models are more suitable for early detection of cancer through screening in asymptomatic populations, while diagnostic models (known as QCancer) aim to predict the risks of existing cancer in patients already present with symptoms that may indicate cancer.
Compared with the original model using a total of 6.6 million patients aged 25-84 years and 13,570 incident lung cancer cases from 753 practices from 1 January 1998 to 30 September 2013 in the deviation and validation cohorts, the sample size of the whole population and the incidence lung cancer cases in the updated model increased substantially, due to the expansion of the QResearch database in recent years. We have conducted a more thorough assessment of model performance in this study. ten Haaf et al. mentioned little attention has been given to sex-specific risk-stratification in current practice39. We developed and validated our models by sex and found that the predictors differed between men and women. Even for the same predictors, their coefficients may be different. The histological subtypes and preclinical duration of lung cancer may vary between sexes24,39. It may need to consider sex-specific risk thresholds and screening intervals for lung cancer screening.
When choosing risk prediction models for population-based screening programmes, we need to consider whether the models can be applied to the targeted population (external validity and generalisability), as models may be developed using a specific study sample, with limitations in study design, data, or internal validity of the study findings. Some models may be developed using highly selective study samples or populations, whose baseline risk was higher than the general population. The LLP model had limitations in study design and study sample (case-control study using regional data)11. The PLCOM2012 model was developed using data from the US population12. When externally validating these two models using CPRD, O’Dowd found that both models underestimated the risk in patients at low predicted risk (observed risk>predicted risk) but overestimated the risk in patients at higher risk (observed risk<predicted risk, cut-offs at around 1% for both models)13. The C statistics (equivalent to AUC in O’Dowd’s report) in this study were even lower. Possible explanations included study sample selection (two patient subgroups in this study vs ever-smokers aged 50-80 years), availability of information, population and geographical coverage in different EHR databases (QResearch vs CPRD), different ways of handling the variables unavailable from the EHR database (described in the methods section), sample size and study period (this study had a larger sample size and a longer study period). Despite these differences, both studies reached the same conclusion that the LLPv2 and PLCOM2012 models did not have satisfactory discrimination and were not well-calibrated when externally validated using different English primary care databases. They may not be directly applicable to the English primary care population.
Robbins et al. evaluated the model performance of lung cancer risk models (LLPv2, LLPv3, LCRAT, LCDRAT, PLCOm2012, and Bach) in current and former smokers aged 40-80 years using three UK cohorts (UK Biobank, EPIC-UK, and Generations Study) to define the eligibility for lung cancer screening23. We did not include LCDRAT in our study, as this model predicts lung cancer death. The AUC values for the models in the two patient subgroups (patients aged 40-74 years and 55-74 years, 0.73-0.82) were much higher in the three cohorts than those in our study. For calibration, all models overestimated risk in all cohorts. The ratio of expected vs observed (E:O) was between 1.20 (LLPv3) and 2.25 (LLPv2). We found both overestimation and underestimation appeared in our study samples. Robbins et al. emphasised the importance of validating prediction tools in specific countries. We agreed with this point.
Strengths and limitations of this study
Strengths
Informed by the existing research evidence and clinical expertise, we included as many relevant predictors as possible when we updated the QCancer2 (10-year risk) lung model. Using contemporaneous primary care EHRs to develop and validate risk prediction models is likely to have great face validity, generalisability, and applicability to the whole primary care population, compared with study designs like RCT or survey, or data from other countries (like the US). EHRs in the UK have a high level of accuracy and completeness of sociodemographic information and clinical diagnoses. Clinical and diagnostic information recorded by healthcare professionals should be more accurate than collecting such information from patients using surveys or interviews. Therefore, it can reduce information and recall bias. In addition, our study also benefits from a large sample size and a long duration of follow-up. The population in the QResearch database is representative of the whole English primary care population, in terms of age, sex, ethnicity, socioeconomic deprivation, smoking exposure, and geographical coverage. Therefore, it minimises selection and respondent bias. The QCancer2 (10-year risk) lung model could be applied to the whole English primary care population. We would like to consider our QCancer2 (10-year risk) lung models as inclusive algorithms. They were designed for the adult primary care population with a wide age range (25-84 years) and various smoking statuses (non-, current, or ex-smokers). They are “live” and flexible risk prediction models, which can be updated based on the previous version with more recent data, as the population changes over time. Physicians and the public can use our models to calculate and understand an individual’s risk of developing lung cancer for each year of follow-up, for up to 10 years, which is more flexible than models predicting a fixed predictive horizon such as 5-year for LLP and 6-year for PLCO. The QCancer2 (10-year risk) lung models were developed and validated in men and women separately, with some different predictors in the men and women models. Therefore, the models allow sex-specific risk stratification. Compared with other models, this is a unique strength. The fact that the QCancer2 (10-year risk) lung models outperform other prediction models in the ever-smokers aged 55-74 years (subgroup 2) demonstrates that our model is robust and suitable for selecting eligible patients for lung cancer screening. This bonus point makes our models more attractive for clinical application and improving population health.
Finally, we pre-registered the research protocol and statistical analysis plan in the public domain. We used robust and advanced statistical methods and followed the recommendations of the TRIPOD guidelines when we developed and validated our models and reported the study findings.
Limitations
Smoking is a strong predictor of lung cancer. When primary care providers ask the smoking status and intensity, some patients may feel ashamed and under-report their smoking intensity than their actual situation. This could affect the accurate estimation of the association between smoking exposure and lung cancer. But there is little that researchers could do about this. Family history of cancer may have been recorded opportunistically rather than systematically in the EHR. Patients with a family member diagnosed with cancer may be more likely to report this to their GPs and for this to be recorded, which would cause information bias. Not all relevant information in other prediction models is available from EHRs. Therefore, we needed to make reasonable assumptions and adapt the situation for the EHR in the English population when we calculated the risk scores using other prediction models. If we could not find values from reliable sources or references, we used the median values or categories, as O’Dowd et al13 did in their study. The unavailable information and the way we handled the variables may inaccurately estimate the risk scores for individual patients in other prediction models, which could either overestimate or underestimate the risk. This may consequently influence the evaluation of model performance for individual models. Another limitation is that we did not use external data to validate QCancer (10-year risk) lung models. It is possible to use another data source (e.g. CPRD) and the internal-external cross-validation approach40 to externally validate our model in future studies, subject to funding availability.
Implications for practice and health policy
As a universal service with a very high population coverage in the UK, primary care has great potential and opportunity for early detection and diagnosis of cancer, and GPs could be strong advocates for the TLHC programme and lung cancer screening. We have demonstrated the model performance in the QCancer2 (10-year risk) lung model. More accurate risk estimation has an impact on risk stratification at the population level and risk communication with the patients. The implementation of the QCancer2 (10-year risk) lung model in primary care computer systems could facilitate patient-GP discussion about the risks and benefits of preventive interventions for individual patients. This could improve patients’ awareness of their health status and risk level, which in turn might increase their willingness to participate in the TLHC programme or lead to behavioural changes such as considering smoking cessation.
The updated QCancer2 (10-year risk) lung model allows batch-mode processes that use existing information in EHRs at each practice to facilitate the selection of eligible patients at high risk for the TLHC programme or lung cancer screening. Compared with using questionnaires to collect information from individual patients and calculating risk scores to check their eligibility, the batch-mode process is more efficient. It can stratify the primary care population based on risk level for better target in prevention or screening programmes. It can not only substantially reduce human resources and costs, but also save time and streamline the administrative process for better patient experience and increased patient satisfaction.
We comprehensively compared nine prediction models using the same validation dataset in this study. We evaluated model performance in two patient subgroups, which broadly covered the inclusion and exclusion criteria of the study sample in each model and the eligibility criteria for the TLHC programme. The LLPv2 and PLCOM2012 may not be the best models to select eligible primary care patients for the TLHC programme, due to their unsatisfactory model performance in patient subgroup 2. Through such a thorough assessment in this study, we provide research evidence to assist policymakers in considering the age range and smoking exposure for lung cancer screening and increase equality in health services access. In the end, more patients can be diagnosed at earlier stages and through screening-detected route, which can lead to better survival outcomes and reduced lung cancer mortality in the UK. Considering the sizable incidence and mortality of lung cancer at the population level, the improvement in lung cancer could substantially contribute to the UK government’s ambition that 75% of people with cancer will be diagnosed at early stages by 202841. Meanwhile, patients may benefit from primary or secondary prevention of other cardiorespiratory diseases through participating in the TLHC programme. We do not recommend a risk threshold for our models for lung cancer screening, as this needs to consider the balance between benefits and harms, cost-efficiency, health resources, accessibility and health equality, and the potential impact of the screening program at the population level. Our colleagues in another strand of our work package will conduct a separate study to evaluate the cost-effectiveness and determine the risk thresholds for lung cancer screening in England.
Conclusion
Developed, updated, and validated using primary care EHRs, the QCancer2 (10-year risk) lung model is robust and flexible. It can estimate an individual adult patient’s risk of lung cancer diagnosis during each year of follow-up, for up to 10 years. It has the best performance among other prediction models for lung cancer screening in discrimination and calibration for both sexes across three predictive horizons (5/6/10 years). It also has the highest net benefit. The QCancer2 (10-year risk) lung model is more suitable for selecting patients at high risk from the English primary care population for lung cancer screening using LDCT than the currently used LLPv2 and PLCOM2012 models in the TLHC programme.
Declarations
WL and JH-C have full access to all data in this study and take the responsibility for the integrity of the data and the accuracy of data analysis. To guarantee the confidentiality of personal and health information of patients, only the named authors have had full access to the data during the study, in accordance with the relevant licence agreements. Information on access to the QResearch data is available on the QResearch website (www.qresearch.org).
JH-C is the guarantor for the study and affirms that the manuscript resulting from this work is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study are omitted; and that any discrepancies from the study protocol and statistical analysis plan16 have been explained.
Authors’ contributions and consent for publication
FG and JH-C secured the funding. FG is the chief investigator of the DART project, and JH-C is the joint package lead (WP6 – Primary care, population health, and health economics). JH-C and WL contributed to the study conceptualisation. WL specified the data and led on the ethical approval. WL designed the statistical analysis plan and drafted the whole research protocol and statistical analysis plan, with methodological input from JH-C and CC, and clinical and contextual input from JH-C, DB, FG, and JB. WL and JH-C wrote the codes and performed the analyses. CC had substantial contributions to the statistical methodology. WL drafted the whole paper. All authors contributed to the interpretation of the results and revision of the manuscript, and approved the final version of the manuscript for publication.
Funding
The DART project is funded by Innovate UK (UK Research and Innovation, grant reference: 40255). QResearch received funding from the NIHR Biomedical Research Centre, Oxford, grants from John Fell Oxford University Press Research Fund, grants from Cancer Research UK (Grant number C5255/A18085), through the Cancer Research UK Oxford Centre, grants from the Oxford Wellcome Institutional Strategic Support Fund (204826/Z/16/Z), during the conduct of the study.
Study and authors’ information
Project website: www.dartlunghealth.co.uk
@dartlunghealth (the DART project),
@WLiao_Ox (Weiqi Liao),
@JudithBurchardt (Judith Burchardt),
@JuliaHCox (Julia Hippisley-Cox)
Declaration of interests
All authors have completed the ICMJE uniform disclosure form.
JH-C is an unpaid director of QResearch, a not-for-profit organisation in a partnership between the University of Oxford and EMIS Health, who supply the QResearch database for this work. JHC is a founder and shareholder of ClinRisk Ltd and was its medical director until 31 May 2019. ClinRisk Ltd produces open and closed source software to implement clinical risk algorithms into clinical computer systems including the original QCancer algorithms referred to above. Other authors have no interests to declare for this submitted work.
Acknowledgements
We thank the two lay members from the Roy Castle Lung Cancer Foundation who reviewed our lay summary of the DART-QResearch Project for ethical approval and provided very helpful feedback. We acknowledge the contribution of the patients and the general practices who contribute to QResearch® and EMIS (Egton Medical Information Systems) Health and the Chancellor Masters & Scholars of the University of Oxford for expertise in establishing, developing and supporting the QResearch database. The Hospital Episode Statistics data used in this analysis are Copyright © (2022) to the Health and Social Care Information Centre and re-used with the permission of the Health and Social Care Information Centre and the University of Oxford. All rights reserved. This project involves data derived from patient-level information collected by the NHS, as part of the care and support of cancer patients. The data is collated, maintained and quality assured by the National Cancer Registration and Analysis Service, which is part of Public Health England (PHE). Access to the data was facilitated by the PHE Office for Data Release. The death registration data are provided by the Office for National Statistics. NHS Digital, Public Health England, and the Office of National Statistics bear no responsibility for the analysis or interpretation of the data.
The views expressed in this manuscript are those of the authors. None of the acknowledged organisations or funding bodies has been involved in any research process, including study design, data specification, statistical analysis, interpretation of results, preparing manuscripts, or decision to publish.