Abstract
Objectives To develop and validate the QCancer2 (10-year risk) lung model for estimation of future risk of lung cancer and to compare the model performance against other prediction models for lung cancer screening
Design open cohort study using linked electronic health records (EHRs) from the QResearch database (1 January 2005 – 31 March 2020)
Setting English primary care
Participants 12.99 million patients aged 25-84 years were in the derivation cohort to develop the models and 4.14 million patients were in the validation cohort. All patients were free of lung cancer at baseline.
Main outcome measure Incident lung cancer cases
Methods There were two stages in this study. First, Cox proportional hazards models were used in the derivation cohort to update the QCancer (10-year risk) lung model in men and women for a 10-year predictive horizon, including two new predictors (pneumonia and venous thromboembolism) and more recent data. Discrimination measures (Harrell’s C, D statistic, and ) and calibration plots were used to evaluate model performance in the validation cohort by sex. Secondly, seven prediction models for lung cancer screening (LLPv2, LLPv3, LCRAT, PLCOM2012, PLCOM2014, Pittsburgh, and Bach) were selected to compare the model performance with the QCancer2 (10-year risk) lung model in two subgroups: (1) smokers and non-smokers aged 40-84 years and (2) ever-smokers aged 55-74 years.
Results 73,380 incident lung cancer cases were identified in the derivation cohort and 22,838 in the validation cohort during follow-up. The updated models explained 65% of the variation in time to diagnosis of lung cancer in both sexes. Harrell’s C statistics were close to 0.9 (indicating excellent discrimination), and the D statistics were around 2.8. Compared with the original models, the discrimination measures in the updated models improved slightly in both sexes. Compared with other prediction models, the QCancer2 (10-year risk) lung model had the best model performance in discrimination, calibration, and net benefit across three predictive horizons (5, 6, and 10 years) in the two subgroups.
Conclusion Developed and validated using large-scale EHRs, the QCancer2 (10-year risk) lung model can estimate the risk of an individual patient aged 25-84 years for up to 10 years. It has the best model performance among other prediction models. It has potential utility for risk stratification of the English primary care population and selection of eligible people at high risk for the targeted lung health check programme or lung cancer screening.
What is already known on this topic
Using risk prediction models to stratify people at the population level and selecting those at the highest risks is an efficient and cost-effective strategy for screening programmes. It avoids waste of resources in screening patients at low risk.
An ideal prediction model should have excellent discrimination and calibration in the target population.
The Liverpool Lung Project (LLPv2) and the Prostate Lung Colorectal and Ovarian (PLCOM2012) models had only moderate discrimination and were not well-calibrated when externally validated using the Clinical Practice Research Datalink (CPRD) data for the English primary care population.
What this study adds
Developed and validated using robust statistical methodologies, the QCancer2 (10-year risk) lung model shows excellent discrimination and calibration in both sexes. It can estimate an individual adult patient’s risk for each year of follow-up, for up to 10 years.
The QCancer2 (10-year risk) lung model has the best model performance in discrimination and calibration when compared with the other eight models (QCancer (10-year risk), LLPv2, LLPv3, LCRAT, PLCOM2012, PLCOM2014, Pittsburgh, and Bach) in three predictive horizons (5/6/10 years) and two sub-populations (smokers and non-smokers aged 40-84 years and ever-smokers aged 55-74 years).
The QCancer2 (10-year risk) lung model can be applied to the English primary care population to select eligible patients for the Targeted Lung Health Check programme or lung cancer screening using low dose CT.
Competing Interest Statement
JH-C is an unpaid director of QResearch, a not-for-profit organisation in a partnership between the University of Oxford and EMIS Health, who supply the QResearch database for this work. JHC is a founder and shareholder of ClinRisk Ltd and was its medical director until 31 May 2019. ClinRisk Ltd produces open and closed source software to implement clinical risk algorithms into clinical computer systems including the original QCancer algorithms referred to above. Other authors have no interests to declare for this submitted work.
Clinical Protocols
https://www.medrxiv.org/content/10.1101/2022.01.07.22268789v1
Funding Statement
The DART project is funded by Innovate UK (UK Research and Innovation, grant reference: 40255). QResearch received funding from the NIHR Biomedical Research Centre, Oxford, grants from John Fell Oxford University Press Research Fund, grants from Cancer Research UK (Grant number C5255/A18085), through the Cancer Research UK Oxford Centre, grants from the Oxford Wellcome Institutional Strategic Support Fund (204826/Z/16/Z), during the conduct of the study.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This project was approved by the QResearch Scientific Committee on 8 March 2021. QResearch is a research ethics approved database, confirmed by the East Midlands - Derby Research Ethics Committee (Research ethics reference: 18/EM/0400, project reference: OX37 DART).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.