RT Journal Article SR Electronic T1 Assessing eligibility for lung cancer screening: Parsimonious multi-country ensemble machine learning models for lung cancer prediction JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2023.01.27.23284974 DO 10.1101/2023.01.27.23284974 A1 Callender, Thomas A1 Imrie, Fergus A1 Cebere, Bogdan A1 Pashayan, Nora A1 Navani, Neal A1 van der Schaar, Mihaela A1 Janes, Sam M YR 2023 UL http://medrxiv.org/content/early/2023/01/29/2023.01.27.23284974.abstract AB Background Ensemble machine learning could support the development of highly parsimonious prediction models that maintain the performance of more complex models whilst maximising simplicity and generalisability, supporting the widespread adoption of personalised screening. In this work, we aimed to develop and validate ensemble machine learning models to determine eligibility for risk-based lung cancer screening.Methods For model development, we used data from 216,714 ever-smokers in the UK Biobank prospective cohort and 26,616 high-risk ever-smokers in the control arm of the US National Lung Screening randomised controlled trial. We externally validated our models amongst the 49,593 participants in the chest radiography arm and amongst all 80,659 ever-smoking participants in the US Prostate, Lung, Colorectal and Ovarian Screening Trial (PLCO). Models were developed to predict the risk of two outcomes within five years from baseline: diagnosis of lung cancer, and death from lung cancer. We assessed model discrimination (area under the receiver operating curve, AUC), calibration (calibration curves and expected/observed ratio), overall performance (Brier scores), and net benefit with decision curve analysis.Results Models predicting lung cancer death (UCL-D) and incidence (UCL-I) using three variables – age, smoking duration, and pack-years – achieved or exceeded parity in discrimination, overall performance, and net benefit with comparators currently in use, despite requiring only one-quarter of the predictors. In external validation in the PLCO trial, UCL-D had an AUC of 0.803 (95% CI: 0.783-0.824) and was well calibrated with an expected/observed (E/O) ratio of 1.05 (95% CI: 0.95-1.19). UCL-I had an AUC of 0.787 (95% CI: 0.771-0.802), an E/O ratio of 1.0 (0.92-1.07). The sensitivity of UCL-D was 85.5% and UCL-I was 83.9%, at 5-year risk thresholds of 0.68% and 1.17%, respectively 7.9% and 6.2% higher than the USPSTF-2021 criteria at the same specificity.Conclusions We present parsimonious ensemble machine learning models to predict the risk of lung cancer in ever-smokers, demonstrating a novel approach that could simplify the implementation of risk-based lung cancer screening in multiple settings.Competing Interest StatementNN reports honoraria for non-promotional educational talks, conference support or advisory boards from Amgen, Astra Zeneca, Boehringer Ingelheim, Bristol Myers Squibb, Guardant Health, Janssen, Lilly, Merck Sharp & Dohme, Olympus, OncLive, PeerVoice, Pfizer, and Takeda. SMJ is supported by CRUK programme grant (EDDCPGM\100002), and MRC Programme grant (MR/W025051/1). SMJ receives support from the CRUK Lung Cancer Centre and the CRUK City of London Centre, the Rosetrees Trust, the Roy Castle Lung Cancer foundation, the Longfonds BREATH Consortia, MRC UKRMP2 Consortia, the Garfield Weston Trust and UCLH Charitable Foundation. SMJ has received fees for advisory board membership in the last three years from Astra-Zeneca, Bard1 Lifescience, and Johnson and Johnson. He has received grant income from Owlstone and GRAIL Inc. He has received assistance with travel to an academic meeting from Cheisi. TC and SMJ are founders of, and own stock in, Mortimer Health.Funding StatementThis work was supported by the Wellcome Trust through a Wellcome Clinical PhD Training Fellowship granted to TC. FI and MvdS are supported by the National Science Foundation, grant number 1722516. NN is supported by a Medical Research Council Clinical Academic Research Partnership (MR/T02481X/1). This work was partly undertaken at the University College London Hospitals/University College London that received a proportion of funding from the Department of Health's National Institute for Health Research (NIHR) Biomedical Research Centre's funding scheme. The funders had no role in the design or conduct of this study.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The ethics committee of University College London gave ethical approval for this work (reference: 19131/001).I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesUK Biobank, NLST, and PLCO data were used on license (references 68073, NLST-806 and PLCO-801, respectively). These data cannot be shared directly, however, researchers can apply for these data from the UK Biobank and the US National Institutes of Health.