Abstract
Importance There has been growing interest in the use of artificial intelligence (deep learning) to help achieve early diagnosis of prevalent diseases. None moreso than in lung cancer, where a combination of factors, including the high prevalence of nodules, the low prevalence of malignant nodules, and the indeterminacy of many nodules mean that it is fertile ground for the deployment of accurate, high-throughput deep learning (DL)-based tools.
Objective To survey the landscape of externally validated DL-based computer-aided diagnostic (CADx) models, and assess their diagnostic performance for predicting the risk of malignancy in computed tomography (CT)-detected pulmonary nodules.
Data sources An electronic search was performed in the MEDLINE (PubMed), EMBASE, Science Citation Index, Cochrane Library databases (from inception to 10 April 2023).
Study selection Studies were deemed eligible if they were peer-reviewed experimental or observational articles that analysed the diagnostic performance of externally validated DL-based CADx models for the prediction of malignancy risk, with a direct comparison to models widely used in clinical practice.
Data extraction and synthesis PRISMA guidelines were followed for the identification, screening, and selection process. A bivariate random-effect approach for the meta-analysis on the included studies was used. Quality Assessment of Diagnosis Accuracy Studies 2 (QUADAS-2) was used to assess risk of bias and applicability.
Main outcomes and measures Main outcomes included sensitivity, specificity, and area under the curve (AUC).
Results After screening, 20 studies were included, comprising 7,664 participants and 10,128 nodules, of which 2,126 were malignant. DL-based CADx models were 15.8% more sensitive than physician judgement alone, and 35.4% more than clinical risk models alone. They had a similar pooled specificity as physician judgement alone (0.77 [95% CI: 0.69 –0.84] v 0.80 [95% CI: 0.71 –0.86], respectively), but were 5.5% more specific than clinical risk models alone. Accounting for threshold effects, DL-based CADx models had superior summary areas under the receiver operating characteristic curve (sAUROC), with relative sAUROCs of 1.06 (95% CI: 1.03–1.08) and 1.22 (95% CI: 1.19–1.24), as compared to physician judgement and clinical risk models alone, respectively.
Conclusions and relevance DL-based models show superior or comparable diagnostic performance when externally validated against widely used methods, such as the Brock and Mayo models. They have the potential to fulfil an unmet clinical-management need alongside experienced physician image readers. The included studies reported a high degree of heterogeneity, with threshold effects particularly prominent. Future research may consider more prospective studies and human-experimental studies.
Question How effective are image-based, computer-aided diagnostic models that use deep learning methods to predict the malignancy risk of pulmonary nodules as compared with other methods used in clinical practice?
Findings This systematic review and meta-analysis identified 20 observational studies (7,664 participants; 10,128 pulmonary nodules) from which pooled analyses found deep learning-based models to have a sensitivity of 0.88, specificity of 0.77, and summary area under the curve of 0.90 in predicting malignancy in pulmonary nodules. This was superior or comparable to other methods routinely used in clinical practice.
Meaning Deep learning-based models are already being used in clinical practice in certain settings for nodule management. The results show their diagnostic performance justifies wider and more routine deployment.
Competing Interest Statement
JW is an employee of Optellum Ltd; Optellum holds some patents in the area.
Funding Statement
This study was funded by Optellum Ltd, Oxford, United Kingdom
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study used only openly available human data that are located on the MEDLINE (PubMed), EMBASE, Science Citation Index, and Cochrane Library databases
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Results updated
Data Availability
All data produced in the present study are available upon reasonable request to the authors