Abstract
There is growing demand for diagnostic services in the UK. This rapid review aimed to assess the effectiveness of artificial intelligence (AI) in diagnostic radiology with a focus on cancer diagnosis. A range of AI models including machine learning, deep learning and ensemble models, were assessed in this review.
The review included an initial broad mapping exercise and a more in-depth synthesis of a specific sub-set of the evidence. The review included evidence available from 2018 until June 2023.
A total of 92 comparative primary studies were included in the evidence map. The evidence map identified 52 studies in which the AI models were in the early stages of development and validation, and highlighted breast, lung and prostate cancers as the type of cancers most frequently reported on. 28 studies evaluating an established model and focusing on the diagnosis of breast, lung, and prostate cancer were included in the in-depth synthesis. All studies included in the in-depth synthesis were classified as diagnostic accuracy studies. Only one study evaluated an AI model that was commercially available in the UK.
Most studies reported results in favour of the AI models, however, these improvements were not always statistically significant. The studies also varied considerably in terms of AI models studied, type of cancer, images used, and comparison made; and were limited in terms of their methodology. When used as a standalone diagnostic tool, there is evidence to suggest that AI can improve diagnostic accuracy or is comparable to experienced radiologists, however this may be dependent on the AI model being used. There is evidence to suggest that AI may be beneficial when used as a support tool for clinicians/radiologists with less experience. The impact of AI on the timeline involved in diagnosis appeared inconsistent. AI may speed up the diagnostic timeline when the level of cancer suspicion is low but may increase diagnostic timelines when the level of cancer suspicion is high. The evidence suggests that clinicians are accepting of AI-based assistance for cancer diagnosis.
Policy and practice implications The overall evidence for effectiveness appeared in favour of AI and several factors were identified that impact the effectiveness of the AI models. AI may improve diagnostic accuracy in clinicians/radiologists with less experience of interpreting radiological images. However, further well-designed high-quality research is needed from the UK and similar countries to better understand the effectiveness of AI in cancer diagnosis.
Economic considerations There is little evidence on the cost-effectiveness of using AI for cancer diagnosis. In theory, it might be possible for AI to assist with earlier diagnosis of cancer with both health and economic benefits.
Funding statement The Public Health Wales Observatory was funded for this work by the Health and Care Research Wales Evidence Centre, itself funded by Health and Care Research Wales on behalf of Welsh Government.
What is a Rapid Review?Our rapid reviews (RR) use a variation of the systematic review approach, abbreviating or omitting some components to generate the evidence to inform stakeholders promptly whilst maintaining attention to bias.
Who is this Rapid Review for?The review question was suggested by the Health Sciences Directorate (Policy).
Background / Aim of Rapid Review There is growing demand for diagnostic services in the UK. The use of artificial intelligence in diagnosis is part of the Welsh Government’s programme for transforming and modernising planned care and reducing waiting lists in Wales. This rapid review aimed to assess the effectiveness of artificial intelligence (AI) in diagnostic radiology with a focus on cancer diagnosis. A range of AI models including machine learning, deep learning and ensemble models, were assessed in this review. The term ‘AI models’ was therefore used to encompass these different types of AI models described in the literature. The review included an initial broad mapping exercise and a more in-depth synthesis of a specific sub-set of the evidence. The focus of the in-depth synthesis was informed by the review’s stakeholders based on the findings of the mapping exercise.
Recency of the evidence base
The review included evidence available from 2018 until June 2023.
Extent of the evidence base
A total of 92 comparative primary studies were included in the evidence map.
The evidence map identified 52 studies in which the AI models were in the early stages of development and validation, and highlighted breast, lung and prostate cancers as the type of cancers most frequently reported on.
28 studies evaluating an established model and focusing on the diagnosis of breast (n=14), lung (n=7) and prostate (n=7) cancer were included in the in-depth synthesis.
Studies included in the in-depth synthesis were conducted in the USA (n=8), Japan (n=5), UK (n=2), Italy (n=2), Turkey (n=2), Germany (n=2), Netherlands (n=2), Portugal (n=1), Greece (n=1) and Norway (n=1). Two studies were conducted across multiple countries.
All studies included in the in-depth synthesis were classified as diagnostic accuracy studies.
Only one study evaluated an AI model that was commercially available in the UK.
A total of 14 studies compared AI models to human readers or to other diagnostic methods used in practice, 13 studies compared the impact of AI on human interpretation of radiologic images when diagnosing cancer, four studies compared multiple AI models, and one study compared an inexperienced AI-assisted reader with an experienced reader without AI.
Five studies reported on the impact of AI on diagnostic timelines (time to diagnosis, assessment time, evaluation times, and reading time).
Four studies also reported on the impact of AI on inter/intra-reader variability, reliability, and agreement.
One study reported on clinicians’ acceptance and receptiveness of the use of AI for cancer diagnosis.
Key findings and certainty of the evidence
Most studies reported results in favour of the AI models, however, these improvements were not always statistically significant. The studies also varied considerably in terms of AI models studied, type of cancer, images used, and comparison made; and were limited in terms of their methodology (unclear level of certainty).
When used as a standalone diagnostic tool, there is evidence to suggest that AI can improve diagnostic accuracy or is comparable to experienced radiologists, however this may be dependent on the AI model being used (unclear level of certainty).
There is evidence to suggest that AI may be beneficial when used as a support tool for clinicians/radiologists with less experience (unclear level of certainty).
The impact of AI on the timeline involved in diagnosis appeared inconsistent. AI may speed up the diagnostic timeline when the level of cancer suspicion is low but may increase diagnostic timelines when the level of cancer suspicion is high (low level of certainty).
The evidence suggests that clinicians are accepting of AI-based assistance for cancer diagnosis (low level of certainty).
Research Implications and Evidence Gaps
No study reported on any patient outcomes, including patient harms.
No study reported on any economic outcomes.
No study reported on equity outcomes, including equity of access.
Further research in a real-world setting is needed to better understand the cost implications and impact on patient safety of AI for cancer diagnosis.
Policy and Practice Implications
The overall evidence for effectiveness appeared in favour of AI and several factors were identified that impact the effectiveness of the AI models.
AI may improve diagnostic accuracy in clinicians/radiologists with less experience of interpreting radiological images.
AI models are continually being developed and updated and findings are likely to vary between different AI models.
Further well-designed high-quality research is needed from the UK and similar countries to better understand the effectiveness of AI in cancer diagnosis.
Economic considerations
In theory it might be possible for AI to assist with earlier diagnosis of cancer with both health and economic benefits.
There is little evidence on the cost-effectiveness of using AI for cancer diagnosis. One modelling paper from the United States (US) suggests using AI in lung cancer screening using low-dose computerised tomography (CT) scans can be cost-effective, up to a cost of $1,240 per patient screened.
The UK (and its constituent countries) perform consistently poorly against European and international comparators in terms of cancer survival rates. Cancer screening was suspended and routine diagnostic work deferred in the UK during the COVID-19 pandemic.
The cost of cancer to the UK economy in 2019 was estimated to be least £1.4 billion a year in lost wages and benefits alone. When widening the perspective to include mortality, this figure rises to £7.6 billion a year. Pro-rating both figures to the Welsh economy and adjusting for inflation gives figures of £79 million and £429 million per annum respectively
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The Public Health Wales Observatory was funded for this work by the Health and Care Research Wales Evidence Centre, itself funded by Health and Care Research Wales on behalf of Welsh Government.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Abbreviations
- Acronym
- Full Description
- AI
- Artificial intelligence
- AIS
- Artificially Intelligent Systems
- AUC
- Area Under The Curve
- BI-RADS
- Breast Imaging Reporting and Data System
- bpMRI
- Biparametric Magnetic Resonance Imaging
- BPN
- Back Propagation Neural Networks
- CAD
- Computer Aided Diagnosis
- CANARY
- Computer-Aided Nodule Assessment and Risk Yield
- CBCT
- Cone-beam Computed Tomography
- CE
- The Conformité Européene
- CI
- Confidence Interval
- CL-bpMRI
- Conventional Biparametric Magnetic Resonance Imaging
- CNN
- Convolutional Neural Network
- CQC
- Care Quality Commission
- CsPCa
- Clinically significant prostate cancer
- CT
- Computed Tomography
- DBT
- Digital Breast Tomosynthesis
- DCE
- MRI Dynamic Contrast Material–Enhanced Magnetic Resonance Imaging
- DCNN
- Deep Convolutional Neural Network
- DL
- Deep learning
- DLCAD
- Deep Learning Computer Aided Diagnosis Software
- DLCNN
- Deep Learning Convolutional Neural Network
- DL-bpMRI
- Deep Learning-Accelerated Biparametric Magnetic Resonance Imaging
- DNN
- Deep Neural Network
- DRE
- Digital Rectal Examination
- FDA
- The United States Food and Drug Administration
- GAN
- Generative Adversarial Networks
- kNN
- k-Nearest Neighbour
- LCP-CNN
- Lung Cancer Prediction Convolutional Neural Network
- ML
- Machine Learning
- MRI
- Magnetic Resonance Imaging
- mpMRI
- Multiparametric Magnetic Resonance Imaging
- mRMR
- Minimum Redundancy Maximum Relevance
- NHS
- National Health Service
- NICE
- The National Institute for Health and Care Excellence
- NPV
- Negative Predictive Value
- PCa
- Prostate cancer
- PI-QUAL
- Prostate Imaging Quality
- PI-RADS
- Prostate Imaging Reporting & Data System
- PPV
- Positive Predictive Value
- PSA
- Prostate-Specific Antigen
- ROC
- Receiver Operating Characteristics
- ROC
- AUC Area Under The Receiver Operating Characteristic curve
- ROI
- Region of Interest
- RR
- Rapid Review
- SD
- Standard Deviation
- SVM
- Support Vector Machine
- UK
- United Kingdom
- USA
- United States of America