ABSTRACT
Introduction Infectious keratitis (IK) represents the 5th leading cause of blindness worldwide. A delay in diagnosis is often a major factor in progression to irreversible visual impairment and/or blindness from IK. The diagnostic challenge is further compounded by low microbiological culture yield, long turnaround time, poorly differentiated clinical features, and polymicrobial infections. In recent years, deep learning (DL), a subfield of artificial intelligence, has rapidly emerged as a promising tool in assisting automated medical diagnosis, clinical triage and decision making, and improving workflow efficiency in healthcare services. Recent studies have demonstrated the potential of using DL in assisting the diagnosis of IK, though the accuracy remains to be elucidated. This systematic review and meta-analysis aims to critically examine and compare the performance of various DL models with clinical experts and/or microbiological results (the current “gold standard”) in diagnosing IK, with an aim to inform practice on the clinical applicability and deployment of DL-assisted diagnostic models.
Methods and analysis This review will consider studies that included application of any DL models to diagnose patients with suspected IK, encompassing bacterial, fungal, protozoal and/or viral origins. We will search various electronic databases, including EMBASE and MEDLINE. There will be no restriction to the language and publication date. Two independent reviewers will assess the titles, abstracts and full-text articles. Extracted data will include details of each primary studies, including title, year of publication, authors, types of DL models used, populations, sample size, decision threshold, and diagnostic performance. We will perform meta-analyses for the included primary studies when there are sufficient similarities in outcome reporting.
Ethics and dissemination No ethical approval is required for this systematic review. We plan to disseminate our findings via presentation/publication in a peer-reviewed journal.
Protocol registration This systematic review protocol will be registered with the PROSPERO after peer review.
STRENGTH AND LIMITATIONS OF THIS STUDY
- This study will serve as the most up-to-date systematic review and meta-analysis specifically evaluating the diagnostic performance of deep learning in infectious keratitis.
- The quality of the study will depend on the quality of the available published literature related to this topic.
- This study will help identify the gaps in the current clinical evidence, which may be related to study design, quality of the research methodologies, setting of reference standard, risk of bias, and outcome reporting.
INTRODUCTION
Due to worldwide population ageing and urbanisation, it is expected that close to 900 million people will suffer from distance vision impairment, of whom 61 million people will be blind by 2050.1 Infectious keratitis (IK), also commonly known as corneal infection, currently represents the 5th leading cause of blindness worldwide.2, 3 It can be caused by a wide variety of pathogens such as bacteria, fungi, protozoa, and viruses.3, 4 Once considered a “silent epidemic” in low- and middle-income countries (LMICs), IK has so far caused ∼5 million cases of blindness around the world and is estimated to cause ∼2 million monocular blindness each year, placing significant burden on global human health.3, 5 A recent meta-analysis conducted by Brown et al.6 estimated that the global incidence of fungal keratitis alone (without accounting for other types of IK) is likely >1 million cases per year, primarily affecting the populations in Africa and Asia. Previous studies have also consistently reported a disproportionately higher incidence of IK in the LMICs (113-799 per 100,000 populations-year) than in high-income countries (HICs; 2.5-40.3 per 100,000 populations-year),3, 7, 8 which was likely attributable to increased risk of trauma from agricultural and other occupational activities, environmental factors, the use of traditional eye medicine (which may contain pathogens) and the limited access to primary and secondary eye care.3, 9-11
Patients affected by IK are often debilitated by severe ocular pain and sight loss, and some are at risk of losing the eye due to intractable infection.6, 11-14 The outcome of IK is critically dependent on a timely and accurate diagnosis, followed by appropriate medical and/or surgical interventions. In current clinical practice, IK is usually diagnosed on clinical grounds with support from additional tests, including microbiological investigations [e.g. smear microscopy, culture and sensitivity testing, and polymerase chain reaction (PCR)] and/or corneal imaging [e.g. in vivo confocal microscopy (IVCM)].15-17 However, these approaches have multiple challenges, including the need for clinical expertise and equipment, low microbiological culture yield, long turnaround time, poorly differentiated clinical features, and polymicrobial infections.7, 15, 18, 19 Moreover, access to such microbiological and imaging investigations is not available in many ophthalmic units in LMICs, leading to a reliance of empirical treatment. This can lead to a misdiagnosis when based on clinical features alone and the use of incorrect antimicrobial therapy (e.g., fungal keratitis being treated only with anti-bacterial agents). This can result in delays in the initiation of effective treatment, with consequent poorer clinical outcomes and higher risk of ocular complications.
In recent years, the interest of integrating AI into clinical medicine with the hope of improving the quality of healthcare services has been reignited,20 primarily owing to the advancement in deep learning (DL) techniques, improvement in computing power and increased availability of big data.21-24 DL, a subfield of AI, has demonstrated promise in assisting automated medical diagnosis, clinical triage and decision making, as well as improving the workflow efficiency in healthcare services in both developed and developing countries.22-28 Within the realm of ophthalmology, DL research previously focussed mainly on various posterior segment diseases (e.g., age-related macular degeneration, diabetic retinopathy, and glaucoma) and demonstrated comparable, if not better, diagnostic accuracy compared to healthcare professionals.22, 23, 29-31 Although several recent studies have demonstrated the potential of DL in assisting the diagnosis of IK and distinguishing IK from other ocular diseases,32-35 the diagnostic accuracy of these DL models remains to be elucidated.
To the best of our knowledge, there is no published systematic review and/or meta-analysis specifically evaluating the diagnostic performance of DL in IK. In view of the current diagnostic challenges of IK and the potential of DL in addressing the highlighted limitations, this systematic review and meta-analysis aims to critically examine and compare the performance of various DL models with clinical experts (the current “gold standard”) in diagnosing IK, which can help inform the clinical practice on the potential clinical applicability and deployment of these DL model.
REVIEW QUESTIONS / OBJECTIVES
The proposed systematic review aims to answer the following main questions:
What is the diagnostic accuracy of DL models in detecting and differentiating IK from healthy eyes?
What is the diagnostic performance of DL models in differentiating IK from other types of corneal or ocular diseases?
What is the diagnostic accuracy of DL models in differentiating the types of IK (e.g., bacterial keratitis vs. fungal keratitis)?
METHODS
This protocol was produced based on the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P).36 This systematic review will be conducted in accordance with the recommendations of the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. We will write the resulting paper following the Preferred Reporting Items for Systematic Reviews and Meta-analyses of Diagnostic Test Accuracy (PRISMA-DTA)37 and the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS).38
Eligibility criteria
This diagnostic accuracy systematic review will consider all relevant clinical studies, including prospective and retrospective comparative cohort studies, case-control studies, and cross-sectional studies, that examined the accuracy of DL in diagnosing any types of IK, encompassing bacterial, fungal, acanthamoeba, and/or viral keratitis. We will exclude case reports and reviews. We will only include studies that employed corneal imaging tests, which may include slit-lamp photography, in vivo confocal microscopy (IVCM), anterior segment optical coherence tomography (AS-OCT), and/or corneal topography/tomography. We will exclude AI studies that contained only data without any imaging or those that focused on image segmentation instead of disease classification. There will be no restriction on patients’ age, gender, ethnicity, and geographical location. There will be no restriction on the number and proportion of images used for each stage of the DL models, including training, validation and testing stages.
Information sources and search strategy
We will search various bibliographic databases, including EMBASE (OVID), MEDLINE (OVID), the Cochrane Central Register of Controlled Trials (CENTRAL), and trial registries including ISRCTN registry (www.isrctn.com/), US National Institutes of Health Ongoing Trials Register (https://www.clinicaltrials.gov/), and World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) for primary research related to DL for diagnosing IK. The search strategy aims to locate both published and unpublished studies. The search will be developed with two concepts built into the search strategy to capture relevant articles: (i) artificial intelligence and (ii) infectious keratitis. There will be no restriction on the study design, date or language for the search. The search strategy, including all identified keywords and index terms, will be adapted to each included information source. The reference list of all eligible studies will be manually screened for additional studies. An example of the search strategy is provided in Table 1.
A summary of the search strategy using EMBASE for studies related to artificial intelligence in diagnosing infectious keratitis.
Study selection
Following the search, all identified citations will be loaded into EndNote20 (Clarivate Analytics, PA, USA) and duplicates will be removed. The titles and abstract will be screened by two independent reviewers for assessment against the inclusion criteria of the review, using the Rayyan AI platform (Qatar).39 The full text of selected citations will then be assessed in detail against the inclusion criteria by two independent reviewers. Reasons for any exclusion of full-text studies will be recorded and reported in the systematic review. Any disagreements between reviewers at each stage of study selection will be resolved through discussions or consultations with a third reviewer. Results of the search will be reported in full in the final systematic review and presented in a Preferred Reporting Items for Systematic Reviews and Meta-analyses of Diagnostic Test Accuracy (PRISMA-DTA) flow diagram.37
Data collection and data items
Data will be extracted from the included articles by two independent reviewers using a standardised and pilot-tested data extraction tool, RevMan 5.4 (Copenhagen: The Nordic Cochrane Centre, Cochrane). The extracted data will include specific details about the name of authors, study title, year of publication, countries of study, populations (including diseased and healthy cases), demographic factors (i.e. age, gender, ethnicity), sample size, study methods, types of DL algorithms, decision threshold, types of reference standard [which may include expert consensus, microbiological results confirmed on either smear microscopy, culture or polymerase chain reaction (PCR), and/or corneal imaging such as IVCM], and diagnostic accuracy (including the sensitivity and specificity) of the index test (i.e. DL algorithms) and the comparator (i.e. non-expert healthcare professionals), if available. We will extract sufficient information to build 2×2 contingency tables at the reported threshold for each study. The contingency tables will include true positive, false positive, true negative, and false negative to calculate the sensitivity and specificity. If various contingency tables were provided for the same or different algorithms in the same study, they are assumed to be independent from each other as the aim of this work is to provide an overview of the results of various studies instead of the precise point estimates.28 Any disagreements will be resolved by group consensus. Authors of eligible studies will be contacted to request any missing data, where required.
Outcomes
The primary outcome will be the diagnostic accuracy of DL algorithms in distinguishing IK from healthy eyes and/or those with other types of corneal diseases, as compared to the reference standard. The diagnostic accuracy of each group will be presented in the form of sensitivity and specificity.40 The secondary outcomes for this review will involve a comparison of the accuracy in differentiating various types of IK and in differentiating IK from other types of corneal or ocular surface diseases. For studies that focused on distinguishing IK (any type) from healthy corneas or those with other non-IK corneal pathologies, the reference standard will be the expert consensus and/or microbiological results. For studies that focused on differentiating the subtypes of organisms (e.g., bacteria vs. fungi), the microbiological results or expert consensus (if microscopy or culture results were not available) will be the reference standard. Other potential outcomes for this review will include the accuracy of DL in predicting the culture positivity and clinical outcomes of IK based on the initial presenting images. These secondary outcomes may not be feasible if these more specific questions are not ascertained in the included studies, but they are nonetheless of interest.
Risk of bias assessment
Eligible studies will be critically appraised by two independent reviewers at the outcome level for methodological quality in the review using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool.41 QUADAS-AI (an AI-specific extension to QUADAS-2) tool42 will be used if available by the time of the conduct of this systematic review. Specifically, we will assess the Risk of Bias for our primary outcomes (i.e., accuracy of DL versus reference standard for IK). The questions used in these tools are split into four domains: patient selection, index test, reference test, and flow and timing. Each of these domain help assess the risk of bias created by patient selection, the conduct and interpretation of index test and reference test and the sequence and timing of the study respectively. We will also assess whether the AI systems have been tested on an externally validated test set. Authors of papers will be contacted to request for additional data for clarification, where necessary. Any disagreements will be resolved by discussion or by seeking the advice of a third reviewer. All studies, regardless of the results of their methodological quality, will undergo data extraction and synthesis. The results of critical appraisal will be reported in the final systematic review, in both narrative and tabular formats.
Data synthesis and analysis
The analysis will be conducted at two levels; (1) a systematic synthesis of all eligible studies; and (2) a meta-analysis of all relevant studies with similar outcome reporting. For the meta-analysis, the intervention group (i.e., the “index test”) will refer to the image-based DL algorithms for diagnosing or differentiating IK from other ocular diseases. The reference group will be the expert consensus and/or microbiological results, also known as the gold standard or “ground truth” for the DL algorithms whereas the comparator group, if available, will be the non-expert healthcare professionals.
Where possible, we will pool similar measures of accuracy from studies with statistical meta-analysis using RevMan 5.4 software. ‘Paired’ forest plots, with one forest plot for sensitivity and the other for specificity, will be used and presented side by side.43 The means and 95% confidence intervals of each selected primary studies will be presented alongside the number of true positives, false positives, true negatives, and false negatives, wherever appropriate. Summary Receiver Operating Characteristic (SROC) curves will also be plotted using the sensitivity and specificity of each primary study. Chi-square or Fisher exact tests will be used to assess the heterogeneity objectively, if needed.43 We expect heterogeneity in the types of DL systems and algorithms used across studies and we will consider all to be acceptable “interventions” for analysis as our question is meant to assess the general accuracy of any DL system. In view of the anticipated inter-studies heterogeneity, a random-effects model will be used for the meta-analysis to determine the pooled sensitivity and specificity of the included studies. A fixed-effect model may be used if there is no significant heterogeneity. Subgroup analyses will be conducted where there is sufficient data to investigate different types of IK and other ocular surface disease, as per our pre-specified secondary outcomes.
The heterogeneity between studies will be assessed through the graphic display of paired forest plots or SROC curves. We will evaluate potential publication bias of the pooled data using Deek’s funnel plot, and p<0.05 will be considered of significant publication bias.44, 45 The Grading of Recommendations, Assessment, Development and Evaluation (GRADE) approach for grading the certainty of evidence will be followed.36 A Summary of Findings (SoF) table, created using GRADEpro software, will be presented. Where appropriate, the following information will be included in the SoF: number and type(s) of studies contributing to the outcome, total sample size contributing to the outcome, ranking of the certainty of the evidence based on the risk of bias, heterogeneity, directness, publication bias and precision of the review results. We will include the following outcomes in the SoF table: AUC, sensitivity, and specificity for IK overall (i.e., primary outcome).
Patient and public involvement
DSJT had previously involved patients who were affected by IK to help identify the research need and priority in relation to IK. Many of the patients with IK have highlighted the importance of timely and accurate diagnosis of IK as the delay in diagnosis has negatively affected their visual outcomes. This serves as one of the key reasons for conducting this systematic review and meta-analysis, which aims to improve the diagnosis of IK in clinical settings.
Clinical relevance of this systematic review
The results of this systematic review and meta-analysis will provide high-quality evidence on the diagnostic accuracy of DL in IK. This study will help identify the gaps (if any) in the current clinical evidence, which may be related to study design, quality of the research methodologies, setting of reference standard, risk of bias, and outcome reporting. The identification of these issues can help refine the study design of any future clinical trials evaluating the diagnostic accuracy of DL in IK in a real-world setting. These findings will also help inform the clinicians, researchers, policy makers and regulatory bodies on the clinical applicability of DL in diagnosing IK, with an aim to develop more accessible investigations for IK in the future, including in both HIC and LMICs.
Financial Support / Funding
MJB is supported by the Wellcome Trust (Grant number 207472/Z/17/Z). DSJT is supported by Medical Research Council / Fight for Sight Clinical Research Fellowship (MR/T001674/1). The funder has not role in developing this protocol.
Author contributions
Study design and conceptualisation: DSJT; Development of study protocol: All authors; Data collection: ZZO, YS; Drafting of initial manuscript: ZZO, DSJT; Review of the manuscript: All authors; Guarantor of the study: DSJT
Conflict of Interest
None to declare.