Abstract
Introduction We propose the Explainable AI (XAI) model for Clinical Decision Support Systems (CDSSs). It supports physician’s Differential Diagnosis (DDx) with Evidence-based Medicine (EBM). It identifies instances of the case data contributing to predicted diseases. Each case data is linked to the sourced medical literature. Therefore, this model can provide medical professionals with evidence of predicted diseases.
Methods The source of the case data (training data) is medical literature. The prediction model (the main model) uses Neural Network (NN) + Learning To Rank (LTR). Physicians’ DDx and machines’ LTR are remarkably similar. The XAI model (the surrogate model) uses k-Nearest Neighbors Surrogate model (k-NN Surrogate model). The k-NN Surrogate model is a symphony of Example-based explanations, Local surrogate model, and k-Nearest Neighbors (k-NN). Requirements of the XAI for CDSS and features of the XAI model are remarkably adaptable. To improve the surrogate model’s performance, it performs “Selecting its data closest to the main model.” We evaluated the prediction and XAI performance of the models.
Results With the effect of “Selecting,” the surrogate model’s prediction and XAI performances are higher than those of the “standalone” surrogate model.
Conclusions The k-NN Surrogate model is a useful XAI model for CDSS. For CDSSs with similar aims and features, the k-NN Surrogate model is helpful and easy to implement. The k-NN Surrogate model is an Evidence-based XAI for CDSSs.
Unlike current commercial Large Language Models (LLMs), Our CDSS shows evidence of predicted diseases to medical professionals.
Introduction
Clinical Decision Support System
Clinical Decision Support Systems (CDSSs) aim to improve medical care quality. CDSSs support medical decisions with clinical knowledge, patient information, and other medical information. (1)
The summaries of this article’s CDSS are as follows:
– Objectives
– To support Differential Diagnosis (DDx) by physicians and general practitioners.
– To prevent diagnostic errors for medical professionals.
– To reduce diagnostic uncertainty
– Medical literature, metadata, and case data
Medical literature (textbooks, original articles, case reports, etc.) is a source of metadata and case data.
Metadata (bibliographic and disease-related information) is obtained from medical literature by text-mining.
Case data (symptoms and diseases) is obtained from the metadata by coding. It is used for training and test of the models.
Symptoms include signs and symptoms, laboratory and imaging test results, etc. (defined as “symptoms” from now on).
Diseases include confirmed diseases, differential diseases, and their scores.
– Prediction model
medical professional inputs the patient’s symptoms.
CDSS outputs predicted diseases by Artificial Intelligence (AI).
diseases are a ranking list of diseases. (2)
– XAI model
Our CDSS is open to medical professionals on the Internet. (5) (See: Fig 1, S Table 1)
Screen image of our Clinical Decision Support System Case citation: (6)
CDSSs with similar aims and features are available worldwide. (7), (8), (9), (10)
Clinical Decision Support System and Learning To Rank
The prediction model of our CDSS uses Neural Network (NN) + Learning To Rank (LTR).
Physicians’ DDx and machines’ LTR are remarkably similar. (2)
Explainable AI
Explainable AI (XAI) aims to help humans accept the AI’s behaviors.
The key points are interpretability and explainability. (11)
XAI has a variety of methods. (12), (13), (See: S Table 2)
Clinical Decision Support System and Explainable AI
Poor interpretability and explainability of CDSS are significant problems in medical ethics. This problem has severe and far-reaching consequences for personal and public health. (14)
The XAI for CDSS has a variety of requirements. (15), (See: S Table 3)
XAI research in medicine is focused on image diagnosis and less on differential diagnosis. (16)
Research on XAI with CDSS for a single disease was reported about “Type 1 diabetes + SHAPE.” (17) A little research on XAI with CDSS for multiple diseases was reported.
No research on XAI to support Evidence-Based Medicine (EBM) was reported.
This manuscript proposes the Evidence-based XAI model for CDSSs.
This model supports the physicians’ DDx with evidence.
Design
Objective
The aims of this article’s XAI model are as follows:
– Requests from medical professionals:
Show evidence of predicted diseases by CDSS.
– Response from the XAI:
Evidence is the case data of this medical literature.
Technical background for XAI
The AI techniques used for implementation are as follows:
Example-based explanations
Surrogate model
K-Nearest Neighbors
Based on the XAI requirements for CDSS, we selected these techniques.
Example-based explanations
Example-based explanations explain each predicted data.
The explanation is the identified instances of the training data contributing to the predicted data. (12) The features of the Example-based explanations are as follows: (12)
The simplest XAI method, with interpretability and explainability
Explain predicted data by identifying instances of the training data contributing to it
A mostly model-agnostic
Available only if a human can understand the instance’s contents The case data of CDSS is obtained from medical literature.
If it can identify instances of the training data (= case data), it can show evidence of the predicted data (= predicted diseases).
However, the Neural Network (NN) using the CDSS’s prediction model is not adapted to Example-based explanations.
Surrogate model
The (local) surrogate model explains each predicted data of the main model. (12) The differences between the main and surrogate models are as follows: (12)
The main model is often uninterpretable (black box).
The surrogate model is an interpretable and explicable (white box) model.
The prediction performance of the surrogate model is lower than that of the main model.
The surrogate model’s predicted data is not equal to the main model.
K-Nearest Neighbors
The features of the k-Nearest Neighbors (k-NN) are as follows:
The simplest AI method, with interpretability and explainability
Support Example-based Explanations by instances of the training data
Wide adaptability
Non-parametric supervised learning method
Pattern matching
Various distance and evaluation functions
High-dimensional data (ex: varieties of symptoms and diseases)
Scarce training data (ex: rare diseases and cases)
Prediction performance is lower than that of the NN
k-Nearest Neighbors Surrogate model for Clinical Decision Support System
We propose the k-Nearest Neighbors Surrogate model (k-NN Surrogate model) as the XAI model for CDSS.
Requirements of CDSS and features of k-NN are remarkably adaptable, except for the k-NN’s prediction performance.
Requirements of XAI for CDSS and features of Example-based explanations are remarkably adaptable. The k-NN Surrogate model is a symphony of Example-based explanations, Local surrogate model, and k-NN.
It improves prediction and XAI performance by “Selecting its data closest to the main model.”
In a typical k-NN use case, the number of neighbors (k) is selected by hyperparameter optimization during the design phase. The value of k is fixed during the prediction phase.
In the k-NN Surrogate model, the value of k is changed for each prediction. It predicts multiple data with multiple numbers of neighbors (ex: k = 1 − 10). The value of closest k (k_closest) is selected by the evaluation function’s value between the main model’s predicted data and the surrogate model’s multiple predicted data. This process can closest the surrogate model’s predicted data to the main model. (defined as “Selecting” from now on)
By “Selecting,” the surrogate model’s data (the predicted data and instances of them) are good surrogates for the main model. k-NN Surrogate model can show evidence of predicted diseases to medical professionals.
Implementation
Definitions of data (See: Table 1)
Medical literature, metadata, and case data
Medical literature
Medical literature is a source of metadata and case data. It includes medical textbooks, original articles, case reports, etc. It will be selected carefully for the CDSS’s purpose and targets. It is limited to clear sources and peer-reviewed content.
Metadata
Metadata is obtained from medical literature by text-mining. It includes bibliographic, disease-related, and other information.
Case data
Case data is obtained from the metadata by coding. It is used as training and test data for models. It includes symptoms and diseases.
Symptoms include signs and symptoms, laboratory and imaging test results, etc. Diseases include confirmed diseases, differential diseases, and their scores.
Case ID can identify and cross-reference between the metadata and the case data. (See: Table 2, Fig 2)
Environments for development and execution
The evaluation function for the main (NN) and surrogate (k-NN) models is the Normalized Discounted Cumulative Gain (ndcg).
Activity diagram of text-mining and coding
The loss function for the main model (NN) is the approximate NDCG loss (ANDCG). (19) (See: Table 3, Table 4)
Training of the models
The main and surrogate models use same datasets. (See: Fig 3)
Activity diagram of training (See: S Code 1)
Evaluation
Datasets for evaluation use k-fold cross-validation of case data. (See: Fig 6)
Activity diagram of prediction (See: S Code 1)
Activity diagram of explanation (See: S Code 1)
Activity diagram of evaluation (See: S Code 1)
Results and Discussion
Prediction performance
The prediction performance of the main and surrogate models was evaluated. (See: Table 5)
XAI performance
The XAI performance of the surrogate model was evaluated. (See: Table 6)
Conclusions
The k-NN Surrogate model is a useful XAI model for CDSS.
It is a symphony of Example-based explanations, Local surrogate model, and k-NN.
k-NN is adapted to high-dimensional data (ex: varieties of symptoms and diseases) and scarce case data (ex: rare diseases and cases).
Our CDSS’s case data are the same as the evidence in medical literature.
For CDSSs with similar aims and features, Example-based explanations are helpful and easy to implement. The k-NN Surrogate model is an Evidence-based XAI for CDSS Uncertainty Quantification (UQ) is an essential issue for AI and CDSS. (20), (21) k-NN is adapted to Conformal Prediction (CP), one of UQ’s. (22), (23) The k-NN Surrogate model will also contribute to these improvements in performance.
Unlike current commercial Large Language Models (LLMs), Our CDSS shows evidence of the predicted diseases to medical professionals.
This emphasis on evidence provides a sense of reassurance and confidence in the system’s capabilities. It is important to remember that XAI and the k-NN Surrogate model are beneficial and can change the game for CDSSs.
Additional Information
Author Approval
This manuscript has been seen and approved by all listed authors. This manuscript has not been accepted or published by a journal.
Competing Interests
The authors have declared no competing interest.
Declarations
I have the right to post this manuscript and confirm that all authors have assented to posting of the manuscript and inclusion as authors.
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. The study does not describe the use of any human data, samples, or any research involving human subjects.
I confirm that all necessary patient/participant consent (including consent to publish) has been obtained and the appropriate institutional forms have been archived. I confirm that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided.
I am legally responsible for the content of the article.
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Funding Statement
This study did not receive any funding.
Supporting information
S Table 1 Example of predicted diseases
Filename: S_table_01_Example_of_the_predicted_diseases.pdf
Case citation: (6)
Notes: The predicted diseases of the Fig 1.
S Table 2 List of Explainable Artificial Intelligence methods
S Table 3 Requirements of Explainable Artificial Intelligence for Clinical Decision Support System
Filename: S_table_03_Requirements_of_XAI_for_CDSS.pdf
Citation: (15)
S Code 1 Pseudo code for k-Nearest Neighbors Surrogate model
Filename: S_code_01_Pseudo_code.pdf
Notes1: The code of the activity diagrams (Fig 3, Fig 4, Fig 5, Fig 6).
Notes2: No guarantee of executability.
End
Acknowledgements
Not applicable.
Footnotes
Fixing errors in figure and table references. Polishing the text.