Abstract
Background Traditional clinical assessments often lack individualization, relying on standardized procedures that may not accommodate the diverse needs of patients, especially in early stages where personalized diagnosis could offer significant benefits. We aim to provide a machine-learning framework that addresses the individualized feature addition problem and enhances diagnostic accuracy for clinical assessments.
Methods Individualized Clinical Assessment Recommendation System (iCARE) employs locally weighted logistic regression and Shapley Additive Explanations (SHAP) value analysis to tailor feature selection to individual patient characteristics. Evaluations were conducted on synthetic and real-world datasets, including early-stage diabetes risk prediction and heart failure clinical records from the UCI Machine Learning Repository. We compared the performance of iCARE with a Global approach using statistical analysis on accuracy and area under the ROC curve (AUC) to select the best additional features.
Findings The iCARE framework enhances predictive accuracy and AUC metrics when additional features exhibit distinct predictive capabilities, as evidenced by synthetic datasets 1-3 and the early diabetes dataset. Specifically, in synthetic dataset 1, iCARE achieved an accuracy of 0·999 and an AUC of 1·000, outperforming the Global approach with an accuracy of 0·689 and an AUC of 0·639. In the early diabetes dataset, iCARE shows improvements of 1·5-3·5% in accuracy and AUC across different numbers of initial features. Conversely, in synthetic datasets 4-5 and the heart failure dataset, where features lack discernible predictive distinctions, iCARE shows no significant advantage over global approaches on accuracy and AUC metrics.
Interpretation iCARE provides personalized feature recommendations that enhance diagnostic accuracy in scenarios where individualized approaches are critical, improving the precision and effectiveness of medical diagnoses.
Funding This work was supported by startup funding from the Department of Psychology at the University of Kansas provided to A.A., and the R01MH125740 award from NIH partially supported J.M.G.’s work.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by startup funding from the Department of Psychology at the University of Kansas provided to A.A., and the R01MH125740 award from NIH partially supported J.M.G.'s work.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study used ONLY openly available human data that were originally located at: 1. https://doi.org/10.24432/C5VG8H 2. https://doi.org/10.24432/C5Z89R 3. https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Section 3: Findings and Interpretation has a new result, Comparisons with Other Frameworks, which compares the feature selection of iCARE with an existing framework using the F1 metric.
Data Availability
All synthetic data generation procedure are contained in the supplementary document. The real-world datasets are available online at https://doi.org/10.24432/C5VG8H, https://doi.org/10.24432/C5Z89R, and https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset.
https://doi.org/10.24432/C5VG8H
https://doi.org/10.24432/C5Z89R
https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset