PT - JOURNAL ARTICLE AU - Cawiding, Olive R. AU - Lee, Sieun AU - Jo, Hyeontae AU - Kim, Sungmoon AU - Suh, Sooyeon AU - Joo, Eun Yeon AU - Chung, Seockhoon AU - Kim, Jae Kyoung TI - SymScore: Machine Learning Accuracy Meets Transparency in a Symbolic Regression-Based Clinical Score Generator AID - 10.1101/2024.10.28.24316164 DP - 2024 Jan 01 TA - medRxiv PG - 2024.10.28.24316164 4099 - http://medrxiv.org/content/early/2024/11/04/2024.10.28.24316164.short 4100 - http://medrxiv.org/content/early/2024/11/04/2024.10.28.24316164.full AB - Self-report questionnaires play a crucial role in healthcare for assessing disease risks, yet their extensive length can be burdensome for respondents, potentially compromising data quality. To address this, machine learning-based shortened questionnaires have been developed. While these questionnaires possess high levels of accuracy, their practical use in clinical settings is hindered by a lack of transparency and the need for specialized machine learning expertise. This makes their integration into clinical workflows challenging and also decreases trust among healthcare professionals who prefer interpretable tools for decision-making. To preserve both predictive accuracy and interpretability, this study introduces the Symbolic Regression-Based Clinical Score Generator (SymScore). SymScore produces score tables for shortened questionnaires, which enable clinicians to estimate the results that reflect those of the original questionnaires. SymScore generates the score tables by optimally grouping responses, assigning weights based on predictive importance, imposing necessary constraints, and fitting models via symbolic regression. We compared SymScore’s performance with the machine learning-based shortened questionnaires MCQI-6 (n = 310) and SLEEPS (n = 4257), both renowned for their high accuracy in assessing sleep disorders. SymScore’s questionnaire demonstrated comparable performance (MAE = 10.73, R2 = 0.77) to that of the MCQI-6 (MAE = 9.94, R2 = 0.82) and achieved AU-ROC values of 0.85-0.91 for various sleep disorders, closely matching those of SLEEPS (0.88-0.94). By generating accurate and interpretable score tables, SymScore ensures that healthcare professionals can easily explain and trust its results without specialized machine learning knowledge. Thus, Sym-Score advances explainable AI for healthcare by offering a user-friendly and resource-efficient alternative to machine learning-based questionnaires, supporting improved patient outcomes and workflow efficiency.Competing Interest StatementThe authors have declared no competing interest.Funding StatementWe thank the following organizations for their support of this study: Institute for Basic Science (Institute for Basic Science Grant IBS-R029-C3 to J.K.K), Samsung Medical Center (Samsung Medical Center Grant OTC1190671 to E.Y.J), and Hyundai Motor's Chung Mong-Koo Global Scholarship (to O.R.C).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB of Samsung Medical Center (approval 2022-07-003) gave ethical approval for this work and was conducted in accordance with the principles of the Declaration of Helsinki. Participant informed consent was waived due to the retrospective nature of the study. (SLEEPS data) IRB of Sungshin Women's University, Seoul, South Korea (SSWUIRB-2020-009) gave ethical approval for this work. Written informed consent was waived. The survey was administered anonymously, and no personal information was gathered. The survey form was developed according to the Checklist for Reporting Results of Internet e-Surveys (CHERRIES) guidelines. (MCQ-I data)I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors The MCQI and SLEEPS datasets are not publicly available but are available from the corresponding author on reasonable request.