ABSTRACT
Background Diabetes is a serious and progressive medical condition demanding efficient diagnostic methods, especially since its associated symptoms overlap with the symptoms of other medical conditions. While various studies have explored early detection of diabetes across different age groups, there is a notable gap in specific attention to middle-aged adults. This study explicitly focused on this demographic, aiming to assess associations between symptoms and diabetes status, investigate the relevance and relative influence of certain symptomatic and demographic features in the prediction of diabetes, and identify the most efficient machine learning (ML) model for predicting diabetes.
Methods Utilizing a dataset from a previous study conducted in the Sylhet Diabetes Hospital in Bangladesh, India, comprising 520 participants, including both diabetic and non-diabetic patients, we extracted and analyzed demographic and symptom-related information from 296 middle-aged adults aged from 40 to 60 years. Employing chi-square tests, we evaluated symptom-diabetes associations, while utilizing the Boruta algorithm to investigate symptom importance and influence. Seven ML models namely, K-Nearest Neighbor (KNN), Naïve Bayes (NB) classifier, Support Vector Machines with linear, polynomial, and radial basis function kernels, Random Forest (RF) classifier, and Logistic Regression were then assessed for optimal predictive performance.
Results Out of the 296 participants of this study, 179 (60%) were diabetic. Significant associations were found between diabetes status in middle-aged adults and symptoms such as polyuria, polydipsia, weakness, sudden weight loss, partial paresis, polyphagia, and visual blurring, as confirmed by the p-values of their respective chi-square tests. All features studied, including demographics and symptoms, were confirmed as relevant for predicting diabetes in middle-aged adults. Notably, polyuria, polydipsia, gender, alopecia, irritability, and sudden weight loss were identified as the most influential features. Among the seven ML models, RF showed the highest sensitivity (98.59%), while KNN excelled in specificity (97.83%). RF demonstrated the best accuracy (96.58%) and area under the curve score (96.00%), making it the most efficient ML model for predicting diabetes among middle-aged adults.
Conclusion The findings of this study emphasize the importance of using diabetes-related symptoms for early detection of diabetes within the middle-aged adult population. The RF model demonstrated robust diagnostic capabilities, emphasizing its potential in predicting diabetes in middle-aged adults. Further exploration of genetic, lifestyle, and environmental factors is warranted to enhance the understanding and diagnostic accuracy in this demographic.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study used ONLY openly available human data that were originally located at Mendeley Data (https://doi.org/10.17632/7zcc8v6hvp.1)
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced are available online at https://doi.org/10.17632/7zcc8v6hvp.1