PT - JOURNAL ARTICLE AU - Chowdhury, Mohammad Mihrab AU - Ayon, Ragib Shahariar AU - Hossain, Md Sakhawat TI - Diabetes Diagnosis through Machine Learning: Investigating Algorithms and Data Augmentation for Class Imbalanced BRFSS Dataset AID - 10.1101/2023.10.18.23292250 DP - 2023 Jan 01 TA - medRxiv PG - 2023.10.18.23292250 4099 - http://medrxiv.org/content/early/2023/10/19/2023.10.18.23292250.short 4100 - http://medrxiv.org/content/early/2023/10/19/2023.10.18.23292250.full AB - Diabetes is a prevalent chronic condition that poses significant challenges to early diagnosis and identifying at-risk individuals. Machine learning plays a crucial role in diabetes detection by leveraging its ability to process large volumes of data and identify complex patterns. However, imbalanced data, where the number of diabetic cases is substantially smaller than non-diabetic cases, complicates the identification of individuals with diabetes using machine learning algorithms. Our study focuses on predicting whether a person is at risk of diabetes, considering the individual’s health and socio-economic conditions while mitigating the challenges posed by imbalanced data. To minimize the impact of imbalance data, we employed several data augmentation techniques such as oversampling (SMOTE-N), undersampling (ENN), and hybrid sampling techniques (SMOTE-Tomek and SMOTE-ENN) on training data before applying machine learning algorithms. Our study sheds light on the significance of carefully utilizing data augmentation techniques, without any data leakage, in enhancing the effectiveness of machine learning algorithms. Moreover, it offers a complete machine learning structure for healthcare practitioners, from data obtaining to ML prediction, enabling them to make data-informed strategies.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not receive any funding Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:https://www.cdc.gov/brfss/index.htmlI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.Yeshttps://www.cdc.gov/brfss/index.html https://www.cdc.gov/brfss/index.html