Abstract
Background Chronic Kidney Disease (CKD) is a significant complication in people with diabetes, leading to serious adverse health outcomes and increased healthcare costs globally individually and on healthcare systems. This problem become more complicated when it is in Low and middle in countries including Rwanda when access to early diagnostic services is limited. Early prediction and intervention can improve patient outcomes and reduce the burden on healthcare systems.
Objective This study aimed to develop and evaluate a machine learning model for predicting CKD in diabetic patients, tailored to the Rwandan population, using Electronic Medical record Data.
Methodology Secondary data were extracted from OpenClinic, an electronic medical record (EMR) system used at Kigali University Hospital, covering a period of 10 years from 2013 to 2023. The final cleaned dataset was used to train four machine-learning models: Logistic Regression (LR), Random Forest (RF), Decision Tree (DT), and Extra Gradient Boosting Machine (XGBoost). XGboost was noted as the best performer with the AUC score of 0.98 and accuracy of 95.67%.
Results The findings revealed that XGBoost was highly effective in predicting chronic kidney disease, achieving an accuracy of 95.76% and an AUC score of 0.98. Given that the dataset was collected from the local population, this study confirms that machine learning algorithms can assist clinicians in Rwanda in diagnosing chronic kidney disease in its early stages.
Conclusion This study demonstrates the potential of machine learning models in predicting chronic kidney disease (CKD) in diabetic patients, highlighting the importance of local datasets for optimizing model performance in specific populations. These findings suggest that machine learning can effectively assist existing medical techniques in the early diagnosis of CKD in Rwanda.
Author summary In this study, we trained machine learning model to predict the risk of chronic kidney disease (CKD) in patients with diabetes, using a dataset collected in Rwanda. Early detection of CKD is crucial, as it allows healthcare providers to intervene sooner, improving patient outcomes, potentially reducing financial, and health burden on the patients. We processed the data, by handling different available data issues and statistically created new features such as anemia status and length of hospital stay to improve the model’s predictions. The final model, XGBoost provides insights that it can help health providers to identify high-risk patients and plan personalized care more effectively.
This study highlights how data-driven solutions can support healthcare delivery in resource-limited settings, by enhancing early diagnosis especially at primary healthcare level. By integrating this predictive tool into routine clinical workflows of Electronic Medical Record, healthcare institutions can make better clinical decisions that improve patient care and outcomes. This project contributes to the growing field of health informatics in Africa and shows the potential of applying advanced analytics to solve local health challenges.
Competing Interest Statement
The authors have declared no competing interest.
Clinical Trial
NA
Funding Statement
The author(s) received no specific funding for this work.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The IRB of University of Rwanda, College of Medicine and Health Sciences issued ethical clearance. Additionally, the Ethical committee at University Teaching Hospital clearance to access data.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The dataset used to train the model in this study is available in the main author Github's repository. No identification information is included in this dataset to ensure the privacy of the subjects involved
https://github.com/rug997/Masters-Thesis/blob/main/CKD%20Analysis.ipynb