PT - JOURNAL ARTICLE AU - Madewell, Zachary J. AU - Rodriguez, Dania M. AU - Thayer, Maile B. AU - Rivera-Amill, Vanessa AU - Aponte, Jomil Torres AU - Marzan-Rodriguez, Melissa AU - Paz-Bailey, Gabriela AU - Adams, Laura E. AU - Wong, Joshua M. TI - Machine learning for improved dengue diagnosis, Puerto Rico AID - 10.1101/2024.11.13.24317272 DP - 2024 Jan 01 TA - medRxiv PG - 2024.11.13.24317272 4099 - http://medrxiv.org/content/early/2024/11/13/2024.11.13.24317272.short 4100 - http://medrxiv.org/content/early/2024/11/13/2024.11.13.24317272.full AB - Background Diagnosing dengue accurately, especially in resource-limited settings, remains challenging due to overlapping symptoms with other febrile illnesses and limitations of current diagnostic methods. This study aimed to develop machine learning (ML) models that leverage readily available clinical data to improve diagnostic accuracy for dengue, potentially offering a more accessible and rapid diagnostic tool for healthcare providers.Methods We used data from the Sentinel Enhanced Dengue Surveillance System (SEDSS) in Puerto Rico (May 2012—June 2024). SEDSS primarily targets acute febrile illness but also includes cases with other symptoms during outbreaks (e.g., Zika and COVID-19). ML models (logistic regression, random forest, support vector machine, artificial neural network, adaptive boosting, light gradient boosting machine [LightGBM], and extreme gradient boosting [XGBoost]) were evaluated across different feature sets, including demographic, clinical, laboratory, and epidemiological variables. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), where higher AUC values indicate better performance in distinguishing dengue cases from non-dengue cases.Results Among 49,679 patients in SEDSS, 1,640 laboratory-confirmed dengue cases were identified.□The□XGBoost and LightGBM models achieved the highest diagnostic accuracy, with AUCs exceeding 90%, particularly with comprehensive feature sets. Incorporating predictors such as monthly dengue incidence, leukopenia, thrombocytopenia, rash, age, and absence of nasal discharge significantly enhanced model sensitivity and specificity for diagnosing dengue. Adding more relevant clinical and epidemiological features consistently improved the models’ ability to correctly identify dengue cases.Conclusions ML models, especially XGBoost and LightGBM, show promise for improving diagnostic accuracy for dengue using widely accessible clinical data, even in resource-limited settings. Future research should focus on developing user-friendly tools, such as mobile apps, web-based platforms, or clinical decision systems integrated into electronic health records, to implement these models in clinical practice and exploring their application for predicting dengue.Author summary Dengue is a tropical disease caused by the dengue virus, which is transmitted by mosquitoes. It affects millions of people worldwide every year, leading to severe illness and even death in some cases. Accurate and timely diagnosis of dengue is crucial for proper treatment and controlling the spread of the virus. Traditionally, diagnosing dengue relies on symptoms and laboratory tests, which can sometimes be non-specific and not immediately available in distinguishing dengue from other similar illnesses. In our study, we explored the use of machine learning, a type of artificial intelligence, to improve dengue diagnosis using patient information from Puerto Rico. Our models, which use information like age, symptoms, and specific blood cell counts, can accurately predict whether someone has dengue. We found that some simple information, like whether a patient has a rash or low blood cell counts, can be very helpful in making a diagnosis. While more complex models performed slightly better, simpler models can also be effective, especially in places with limited resources. Our study shows that using computer models can improve dengue diagnosis and help healthcare providers make better decisions for their patients.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis research was funded by Centers for Disease Control and Prevention, grant numbers U01CK000473 and U01CK000580 (VRA).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Institutional Review Boards at the Centers for Disease Control and Prevention (CDC), Auxilio Mutuo, and Ponce Medical School Foundation approved the SEDSS study protocols 6214, and 120308-VR/2311173707, respectively. Written consent to participate was obtained from all adult participants and emancipated minors. For minors aged 14 to 20 years, written consent was obtained, and for those aged 7 to 13 years, parental written consent and participant assent were obtained.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesData cannot be shared publicly because data cannot be deidentified at the granular level of analyses performed. Data are available from the CDC and PHSU study management team (contact: dengue{at}cdc.gov) for researchers who meet the criteria for access to confidential data.