ABSTRACT
Background This study explores the utility of machine learning (ML) models in predicting complicated Ovarian Hyperstimulation Syndrome (OHSS) in patients undergoing infertility treatments, addressing the challenge posed by highly imbalanced datasets.
Objective This research fills the existing void by introducing a detailed structure for crafting diverse machine learning models and enhancing data augmentation methods to predict complicated OHSS effectively. Importantly, the research also concentrates on pinpointing critical elements that affect OHSS.
Method This retrospective study employed a ML framework to predict complicated OHSS in patients undergoing infertility treatment. The dataset included various patient characteristics, treatment details, ovarian response variables, oocyte quality indicators, embryonic development metrics, sperm quality assessments, and treatment specifics. The target variable was OHSS, categorized as painless, mild, moderate, or severe. The ML framework incorporated Ray Tune for hyperparameter tuning and SMOTE-variants for addressing data imbalance. Multiple ML models were applied, including Decision Trees, Logistic Regression, SVM, XGBoost, LightGBM, Ridge Regression, KNN, and SGD. The models were integrated into a voting classifier, and the optimization process was conducted. The SHAP package was used to interpret model outcomes and feature contributions.
Results The best model incorporated IPADE-ID augmentation along with an ensemble of classifiers (SGDClassifier, SVC, RidgeClassifier), reaching a recall of 0.9 for predicting OHSS occurrence and an accuracy of 0.76. SHAP analysis identified key factors: GnRH antagonist use, longer stimulation, female infertility factors, irregular menses, higher weight, hCG triggers, and, notably, higher number of embryos.
Conclusion This novel study demonstrates ML’s potential for predicting complicated OHSS. The optimized model provides insights into contributory factors, challenging certain conventional assumptions. The findings highlight the importance of considering patient-specific factors and treatment details in OHSS risk assessment.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The author(s) received no specific funding for this work.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This retrospective study analyzed anonymized medical records with approval from the Mashhad University of Medical Sciences ethics committee (IR.MUMS.REC.1395.326). Data were accessed on July 11, 2023, and contained no identifying variables. Verbal consent was obtained from all patients for the use of their anonymized data, in accordance with ethical guidelines and data protection regulations.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
DATA AVAILABILITY STATEMENT
The dataset supporting the conclusions of this article is confidential and cannot be made publicly available. However, data may be available from the corresponding author upon reasonable request and with permission of the original data providers, subject to compliance with applicable confidentiality agreements and data protection laws.
Declaration of Generative AI and AI-assisted Technologies in the Writing Process
In the development and composition of this document, generative AI was not utilized beyond the scope of fundamental tools for grammar, spelling, and reference verification.
Summary Table
What was already known on the topic:
Machine learning models have been applied to predict outcomes like oocyte retrieval numbers, IUI success, pregnancy rates, and live birth rates in assisted reproductive technology (ART)
Ovarian hyperstimulation syndrome (OHSS) is a potentially life-threatening complication of ART, but prediction of complicated OHSS using machine learning has not been explored previously
Imbalanced datasets are a major challenge for developing effective machine learning models in medical domains
What this study added to our knowledge:
This is the first study to develop and optimize machine learning models specifically for predicting complicated cases of OHSS, addressing the obstacle of imbalanced data
The optimized ensemble model provides insights that challenge certain conventional assumptions about risk factors for OHSS, such as de-emphasizing oocyte numbers while highlighting the number of embryos as a predictor
Novel data augmentation techniques like IPADE-ID were effectively applied to tackle the highly imbalanced nature of the OHSS dataset
The study identified key influential factors like GnRH antagonist use, stimulation duration, female infertility factors, irregular menses, higher weight, hCG trigger usage, and number of embryos through interpretation of the optimized model