ABSTRACT
Background There has emerged an increasing demand for advanced methodologies aimed at augmenting our comprehension and prognostication of illnesses. This study is distinctly centered on tackling the complexity of Sepsis, an immediate bodily reaction to infection. Our objective is to refine the early identification and mortality forecasting for patients diagnosed under the Sepsis-3 criteria, with the overarching aim of enhancing the allocation of hospital resources.
Methods In this study, we introduced a Machine Learning (ML) framework aimed at predicting the 30-day mortality rate among Intensive Care Unit (ICU) patients diagnosed with Sepsis-3. Leveraging the Medical Information Mart for Intensive Care III (MIMIC-III) database, we systematically identified eligible patients using advanced big data extraction tools such as Snowflake. Additionally, we employed decision tree models to ascertain the importance of various features and conducted entropy analyses across decision nodes to refine feature selection. Collaborating with esteemed clinical experts, we curated a list of 30 relevant features. Moreover, we used the Light Gradient Boosting Machine (LightGBM) model due to its gradient boosting architecture and computational efficiency.
Results The study comprised a cohort of 9118 patients diagnosed with Sepsis-3. Through our meticulous preprocessing techniques, we observed a marked enhancement in both the Area Under the Curve (AUC) and accuracy metrics. The LightGBM model yielded an impressive AUC of 0.983, with a 95% confidence interval [0.980-0.990]. Moreover, it exhibited a commendable accuracy of 0.966 and an F1-score of 0.910. Notably, LightGBM showcased a substantial 6% enhancement over our best baseline model and a significant 14% enhancement over the best existing literature. These noteworthy advancements can be attributed to several factors: (I) the incorporation of a novel and pivotal feature in our model, Hospital Length of Stay (HOSP_LOS), which has not been included in previous literature; (II) the inherent strengths of LightGBM’s gradient boosting architecture, enabling robust predictions even with high-dimensional data, while maintaining computational efficiency, as evidenced by its learning curve.
Conclusions The introduced preprocessing methodology not only led to a substantial reduction in the number of relevant features compared to the best existing literature, thereby alleviating computational complexities, but also enabled the identification of a crucial feature previously ignored in existing literature. Through the integration of these pivotal features and meticulous parameter tuning, our proposed model achieved remarkable predictive power, with its learning curve demonstrating its capacity for generalization to unseen data. This underscores the potential of ML as indispensable tools in the dynamic environment of the ICU. Employing our model stands to streamline resource allocation within ICUs, offering clinicians greater efficiency and tailored interventions for patients afflicted with Sepsis-3.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Medical Information Mart for Intensive Care III (MIMIC-III) database. This database is publicly available and the link is: https://physionet.org/content/mimiciii/1.4/
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Revised the placement of inserted images, added data availability in the declaration section, included the country and city for affiliations, and corrected an erroneous citation.
Data Availability
All data produced are available online at https://physionet.org/content/mimiciii/1.4/
ABBREVIATIONS
- ML
- Machine Learning
- ICU
- Intensive Care Unit
- SAPS-II
- Simplified Acute Physiology Score-II
- SOFA
- Sequential Organ Failure Assessment
- AUROC
- Average Area Under the Receiver Operating Characteristic Curve