Abstract
Background and Objective Paralytic Ileus (PI) is the pseudo-obstruction of the intestine secondary to intestinal muscle paralysis. PI is caused by several reasons such as overuse of medications, spinal injuries, inflammation, abdominal surgery, etc. We have developed an early mortality prediction framework that can help intensivist, surgeons and other medical professionals to optimize clinical management for PI patients in terms of optimal treatment strategy and resource planning.
Methods We used publicly available ICU database called MIMIC III v1.4, extracted patients that had paralytic ileus as primary diagnosis over the age of 18 years old. We developed FLAIM Framework a two-phase model (Phase I: Statistical testing and Phase II: Machine Learning application) that was compare to traditional methods of machine learning. We used five different machine learning algorithms to test the validity of our Framework. We evaluated the effectiveness of the proposed framework by comparing accuracy, sensitivity, specificity, Receiver Operating Characteristic (ROC) curves, and area under the curve (AUC) for each model.
Results The highest improvement in AUC of 7.78% was observed due to application of the proposed FLAIM method. Additionally, almost for all the machine learning models, improvement in accuracy was also observed. With the FLAIM framework, we recorded an accuracy of 81.30% and AUC of 81.38% under support vector machine (with RBF kernel) model in predicting mortality during a hospital stay for the PI patients
Discussion Our results show promising clinical outcome prediction and application for individual patients admitted to the ICU with paralytic ileus after the first 24 hours of admission.
1. Introduction and Background
Paralytic ileus (PI) or intestinal pseudo-obstruction is a rare disorder of the intestine which is characterized by the paralysis of the intestinal peristalsis without any machinal cause. This leads to an impaired movement of food or fecal matter along the gastrointestinal tract [1]. Diagnosis is made clinically and it can present with symptoms like abdominal bloating, pain, nausea, vomiting and constipation as initially described by Dudley et al [2]. PI is caused by either primary or idiopathic dysfunction of gastrointestinal motility (acquired or inherited mutations) [3]. Additionally, it could be due to abdominal and/or pelvic surgery [4], systemic lupus erythematosus [5], scleroderma [6], Parkinson’s disease [7], infections [8], medications like opioids and antidepressants [9], radiation to the abdomen [10] and malignancies lung cancer [11]. Unfortunately, paralytic ileus has a poor clinical outcome [12] and an early prediction could be helpful for care delivery.
The mortality of patients with PI can be as high as 40% [13] in the ICU setting. Patients admitted to the intensive care are especially at risk of dying because of the seriousness of their condition. Some of these ICU may have additional risks associated when these patients undergo colonic decompression [14] and while providing bowel rest with reduced food intake [15]. With this in mind, it becomes important to have an approach that can predict the risk of mortality in these patients and help clinicians in making a decision faster and allocate adequate resources to these patients.
The MIMIC-III (Medical Information Mart for Intensive Care III) database is a freely-available. large database that has de-identified patient data associated with more than forty thousand patients that were admitted to the critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012 and developed by the Massachusetts Institute of Technology (MIT) [16]. This dataset has been used to predict mortality for multiple diseases [17]. In this paper, we utilized risk factors that are based on laboratory test results in order to predict mortality in PI patients.
To obtain an optimal mortality risk prediction performance, we proposed an automated framework called FLAIM. The proposed FLAIM framework works in two phases. In the first phase, it uses Cox-regression (univariate and multivariate) and Kaplan-Meier analysis based statistical methods to find out a relevant (reduced) set of risk factors while the second phase explores the feasibility of different machine learning models which are used for classification purposes. Machine learning models are provided with some training data, the model learns a hypothesis i.e. a fitting function denoted by h(x) by optimizing error achieved on the training data. Thus, the model learns the behavior of the distribution of the data by analyzing the training data set. The model performance i.e. the performance of the learned hypothesis function is tested on the unseen testing data. The main objective is to construct such a model that would show better performance on the training data and would also generalize to unseen data i.e. the testing data. Hence, such a constructed model can be deployed in hospitals for predictions in future.
In this paper, we have explored the hybridization of six different machine learning models with the above-mentioned statistical models that are utilized for evaluating most relevant (reduced) set of risk factors. The machine learning models used for mortality prediction of PI patients include linear discriminant analysis (LDA), (GNB), Decision Tree (DT) model (also known as CART in literature), K nearest neighbors (KNN) and support vector machines (SVMs) with linear and radial basis functions (RBF). Linear Discriminant Analysis (LDA) is a method used in statistics, pattern recognition and machine learning (ML) to find a linear combination of features which characterizes or separates two or more classes of objects or events [18]. SVM has been used in bioinformatics fields for its excellent performance in classification by drawing a hyperplane [22] [23]. A hyperplane is a line that divides two classes where each labeled data lay on one side. To separate classes, there may be many possible hyperplanes that can differentiate instances, however, the goal is to find an optimal hyperplane that has a maximum margin or maximum distance from the data points of both classes [20,24]. When the samples are not separable then an error term is considered as cost function that leads to poor classification. Regularization methods are used to deal poor classification issues such as overfitting and underfitting.
In addition to the classification models stated above, in FLAIM we have also used Decision Tree model. A decision tree has nodes (attributes) and edges which make a machine processable tree-like structure which helps to build a parseable logic that leads to a decision, where edges are the outcomes of the split to the next node. The root node performs the first split, whereas leaves, terminal nodes predict the outcome. Entropy and information gain are the mathematical methods that have been used for the splitting of nodes [25]. Another machine learning model used for mortality prediction of PI patients is GNB model. It uses Bayes theorem of probability for prediction of unknown class. Usually, three approaches have been used for classification in Naive Bayes. 1) Gaussian: It assumes that continuous features follow a normal distribution, 2) Multinomial: When features are discrete, 3) Bernoulli: When features are binary [26,27][28] is most frequently non-parametric methods used in data science. In kNN, (k) is the number of nearest neighbors where unknown instances or data-points are classified based on the nearest known instances in the feature space. The instances which are close to each other are more likely of the same class. Distance between instances is commonly measured using the Euclidean distance [29]. The principal of kNN during training is to store all data points then the prediction algorithm calculates the distance from new data-point (x) to all points in the data and finally assigns the instance to that class which has the shortest distance.
2. Methods
2.1 Study Type
Retrospective investigation of clinical electronic health records
2.2 Participants
The data was obtained from a publicly available database to researchers called MIMIC III. The was collected by many different intensive care units including surgical, trauma surgery, medical, coronary and cardiac surgery at Beth Israel Deaconess Medical Center (BIDMC), Boston, Massachusetts from 2001-2012. All patients’ records are completely anonymous, and all the data collected has received a prior Institutional Review Board (IRB) approval from BIDMC and Massachusetts Institute of Technology (MIT). The institutional IRB requirement was waived because the database is HIPAA-complaint deidentified and our study complies with the ethical standards set by the Helsinki declaration (1964) and its amendments [16]. This database contains clinical parameters from de-identified ICU admission patients to the above mentioned.
We selected all patients using the ICD-9 code for paralytic ileus (560.1) [30] from all and extracted all the data from the rest of the datasets including initial labs, demographic information, ICU stay data, microbiological reports, and prescription information. We excluded calculated the age using the date of birth and date of admission and remove all patients that were under the age of 18 for this analysis.
2.3 Testing methods
Initial labs for the first day of ICU admission were extracted from the data set using the MIMIC code available for public access repository on GitHub, which included hemoglobin, hematocrit, white blood cells, platelets, serum electrolytes (sodium, potassium, chloride, bicarbonate), blood urea nitrogen (BUN), creatinine, glucose, anion gap, lactate, bilirubin, albumin, prothrombin time (PT), partial-thromboplastin time (PTT) and international normalization ratio (INR). All of these labs were expressed as minimum and maximum values for each blood marker, we calculated the average and subdivided the values into either normal range and abnormal values. Reference ranges were used to calculate these values and can be seen in Table 2.
2.4 Analysis
We have developed the FLAIM (Fahad-Liaqat-Ahmed Intensive Model) a machine learning framework for the early mortality risk prediction of patients with the PI disease. Our proposed framework works in two phases (Figure 1).
This proposed hybrid model can be considered as a black-box that hybridizes the risk survival statistical methods with machine learning-based predictive models. The first stage/phase is significant as it improves the predictive capabilities of machine learning-based predictive models (phase II) as well as reduces the time complexity of the predictive models by reducing the number of risk factors that need to be processed by the machine learning models.
Phase I
For the initial data extraction, we used PostgreSQL (version 11.2, www.postgresql.org) and data analysis was done with IBM SPSS (version 24.0.0.0) [31] for demographic, univariate, multivariate and Kaplan Meier curves for outcome assessment. A p-value of <0.05 was considered statistically. Statistical analysis is carried out on all the variables, this includes univariate and multivariate (covariates with age, gender, and ethnicity) cox-regression analysis followed by Kaplan-Meier survival analysis (Figure 2).
The cox-regression analysis [32] gives us the hazard ratio for these risk factors which can be used in conjunction with the Kaplan-Meier analysis [33] to make a rank order list of these variables ranked from highest to lowest; two lists were generated full-risk-factors list (22 variables) and reduced-risk-factors list (only statistically significant results were used, 15 variables). The reduced set of risk factors that are obtained based on the statistical methods, they are supplied to different machine learning models for classification. The outcome of this phase can be viewed in Figure 3.
Phase II
In this phase, different machine learning models are applied to risk factors that have been found significant in Phase I, and higher weight can be given to the variables/risk factors based on hazard ratios as shown in Table 3. During the second phase, the data (risk factors) is partitioned in two parts i.e., training and testing datasets (Figure 4). The training dataset with risk factors or feature vectors denoted by X_train along with the label information denoted by Y_train are supplied to the machine learning models. The machine learning models analyze the distribution of the training data and tries to learn a hypothesis function i.e., a fitting function that would represent the behavior or distribution of the whole dataset. In the next step, the performance of the trained or learned machine learning model is tested by considering the testing dataset. During the testing process, only feature vector or risk factors denoted by X_test is supplied to the trained or learned machine learning model, the model predicts the label based on the learned knowledge or rules from the training data. The predicted labels are compared with the true labels of samples of the testing dataset and based on the comparison, the prediction accuracy is evaluated.
3. Results
3.1 Participants
We used the complete list of risk factors (full list) and a selected list of risk factors (reduced list) (Table 3.) to predict outcome in the test-set using Linear Discriminant Analysis, Gaussian Naive Bayes, Decision Trees, K-Nearest neighbor and support vector machines (Supp 1.) In order to evaluate the effectiveness of the proposed method, we used the publicly available MIMIC III dataset. A total of 1021 patients’ data were extracted from the original dataset of the MIMIC III database that were diagnosed with PI. All the patients were above the age of 18 years of age at their first admission. The median age of patients was 63.2 years, most of the people that were admitted to the ICU were women (63.8%) and mostly Caucasian (74%) from all the ethnic groups. These patients had a median stay of 14.5 ± 16.2 days of hospital stay and 4.2 ± 10.7 days of intensive care stay. Emergency admission comprised of most admission making 76.7% of all patients that were admitted to the ICU while most admissions came through referrals around 490 patients (48%). Mortality among these patients was 16.8% (173), while 441 (43.2%) were transferred out to different health facilities and 407 (40%) patients were discharged home. All of this can be seen in Table 1.
The univariate analysis showed that patients with abnormal blood levels for sodium, potassium, bicarbonate, BUN, creatinine, glucose, anion gap, lactate, bilirubin, platelets, PT, PTT, and INR had worse survival than those that had normal ranges of these blood related risk factors or biomarkers. The multivariate analysis was also done so that we normalize these findings for age, gender and ethnicity and all these markers were still significant (Table 2.)
3.2 Analysis Participants
Kaplan-Meier survival analysis of these blood markers revealed that at the first 24 hours of ICU admission labs show a relationship between better overall survival during these patient’s hospital stay. The largest difference between median patient survival was seen with abnormal vs normal anion gap (37 vs 129 days(d)) a difference of 92 days. Serum abnormal vs normal bicarbonate (55d vs 129d), PT (55 vs median not reached (MNR)), BUN (51d vs MNR), bilirubin (37d vs 67d), creatinine (57d vs 78d), PTT (54d vs 67d) and platelets (55d vs 66d) were also significant (Table 2, Figure 4 and Figure 5). Showing abnormalities of these markers lead to worse patient survival during their hospital stay in these patients.
These results were used to make an ordered list of risk factors that affect patient outcomes from most to least as described in table 3. We used previously established machine learning algorithms as shown in supplemental 1.
Subsequently, we used two third (2/3) i.e., n=684 samples for training the machine learning models while one third of the dataset (n=337) for testing purposes. The results for each machine learning model is presented in Table 3. Our assessment of the algorithm was based on testing set accuracy sensitivity, specificity and area under the ROC curve (AUC) to assess for the best algorithm. As in machine learning, the quality of output of a model is judged by AUC, hence, we utilize the AUC and accuracy as the primary evaluation metrics for selecting optimal model under the proposed FLAIM architecture. Our best algorithm SVM RBF on reduced risk factors yielded highest AUC of 81.38%, testing accuracy of 81.30%, training accuracy of 83.18%, sensitivity of 35.59%, a specificity of 91%. The second-best algorithm was LDA with AUC of 82.02%, testing accuracy of 70.32%, training accuracy of 69.59%, sensitivity of 83%, a specificity of 58% using the reduced set of risk factors.
5. Discussion
Paralytic ileus although a rare but a morbid and fatal condition [34] and almost 16.8% mortality to all patients that were admitted to the ICU in our patient population. For that reason, it becomes important to identify patients that may be at risk of dying during their ICU stay. Out dataset revealed that the majority of admitted patients are elderly individuals with a median age of 63 years [34–36]. In this cohort of patients, we observed labs which were performed in the first 24 hours to predict overall survival. The best predictors of patient survival were serum anion gap (37d vs 129d), bicarbonate (55d vs 129d), PT (55d vs MNR), BUN (51d vs MNR), bilirubin (37d vs 67d), creatinine (57d vs 78d), PTT (54d vs 67d) and platelets (55d vs 66d). While other labs like serum chloride, hematocrit, hemoglobin, white blood cells, and albumin were non-significant. The abnormal initial anion gap is associated with adverse outcomes in admitted to the ICU [37] which is also seen in the study as well. Similarly, abnormal potassium[38] bicarbonate [39], BUN [40], bilirubin [41], creatinine [42], platelets [43]. Abnormal coagulation studies in a patient with sepsis have shown worse outcomes in ICU patients. We considered patients with age > years along with their demographic information such as gender and ethnicity and prepared a final list of all risk factors and further ranked them according to their severity (Table 3).
In order to validate the effectiveness of the proposed hybrid model, we developed two types of experiments. In the first experiment, we utilized the conventional machine learning models such as LDA, GNB, DT, KNN, and SVM as baseline using full set of risk factors. In the second experiment, the same machine learning models were hybridized with the statistical models using the proposed FLAIM method (see Figure 1). The main objective of developing two different types of experimental settings was to validate the fact that the proposed hybrid method is capable of reducing the complexity of machine learning models as well as enhancing their predictive performance. The performance comparison or evaluation of the two experiments were carried out from two aspects i.e. using mortality prediction accuracy and ROC curve to yield AUC. Both the accuracy and AUC are well used metrics to evaluate a model performance. Therefore, these two-evaluation metrics were exploited for comparison of the proposed methods with baseline methods.
From Table 3, the improvement in AUC of conventional machine learning (baseline) models can be observed. The baseline models and their AUC have been reported in red and the AUC for the corresponding proposed or developed methods (under FLAIM architecture) have been reported in green color. It can be seen that the optimal performance is obtained with SVM RBF model as it yields accuracy rate of 81.30% and AUC of 81.38% which is better compared to the other baseline and proposed methods. Additionally, the constructed model has another benefit of lower complexity i.e., it uses reduced set of risk factors. This means if we provide risk factors (serum lactate, BUN, PT, anion gap, creatinine, INR, bilirubin, potassium, PTT, bicarbonate, platelets, sodium, glucose, Age more than 65 years and gender) the model will correctly predict ∼81% of the time if the patient will die during their hospital stay or not. Other models that showed potential similar results but were less accurate than the optimal model, were LDA (reduced risk factors), DT (reduced risk factors) and SVM (Linear) with full risk factors and reduced set of risk factors. To effectiveness of the proposed SVM (RBF) under FLAIM architecture is validated by observing the AUCs in Figure 4 of conventional SVM (RBF) and the proposed version of SVM (RBF). The AUC of conventional SVM RBF is 78.89% while the AUC of the proposed SVM RBF is 81.38%, hence, the improvement of 2.49% can be observed. Similarly, the ROC charts for each model (conventional and proposed) are depicted in Figure 3 (see supplementary section). For almost all the models, the improvement in AUCs can be observed due to application of the proposed FLAIM architecture. Hence, our proposed framework could potentially facilitate the critical care physicians in determining which patients are at the highest risk of mortality while their hospital stay and should be treated with more care.
Our future direction is to develop a more sophisticated model that incorporates more clinical parameters and has a higher AUC and accuracy of predicting clinical outcome for paralytic ileus patients admitted to the intensive care unit. Further experiments are required to evaluate the performance of the proposed framework with different age groups in different clinical situations.
6. Conclusion
FLAIM is a statistically robust machine learning framework which is developed for clinical use to predict mortality risk in PI patients. The mortality prediction was modelled based on clinically relevant labs or risk factors that were collected from the PI patients during their first 24-hours of stay in the intensive care unit. Experimental results evidently show that the proposed method reduces the complexity of conventional learning models by effectively reducing the number of risk factors and also improves the predictive performance and output quality of conventional machine learning models. From the simulation results, it can be safely concluded that the proposed method can also be expanded in the future to predict mortality and other risks associated with the ICU admitted patients for various diseases.
Data Availability
MIMIC III database version 1.4