Abstract
Background Predicting accurately the frequent Emergency Department (ED) visitors is critical for hospitals because they often consume significant ED resources, including staff time, equipment, and medical supplies. Furthermore, frequent ED visitors may contribute to increased wait times for all patients. Therefore, by accurately predicting and identifying these individuals, hospitals can help reduce the burden on the ED and decrease wait times for all patients, improving the overall quality of care.
Objective This study proposed an effective and adaptive ensemble learning prediction model to identify frequent visitors in the emergency department.
Methods This was a retrospective population-based study of patients and utilised medical and administrative databases at Canberra Hospital, a tertiary public hospital in ACT, Australia, between January 1997 and December 2022. The study focuses on a wide age range of the population with 20 viral chronic diseases. The definition of frequent ED use is considered as having at least three visits within a year. This study developed an Adaptive ensemble learning–based prediction model and compared the performance with 16 popular machine learning models. In addition, three techniques are compared to handle the imbalanced data issue, and we also proposed a hybrid feature selection composed of Elastic-Net and local search to find the best combination of features. In order to hyper-parameter tuning, two techniques were compared: a population-based evolutionary algorithm and a local search.
Results The study included 535,474 patient visits and 1.6 million episodes, with 25% overall frequent visitors. We compared the performance of the proposed prediction model with that of the other 16 popular classifiers. According to the prediction results, the proposed model considerably outperformed other models in terms of five metrics: accuracy, Recall, F1-score, Area under the ROC curve (AUC), and Log loss at 0.78 (95% CI 0.78-0.79), 0.68 (95% CI 0.68-0.68), 0.68 (95% CI 0.68-0.69), 0.69 (95% CI 0.69-0.70), and 7.4 (95% CI 7.2-7.5), respectively.
Conclusions We proposed an adaptive ensemble learning model combining XGBoost Elastic-net with local search and Differential evolution to address the imbalanced nature of the frequent ED visitors’ data. Our approach aimed to enhance the prediction capability of the classifier substantially. To tackle the class imbalance, we employed both under-sampling and adjusted weights for the positive class. Through extensive testing and evaluation, we demonstrated that these strategies effectively improved the model’s performance. Further, we emphasised the importance of employing a robust feature selection method and a fast hyper-parameter optimiser. These elements were essential for enhancing the identification of frequent ED visitors. By incorporating these techniques, our study contributes to developing more accurate and reliable models for predicting frequent ED visitors, thereby assisting hospitals in resource allocation and patient care management.
1. Introduction
The Emergency Department (ED) is an essential component of hospitals as it fulfils multiple crucial roles [1]. It serves as a primary point of contact for immediate and life-saving care, ensuring round-the-clock availability to address emergent healthcare needs. The ED utilises a triage system to prioritize patient needs, facilitating efficient and timely care delivery. Definitions may vary, yet individuals who visit the ED at least thrice a year are categorised as “frequent users” [2]. These frequent ED visitors demonstrate a diverse array of characteristics, encompassing mental health issues, physical comorbidities, and disadvantaged socioeconomic status [3]. The requirements of these individuals are often too intricate to be adequately catered to within the confines of the ED environment [4]. Many of these recurrent ED visitors grapple with several chronic ailments, such as coronary artery disease or chronic obstructive pulmonary disease, conditions that could potentially be better managed in a primary care setting, thus averting acute deteriorations that necessitate ED visits [5]. The use of the ED to address complex needs points towards a less-than-optimal scenario where these needs have not been satisfactorily addressed within the realm of primary care. This form of ED use is linked with unfavourable outcomes for patients, including heightened rates of hospital admissions and mortality [6]. Additionally, the costs associated with ED services tend to be elevated in comparison to those in primary care, thereby imposing a socioeconomic strain on the healthcare system [7]. As an illustration, a particular study conducted in the province of Quebec, Canada, [2] revealed that individuals classified as frequent ED users with chronic diseases constituted approximately 9.2% of the total ED user population. Surprisingly, these frequent users were regarded as a significantly higher proportion, around 28.8%, of all ED visits. Another research [8] endeavour focused on defining frequent users as those with four or more ED visits annually. The findings indicated that this subset of patients represented approximately 4.5% to 8% of the overall ED patient population. Strikingly, they contributed to a substantial 21% to 28% of all ED visits. These statistics highlight frequent users’ significant impact and disproportionate utilisation of ED resources, emphasising the need for targeted interventions and alternative care strategies to address their specific needs and reduce the strain on the ED system.
Accurately identifying high-rate visitors among normal visitors is a crucial step in mitigating unnecessary ED visits. Extensive research has been conducted to tackle this challenge using statistical models [9]. Logistic regression (LR) has emerged as a standard and widely employed statistical model for this purpose [10]. Although LR is known for its effectiveness in prediction tasks, it encounters limitations when applied to predict frequent ED visitors. Its linear nature assumes a linear relationship between predictor variables and the target, which may not hold true for this case study. The relationship between predictors, such as demographic factors and medical history, and the target of being a frequent visitor often exhibit complexity and nonlinearity [11]. LR may struggle to accurately capture and model these intricate nonlinear associations, potentially compromising its predictive performance in this context.
Traditional statistical models face limitations in identifying highly visited ED patients due to their reliance on predetermined rules centred around specific clinical predictors. On the other hand, machine learning (ML) prediction models present a more adaptable approach through the utilisation of nonparametric algorithms that can encompass a broad spectrum of intricate predictors while upholding robust predictive capacities. The ML-based techniques application brings about advantages such as learning error reduction, runtime and cost savings, as well as an enhancement in the quality of care services provided [12]. Prior studies have focused on applying ML models in predicting frequent visitors, with numerous research works disseminating their discoveries in this domain [13]. This exploration has showcased the potential of ML in revolutionising the identification and management of frequent ED patients, marking a significant leap in the realm of healthcare analytics.
When predicting frequent visitors in the Emergency Department (ED), neural networks have gained significant attention among the various machine learning (ML) models utilised. This popularity can be attributed to their remarkable ability to learn and model intricate patterns and relationships within both small and large-scale datasets. Neural networks (NNs) encompass a range of architectures, including feed-forward backpropagation (FFNN) [2], multilayer perceptron (MLP) [14], deep neural networks (DNN) [15] and hybrid deep learning models [16]. These architectures enable neural networks to effectively capture the complexity of the data, making them well-suited for tasks involving frequent ED visitor prediction. Despite their remarkable capabilities, NNs are subject to various constraints. A significant drawback is the lack of interpretability. Moreover, NNs demand substantial computational power and resources, resulting in elevated computational and resource demands. Overfitting presents another issue: NNs may excessively specialise in the training data, leading to subpar generalisation. Lastly, identifying optimal hyper-parameters for NNs can be intricate, necessitating meticulous adjustments.
Recently, a notable surge of interest has been in utilising tree-based and ensemble learning models to predict frequent ED visitors. The Random Forest (RF) [2, 17, 18] model is one popular choice in related studies and is known for its robust classification capabilities [19]. Another extensively employed model is XGB [20], which excels in handling small datasets with complex patterns. Among the commonly utilised models, XGB often exhibits superior predictive performance [21]. However, it is not without its limitations, such as the need for hyperparameter tuning. To strike a favourable balance, some studies have adopted ensemble models like the voting classifier [14], which combines multiple models with specific weighting ratios. This approach allows for a more comprehensive and nuanced prediction by leveraging the strengths of different models within the ensemble.
In this study, we build upon previous research by developing a series of binary classifiers using a dataset consisting of 535,474 patient visits spanning 25 years at the ED of Canberra Hospital in ACT, Australia. For each visit, we extracted 18 variables, including patient demographics, primary diagnosis, up to 20 secondary diagnoses coded in ICD-10, admission and release dates, triage category, and patient disposition. To predict frequent ED visitors, we establish a comprehensive comparative framework employing 16 machine learning models. These models encompass linear and logistic regression, various types of neural networks, popular ensemble learning models, and deep learning models, enabling us to capture the nonlinear relationships among the variables. Furthermore, we introduce an adaptive feature selection approach by combining Elastic-Net and a local search technique to optimise the performance of our feature set. To fine-tune the best-performing model, we implement an effective hyper-parameter optimisation method utilising Differential Evolution (DE). Through this research, we aim to provide insights into predicting frequent ED visitors and enhance the overall understanding of patient visit patterns in the healthcare system.
2. Study design and data sources
In this population-based retrospective cohort study, data was collected from the Emergency Department of Canberra Hospital, which serves as the largest and oldest public hospital in Canberra, Australia’s ACT province. The collected period spans 25 years, from January 1997 to December 2022. The dataset encompasses various components, including the patient demographic register, which provides information such as sex, date of birth, and place of residence. Additionally, features extracted from the hospital register include the primary diagnosis based on the International Classification of Diseases, tenth revision (ICD-9), as well as up to 20 secondary diagnoses coded using ICD-10. Other relevant details encompass admission and release dates, triage category, and patient disposition. For further insights into the specific features utilised in this study, Supplementary Table 1 can be referred to.
Two exclusion criteria were applied in this study, as depicted in Figure 1. The first criterion involved excluding patients with missing information, which accounted for a negligible rate of 0.004%. The second criterion entailed excluding patients who had more than three Emergency Department (ED) visits per year, which constituted a substantial portion of 89% of the total number of patients. By implementing these exclusion criteria, the study aimed to ensure data completeness and focus on individuals with a higher frequency of ED visits for a more comprehensive analysis.
As reported, the number of ED visitors at Canberra Hospital has increased by an average of 4% over the years (See Figure 2). However, a notable dip was observed in 2020, reflecting a substantial decrease in ED visitors due to the implementation of the COVID-19 lockdown situation. Based on predictive analysis, it is estimated that the monthly average of ED episodes will reach 11,000 in 2032 and subsequently rise to 13,000 in 2042. These projections indicate a steady upward trend in ED visits over the next two decades, with the exception of the temporary decline observed in 2020.
2.1 Characteristics of participants
Descriptive statistics for various populations, including whole ED visitors and frequent visitors (3+ visits per year), as well as information on triage category, patient location (state and suburb), and patient disposition, are presented in Table 1. The table highlights a significant finding: the largest proportion of non-visitors and frequent visitors is among individuals under 18 years of age, accounting for 29.15% and 23.78% of their respective populations. Furthermore, among frequent visitors, approximately half of the patients were discharged to their homes, indicating a common disposition. Additionally, it is notable that the triage category labelled as “green” exhibits the highest rate at 36.70% among the different triage categories, indicating a relatively lower level of urgency or severity in those cases.
A comprehensive statistical analysis was conducted on the monthly (Table 2) and weekday (Table 3) data of whole, non-, and frequent visitors at Canberra Hospital. The findings indicate that October and November emerge as the busiest months, accounting for 10.81% and 9.88% of the total high-frequency visitors, respectively. These peaks could be attributed to various factors, including the coinciding spring season in Australia, which brings about an allergy season, increased prevalence of respiratory infections, and a higher risk of seasonal asthma [22] exacerbations. Additional contributing factors may also play a role in the observed patterns. The distribution of non- and frequent ED visitors across the seven weekdays is presented in Table 3. The analysis reveals that Monday experiences the highest overload in the Canberra Hospital’s ED, accounting for 15.76% of the total visits. This finding suggests that Mondays exhibit a greater demand for emergency services compared to other weekdays.
In order to gain insights into the primary reasons for ED visits, we conducted an analysis of viral diagnoses among both non- and highly frequent visitors. Table 4 presents a compilation of the top 25 viral diagnoses, sorted in descending order. Notably, Chest pain (R07.4), Abdominal Pain (T79.4), and Pain in the abdomen (including colic) (R10.4) emerged as the most prevalent diagnoses, accounting for rates of 3.9%, 3.44%, and 3.44%, respectively. These findings highlight the significance of these conditions and underscore the need for appropriate management and attention to these specific health concerns.
A comprehensive analysis of all 1.6 million episodes in the ED at Canberra Hospital from 1997 to 2022, including frequent visitors with more than 3 and 5 visits per year, was conducted to identify the top 20 reasons for attending. Figure 3 showcases these findings, and remarkably, the most prevalent reason for seeking ED services was visitors who did not wait for medical attention, encompassing roughly 6% of the total episodes across all three groups. Subsequently, the second and third most common reasons were identified as Abdominal Pain and Chest Pain, respectively.
To gain a deeper understanding of the relationship between the age of frequent visitors and the 20 most common viral reasons for ED visits, we divided the patients into five age groups, as depicted in Figure 4(a). The analysis revealed distinct age patterns associated with viral reasons. Notably, patients aged ≤ 14 represented the highest proportion of ED visitors for Viral infection (B34.9), Asthma (J45.9), Head Injury (S09.9), Fever (R50.9), and Upper respiratory tract infection (J06.9), accounting for 60%, 59%, 49%, 58%, and 70% of the total episodes, respectively. Conversely, senior visitors (age ≥ 64 years) exhibited a different pattern, with Pneumonia (J18.9), Syncope (R55), Urinary tract infection (N39.0), and Constipation (K59.0) being the prominent viral reasons. These conditions were observed at rates of 50%, 50%, 39%, and 35%, respectively, among the other population. Having insights into the age-specific trends in viral reasons for ED visits can be helpful for healthcare professionals to tailor their approaches to meet the unique needs of different age groups. The seasonality analysis of the most common disorders among highly-frequent visitors is presented in Figure 4(b). The findings indicate that, apart from viral infection, pneumonia, and respiratory tract infection, which exhibit a notable pattern of being more prevalent during winter, no specific seasonal trend was observed for the other disorders. These three conditions demonstrate a higher frequency during the winter months, potentially influenced by factors such as increased viral circulation, colder temperatures, and reduced immunity.
In Figure 5, a noteworthy pattern is evident concerning the gender distribution of frequent visitors in the 20 most prevalent reasons for ED visits. Specifically, there are significant gender differences in the contribution percentages for certain conditions. Among the identified conditions, it is observed that females account for a higher proportion in several categories. For instance, in Urinary tract infections, females contribute to ED visits at a rate of 137% compared to males. Similarly, for Abdominal pain, Headache, Suicidal ideation, Nausea and vomiting, and Low back pain, the female contribution percentages are 108.71%, 74.68%, 65.11%, 56.92%, and 24.77%, respectively. On the other hand, there is a noticeable trend of increased representation among males for specific disorders when compared to females. Conditions such as dressing changes, Head injuries, Psychotic episodes, Fever, and Pneumonia exhibit higher contribution rates among males, with percentages of 51.77%, 40.02%, 41.63%, 25.31%, and 34.66%, respectively.
3. Methods
3.1 Ensemble Machine learning models
In opposition to conventional data-centric modelling strategies that entail fitting a single predictive model, ensemble-based models amalgamate the forecasts of numerous base estimators to amplify resilience [23]. The ensemble learning techniques at the core encompass bagging [24], boosting [25], and stacking [26]. Bagging entails the random selection, with replacement, of multiple subsets of training data to train an identical model. The training of these models operates independently of one another and can be executed concurrently. The ultimate forecast is the mean of predictions generated by all models. Bagging enhances generalisation and mitigates overfitting. In contradistinction to bagging, the boosting methodology involves the sequential training of models, where each fresh model strives to refine predictions by concentrating on the feeble forecasts of preceding models within the ensemble. Boosting stands out as one of the most potent learning methodologies unveiled in the realm of machine learning [27]. Stacking emerges as a more intricate ensemble technique. Instead of directly averaging or sequentially amalgamating base model predictions, akin to bagging and boosting, stacking harnesses these individual base models’ predictions as input for a superior-level model. This meta-model is schooled on these varied base model predictions to craft precise final predictions. The essence of stacking lies in its ability to leverage the diversity among the base models to enhance the overall predictive accuracy.
3.2 Gradient Boosting Machine (GBM)
This innovative method follows a forward-thinking, gradual procedure where new cutting-edge predictive models are integrated into the group one by one. The process typically relies on regression trees (RTs) as the basic weak learners. These RTs are generated sequentially, aiming to capture any prediction mistakes left by the preceding RTs. The Gradient Boosting Machine [28] (GBM) begins by creating a differentiable loss function, often using the squared error as the loss function, which can be symbolized as a mathematical expression. The GBM then iterates through the ensemble, emphasising correcting the errors made by the previous models by adjusting the weights of each model based on the loss function’s gradient. This iterative process continues until a specified stopping criterion is met, producing a powerful ensemble model that combines the strengths of individual weak learners to make accurate predictions. Within the domain of mathematical symbolism, the symbols y and F carry a unique and profound meaning, representing the actual output results and the forecasts produced by the less powerful learner. One search through the commencement phase involves reflecting on a fixed forecast value attributed to each individual data point, a particular computation that is executed in a subsequent manner: Suppose the decision is made to utilise Equation 1 as the designated loss function; the effects of Equation 2 will materialise in the form of a direct calculation representing the mean of the actual results associated with the training dataset. Following this, the subsequent steps entail performing calculations to determine pseudo-residuals for every single data point, a process that is executed by following a specific approach and methodology. This systematic approach involves analysing each data point to accurately compute the residual values, which is crucial in further refining the model’s performance and optimising the training process. Delving deeper into the realm of advanced algorithms, the powerful and sophisticated GBM embarks on the challenging task of training a base learner, which can take the form of a tree or any other type of model, and then adjusts it in accordance with the pseudo-residuals meticulously crafted during the process. This particular base learner is symbolically represented as Fm(x), with the training dataset laying the essential groundwork for this intricate journey towards optimisation. As the journey progresses to the next phase, the GBM meticulously computes the multiplier ρm by carefully addressing the optimisation problem that is elegantly presented in a structured manner. The interplay between the base learners and the pseudo-residuals within the GBM framework showcases a harmonious blend of mathematical precision and algorithmic finesse, resulting in a dynamic and adaptive learning process that continually refines the model’s predictive capabilities. Each step in the GBM’s iterative process unfolds like a careful control, where each base learner contributes unique insights while guided by the significant objective of minimising the loss function. Through this intricate interplay of components and calculations, the GBM methodically enhances the model’s predictive power, unlocking new possibilities and insights in the realm of machine learning that is presented in the following manner: therefore, the model is updated as below:
3.3. Extreme Gradient Boosting (XGBoost)
This innovative technique significantly improves the Gradient Boosting Machine (GBM) by incorporating novel regularization methods that effectively reduce the chances of overfitting and complex modeling [29]. The main objective of this advanced approach, as demonstrated in Equation 6, is to balance the training loss and model complexity, resulting in the development of streamlined prediction models with exceptional predictive accuracy. In this function, the initial portion accurately calculates the loss function (denoted as l(yi, F(xi))), effectively measuring the difference between the actual and predicted values of the output variable. Furthermore, the regularization penalty term (Ω(fi)) plays a crucial role in promoting a more simplified model. It is worth mentioning that this penalty term often incorporates a combination of L1 and L2 regularization techniques, as illustrated below: Within this mathematical expression, it is imperative to acknowledge the presence of three distinct regularisation parameters denoted as γ, α, and λ. The parameter γ, when multiplied by the number of leaf nodes represented by T, plays a pivotal role in regulating the complexity of the model. Moreover, both α and λ act as penalty coefficients pertaining to the L1 and L2 regularisation terms, respectively. These coefficients hold significant importance in attaining optimal regularisation outcomes, further amplifying the model’s simplicity and resilience.
The XGBoost algorithm possesses a remarkable advantage due to its capability to effectively manage a variety of data preprocessing tasks, which encompass activities like feature engineering, data normalization, and feature scaling. Consequently, this ability diminishes the necessity for extensive manual preprocessing, consequently saving valuable time and effort throughout the feature engineering phase as indicated by [30]. Furthermore, XGBoost integrates built-in functionalities designed to address missing values within the dataset, thus obviating the requirement for imputation techniques.
Simultaneously, XGBoost garners acclaim for its rapid processing speed and scalability, rendering it highly suitable for efficiently handling extensive datasets. It outshines numerous other machine learning algorithms in terms of both processing speed and predictive accuracy. Another notable attribute of XGBoost is its capacity to avert overfitting, thereby ensuring the generalization of unseen data instances. Nevertheless, XGBoost does confront certain constraints. One such limitation pertains to the substantial number of hyper-parameters it encompasses, which can pose challenges during the parameter tuning process, as highlighted in the study by [31]. Discovering the optimum combination of hyper-parameter values necessitates meticulous experimentation and can prove to be a time-intensive endeavour.
3.4. Proposed Hybrid Boosting Prediction Model
We devised an adaptive ensemble learning model to enhance the accuracy and reliability of predicting frequent ED visitors. The model incorporates an XGBoost algorithm as the foundation, augmented with a feature selection method that combines Elastic-Net and Local search techniques. To optimise the performance of XGBoost, we implemented a competitive approach using Differential Evolution (DE) and Nelder-Mead optimisation algorithms for hyper-parameter tuning. Besides, we evaluated three strategies for handling imbalanced data: under-sampling, oversampling (SMOTE), and adjusting class weights. By employing this comprehensive approach, we aimed to develop a robust prediction model that effectively identifies frequent ED visitors with improved precision and generalisability. Various components of the proposed prediction model are described as follows.
3.4.1. Solutions for addressing imbalanced data
During our investigation, we encountered the challenge of imbalanced data, where the class distribution within the dataset exhibited a substantial disparity. This issue [32] arises when one class is represented by only a limited number of samples, referred to as the minority class. At the same time, the remaining samples belong to the other class, known as the majority class. Imbalanced data classification poses a predicament for classifiers as they tend to exhibit a bias towards the majority class, leading to imbalanced performance. This bias manifests in solutions favouring accuracy for the majority class but resulting in poor accuracy for the minority class. To address this issue, we explored and compared three techniques: over-sampling (employing synthetic samples through methods like SMOTE [33]), under-sampling [34] (randomly selecting a subset of samples from the majority class to achieve class balance while disregarding the remaining samples), and adjusting class weights [35] (assigning higher weights to samples from the minority class). Through our testing and analysis, we assessed the effectiveness of these techniques in mitigating the challenges posed by imbalanced data.
3.4.2. Adaptive Elastic Net feature selection
With regard to selecting the most compelling features, we proposed a hybrid feature selection technique consisting of Elastic-net [36] combined with a fast local search called hill-climbing (HC) [37]. The role of HC is tuning the control parameters of Elastic-net during the feature selection to improve the performance of the most frequent visitor prediction model.
Elastic Net (as proposed by Zou and Hastie in 2005 [36]) is a sophisticated linear regression approach that cleverly marries the L1 regularisation technique of Lasso with the L2 regularisation method of Ridge regression. This unique fusion of regularisation methods allows Elastic Net to offer a balanced solution that benefits from the strengths of both Lasso and Ridge regression. The L1 regularization component in Elastic Net plays a crucial role in promoting sparsity within the solutions it generates, making it a valuable tool for feature selection as it can effectively drive numerous coefficients to zero. On the other hand, the L2 regularisation element in Elastic Net serves to combat overfitting by gently pulling the coefficients closer to zero, thus enhancing the model’s generalisation capabilities. Moving on to the scenario where P represents the count of predictors denoted by x1, x2, …, xP, the predicted value of the response variable Y can be expressed as Y′ = ω0 + ω1 x1 + ω2 x2 + … + ωP xP in line with the fundamental definition of linear regression. To determine the coefficients vector ω′ = [ω0, ω1, …, ωP]T, the objective is to minimise the sum of squared errors of residuals (SSE), which is represented as S S E = ||Y− Xω′||. In cases where the number of observations surpasses the dimension of features, the optimisation process involves minimising the loss function L to compute the coefficients in the following manner effectively: The penalty amount in Elastic Net is determined by adjusting weights and the parameter ρ. Fine-tuning both β and ρ is crucial for optimal results. Elastic Net has two special cases: ρ = 1 transforms it into LASSO, creating a sparse model with zero coefficients. ρ = 0 makes Elastic Net similar to ridge regression, allowing correlated predictors without restriction on the number of selected predictors.
Although Elastic Net’s capability shines in handling high-dimensional data, conducting feature selection, and managing correlated features [38], fine-tuning its control parameters presents a tough challenge. To address this issue, the current research introduces an efficient local search strategy designed to adjust the parameters of Elastic Net, encompassing L1-ratio, alpha, and selection methods (‘cyclic’, ‘random’). This optimisation technique adopts a heuristic search approach known as Hill climbing, which focuses on enhancing the Elastic Net’s configuration by incrementally refining it based on neighbouring solutions. The iterative nature of this method ensures that the Elastic Net gradually converges towards the most optimal configuration, striving to achieve the best possible solution. This iterative process persists until a local optimum is attained, signifying that no further enhancements can be achieved by transitioning to neighbouring solutions. The detailed steps of this adaptive Elastic Net approach are meticulously outlined in Algorithm 1, providing a comprehensive guide for implementing this sophisticated optimisation method.
3.4.3. Hyper-parameters tuning
Tuning hyper-parameters in machine learning is of utmost importance due to its direct influence on the model’s performance and generalisation capabilities. The impact of hyper-parameter tuning is profound and multifaceted, as it directly affects how well a model can learn from data and make predictions. Indeed, this optimisation process aims to enhance the model’s accuracy, reduce error rates, and improve other relevant metrics, ultimately leading to better predictive performance. The ability to properly tune hyper-parameters enables the model to effectively capture the underlying patterns in the data, thereby enhancing its overall efficacy. Moreover, hyper-parameter tuning ensures that a machine learning model can generalise well to unseen data. Generalisation is critical to model performance, as it determines how well a model can predict new, unseen instances. Through hyper-parameter tuning, one can strike the right balance between under-fitting and overfitting.
In this study, we employed an effective optimisation method, Differential Evolution [39] (DE) and Nelder-mead [40] (NM), to fine-tune four hyper-parameters of the XGBoost model, which exhibited the best performance. The hyperparameters under consideration were the tree depth, number of estimators, learning rate, and sub-sample rate. We utilised both accuracy and AUC as metrics to adjust the XGBoost parameters effectively.
DE is a groundbreaking concept in evolutionary algorithms that incorporates differential vectors into a triangle search and has emerged as a highly favoured population-based optimisation technique, widely adopted for tackling a diverse array of real problems characterised by noise, dynamics, and multimodality. Among the various components of DE, the mutation operator stands out as a crucial element driving the algorithm’s success. Several mutation operators have been proposed within the realm of DE, each offering distinct convergence rates and exploration capabilities. One of the well-known mutation strategies is DE/best/1/bin, renowned for its ability to converge swiftly, particularly in handling unimodal problems. However, its efficacy diminishes when faced with challenges posed by local optima, often culminating in premature convergence while navigating through multi-modal problem landscapes. Equation (8) mathematically defines this mutation scheme, encapsulating its operational essence within DE. where is a vector of differential of three solutions, including , and . In the meantime, ζ is the mutation factor which can adjust the step size and speed of exploration ability. Moving on to another evolutionary operator within DE, we encounter the pivotal role of crossover. The predominant form of crossover method employed is the binomial approach, characterised by a formulation that involves the trial vector, denoted as , and the crossover rate, symbolised as Cr, ranging from zero to one as follows. During the crossover process, a specific number of candidates, represented by sn, are selected. This culminates in the creation of a novel solution through a reasonable combination of the parent and offspring elements, perpetuating the evolutionary cycle of optimisation as follows.
4. Experimental results and discussions
In this section, to identify the most effective classification model for predicting frequent ED visitors, we conducted a comparative analysis of 16 popular machine-learning algorithms. These models encompassed logistic regression (LR), Stochastic gradient descent (SGD), k-nearest neighbours (KNN), three neural networks (MLP, Perceptron, and DNN), Decision trees (DT), Extra trees [41] (ET), Random forests [42] (RF), and six ensemble boosting models (XGBoost [29], AdaBoost [43], CatBoost [44], GBM [28], LightGBM [45], and HGBM [46]).
The prediction results of these models are detailed in Section 4.1. Subsequently, we explored the impact of two optimisation techniques, DE and Nelder-Mead, on tuning the hyper-parameters of the best-performing classifier, XG-Boost. The landscape analysis and AUC convergence results are provided in Section 4.3. Furthermore, we extensively discussed the performance of the proposed feature selection method in Section 4.4. Lastly, we presented a comparative evaluation of three approaches for addressing imbalanced data in Section 4.1. These comprehensive analyses and comparisons shed light on the most accurate and robust techniques for predicting frequent ED visitors, contributing to developing an effective classification model.
4.1 Fundamental prediction results
The performance of nine ensemble learning models and seven ML methods was assessed, and the statistical results are presented in Figure 6. Due to utilising 10-fold cross-validation for classifier validation, each box plot represents the minimum, maximum, and median metrics, with outliers indicated by a plus sign. Analysis of Figure 6(a), (b), (c), and (d) reveals that XGBoost consistently outperforms other models, except for precision, where LightGBM and GBM exhibit superior performance. However, all nine ensemble models demonstrate competitive performance, with accuracy ranging from 76.6% to 78.2%. On average, ensemble models outperform other ML models, including LR, NNs, and KNN. Supplementary Table A.4 provides the formulation of evaluation metrics employed in this study. Table 5 and Supplementary Table A.5 present detailed classification results for eight ensemble and eight ML models, respectively, with six evaluation metrics. Regarding accuracy, XGBoost is the top-performing model, with an average of 78%. Notably, LightGBM demonstrates high precision at 85.7% and outperforms other models in this regard.
We employed the Friedman test, a ranked-based statistical test, to determine the best-performing classifier among the 16 models utilised. This test extends the Wilcoxon signed-rank test and is the nonparametric equivalent of one-way repeated measures. We calculated the rank of each model based on all evaluation metrics and presented the results of the Friedman test in Table 6. Particularly, XGBoost achieved the highest rank compared to other models in terms of accuracy, F1 score, and log loss. While our models have demonstrated high accuracy and precision, the AUC metric presents an area for improvement, as noted in Table 5. The AUC reflects the trade-off between the true positive rate and the false positive rate, and achieving a higher AUC value is desirable. To address this concern and enhance the AUC performance, we have explored three methods, which are discussed in the subsequent section. These methods aim to rectify the issue and optimise the balance between true positive and false positive rates, ultimately improving the AUC metric.
4.2. The results of handling imbalanced data
To address the challenge of imbalanced data and improve the classifier’s AUC performance, we conducted comparative experiments using three methods: over-sampling (SMOTE), under-sampling, and adjusting class weights. Initially, we identified the best-performing model among the 16 ML models: XGBoost. Subsequently, we examined the impact of different weights assigned to the minority (positive) class, ranging from two to five. Figure 7 illustrates the performance of XGBoost with weighted class in terms of AUC and accuracy. Interestingly, we observed a direct relationship between increasing the class weight and AUC. However, there was a decrease in accuracy when raising the class weight from 80% to 67%. Therefore, it is crucial to identify an optimal weight that strikes a balance between AUC and accuracy. In this particular case study, the optimal weight was determined to be 3, representing a suitable trade-off between AUC performance and accuracy.
Figure 8 presents a comparison of three imbalanced data handling techniques, along with the original XGBoost model, using four evaluation metrics. When considering AUC, the methods of adjusting class weight and undersampling emerge as the best performers, achieving rates of 68% and 67%, respectively. Notably, these two methods have substantially improved the recall rate from 21% to 55% and 68%. Enhancing recall is of paramount importance as it enables the model to more accurately identify positive instances, thereby reducing the occurrence of false negatives. This is particularly significant in domains where the repercussions of missing positive instances, such as in medical diagnoses or fraud detection, can be severe. A higher recall rate signifies a reduced number of missed positive instances, resulting in heightened sensitivity and a more reliable identification of true positives.
4.3. Hyper-parameters tuning
Figure 9 illustrates the landscape of hyper-parameter tuning for XGBoost. The results indicate that increasing the depth of trees enhances accuracy; however, for achieving a high AUC, the optimal range appears to be around 10. Moreover, we observed that a larger number of estimators yielded better performance. Interestingly, utilising the entire training dataset (subsample=1.0) for training the trees demonstrated the highest accuracy and AUC. Finally, based on Figure 9 (d), the optimal learning rate value was found to be approximately 0.4.
In order to demonstrate the convergence speed of both DE and Nelder-Mead in optimising the hyper-parameters of XGBoost, we conducted ten independent runs for each optimiser with randomly initialised parameters. Figure 10 indicates the convergence pattern of the two optimisers. Nelder-Mead exhibits rapid initial convergence within the first few evaluations (less than 200); however, it becomes trapped in a local optimum and fails to improve the AUC further. Conversely, although DE has a slower convergence rate compared to Nelder-Mead, it maintains an upward trend throughout the optimisation process and ultimately surpasses Nelder-Mead’s performance. Therefore, if the training runtime is not a constraint, we recommend utilising DE for hyper-parameters tuning as it demonstrates better overall performance. Conversely, if runtime is a crucial factor, Nelder-Mead can be a more suitable choice for fine-tuning the hyper-parameters.
4.4. Feature selection results
The effectiveness of the proposed adaptive method for optimal feature selection, as outlined in Section 3.4.2, is demonstrated in Figure 11. The graph showcases the efficiency achieved when utilising the optimal subset of features. It is observed that increasing the number of optimal features from five to seven leads to improvements in both accuracy and AUC, with gains of approximately 6% and 4%, respectively. However, beyond seven features, both accuracy and AUC experienced a decline. Consequently, the optimal number of features for this particular scenario is seven. These seven features comprise Age, Sex, ICD10, Disposition, Year (arrival time in ED), State, and Suburb, which collectively offer the best performance in terms of accuracy and AUC for the given dataset and classification task.
5. Conclusions
Frequent ED visitors are a significant concern within primary and emergency care settings. Machine learning models have gained popularity in medicine and healthcare and are being increasingly utilised. ML models continuously advance, present novel possibilities, and witness notable theoretical advancements. These advancements have led to improved accuracy and enhanced reliability of ML models.
In this research, an Adaptive ensemble learning–based prediction model was devised and evaluated against 16 prominent machine learning models using a large-scale dataset collected from Canberra Hospital, a tertiary public hospital in ACT, Australia, between 1997 and 2022 with 1.6 million records of patients’ episodes. Also, three ways of adjusting were examined to deal with the challenge of imbalanced data. At the same time, a combination of two feature selection approaches incorporating Elastic-Net and local search was proposed to identify the best feature combination. Hyper-parameter tuning was managed and done using a population-based set of computer instructions and a local search. The performance of these ways of doing things was compared to determine their effectiveness in optimising the model’s abilities.
The experimental modelling results revealed that the proposed model considerably outperformed other models in terms of five metrics: accuracy, Recall, F1-score, Area under the ROC curve (AUC), and Log loss at 0.78 (95% CI 0.78-0.79), 0.68 (95% CI 0.68-0.68), 0.68 (95% CI 0.68-0.69), 0.68 (95% CI 0.69-0.70), and 7.4 (95% CI 7.2-7.5), respectively.
Our future research aims to explore the potential impact of additional features on enhancing prediction results. Specifically, we plan to investigate the inclusion of weather parameters and pollen count levels, as these factors have been found to influence various health conditions. Furthermore, we aim to expand our analysis to encompass a higher rate of ED visits, such as ten visits per year, to gain a deeper understanding of the predictive capabilities of our model. Additionally, we intend to explore time-series forecasting machine learning models as part of our methodology. By incorporating these advancements, we anticipate further improving the accuracy and reliability of our predictions in the context of frequent ED visits.
Data Availability
All data produced in the present study are available upon reasonable request to Michael Phipps (michael.phipps{at}act.gov.au), the esteemed Senior Director of the Canberra Health Service.
Ethics approval and consent to participate
Data and Code availability
For information regarding the availability of data used in this study, interested individuals can obtain access by contacting the corresponding author, Michael Phipps (michael.phipps{at}act.gov.au). Python 3.9.16 served as the programming language for implementing this study. Various open-source Python libraries were utilized, including Scikit-learn (0.23.2) 1, NumPy (1.19.1), Pandas (1.1.0), SciPy (1.5.2), and TensorFlow (2.3.0). Scikit-learn played a central role in supplying a wide range of machine learning models employed in this research. Furthermore, ensemble models such as XGBoost 2, LightGBM 3, and CatBoost4, accessible open-source platforms were utilized. TensorFlow 5, an open-source machine learning framework developed by Google, was specifically employed for implementing deep neural networks (DNN) and convolutional deep neural networks (CDNN). In order to tune the hyper-parameters, we applied a well-known optimisation method, Differential Evolution (DE)6, from the Scipy 7 library. By combining these libraries and frameworks, a comprehensive array of tools and algorithms were utilized to develop and analyze machine learning models effectively.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethics approval
This study has received ethical approval from The ACT Health Human Research Ethics Committee (Approval No. 2024/ETH01037).
Data availability
The dataset regarding ED Presentations used in this research is owned by Canberra Health Services and the ACT Health Directorate. Access to data for our study relies on obtaining anonymised datasets, a task made possible through collaboration with Michael Phipps (michael.phipps{at}act.gov.au), the esteemed Senior Director of the Canberra Health Service.
Appendix A. Supplementary materials
Footnotes
mehdi.neshat{at}ieee.org, michael.phipps{at}act.gov.au
nikhil.jha{at}act.gov.au
chris.browne{at}anu.edu.au
Walter.P.Abhayaratna{at}act.gov.au
Walter.P.Abhayaratna{at}anu.edu.au
↵6 https://github.com/scipy/scipy/blob/main/scipy/optimize/differentialevolution.py