Leveraging patients’ longitudinal data to improve the Hospital One-year Mortality Risk ======================================================================================== * Hakima Laribi * Nicolas Raymond * Ryeyan Taseen * Dan Poenaru * Martin Vallières ## Abstract **Purpose** Predicting medium-term survival after admission is necessary for identifying end-of-life patients who may benefit from goals of care (GOC) discussions. Considering that several patients have multiple hospital admissions, this study leverages patients’ longitudinal data and information collected routinely at admission to predict the Hospital One-year Mortality Risk. **Methods** We propose an Ensemble Long Short-term Memory neural network (ELSTM) to predict one-year mortality using patients’ longitudinal records. The model was evaluated: (i) with only predictors reported upon admission (AdmDemo); and (ii) also with diagnoses available later during patients’ stay (AdmDemoDx). Using records of 123,646 patients with 250,812 hospitalizations from 2011-2021, our dataset was split into a learning set (2011-2017) to compare models with and without longitudinal information using nested cross-validation, and a holdout set (2017-2021) to assess clinical utility towards GOC discussions. **Results** The ELSTM achieved a significant increase in predictive performance using longitudinal information (*p*-value < 0.05) for both the AdmDemo and AdmDemoDx predictors. For randomly selected hospitalizations in the holdout set, the ELSTM showed: (i) AUROCs of 0.83 (AdmDemo) and 0.87 (AdmDemoDx); and (ii) superior decision-making properties, notably with an increase in precision from 0.25 for the standard process to 0.28 (AdmDemo) and 0.36 (AdmDemoDx). Feature importance analysis confirmed that the utility of the longitudinal information increases with the number of patient hospitalizations. **Conclusion** Integrating patients’ longitudinal data provides better insights into the severity of illness and the overall patient condition, in particular when limited information is available during their stay. Keywords * Machine learning * Long Short-Term Memory neural network * Longitudinal data, Mortality risk * Administrative data ## 1 Introduction Estimating the life expectancy of patients helps identifying high-risk individuals and improve the quality of care they receive in hospital settings [1–3]. Unlike patients with cancer who receive palliative care in their final months of life, patients with other less predictable conditions are only referred for these services in their final weeks or days, if at all [4]. In Canada, despite common individual preference for most individuals to die in community and other home-like settings [5], 58% of those who died in 2015 were hospitalized more than once in their last year of life, and 61% died in hospital [6]. An early identification of these high-risk patients would allow important discussions with healthcare providers regarding end-of-life choices, to align their preferences with the care they receive [7]. Such discussions would enable goals-of-care (GOC) documentation, including code status orders (CSOs) clarifying essential preferences for life-supporting therapy [8, 9]. Early identification would also facilitate communication between clinicians and families regarding patients’ life trajectories, ensuring informed shared decision-making [10] and potentially reduce depression and grief [11]. However, a clear and timely prognostication of high-risk patients in hospital settings is time-consuming and therefore challenging for workload-burdened clinicians [12]. An accurate automated tool not requiring human involvement could initially flag these patients, lightening the work burden of the clinical team. Several studies have investigated the ability of data available in Electronic Health Records (EHRs) to predict the mortality risk of patients, potentially driving an automated clinical deci-sion support system. *van Walraven et al* [13, 14] introduced the Hospital One-year Mortality Risk (HOMR) score, representing the probability of death within one year of patient’s admission. The original model consisted of a logistic regression using post-discharge administrative data routinely collected upon admission, evaluated using Area Under the Receiver Operating Characteristic curve (AUROC). Their goal was to flag high-risk individuals and initiate end-of-life discussions with them to decide in favor or against potentially aggressive and invasive interventions. To operate in real-time settings, subsequent studies modified the HOMR score according to the availability of data in each hospital, and included only variables available immediately when patients were admitted [15, 16]. As a result, due to specific EHRs constraints, diagnostic codes were omitted from the predictors. More recently, *Taseen and Ethier* [9] explored the clinical utility of models predicting the HOMR score, in which they developed three random forest models based on variable sets available at different times during a patient’s admission. The authors compared the discriminative power of such models with previously established linear regression models and evaluated their clinical utility within their hospital setting. Nevertheless, these studies did not include valuable longitudinal information present in patients’ records, as they focused on single visits and did not take into account the patient’s history from previous hospital admissions. This approach diverges from the clinical reality, where clinicians consistently consider the patient history before making a prognostic prediction for a given condition. Another approach proposed in previous work has been to incorporate broader covariates (e.g., medical disease codes, clinicians’ notes, social history) and aggregate patient information within and across admissions to predict their mortality risk in order to refer them for end-of-life care [17, 18]. However, these studies did not explicitly quantify the impact of integrating patient history in developing more accurate solutions. Moreover, these proposed models are more challenging in terms of data acquisition and are therefore less likely to be deployed in a clinical decision support system — unlike HOMR-based models that have already been clinically deployed [16] or are in the process of deployment [9]. In this work, we evaluated the benefits of integrating patients’ longitudinal data to improve the accuracy of the HOMR score. We built on the work of *Taseen and Ethier* [9] by reanalyzing the same data routinely collected during patients’ admissions while also integrating additional recent visits. To assess the benefits of a temporal EHR analysis, we developed and compared a Long Short-Term Memory-based ensemble model (ELSTM) that leverages patients’ longitudinal data, to baseline models that consider patients’ visits independently without including previous visits. Fig. 1 shows an overview of our study. We further analyzed the predictive power of our model in two different scenarios with different requirements of data access: (i) including only demographics and admission characteristics available on patient’s admission (“AdmDemo”), and (ii) adding also admission diagnoses and comoridity diagnoses available during patient’s hospitalization (“AdmDemoDx”). In an effort to better inform about the clinical utility of such models, we quantified the gains and losses of our ELSTM in terms of true and false positives as compared to standard human decision-making. ![Fig. 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/26/2024.06.21.24309191/F1.medium.gif) [Fig. 1.](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/F1) Fig. 1. Study overview. (a) The ELSTM averages predictions of multiple LSTMs trained using different cohorts of the same patients. Each cohort includes the patient’s history up to a specific visit. (b) Baseline models consider patients visits independently ## 2 Materials and methods ### 2.1 Dataset This retrospective study took place at an integrated university hospital network with 2 sites and 700 acute care beds in Sherbrooke, Quebec, Canada. Data was obtained from the institutional data warehouse, combining EHR and administrative information. The cohort included all adult patients admitted to a non-psychiatric service between July 1, 2011 and June 30, 2021, excluding admissions to infrequently admitting services (such as genetics) or admissions with a legal context (i.e. court-ordered). Mortality status was also extracted from the institutional data warehouse, which was sourced from the Quebec vital statistics registry. Institutional Review Board approval was obtained prior to data acquisition (Institutional Review Board of the CIUSSS de l’Estrie—CHUS Nagano #2022-4409). We followed the data extraction steps previously described by *Taseen and Ethier* [9] since we used the same source of data. Table 1 lists the predictors used for model comparisons. Comorbidity diagnoses from prior visits became accessible in the information system 6 months following a given visit, or only 2 weeks later for emergency department encounters. View this table: [Table 1:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T1) Table 1: Covariates included in all predictive models as described in *Taseen and Ethier* [9]. AdmDemo predictors include only demographics and admission characteristics while AdmDemoDx predictors include demographics, admission characteristics and comorbidity and admission diagnoses. Given the potential variations in data availability on admission across different hospital information systems, we explored the feasibility of early identification of high-risk patients in several scenarios. We evaluated two strategies with different data requirements: (i) “AdmDemo”, including only demographics and admission characteristics and, (ii) “AdmDemoDx” including demographics, admission characteristics, comorbidity diagnoses and admission diagnoses. ### 2.2 Ensemble Long Short-Term Memory neural network (ELSTM) To evaluate the impact of incorporating a patient’s longitudinal health record for improving the HOMR score, we introduce an Ensemble Long Short-Term Memory neural network (ELSTM) that leverages information learned by multiple LSTMs trained at different stages of patients’ admissions to hospital (Fig. 1a). We base our ensemble model on an LSTM architecture [19] since recurrent neural networks can handle sequences of different lengths without extra padding. This is particularly relevant in our case where patients can have varying numbers of previous visits. More formally, we define *C**k* as the temporal cohort including the visits sequence of each patient up to their *k**th* visit; if a patient has less than *k* visits, *C**k* includes all their visits. *C*last denotes the cohort including the visits sequence of each patient up to their last visit available in our dataset. The formal definition of *C**k* is given by: ![Formula][1] where *N* ∈ ℕ is the number of patients, *M**i* ∈ ℕ the number of visits for the *i**th* patient and ![Graphic][2] the *j**th* visit of the *i**th* patient. During the training phase, we train multiple LSTMs on temporal cohorts including patients with varying numbers of visits. The goal is to capture diverse information at different stages of patients’ visit sequence. Each LSTMk is trained using the temporal cohort *C**k* to aggregate a patient’s visit sequence and estimate their mortality risk at their last visit available in *C**k*, with *k* ∈ {1, …, K} ∪ {last}. The ensemble model learns from multiple visits for each patient, while each LSTMk is exclusively trained on a single visit sequence per patient. This setup guarantees that the training data for each LSTMk are independent and identically distributed (iid). We set *K* = 5 given that only 5% of patients have more than 5 visits in our dataset. We chose not to restrict *C**k* to patients with only *k* visits in order to optimize each LSTMk of the ensemble model on a larger set of data. Here, our assumption is that including patients with a full sequence of visits, even if the length was less than *k*, would make the distribution of training data more exhaustive and improve the model’s predictive performance. In the testing phase, the ELSTM averages the predictions of all LSTMs trained with patients having at least *m* visits to make a prediction at the *m**th* visit of a patient, as follows: ![Formula][3] ### 2.3 Experimental setup #### 2.3.1 Baseline models We conducted a comparative analysis of the ELSTM with two baseline models which do not use longitudinal data. The first model is the random forest (RF), as employed in prior work [9], using the scikit-learn wrapper [20] from skranger library1. The second model is a Basic LSTM (BLSTM) which does not consider previous information when making a prediction for a specific visit. Each LSTM-based model contains one single hidden layer followed by 2 fully connected layers and was implemented using the PyTorch library [21]. For a fair comparison, we added the previous visit count at each admission as a predictor to the baseline models. #### 2.3.2 Experimental design We used the experimental setup illustrated in Fig. 2 to evaluate the ELSTM and baseline models. The experiments are repeated for each group of predictors AdmDemo and AdmDemoDx. Fol-lowing a similar approach to *Taseen and Ethier* [9], we temporally split the dataset into a *learning set*, including admissions from July 1, 2011, to June 30, 2017, and a *holdout set*, including admissions from July 1, 2017, to June 30, 2021. We excluded patients admitted before June 30, 2017 from the holdout set to prevent data leakage. This design aimed to simulate the evaluation of a model trained on all available patients data and tested on subsequently admitted patients. As patients are exclusively in one set at a time, temporal models have only aggregated previous visits occurring within the last six years prior to the current admission. To evaluate the final model’s clinical utility on the holdout set, we focused on the same population eligible for GOC discussions as in previous work [9]. Therefore, we excluded hospitalizations without an overnight stay from the holdout set, since there would not be enough time for a GOC discussion to occur. Additionally, we omitted admissions to the obstetrics service, where such discussions are considered inappropriate, and admissions to the palliative care service, where GOC discussions have already occurred and are therefore unnecessary at this stage. ![Fig. 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/26/2024.06.21.24309191/F2.medium.gif) [Fig. 2.](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/F2) Fig. 2. Experimental setup for model comparisons and final evaluation. 1) Temporal division of the dataset into a *learning set* and a *holdout set*. 2) Evaluation of the predictive performance of each model on 5 *testing sets* using a 5-fold cross-validation over the patients of the learning set. The same data splits are used for all models. Baseline models (RF, BLSTM) are trained on all the visits, while each temporal model LSTMk comprising the ELSTM is trained using a temporal cohort *C**k*. The scores are reported on specific patients of the *testing sets*. The training of each model includes the optimization of hyperparameters, except for the temporal models, for which we only optimize the hyperparameters of LSTMlast. Details on hyperparameter optimization are shown in Appendix A. 3) Comparison of the temporal and non-temporal strategy and selection of the best strategy based on performance. The latter is measured using the mean and standard deviation of the scores on the 5 testing sets. 4) Final evaluation of the selected strategy on the holdout set. The final model predictions are then compared to usual care to quantify clinical utility #### 2.3.3 Model selection procedure In the model selection phase, we compare the performance of the ELSTM and baseline models to evaluate the benefits of incorporating the patients history in predicting their one-year mortality risk. To achieve this, we used a nested 5-fold cross-validation scheme. We partitioned the learning set using a 5-fold cross-validation into distinct training and test sets. Each of the training sets was subsequently separated into distinct inner training and inner test sets with an inner 5-fold cross-validation. The inner sets were entirely dedicated to optimize the hyperparameters of the models for each outer training fold. The data splitting was based on patients rather than visits, ensuring that each patient exclusively belonged to one set at a time. To train the LSTM-based models, we created an additional validation set (as well as an inner validation set) for each of the 5 cross-validation splits, enabling us to track model performance through training epochs and proceed to early stopping if necessary. Each (inner) validation set was created by randomly sampling 10% of patients from the corresponding (inner) training set. At each (inner) training split, the baseline models were trained using all patients’ visits, while each LSTMk part of the ELSTM was trained using a temporal cohort *C**k*. We assessed the benefits of the longitudinal data at each patient visit including their last visit available in our dataset, when we considered our patient’s medical trajectory completed. We define *V**t* as all the *t**th* visits of patients having at least *t* visits, and *V**t,last* as the last visits of patients having exactly *t* visits. #### 2.3.4 Hyperparameters optimization We optimized each model’s hyperparameters to find the best set leading to the highest scores. We trained each LSTM-based model using the Adam optimizer [22] with parameters *β*1 = 0.9 and *β*2 = 0.999, and a batch size of 100. We fixed the sizes of the fully connected layers to 2 and 1 respectively. Given that the ELSTM consists of multiple models, we chose to exclusively optimize the hyperparameters of LSTMlast and used the selected set to train each LSTMk. This way, we ensured consistent probability scales within the models constituting the ensemble model. For each optimized model, we sampled 100 sets of hyperparameters values from predefined search spaces, using a random sampler from the Optuna Python library [23]. Each set of hyperparameters values was evaluated by training the model with the 5 inner training sets and then measuring the AUROC on their respective inner testing sets. Here, the inner test sets included only the last visit of each patient. The set associated with the highest AUROC was selected to train the model on the whole training set of the outer loop. Models’ hyperparameters are provided in Appendix A. #### 2.3.5 Final evaluation procedure To evaluate the clinical utility of the best model selected in the previous phase, we compared its predictions to the usual care performed by clinicians on patients eligible for a GOC discussion. We aimed to quantify the gains and losses in terms of true positive and false positive alerts if this automated tool was used in a clinical decision support system to alert clinicians when a patient is identified as being at risk of one-year mortality. First, we extracted all CSOs of patients in the holdout set, and considered that a GOC occurred between a patient and a clinician (and that a patient at high risk of one-year mortality was identified by the clinical team) if a CSO was documented prior to the patient’s discharge, whether during the current admission or a previous one. Similarly to *Taseen and Ethier* [9], we defined: * True Positives (TPs) as patients with a documented CSO who died within a year. * False Positives (FPs) as patients with a documented CSO who survived beyond a year. * False Negatives (FNs) as patients without a documented CSO who died within a year. * True Negatives (TNs) as patients without a documented CSO who survived beyond a year. Next, we trained the previously selected model using the entire learning set and compared its predictions, which would represent the actions suggested by the automated tool, to the usual care the patients from the holdout set received. ## 3 Results The overall cohort consisted of 123,646 patients and 250,812 hospitalizations, with 15% of patients experiencing mortality within one year of their last admission. The learning set included 82,104 patients and 148,587 hospitalizations. For the holdout set, we excluded 40,915 hospitalizations belonging to previously admitted patients between 2011-2017, along with 7,644 patients and 11,992 hospitalizations ineligible for GOC discussions. Ultimately, the holdout set included 33,898 patients and 49,318 hospitalizations. Detailed descriptive analyses for each set can be found in Appendix B. Fig. 3a provides an overview of the proportion of mortality and survival per number of visits across the dataset. Patients who are frequently admitted to the hospital are generally fewer, but present a higher risk of one-year mortality. Fig. 3b shows the distribution of visits over time after the first hospitalization discharge. The second and third visits occur mainly in the first months following the first hospital discharge, while subsequent visits are increasingly scattered across time. ![Fig. 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/26/2024.06.21.24309191/F3.medium.gif) [Fig. 3.](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/F3) Fig. 3. Distribution of visits across the entire dataset. (a) Proportions of survival and mortality per number of visits in the dataset. *V**t*,last represents all the patients with exactly *t* visits in the dataset and *V*>*t*,last those with more than *t* visits. The number of patients decreases with the number of visits, in contrast to the mortality rate. (b) Distribution of visits over time after the first hospital discharge. *V**t* represents all the *t*th visits in the dataset and *V*>*t* all the visits after the *t*th visit ### 3.1 Model selection on the learning set In this part of our study, we explored the benefits of integrating patients’ historical data to predict their HOMR score. We assessed the baselines and the ELSTM on the learning set at various stages of patients’ hospital admissions, to understand the extent to which exploring patients’ history proves beneficial. As described earlier, we considered two groups of predictors: AdmDemo and AdmDemoDx. Table 2a presents the performance of the baseline models and ELSTM for the last visit of each patient. The ELSTM outperforms the non-temporal models with a higher AUROC for all patient groups with both sets of predictors. Statistical tests revealed a significant overall improvement, except for *V*5,last and *V*>5,last, where we note a higher variance due to fewer patients (∼ 300 and ∼ 500) that can diminish the statistical power of the test. Notably, even for patients without a historical record *V*1,last, the temporal model was effective – emphasizing that absence of recurrent visits serves as valuable insight. Experiments in Table 2b show that the impact of longitudinal data is less pronounced on intermediate visits, with non-statistically significant increases or decreases across multiple groups of patients, especially for AdmDemoDx predictors. Appendix C shows the performances of each individual LSTMk within the ELSTM. View this table: [Table 2:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T2) Table 2: Performance of the baseline models and the ELSTM on the testing sets of the learning set using both AdmDemo and AdmDemoDx predictors. RF: Random Forest; BLSTM: Basic LSTM; ELSTM: Ensemble LSTM. (a) Performance on the last visits of patients. (b) Performance on the last and intermediate visits of patients. Each testing set from the 5-fold cross-validation was divided into different groups of patients according to their number of visits to evaluate when temporal modeling is beneficial. Each group of patients included a patient at most once. The scores correspond to the *mean ± standard deviation* of the AUROC over the 5 testing sets. For each group of patients, the highest AUROC is highlighted in bold. Significant difference was quantified using the one-sided Wilcoxon signed-rank test [24]. Each *p*-value corresponds to the significance of improvement of the ELSTM over the best baseline model (RF or BLSTM) for a specific group of patients. *V**t*,last represents the *t*th visits of patients having exactly *t* visits, *V*>*t*,last the last visits of patients having more than *t* visits, *V**t* the *t*th visits of patients having at least *t* visits, and *V*>*t* one visit selected randomly that occurred after the *t*th visit for patients having more than *t* visits. Next, the group of predictors with fewer variables (AdmDemo) achieved acceptable results for all patient sets with a predictably lower AUROC compared to AdmDemoDx (Tables 2a and 2b). The AdmDemo feature set seems to benefit more from the longitudinal data, as we observed a higher AUROC improvement across all patient sets compared to the AdmDemoDx feature set when using the ELSTM. This emphasizes the importance of incorporating longitudinal data in cases where variables about the patient’s precondition are not available (e.g., comorbidity diagnoses), and the ability of such longitudinal data to provide a more comprehensive understanding of the patients through their history. In addition, the ELSTM revealed comparable AUROCs on patients admitted later in time to the AUROCs observed in the learning set (see Appendix D), thereby demonstrating its temporal validity. Overall, the ELSTM achieved the best performance for most patient groups, particularly on their last visits completing their medical trajectory. These results highlight the gains from integrating longitudinal patient data to predict the HOMR score. ### 3.2 Final evaluation on patients eligible for a GOC discussion In this section, we compared the ELSTM using AdmDemo or AdmDemoDx predictors with the usual care performed by clinicians for each patient in the holdout set. We optimized the decision threshold for considering a patient at risk of one-year mortality by maximizing the Youden’s *J*-index [25]. We set it at 0.34 for ELSTM-AdmDemo and 0.17 for ELSTM-AdmDemoDx. Results in Table 3 revealed that the ELSTM with AdmDemo predictors constitutes an automated tool with similar predictions to the usual care performed by clinicians, with overall good precision and a low rate of inappropriate alerts relative to daily clinical practice. We also observe that, although the ELSTM with AdmDemoDx predictors achieved the highest AUROC, the model is less sensitive and detects slightly fewer patients who actually died within a year of their admission (Fig. 4a). Nevertheless, this model considerably reduced the number of false positive notifications and increased the precision. The calibration curves in Fig. 4b support this result, by showing a tendency of ELSTM-AdmDemo to overestimate the risk of death compared to ELSTM-AdmDemoDx. View this table: [Table 3:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T3) Table 3: Comparisons of the final ELSTM with AdmDemo and AdmDemoDx predictors to the usual care performed by clinicians on patients of the holdout set. The scores correspond to the *mean* ± *standard deviation* of the metric over 100 bootstraps drawn with replacement. The highest value for each metric is highlighted in bold. ![Fig. 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/26/2024.06.21.24309191/F4.medium.gif) [Fig. 4.](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/F4) Fig. 4. Analyses of the final ELSTM tested on the holdout set with both AdmDemo and AdmDemoDx predictors. Shaded regions indicate variations within one standard deviation of the mean over 100 bootstraps. (a) Number of positive predictions by the ELSTM with AdmDemo and AdmDemoDx predictors, and CSOs documented by clinicians. The ELSTM shows a reduced number of false positives and a slight loss in the number of true positives. (b) ELSTM calibration curves with AdmDemo and AdmDemoDx predictors. We used interpolation to unify the predicted risk bins over the 100 bootstraps and generate a mean calibration curve with its variations. We show in Appendix C the 100 calibration curves for each ELSTM in the bootstrap sampling. The ELSTM-AdmDemoDx is almost identical to a perfectly calibrated model, while the ELSTM-AdmDemo tends to overestimate the risk of mortality Finally, we analyzed the evolution of importance for each group of features along with the number of visits per patient in the ELSTM-AdmDemoDx. Post-hoc analyses of the importance assigned to each feature by a model provided important insights into their impact on the predicted scores. Fig. 5 illustrates that as the number of patient visits increases, the importance of longitudinal information grows, and the model relies on predictors from both current and previous admissions to predict mortality risk. Conversely, as the number of visits decreases, the model relies more heavily on demographic information to make its predictions. This highlights the importance of using longitudinal data for patients with a long medical history, and is consistent with clinical reality, where the frequently admitted patients’ prognoses have a higher dependence on their overall health history than on their demographics. Feature importance and the overall performance of the ELSTM did not vary when we included the time gap between current and previous admissions (see Appendix E), demonstrating that the model was able to learn this information solely through the content of longitudinal records. ![Fig. 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/26/2024.06.21.24309191/F5.medium.gif) [Fig. 5.](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/F5) Fig. 5. Post-hoc analyses of feature importance of the final ELSTM trained with AdmDemoDx predictors. Importance of each feature is computed using feature permutation [26] over 100 bootstraps. Shaded regions indicate variations within one standard deviation of the mean over 100 bootstraps. Importance of previous features increases as the size of patients’ history gets longer ## 4 Discussion Recent years have seen efforts dedicated to developing automated models identifying patients at high risk of mortality, in order to improve end-of-life care and align patient preferences with the provided care. Recent works have explored the use of machine learning models to integrate patients’ longitudinal data in several clinical contexts [27–29], and presented interesting improvements over single-visit models. However, to date, these techniques have not been used for models predicting the HOMR score to enhance palliative care. This study introduces the Ensemble Long Short-Term Memory (ELSTM), a recurrent neural network-based ensemble model that integrates both admission and historical patient data to automatically identify individuals at an elevated risk of one-year mortality. The aim is to prompt the clinical team for end-of-life interventions, such as GOC discussions. Firstly, we developed the ELSTM, an ensemble model built upon the LSTM neural network, that leverages information learned by different LSTMs at various stages of a patient’s admission. We applied the ELSTM to patients with varying numbers of visits and estimated their HOMR score. We used patient self-reported predictors available upon admission (AdmDemo), as well as other comorbidity diagnoses available in patients’ EHR and admission diagnoses documented later during their stay (AdmDemoDx). A significant improvement in AUROC, the standard evaluation metric in the literature for measuring the discriminative power of the HOMR score [13], was observed across the majority of patients groups using both sets of predictors. Within the LSTM-based neural network, we believe the longitudinal data contributed to mortality prediction in two aspects. First, frequent visits to the hospital (i.e., more longitudinal data) likely indicate an increasing severity of illness, thus a higher risk of death. Second, the characteristics of each previous hospitalization (i.e., the content of longitudinal data) provide a superior holistic view of the patient’s overall condition. Thus, the importance of previous data grows with the length of a patient’s history. These two aspects allow each LSTM to learn long- and short-term longitudinal patterns to accurately identify patients at high risk of one-year mortality. Next, we compared the ELSTM to the usual care provided by clinicians to the population of interest eligible for end-of-life interventions. Both AdmDemo and AdmDemoDx strategies revealed considerable benefits as an automated alert prevalence tool of patients at high risk of one-year mortality in a clinical decision support system. More specifically, the ELSTM using AdmDemo predictors facilitates real-time data acquisition, as it requires fewer variables, all available immediately upon admission and can be self-reported. In addition, the model revealed similar results to human decision-making, and is hence useful in hospitals where diagnoses are encoded post-discharge as in previous studies [15, 16]. On the other hand, even though the ELSTM using AdmDemoDx predictors is more challenging in terms of data acquisition, it can significantly reduce false positive notifications and therefore the risk of an alert fatigue, making it a suitable candidate for deployment in a clinical decision support system. We have identified several limitations worth addressing in future studies. Firstly, while the overall AUROC of the ELSTM with the AdmDemoDx feature set ranges between 0.87-0.89, an examination of population subgroups shows that the oldest patients, and potentially those with the most complicated medical conditions, are less accurately predicted (see Appendix E). To better identify patients at high risk of mortality, models used on these patients should include not only administrative and diagnostic variables routinely collected on admission, but also admission-specific clinical variables such as vital signs, laboratory and imaging tests. Secondly, our evaluation of clinical utility assumes that a clinician would engage in a GOC discussion and document a CSO for all and only those patients suspected to be at high risk of death. However, this assumption has limitations. Not all high-risk patients may have the opportunity for a GOC discussion due to a lack of resources or time. Additionally, clinicians may document a patient’s CSO not only based on their risk of mortality but also on the potential need for escalated care requiring intubation or ventilation. Thirdly, although the AUROC is the main metric in the literature to evaluate models predicting the HOMR score, model selection in clinical settings should primarily maximize clinical utility, which is extremely context-dependent (based on individual hospital services and resources, typical patient origin and profile, severity of admissions, length of stay, etc). Fourthly, although our model demonstrated an acceptable temporal validity, it was not validated using external datasets. Let us however note that the HOMR prediction score has been externally validated in a previous study [14]. Finally, it is important to acknowledge that predicting a patient at high risk of mortality does not guarantee an effective GOC discussion. Future research should therefore investigate the actual impact of early detection of these patients on the quality of their end-of-life care. ## 5 Conclusion In this work, we developed an ELSTM, an ensemble recurrent neural network-based approach leveraging information available across different patient hospitalizations. We evaluated our model using data collected routinely during hospital admissions to predict the Hospital One-year Mortality Risk (HOMR) score and to identify individuals who might benefit from end-of-life discussions with healthcare providers. Our model outperformed existing approaches both when using only admission demographics and administrative variables as predictors (AdmDemo), and when integrating diagnoses as well (AdmDemoDx). Our study highlights the rich data potential available in patients’ medical records, emphasizing their ability to generate predictive models for enhancing patient care, throughout the life spectrum and at the end of life. ## Data Availability Software code allowing to run the experiments used to produce the results presented in this work is freely shared under the GNU General Public License v3.0 on the GitHub website at: [https://github.com/MEDomics-UdeS/POYM](https://github.com/MEDomics-UdeS/POYM). The hospitalization data analysed during the current study are not publicly available for confidentiality purposes overseen by the IRB (Institutional Review Board of the CIUSSS de l Estrie - CHUS Nagano \#2022-4409). However, a randomly generated dataset with the same format as used in our experiments is publicly shared in our GitHub repository to test the code implemented for this work. [https://github.com/MEDomics-UdeS/POYM](https://github.com/MEDomics-UdeS/POYM) ## Declarations ### Funding This study was supported by: (i) Canada CIFAR AI Chair, Mila; (ii) Natural Sciences and Engineering Research Council of Canada (NSERC), Discovery Grants Program (RGPIN-2021-03996); (iii) Fonds de recherche du Québec – Nature et technologies, programme relève professorale (312290). The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. ### Competing interests The authors have no competing interests to declare that are relevant to the content of this article. ### Ethics approval and consent to participate Institutional Review Board approval was obtained prior to data acquisition (Institutional Review Board of the CIUSSS de l’Estrie—CHUS Nagano #2022-4409). Given the retrospective nature of the study, the Director of Professional Services of CIUSSS de l’Estrie-CHUS waived the requirement for signed consent from participants. ### Consent for publication All authors had full access to all the data in the study and accept responsibility to submit for publication. ### Data and code availability Software code allowing to run the experiments used to produce the results presented in this work is freely shared under the GNU General Public License v3.0 on the GitHub website at: [https://github.com/MEDomics-UdeS/POYM](https://github.com/MEDomics-UdeS/POYM). The hospitalization data analysed during the current study are not publicly available for confidentiality purposes overseen by the IRB (Institutional Review Board of the CIUSSS de l’Estrie—CHUS Nagano #2022-4409). However, a synthetic dataset generated using the AVATAR method [30] in partnership with Octopize2 is publicly shared on the Zenodo website at: [https://zenodo.org/doi/10.5281/zenodo.12954672](https://zenodo.org/doi/10.5281/zenodo.12954672). The publication of this synthetic dataset has been approved under project #2022-4409 with form #F2H-60510. ### Authors’ contributions Authors’ initials are used to designate them. Conceptualization: HL, MV Data curation: HL, RT Formal Analysis: HL Funding acquisition: MV Investigation: HL, MV, RT Methodology: HL, MV Project administration: HL, MV Resources: MV Software: HL, NR Supervision: MV, DP Validation: HL, DP Visualization: HL Writing – original draft: HL Writing – review & editing: HL, MV, DP, RT, NR ## Acknowledgements We thank Jean-François Ethier, Associate Professor in the Department of Medicine at the Université de Sherbrooke, for data collection. We also thank Olivier Lefebvre, PhD student at Université de Sherbrooke, and Mahdi Ait Lhaj Loutfi, Master’s student at Université de Sherbrooke, for helpful comments and suggestions throughout the project. ## Appendix A. Hyperparameters This section provides further information on the hyperparameter optimization process (Fig. 6), including the hyperparameters of the models and their respective search spaces (Table 4 and 5). View this table: [Table 4:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T4) Table 4: Random Forest’ hyper-parameters. The hyperparameters that are not mentioned were set as the default ones from the *scikitlearn* wrapper interface of version 0.8.0 of the *skranger* library3. The *weight* hyperparameter represents the weight of the positive class. View this table: [Table 5:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T5) Table 5: Hyperparameters of LSTM-based models (BLSTM and LSTMk). The *weight decay* refers to the coefficient multi-plying the ℒ2 penalty in the cross entropy loss. The *learning rate* refers to the initial learning rate given to the Adam optimizer [22] at the beginning of the training. The *hidden size* refers to the number of neurons in the hidden layer. The *weight* hyperparameter represents the weight of the positive class. ![Fig. 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/26/2024.06.21.24309191/F6.medium.gif) [Fig. 6.](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/F6) Fig. 6. Hyperparameter optimization for each model. The process is performed automatically using a random sampler from predefined search spaces for each hyperparameter, within the framework of the Optuna [23] Python library. The training patients are divided using a 5-fold cross-validation into inner training patients and inner testing patients. Temporal models are trained using the temporal cohort *C*last. All models are tested on the last visits of patients. A total of 100 sets of hyperparameter values are sampled sequentially and evaluated on the same inner testing patients. The performance is measured using the mean of the AUROC on the 5 inner testing sets. The set of hyperparameters associated with the highest AUROC is used to train the model on the outer training patients ## Appendix B. Descriptive analyses This section provides detailed descriptive analyses of the demographic and admission characteristic features, as well as four major comorbidities, across the full dataset (Table 6), the learning set (Table 7), and the holdout set (Table 8). We present the mode of each categorical feature along with its proportion in the dataset, and the mean of each continuous feature along with its standard deviation. The *p*-values are computed using the Welch’s t-test [31] for continuous features (age, ambulance admission count, ED visit count, and weeks recently hospitalized) and the Pearson’s chi-squared test [32] for categorical and binary features, using the *scipy* [33] Python library. View this table: [Table 6:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T6) Table 6: Descriptive analysis of the demographics and admission characteristics features along with four major comorbidities on the full dataset. View this table: [Table 7:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T7) Table 7: Descriptive analysis of the demographics and admission characteristics features along with four major comorbidities on the learning set. View this table: [Table 8:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T8) Table 8: Descriptive analysis of the demographics and admission characteristics features along with four major comorbidities on the holdout set. ## Appendix C. Detailed performance of the ELSTM We present the performance of each LSTMk constituting the ELSTM on patients of the learning set in Fig. 7. ![Fig. 7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/26/2024.06.21.24309191/F7.medium.gif) [Fig. 7.](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/F7) Fig. 7. Performance of each LSTMk trained with a cohort *C**k* on different groups of patients. (a) and (b) Performance on the last visits of patients. (c) and (d) Performance on the last and intermediate visits of patients. The rows represent the testing patients and the columns represent the training cohorts. The scores in the intersection of a row and a column correspond to the *mean* of the AUROC over the 5 folds of cross-validation of an LSTM trained with the corresponding cohort and tested on the corresponding patients Additionally, we provide the calibration curves of the final ELSTM, tested on 100 bootstraps of the holdout set, at each bootstrap in Fig. 8. ![Fig. 8.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/26/2024.06.21.24309191/F8.medium.gif) [Fig. 8.](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/F8) Fig. 8. Calibration curves of the ELSTM with AdmDemo and AdmDemoDx predictors for each of the 100 bootstraps on the holdout set ## Appendix D. Temporal validity We train the ELSTM with patients admitted between July 1, 2011 and June 30, 2017 and tested it on patients who are only admitted between July 1, 2017 and June 30, 2021, without excluding those ineligible for a GOC discussion. The testing set included 41,542 patients and 61,310 hospitalizations. The goal is to evaluate the ELSTM when using the same rules for excluding visits for training and testing, but with data from different time periods. Results in Table 9 reveal an acceptable temporal validity, with AUROCs comparable to those observed in the learning set (Table 2). View this table: [Table 9:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T9) Table 9: Temporal validity of the ELSTM without excluding patients not eligible for GOC discussions. (a) Performance on the last visits of patients. (b) Performance on the last and intermediate visits of patients. The scores correspond to the *mean* ± *standard deviation* of the AUROC over 100 bootstraps. ## Appendix E. Additional experimental results We trained the ELSTM model with the time gap between current and previous admissions included as an additional predictor to assess its impact on predicting patient mortality risk. The overall performance (Table 10) and feature importance (Fig. 9) remained consistent with the results obtained using the original set of predictors. View this table: [Table 10:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T10) Table 10: Performance of the ELSTM with AdmDemoDx predictors when including the time gap between current and previous admissions as a predictor. The scores correspond to the *mean* ± *standard deviation* of the metric over 100 bootstraps. ![Fig. 9.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/26/2024.06.21.24309191/F9.medium.gif) [Fig. 9.](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/F9) Fig. 9. Post-hoc analyses of feature importance of the ELSTM trained with AdmDemoDx predictors when including the time gap between current and previous admissions as a predictor. Importance of each feature is computed using feature permutation [26] over 100 bootstraps. Shaded regions indicate variations within one standard deviation of the mean over 100 bootstraps We also evaluated the performance of the final ELSTM on various population subgroups within the holdout set. The model performed similarly across patients of different biological sexes (Table 11), but its accuracy varied among age subgroups, with the oldest patients being the least accurately predicted (Table 12). View this table: [Table 11:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T11) Table 11: Performance of the ELSTM on subpopulations of males and females. The scores correspond to the *mean ± standard deviation* of the AUROC over 100 bootstraps. View this table: [Table 12:](http://medrxiv.org/content/early/2024/07/26/2024.06.21.24309191/T12) Table 12: Performance of the ELSTM on subpopulations of different age groups. The scores correspond to the *mean ± standard deviation* of the AUROC over 100 bootstraps. ## Footnotes * This version of the manuscript includes a link to the synthetic dataset, which is publicly available on the Zenodo website. * 1 [https://pypi.org/project/skranger/](https://pypi.org/project/skranger/) * 2 [https://www.octopize.io](https://www.octopize.io) * 3 [https://pypi.org/project/skranger/](https://pypi.org/project/skranger/) * Received June 21, 2024. * Revision received July 26, 2024. * Accepted July 26, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/) ## References 1. [1].Yourman, L.C., Lee, S.J., Schonberg, M.A., Widera, E.W., Smith, A.K.: Prognostic indices for older adults: a systematic review. Journal of the American Medical Association 307(2), 182–192 (2012) doi:10.1001/jama.2011.1966 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2011.1966&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22235089&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F26%2F2024.06.21.24309191.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000299038300030&link_type=ISI) 2. [2].Clarke, M., Kennedy, K., MacDonagh, R.: Development of a clinical prediction model to calculate patient life expectancy: the measure of actuarial life expectancy (male). Medical Decision Making 29(2), 239–246 (2009) doi:10.1177/0272989X08327114 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/0272989X08327114&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19047762&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F26%2F2024.06.21.24309191.atom) 3. [3].Kalra, S., Basourakos, S., Abouassi, A., Achim, M., Volk, R.J., Hoffman, K.E., Davis, J.W., Kim, J.: The implications of ageing and life expectancy in prostate cancer treatment. Nature Reviews Urology 13(5), 289–295 (2016) doi:10.1038/nrurol.2016.52 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrurol.2016.52&link_type=DOI) 4. [4].Seow, H., O’Leary, E., Perez, R., Tanuseputro, P.: Access to palliative care by disease trajectory: a population-based cohort of ontario decedents. BMJ open 8(4), 021147 (2018) doi:10.1136/bmjopen-2017-021147 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1136/bmjopen-2017-021147&link_type=DOI) 5. [5].Gomes, B., Calanzani, N., Gysels, M., Hall, S., Higginson, I.J.: Heterogeneity and changes in preferences for dying at home: a systematic review. BMC Palliative Care 12(1), 1–13 (2013) doi:10.1186/1472-684X-12-7 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1472-684X-12-7&link_type=DOI) 6. [6].Hsu, A.T., Garner, R.E.: Associations between the receipt of inpatient palliative care and acute care outcomes: a retro-spective study. Health Reports 31(10), 3–13 (2020) doi:10.25318/82-003-x202001000001-eng [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.25318/82-003-x202001200001-eng&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F26%2F2024.06.21.24309191.atom) 7. [7].Brinkman-Stoppelenburg, A., Rietjens, J.A., Heide, A.: The effects of advance care planning on end-of-life care: a systematic review. Palliative Medicine 28(8), 1000–1025 (2014) doi:10.1177/0269216314526272 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/0269216314526272&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24651708&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F26%2F2024.06.21.24309191.atom) 8. [8].Huber, M.T., Highland, J.D., Krishnamoorthi, V.R., Tang, J.W.-Y.: Utilizing the electronic health record to improve advance care planning: a systematic review. American Journal of Hospice and Palliative Medicine 35(3), 532–541 (2018) doi:10.1177/1049909117715217 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/1049909117715217&link_type=DOI) 9. [9].Taseen, R., Ethier, J.-F.: Expected clinical utility of automatable prediction models for improving palliative and end-of-life care outcomes: Toward routine decision analysis before implementation. Journal of the American Medical Informatics Association 28(11), 2366–2378 (2021) doi:10.1093/jamia/ocab140 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/jamia/ocab140&link_type=DOI) 10. [10].Heyland, D.K., Allan, D.E., Rocker, G., Dodek, P., Pichora, D., Gafni, A., End-of-Life Network (CARENET, C.R., et al: Discussing prognosis with patients and their families near the end of life: impact on satisfaction with end-of-life care. Open Medicine 3(2), 101 (2009) 11. [11].Yamaguchi, T., Maeda, I., Hatano, Y., Mori, M., Shima, Y., Tsuneto, S., Kizawa, Y., Morita, T., Yamaguchi, T., Aoyama, M., et al: Effects of end-of-life discussions on the mental health of bereaved family mem-bers and quality of patient death and care. Journal of Pain and Symptom Management 54(1), 17–26 (2017) doi:10.1016/j.jpainsymman.2017.03.008 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jpainsymman.2017.03.008&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F26%2F2024.06.21.24309191.atom) 12. [12].Lund, S., Richardson, A., May, C.: Barriers to advance care planning at the end of life: an explanatory systematic review of implementation studies. PloS one 10(2), 0116629 (2015) doi:10.1371/journal.pone.0116629 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0116629&link_type=DOI) 13. [13].Walraven, C.: The hospital-patient one-year mortality risk score accurately predicted long-term death risk in hospitalized patients. Journal of Clinical Epidemiology 67(9), 1025–1034 (2014) doi:10.1016/j.jclinepi.2014.05.003 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jclinepi.2014.05.003&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24973823&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F26%2F2024.06.21.24309191.atom) 14. [14].Walraven, C., McAlister, F.A., Bakal, J.A., Hawken, S., Donzé, J.: External validation of the hospital-patient one-year mortality risk (homr) model for predicting death within 1 year after hospital admission. Canadian Medical Association Journal 187(10), 725–733 (2015) doi:10.1503/cmaj.150209 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiY21haiI7czo1OiJyZXNpZCI7czoxMDoiMTg3LzEwLzcyNSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzA3LzI2LzIwMjQuMDYuMjEuMjQzMDkxOTEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 15. [15].Walraven, C., Forster, A.J.: The homr-now! model accurately predicts 1-year death risk for hospitalized patients on admission. The American Journal of Medicine 130(8), 991–999116 (2017) doi:10.1016/j.amjmed.2017.03.008 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.amjmed.2017.03.008&link_type=DOI) 16. [16].Wegier, P., Koo, E., Ansari, S., Kobewka, D., O’Connor, E., Wu, P., Steinberg, L., Bell, C., Walton, T., Walraven, C., et al: mhomr: a feasibility study of an automated system for identifying inpatients having an elevated risk of 1-year mortality. BMJ Quality & Safety 28(12), 971–979 (2019) doi:10.1136/bmjqs-2018-009285 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoicWhjIjtzOjU6InJlc2lkIjtzOjk6IjI4LzEyLzk3MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzA3LzI2LzIwMjQuMDYuMjEuMjQzMDkxOTEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 17. [17].Guo, A., Foraker, R., White, P., Chivers, C., Courtright, K., Moore, N.: Using electronic health records and claims data to identify high-risk patients likely to benefit from palliative care. American Journal of Managed Care 27(1) (2021) doi:10.37765/ajmc.2021.88578 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.37765/ajmc.2021.88578&link_type=DOI) 18. [18].Beeksma, M., Verberne, S., Bosch, A., Das, E., Hendrickx, I., Groenewoud, S.: Predicting life expectancy with a long short-term memory recurrent neural network using electronic medical records. BMC Medical Informatics and Decision Making 19(1), 1–15 (2019) doi:10.1186/s12911-019-0775-2 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12911-019-0775-2&link_type=DOI) 19. [19].Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997) doi:10.1162/neco.1997.9.8.1735 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1162/neco.1997.9.8.1735&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9377276&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F26%2F2024.06.21.24309191.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1997YA04500007&link_type=ISI) 20. [20].Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al: Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12, 2825–2830 (2011) 21. [21].Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al: Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (2019) 22. [22].Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arxiv:1412.6980 (2014) doi:10.48550/arXiv.1412.6980 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.48550/arXiv.1412.6980&link_type=DOI) 23. [23].Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631 (2019). doi:10.1145/3292500.3330701 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1145/3292500.3330701&link_type=DOI) 24. [24].1. Kotz, S., 2. Johnson, N.L. Wilcoxon, F.: In: Kotz, S., Johnson, N.L. (eds.) Individual Comparisons by Ranking Methods, pp. 196–202. Springer, New York, NY (1992). doi:10.1007/978-1-4612-4380-916 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/978-1-4612-4380-916&link_type=DOI) 25. [25].Youden, W.J.: Index for rating diagnostic tests. Cancer 3(1), 32–35 (1950) [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15405679&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F26%2F2024.06.21.24309191.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1950UD97200004&link_type=ISI) 26. [26].Fisher, A., Rudin, C., Dominici, F.: All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research 20(177), 1–81 (2019) 27. [27].Herman, R., Vanderheyden, M., Vavrik, B., Beles, M., Palus, T., Nelis, O., Goethals, M., Verstreken, S., Dierckx, R., Penicka, M., et al: Utilizing longitudinal data in assess-ing all-cause mortality in patients hospital-ized with heart failure. ESC Heart Failure 9(5), 3575–3584 (2022) doi:10.1002/ehf2.14011 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/ehf2.14011&link_type=DOI) 28. [28].Nitski, O., Azhie, A., Qazi-Arisar, F.A., Wang, X., Ma, S., Lilly, L., Watt, K.D., Levitsky, J., Asrani, S.K., Lee, D.S., et al: Long-term mortality risk stratifica-tion of liver transplant recipients: real-time application of deep learning algorithms on longitudinal data. Lancet Digital Health 3(5), 295–305 (2021) doi:10.1016/S2589-7500(21)00040-6 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S2589-7500(21)00040-6&link_type=DOI) 29. [29].Yang, F., Zhang, J., Chen, W., Lai, Y., Wang, Y., Zou, Q.: Deepmpm: a mortality risk prediction model using longitudinal ehr data. BMC Bioinformatics 23(1), 423 (2022) doi:10.1186/s12859-022-04975-6 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12859-022-04975-6&link_type=DOI) 30. [30].Guillaudeux, M., Rousseau, O., Petot, J., Bennis, Z., Dein, C.-A., Goronflot, T., Vince, N., Limou, S., Karakachoff, M., Wargny, M., et al: Patient-centric synthetic data generation, no reason to risk re-identification in biomedical data analysis. NPJ Digital Medicine 6(1), 37 (2023) doi:10.1038/s41746-023-00771-5 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41746-023-00771-5&link_type=DOI) 31. [31].Welch, B.L.: The generalization of ‘student’s’problem when several different population varlances are involved. Biometrika 34(1-2), 28–35 (1947) doi:10.1093/biomet/34.1-2.28 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/34.1-2.28&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20287819&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F26%2F2024.06.21.24309191.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1947UZ95100003&link_type=ISI) 32. [32].Pearson, K.: X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50(302), 157–175 (1900) doi:10.1080/14786440009463897 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/14786440009463897&link_type=DOI) 33. [33].Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ., Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P., SciPy 1.0 Contributors: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, 261–272 (2020) doi:10.1038/s41592-019-0686-2 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41592-019-0686-2&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32015543&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F26%2F2024.06.21.24309191.atom) [1]: /embed/graphic-3.gif [2]: /embed/inline-graphic-1.gif [3]: /embed/graphic-4.gif