Abstract
Respiratory failure (RF) is a frequent occurrence in critically ill patients and is associated with significant morbidity and mortality as well as resource use. To improve the monitoring and management of RF in intensive care unit (ICU) patients, we used machine learning to develop a monitoring system covering the entire management cycle of RF, from early detection and monitoring, to assessment of readiness for extubation and prediction of extubation failure risk. For patients in the ICU in the study cohort, the system predicts 80% of RF events at a precision of 45% with 65% identified 10h before the onset of an RF event. This significantly improves upon a standard clinical baseline based on the SpO2/FiO2 ratio. After a careful analysis of ICU differences, the RF alarm system was externally validated showing similar performance for patients in the external validation cohort. Our system also provides a risk score for extubation failure for patients who are clinically ready to extubate, and we illustrate how such a risk score could be used to extubate patients earlier in certain scenarios. Moreover, we demonstrate that our system, which closely monitors respiratory failure, ventilation need, and extubation readiness for individual patients can also be used for ICU-level ventilator resource planning. In particular, we predict ventilator use 8-16h into the future, corresponding to the next ICU shift, with a mean absolute error of 0.4 ventilators per 10 patients effective ICU capacity.
Introduction
Respiratory failure (RF) is common among patients in intensive care units (ICUs) and is associated with high morbidity and mortality1. RF severity is defined by the P/F ratio (PaO2/FiO2 ratio) with values below 200 mmHg corresponding to moderate and below 100 mmHg to severe RF. Treating patients with RF involves a sequence of clinical evaluations. This includes identifying RF and the need for mechanical ventilation, tracking lung function improvements, determining the right time to stop mechanical ventilation, and assessing the risk of complications after extubation.
Optimizing clinical decision-making requires continuous monitoring of the patient state and prediction of the future clinical course. ICU physicians base their treatment decisions mostly on intermittent clinical assessments and evaluation of monitored vital signs stored in electronic patient-data management systems (PDMS). In the increasingly complex ICU environment, clinicians are confronted with large amounts of data from a multitude of monitoring systems for numerous patients. The quantity of data increases the risk that clinicians do not readily recognize, interpret, and act upon relevant information, contributing to poorer patient outcomes as well as increased ICU resource expenditure2. These large data quantities are ideal for automatic processing by machine learning (ML) algorithms3,4, which have been used to develop decision support systems for various conditions, such as acute respiratory distress syndrome (ARDS)5–9, circulatory failure10, sepsis11–13, and renal failure14.
We aim to develop a comprehensive, ML-based Respiratory Monitoring System (RMS) to simplify monitoring, expedite treatment of individual patients with RF, and optimize ICU resource planning. For individual patients, the system predicts the risk of RF and the need for mechanical ventilation, continuously monitors changes and improvements of the respiratory state, and predicts the probability of successful extubation. To facilitate total ICU resource management, we demonstrate how using respiratory state predictions from all individual patients admitted to the ICU enables estimating the future number of patients needing mechanical ventilation.
All models are developed on HiRID-II15, a new open-source dataset containing more than 55,000 admissions to a tertiary care ICU in Switzerland, which forms an integral part of this work. The models for respiratory and extubation failure are externally validated in the Amsterdam University Medical Center database16 (UMCdb).
We hypothesize that RMS can predict the relevant respiratory events throughout the treatment process of individual patients accurately and early; both in the development dataset and when validated in externally sourced data. In addition, we aim to show that ICU-level resource requirements for the respiratory treatment of patients can be accurately predicted by integrating the various RMS scores across patients in the ICU.
Results
Preparation of an extended HiRID dataset (HiRID-II)
We present the High time Resolution Intensive care unit Dataset II (HiRID-II), a substantial update to HiRID-I15, that we aim to make available to the research community on physionet.org17,18. This new dataset contains 60% more ICU admissions than its predecessor (Table 1, Extended Data Fig. 1a). Additionally, the number of meta-variables increased from 209 to 310 by merging equivalent clinical concepts and including additional respiratory variables (Extended Data Fig. 1b). The dataset was k-anonymized with respect to the variables age, weight, height & gender, reducing the number of admissions from 60,503 to 55,858. To further reduce the risk of individual patient identification, admission dates were randomly shifted. To allow the assessment of model generalization to the future, the data set was divided into temporal splits while respecting k-anonymization (Extended Data Fig. 1c). To test generalization to other health systems, an external high-resolution evaluation data set was extracted from the Amsterdam UMCdb16 and harmonized with the HIRID-II dataset (Extended Data Fig. 1d). Preliminary analysis of the HiRiD-II data set revealed strong correlations between occurrence of RF and extubation failure with ICU mortality, motivating our proposed respiratory monitoring system (Extended Data Fig. 2) and confirming prior results1.
Development of a continuous monitoring system for respiratory management
Continuous PaO2 estimation
The partial pressure of oxygen in arterial blood (PaO2) is one of the main determinants of arterial oxygen content, a parameter that we aim to estimate continuously. The ratio of fraction of oxygen in the inspiratory gas (FiO2) and PaO2 (P/F ratio, PaO2/FiO2 in mmHg) is commonly used to determine the severity of RF19. To measure PaO2, an arterial blood sample is necessary. Contrary to PaO2, arterial oxygen saturation (SpO2) can be continuously monitored in ICU patients using pulse oximetry. The underlying physiological principles governing the binding and release of oxygen to and from hemoglobin create a correlation between SpO2 and PaO 20–22. This relationship allows for the use of SpO values to infer PaO2 levels accurately. Firstly, we developed an algorithm to continuously estimate PaO2 using SpO2 and other relevant variables determining the hemoglobin-oxygen dissociation curve. This enables us to obtain PaO2 estimates every five minutes. The algorithm outperforms the non-linear Severinghaus-Ellis baseline23 for estimating PaO2 values from non-invasive SpO2 measurements (Extended Data Fig. 3).
Patient State Annotation and Labeling
We aim to predict the risk of a patient developing RF within the next 24 hours throughout the ICU stay, with a risk score produced continuously every 5 minutes (Fig. 1a). For each time-point it was determined if a patient is currently in (moderate or severe) RF (P/F ratio < 200 mmHg), ventilated, or ready to be extubated. Readiness to extubate status at each time-point was defined using a clinical scoring system (REXT status score), and a score threshold was manually selected after inspection of the time series by an experienced ICU clinician (Fig. 1b). Current ventilation status was deduced from the presence of ventilator-specific requirements.
Positive labels for future RF are defined as time-points when the patient is not currently in failure, but RF occurs in the next 24 hours (“impending RF”), while a negative label is assigned if the patient remains stable in the next 24 hours. For every extubation event, we determine whether it failed (reintubation necessary within 48h after extubation) and use it as the label for extubation failure (EF). Labels for ventilation onset and readiness to extubate prediction are positive, if the patient is currently not ventilated/ready-to-extubate, but will be in the next 24 hours (Fig. 1b). In HiRID-II, 43.7% and 46.2% of all patients had RF events and required mechanical ventilation, respectively. Moreover, the dataset contains 23,861 extubations of which 11.1% failed. As the original dates were removed during anonymization for HIRID-II, we used an additionally provided dataset with the admission dates of the ICU patients in order to reconstruct the number of patients within the ICU and the ventilator resource use.
Development of RMS Predictors
The developed RMS consists of four individual scores which are active at different stages of the RF management process. All four models are based on manual feature engineering and LightGBM24 predictors, similar to what was previously described in Hyland et al.10 Prior analyses on HIRID-I for circulatory and a related respiratory failure task have shown its superior performance compared to others, including deep learning models10,25. The predictor for RF (RMS-RF) uses 15 clinical variables (Supplemental Table 3). As in Hyland et al.10, the system raises an alarm, if the RF score raises above a certain threshold and is silenced for 4 hours afterwards; the alarm system is reset after the patient just recovered from an event and is able to raise an alarm again 30 minutes after the recovery. The extubation failure (RMS-EF) predictor uses 20 clinical variables (Supplemental Table 3). The RMS-RF & RMS-EF variable sets were identified using greedy forward selection on the validation set of five data splits, separately for the two tasks. The models for the ventilator use (RMS-VENT) and extubation readiness (RMS-REXT) use the union of the parameters of the two main tasks, yielding a total of 26 variables (Supplemental Table 3).
We use the four risk scores to estimate mechanical ventilator resource requirements in the short-term future by training a meta-model (Fig. 1c). The resource planning problem is divided into two sub–problems; predicting the future ventilator use for already admitted ICU patients, and predicting near future ventilator requirement for newly admitted non-elective patients. We excluded elective patients as their resource use is typically known well in advance. The predictor uses date and time information as well as summary statistics regarding ventilator use and patient numbers from the ICU. A LightGBM24 regressor is used to solve both sub-problems. For admitted ICU patients, it predicts the necessity for mechanical ventilation in the short-term future, as well as the total number of ventilators required for all admitted patients as an aggregate of the individual predictions (Fig. 1d).
Open Source Release
All elements of the developed system, including data preprocessing, annotation, prediction task labeling (Fig. 1e), and both training and prediction pipelines are made available under an open source license facilitating the reproducibility and reuse of the methodology and results.
RMS-RF predicts RF early with high precision and reduces false alarms compared to clinical baselines
The early prediction of RF is crucial for timely intervention, potentially reducing the severity of patient outcomes and improving overall healthcare efficiency. By accurately forecasting these events, RMS-RF may not only improve clinical decision-making but also allow physicians to commence treatment early, thereby mitigating the risk of more severe respiratory complications. We observe that the developed early alarm system RMS-RF significantly outperforms a decision tree that uses the current value of the four most relevant respiratory parameters (SpO2, FiO2, PaO2, and Positive End-Expiratory Pressure (PEEP)) as well as a clinical threshold-based system based on the SpO2/FiO2 ratio (Fig. 2a). It achieves an area under the alarm/event precision recall curve10 (AUPRC) of 0.559 with an alarm precision of 45% at an event recall of 80%. Its underlying risk score has an area under the receiver operating characteristic curve (AUROC) of 0.839 (Extended Data Fig. 4a) and is well calibrated, in contrast to the two baselines (Extended Data Fig. 4b). The system detects 65% and 78% of events at least 10 hours before they occur when set to an event recall of 80% and 90%, respectively (Fig. 2b). Compared to the SpO2/FiO2 threshold-based system, our system generates two-thirds fewer false alarms per day on days where the patient experiences no respiratory failure (Fig. 2c). We find performance increases with more data up to 25% of the total dataset size (Extended Data Fig. 4c). Performance in patients from the cardiovascular and respiratory diagnostic groups is higher than average (alarm precisions 55% and 60% at 80% event recall, respectively). Lower performance is observed in neurologic and trauma patients (Fig. 2d). Performance varies in groups determined by age and gender26 (Extended Data Fig. 4d/e). RMS-RF is inspectable to the clinician using SHapley Additive exPlanations (SHAP)27 values and exhibits physiologically plausible relationships of risk and clinical variables (Fig. 2e, Extended Data Fig. 5).
The proposed RMS-RF model only uses a small number of physiological parameters and ventilator settings. We excluded medication variables to reduce the effect of differences in medication policies in different hospitals. When externally validated in the Amsterdam UMCdb database16, a somewhat reduced performance is observed when the HIRID-II-based model is used and no major performance gains are achieved by retraining using local data (Fig. 2f; 38% vs. 45% alarm precision at 80% event recall). A variant of RMS-RF including medication variables (RFS-RF-p) achieved only minor gains in internal HiRID performance (Fig. 2g) and exhibited poor transfer performance to UMCdb (Extended Data Fig. 6a).To understand these transfer issues, medication policy differences between HiRID-II and UMCdb were analyzed and could be attributed to the medications loop diuretics, heparin and propofol (Fig. 2g, Extended Data Fig. 6b/c).
RMS-EF predicts extubation failure with high precision and is well-calibrated
The accurate prediction of extubation failure is a critical aspect of patient management in intensive care, enabling clinicians to make informed decisions about the ideal timing of extubation. By utilizing RMS-EF to predict the risk of extubation failure, physicians could judiciously determine whether to proceed with or delay extubation based on a quantifiable risk threshold, potentially reducing the likelihood of complications associated with both, premature extubation or unnecessary prolongation of mechanical ventilation. We compare the developed RMS-EF predictor to a threshold-based scoring system, which counts the number of violations of clinically established criteria for readiness to extubate at the time point when the prediction is made (REXT status score). RMS-EF significantly outperforms the baseline (Fig. 3a) with an AUPRC of 0.535 and an AUROC of 0.865 (Extended Data Fig. 7a). We also analyzed calibration and observed high concordance between observed risk of extubation failure and RMS-EF with a Brier score of 0.078, in contrast to the baseline (Fig. 3b). The precision for predicting EF is 80% at a recall of 20% indicating that RMS-EF can confidently identify the highest risk patients. For 25% of correctly predicted successful extubations, RMS-EF would predict success at least 3h prior to the time point when extubation effectively takes place (Fig. 3c). As with RMS-RF, no major improvements are observed when using more than 25% of the training data (Extended Data Fig. 7b). Performance in sub-cohorts according to the diagnostic group is similar, with RMS-EF performing best in respiratory patients (Fig. 3d). We observe that the performance in female patients and older age groups is slightly inferior (Extended Data Fig. 7c/d). As RMS-EF is based almost exclusively on variables that are influenced by clinical policies which likely differ in different hospitals, it transfers poorly to the UMCdb database16 (External Data Fig. 7e). However, a variant of RMS-EF can be constructed without medication variables, which transfers better to the UMCdb database with only slightly reduced internal performance (Fig. 3e; AUPRC 53.5% vs. 49% for HIRID). Accordingly, the analysis of medication policies revealed major differences for ready-to-extubate patients between HiRID-II and UMCdb (Extended Data Fig. 7f/g). SHAP value analysis28 shows that the RMS-EF risk score is dependent on several parameters determined by treatment-policies, such as medications and ventilator settings (Fig. 3f, Extended Data Fig. 8). Severe loss of transfer performance resulted from the inclusion of sedatives and vasopressors in the model (Fig. 3g).
Integrating RMS scores of individual patients for ICU-level resource planning
Using the predictions for the four models focusing on respiratory failure (RMS-RF), extubation failure (RMS-EF), ventilation onset (RMS-VENT), and readiness to extubate (RMS-REXT), we develop a combined model predicting the number of ventilators in use at a specific future horizon. Preliminary analysis of the HiRID-II dataset shows substantial variation in demand for ventilators each day, underscoring the need for a model to aid resource planning (Fig. 4a). In a first step, we evaluated ventilation onset (RMS-VENT) and readiness to extubate (RMS-REXT) prediction 24h prior to the event on a patient-level. We observe a high discriminative performance with AUROCs of 0.914 and 0.809 (Extended Data Fig. 9a/b), event-based AUPRCs of 0.528 and 0.910 (Extended Data Fig. 9c/d), respectively, and the models are well calibrated (Extended Data Fig. 9e/f).
We then train a meta-model using the four scores to predict ventilator usage in the ICU at future time horizons every hour (4-8h, 4-12h, 8-12h, 8-16h, 16-24h; Fig. 4b). We compare it with a baseline that predicts that the future ventilator resource remains unchanged. We observe that the proposed model clearly outperforms this baseline in terms of mean absolute error (MAE), with the largest relative gain in longer prediction horizons (Fig. 4b). We observe that in 39% of time-points the model’s predictions are at least two ventilators closer to ground-truth, for predicting ventilator use in 8-16 hours into the future (Fig. 4c/d). RMS outperforms the baseline for the vast majority of ICU ventilator utilization scenarios (Fig. 4e) with the largest improvement over the baseline when the respirator use is below the maximum capacity (Fig. 4e) and for predictions of ventilator use during day hours (Fig. 4f).
Explorative joint analysis of RMS scores throughout the ICU stay
We analyzed the relationship of the four RMS scores produced at each time point of the ICU stay by embedding the most important parameters for respiratory failure and extubation failure prediction (union of the top 10 variables identified for each task, current value feature) using t-distributed Stochastic Neighbour Embedding (t-SNE29) with subsequent discretization into hexes. This approach produces a two-dimensional hex-map that defines subsets of comparable patient states that can be compared across different characteristics, i.e., between the panels for the hex. We observe that the space is divided into two distinct states, corresponding to time-points when the patient is ventilated or not ventilated (Fig. 5a). The region of ventilated patients is further subdivided, with patients in the upper part being more likely to be ready-to-extubate (Fig. 5b). As expected, the ventilated and not ready-to-extubate region has the highest observed 24h mortality (Fig. 5c). Patients currently experiencing respiratory failure are concentrated in a compact region in the non-ventilated space, as well as scattered throughout the ventilated space (Fig. 5d). States with high risk of future ventilation need according to RMS-VENT are close to the boundary of the ventilated region (Fig. 5e). Readiness to extubate scores show a less clear pattern, but scores tend to be higher in the upper part of the ventilated region, which is also enriched in states in which patients are ready-to-extubate (Fig. 5f). For RMS-EF, high scores are concentrated in two distinct regions at the edge of the ventilated region (Fig. 5g). Lastly, RMS-RF scores are high close to the boundary of patients already in respiratory failure (Fig. 5h). The median risk scores of hexes for respiratory failure/ventilation need are strongly positively correlated with an R2 of 0.471 (Fig. 5i). Likewise, respiratory failure and extubation failure scores are moderately positively correlated (Fig. 5j). For RMS-EF/RMS-REXT scores, no correlation could be observed (Extended Data Fig. 10). For three exemplary hexes with predominantly (1) non-ventilated patients but high RMS-RF score, (2) ready-to-extubate patients but high RMS-EF score, and (3) not-ready-to-extubate patients but high RMS-RF score, the distribution of clinical parameters was analyzed, showing plausible relationships with clinical parameters (Fig. 5k).
Discussion
We present a ML-based system for the comprehensive monitoring of the respiratory state of ICU patients. The respiratory monitoring system (RMS) consists of four highly accurate scoring models that predict the occurrence of respiratory failure, start of mechanical ventilation, readiness to extubate as well as extubation failure. By combining the prediction scores of all admitted patients at any time point and by accounting for the likelihood of future admissions, RMS facilitates the accurate prediction of the near future cumulative number of patients requiring mechanical ventilation to help optimize resource allocation at the ICU level.
In conjunction with our study, we aim to release the extensive HiRID-II dataset, a rich resource for broad-scale analyses of ICU patient data. This dataset represents a significant advancement to HiRID-I, both in terms of number of included patients and clinical parameters. Our initial analysis of the HiRID-II dataset identified significant links: both the presence and duration of respiratory failure, as well as extubation failure, are associated with increased ICU mortality, highlighting distinct yet interconnected risk factors. These insights highlight the critical need for advanced alarm systems in clinical settings to reduce the risks associated with respiratory and extubation failure. The future availability of the HiRID-II dataset to the research community on Physionet17,18 will open up numerous possibilities for further research, allowing for more in-depth investigations into various aspects of ICU patient care and outcomes.
RMS-RF predicts respiratory failure throughout the ICU stay, and alarms for impending failure are typically triggered at least 10 hours before the event. This early warning is sufficient to enable adjustments in the patient’s medical management well in advance of the potential respiratory failure. It outperforms a baseline representing standard clinical decision-making based on SpO2 and FiO2, and reduces the number of false alarms by a factor of 3 at 80% event recall (Fig. 2c). RMS produces RF-specific alarms and silences them within a specified period of time after the model triggers an alarm, reducing alarm fatigue, which is a major issue for ventilator alarms30. Prior to respiratory failure, only 1.5 alarms per patient/day are raised, which is manageable for the clinical personnel, and unlikely to cause alarm fatigue. Reassuringly, only variables directly associated with respiratory physiology or ventilator settings were found consistently predictive of impending respiratory failure. RMS-RF demonstrates its highest precision in individuals admitted with cardiovascular or respiratory admission diagnoses, while its performance notably declines in neurologic patients. In these patients ventilatory management is often determined by the need to protect a compromised airway in patients with altered levels of consciousness and not by the presence of RF per se. A similar pattern was previously observed for circulatory failure10 and suggests that patients in the neurologic category deserve additional attention and may need to be excluded in a clinical implementation of an early alarm system based on RMS-RF. To date, few externally validated ML models to continuously predict acute respiratory failure in the ICU have been reported. Recent works by Le et al.8, Zeiberg et al.31, and Singhal et al.32 focus on mild respiratory failure (P/F index < 300 mmHg). Other models predict respiratory failure at the time of ICU admission or are only valid for specific cohorts33–35.
RMS-EF predicts extubation failure and significantly outperforms a clinical baseline based on common clinical criteria for readiness to extubate status. The model is well calibrated, with almost ideal concordance of the prediction score and observed risk of extubation failure. A potential use case would be to assess the predicted EF risk when considering extubation for patients that are ready to extubate in order to decide whether to accelerate or delay the extubation of the patient. For instance, if the risk is very low, one may speed up extubation of patients that are ready to extubate. At about 80% recall, a quarter of correctly predicted extubation successes are recommended more than 3h before the actual extubation. This suggests that our model could help clinicians to extubate patients earlier. However, in our analysis we could not ascertain whether a patient was not extubated for another reason not apparent from the data, such as availability staff. For clinical use the model could also be operated at 20% recall with very high precision (80%), to identify patients with a high likelihood that extubation will be unsuccessful. This could guide attention towards critical patients, and may caution clinicians from prematurely extubating patients. For the prediction of extubation failure, various models have been proposed36–41. The largest cohorts to date were used in the works by Zhao et al.41, who only validated the model in a cardiac ICU cohort, which limits the generalizability of the results, and Chen et al.42, who restricted the evaluation to ROC-based metrics only, which makes clinical interpretation difficult.
Machine learning (ML) has previously been used to develop support systems for the management of RF patients in the ICU. These include models for recognition of acute respiratory distress syndrome (ARDS)5–9 and COVID-19, pneumonitis patients32,43, prediction of readiness-to-extubate44–46, need for mechanical ventilation47,48, and detection of patient-ventilator asynchrony49. Existing work focuses on single aspects of RF management, often in specific patient cohorts only. Our approach aims to comprehensively monitor the respiratory state throughout the RF treatment process, by integrating relevant respiratory-system related tasks and allowing for joint analysis of risk scores and trajectories. We believe a single and universally applicable system is much more likely to be successfully implemented than multiple fragmented models pertaining to specific disease entities. A further distinguishing feature of RMS is the five-minute time resolution at which predictions are made, enabling longitudinal analysis of risk trajectories. This dynamic prediction paradigm is more flexible than traditional severity scores, which are evaluated at fixed time-points, such as at 24 h after ICU admission50, mainly to predict ICU mortality51.
For successful external validation of RMS-RF, it was key to exclude medication variables from the model, as their inclusion was detrimental to model transferability. We hypothesize that this difficulty is caused by the observed medication policy differences between the centers. Interestingly, ventilator settings, while also policy-dependent, do not appear to compromise transfer performance in the same way. Investigating and quantifying the underlying policy differences, which make transfer difficult, needs additional research. Model transferability is an emergent topic in robust machine learning for ICU settings, and recent works study it for sepsis13,52,53 or mortality prediction54. Our results suggest that medication variables require special attention to enable transfer. In contrast to RMS-RF, we suggest that RMS-EF to be re-trained and fine-tuned using the data from the center where it should be applied. As extubation failure predictions are necessarily tied to policy, the policy differences between different centers proved more relevant than in the case of RMS-RF.
While clinical prediction models for individual patients have been extensively studied, resource planning in the ICU has received little attention in the ML literature, but came into renewed focus due to the COVID-19 crisis55. During the COVID-19 pandemic, the first ML-based models to predict ICU occupation were proposed, such as by Lorenzen et al.55, who predict daily ventilator use up to 15 days into the future, as well as more generally hospitalization, using patient-specific features56. The proposed RMS clearly outperforms a baseline method for predicting future ventilator use at the ICU level. With a mean absolute error of 0.39 ventilators per 10 ICU patient beds used during the next shift (8 to 16 hours), it is sufficiently precise for practical purposes. Since resource allocation in the ICU depends on local policies and procedures, such a system likely needs to be retrained for every clinical facility for reliable predictions.
In this study, we developed comprehensive predictors of key aspects of respiratory failure management, including RF, EF, ventilation need, and readiness to extubate. These predictors collectively describe various aspects of a patient’s respiratory health state in the ICU which can be used for exploratory analysis. The joint analysis and visualization of risk scores alongside other vital clinical variables yield discernible clusters that correspond to specific patient states, indicating the potential for risk stratification within the patient population. We observed a separation of patient states into two main clusters that align with ventilated and non-ventilated states, with substructures within these clusters, in particular the patients that are not ready to be extubated among the ventilated patients (Fig. 5b). A subset of patients with low predicted readiness to extubate within the next 24h have the highest mortality risk (Fig. 5f,c). These patients often have a low GCS, are more likely to require controlled ventilation modes, have higher ventilator peak pressure and require higher PEEP (Fig. 5k); all indicators of more severe underlying lung pathology. We can also identify a distinct cluster of patients that are clinically ready to be extubated, have a lower risk of RF but have a very high EF risk (Fig. 5g). These patients require relatively higher ventilation pressure support and have a low respiratory rate (Fig. 5k); again established risk factors for extubation failure. Among the patients that are not currently ventilated, we find a wide range of risks for RF. Those patients with the highest RF risks have low PaO2, high (supplemental) FiO2 and high respiratory rates (Fig. 5k). However, mortality risks are relatively low (Fig. 5c).
The hex-map visualization provides a snapshot of the ICU population at any given time and allows for the monitoring of patient states over time with updates, akin to those seen in methodologies like T-DPSOM57,58. This dynamic tracking is based on the automated integration of multiple respiratory state dimensions and uses nonlinear dimensionality reduction to provide the position of an individual patient on the map of respiratory health states. We hypothesize that this visualization could assist clinicians in identifying shifts in patient states, although the practical implications of this feature require further validation. This represents a different approach to previous works, that mainly tries to understand biological phenotypes of ARDS patients59–62 or longitudinal sub-phenotypes of a more specific patient set, like COVID-19 patients63,64. Overall, while the visualization provides an interesting perspective for an alternative modality for monitoring of respiratory state in the ICU and can serve as a tool for a more detailed exploration, the presented analysis is primarily exploratory. Further research is needed to substantiate the clinical relevance of the identified clusters and to explore how this system might integrate into the decision-making processes within the ICU.
Our study, while robust, has certain limitations. Unlike typical single-center studies, our research utilized data from two distinct centers, one for development and another for validation. This approach reduces the risk of overfitting models to a local patient cohort, although it is important to note that external applicability may still vary and retraining on local data will still be needed for parts of the proposed RMS. In our machine learning models, we have incorporated improvements based on our previous work. Unlike earlier systems heavily reliant on sporadic clinical measurements like lactate levels10, our current model leverages continuous SpO2 monitoring and ventilator data. This reduces the influence of clinician-driven decisions on our alarm systems, ensuring a more objective assessment of the patient’s condition. However, a limitation remains in the retrospective nature of our data collection. Missing data was partially imputed for respiratory failure annotation, and while this aids in model development, it introduces potential biases. Additionally, our study could not evaluate the impact of system implementation in actual clinical practice, which might alter treatment or monitoring strategies (domain shift)65. Lastly, our assessment of the extubation failure (EF) risk score was limited to scenarios of actual extubation events. While we hypothesize that the accuracy of this score would be similar in patients nearing readiness for extubation, this cannot be definitively concluded from our retrospective data. Future prospective studies are needed to fully understand the implications of our model in a live clinical setting.
Overall, we have proposed a comprehensive monitoring system for the entire respiratory failure management cycle, including resource planning at the ICU level. We hypothesize that our system can facilitate early reassessment of deteriorating patients, enable rapid treatment and improve their outcomes. However, this has to be validated in prospective randomized controlled trials, assessing the impact of using RMS-RF/RMS-EF on patient outcomes. Using gradient-boosted decision trees for constructing RMS allows for the introspection on individual predictions using SHAP values, offering valuable insights to clinicians, and ultimately increasing trust in the predictions66. Resource planning at the ICU level, which has not only become an important topic in the context of the COVID-19 pandemic55, is facilitated by a meta-model, built on top of RMS. Testing such an approach for resource planning and contrasting it with current clinical practice also lies in the scope of future clinical studies.
Data Availability
Data used in this study were obtained from University Hospital Bern and we aim to make it available on physionet.org in anonymized form, similar to the HIRID-I dataset associated with a previous study (Hyland et al., Nature Medicine, 2020).
Author contributions
M.H., X.L., M.F., G.R., T.M.M. with input from S.L.H., M.Ho., A.P., H.Y., M.B. designed the experiments; M.F., T.M.M. selected and provided the clinical data and context; A.P. with input from M.F., G.R., T.M.M., X.L. k-anonymized the data set. X.L., M.F., A.P. with contributions from T.M.M., G.R. preprocessed and cleaned the HiRID-II data; X.L., M.F. harmonized the UMCdb data set with HiRID-II. M.H., M.F. with input from G.R., T.M.M, X.L., S.L.H. defined and developed the respiratory state annotations and labels; M.H., M.F. developed the continuous estimation algorithm for PaO2;. M.H., M.F. developed and extracted ML features; M.H. developed the pipeline for supervised learning including variable selection; M.H. with input from X.L., G.R., M.F., M.H. performed the fairness analysis in sub-cohorts. A.P., with input from M.H., M.F., T.M.M., G.R., performed analyses of treatment policy differences. X.L. with input from G.R., M.H., M.F., A.P. conceived and developed the model for resource planning, M.H. with input from G.R., M.F., X.L., T.M.M. implemented the joint analysis of RMS scores; T.M.M., G.R., M.F. conceived and directed the project; M.H., M.F., X.L., G.R., T.M.M, A.P., M.Ho. with input from H.Y., M.B. wrote the manuscript. X.L. with input from all authors created Fig. 1. All authors read the manuscript and provided critical feedback.
Extended data figures
Supplemental Materials
Supplemental Table 1. Details on the clinical parameters extracted in the HiRID-II dataset (downloadable XLSX file).
Supplemental Table 2. Details on the imputation parameters, such as normal value, and imputation models, for the clinical parameters (downloadable XLSX file).
Supplemental Table 3. List of important variables used for computing complex features, as a basis for variable selection, and for building the final models RMS-RF/RMS-EF/RMS-VENT/RMS-REXT (downloadable XLSX file).
Supplemental Table 4. List of severity levels for computing ‘instability history’ features, for a subset of the important variables. (downloadable XLSX file).
Supplemental Table 5. Model training parameters and grid used for selection of hyperparameters for the LightGBM library (downloadable XLSX file).
Acknowledgments
This project was supported by the Grant No. 205321_176005 of the Swiss National Science Foundation (to T.M.M. and G.R.), and grant #2022-278 of the Strategic Focus Area “Personalized Health and Related Technologies (PHRT)” of the ETH Domain (Swiss Federal Institutes of Technology), and ETH core funding (to G.R.). We acknowledge discussions with and organizational, administrative or technical help by David Berger, Carmen Pfortmüller, Jörg Schefold, Daniel Vonder Mühll, Olga Mineeva, Quinten Johnson, Dinara Veshchezerova, David Meyer, Anastasia Escher, Nora Toussaint, Margarita Kuznetsova, Fedor Sergeev, Marc Zimmermann, Catherine Jutzeler, Karsten Borgwardt, Thomas Gumbsch, Bowen Fan, Jörg Goldhahn, Sonia Strangio, Ivo Schauwecker, Martina Baumann, Sergio Maffioletti, Bernd Rinn, Anna Wiegand, Diana Coman Schmidt, Matthew Levin, Robert Freeman, Thomas Fuchs, Emanuela Keller, Michael Krauthammer, Paul Elbers, and Patrick Thoral. Computational analyses were performed at the LeonhardMed Trusted Research Environment at ETH Zurich (https://sis.id.ethz.ch/services/sensitiveresearchdata/). The work by S.L.H. was done while she was working at ETH Zurich.
Footnotes
↵+ These authors jointly supervised this work: Tobias M. Merz, Gunnar Rätsch; e-mail: tobiasm{at}adhb.govt.nz, gunnar.raetsch{at}inf.ethz.ch.