Abstract
Background Full polysomnography, the gold standard of sleep measurement, is impractical for widespread use in the intensive care unit (ICU). Wrist-worn actigraphy and subjective sleep assessments do not measure sleep physiology adequately. Here, we explore the feasibility of estimating conventional sleep indices in the ICU with heart rate variability (HRV) and respiration signals using artificial intelligence methods.
Methods We used deep learning models to stage sleep with HRV (through electrocardiogram) and respiratory effort (through a wearable belt) signals in critically ill adult patients admitted to surgical and medical ICUs, and in covariate-matched sleep laboratory patients. We analyzed the agreement of the determined sleep stages between the HRV- and breathing-based models, computed sleep indices, and quantified breathing variables during sleep.
Results We studied 102 adult patients in the ICU across multiple days and nights, and 220 patients in a clinical sleep laboratory. We found that sleep stages predicted by HRV- and breathing-based models showed agreement in 60% of the ICU data and in 81% of the sleep laboratory data. In the ICU, deep NREM (N2 + N3) proportion of total sleep duration was reduced (ICU 39%, sleep laboratory 57%, p<0.01), REM proportion showed heavy-tailed distribution, and the number of wake transitions per hour of sleep (median = 3.6) was comparable to sleep laboratory patients with sleep-disordered breathing (median = 3.9). Sleep in the ICU was also fragmented, with 38% of sleep occurring during daytime hours. Finally, patients in the ICU showed faster and less variable breathing patterns compared to sleep laboratory patients.
Conclusions Cardiovascular and respiratory signals encode sleep state information, which can be utilized to measure sleep state in the ICU. Using these easily measurable variables can provide automated information about sleep in the ICU.
I. Introduction
Healthy sleep is a biological imperative(1, 2). Abnormal sleep impairs critical brain and body functions, including memory(3), learning, attention, and affective state(4–8), and regulation of blood pressure(9), inflammatory processes(10, 11), metabolic control(12–14), and stress responses(15–17). The intensive care unit (ICU) is associated with disrupted sleep, due to internal (e.g., pain, immunocompromisation, dyspnea and apnea) and external (e.g., noise, circadian mismatch) factors. Sleep can be so distorted in the ICU that conventional sleep stages become hard to recognize(18). Sleep disruption in the ICU contributes to delirium(19), difficult weaning from mechanical ventilation(20), and increased risk of autonomic, inflammatory and metabolic dysfunction(21).
Despite the urgency of improving sleep in the ICU, measuring sleep in this environment is challenging since conventional polysomnography is difficult to operationalize in the ICU setting(22). Subjective sleep estimates and movement analysis using actigraphy can provide crude assessments of sleep(22, 23), but are heavily confounded by common ICU experiences, including sedation, monitoring, illness, and immobility. No present method of measuring sleep in the ICU is satisfactory(24), thus alternative approaches are needed.
To address this need, our first aim for this study was to evaluate the validity of monitoring sleep physiology in ICU patients using easily obtainable biosignals, such as electrocardiogram (ECG) and respiration, using artificial intelligence methods. Although sleep states are commonly discerned through electroencephalogram (EEG) signals, they can also be decoded through analysis of non-EEG signals(25–27) since sleep modifies a variety of biosignals(28, 29), including blood pressure, heart rate, and respiration. Additionally, respiration and ECG measurements are easier to acquire, offer a more practical and repeatable diagnostic tool compared to EEG, and better reflect sleep physiology compared to actigraphy and subjective assessments. For this study, we used deep neural network models to determine sleep stages from heart rate variability (HRV), derived from ECG, and breathing signals, obtained with a single respiratory effort belt. We analyzed how well the HRV and breathing models agreed in determining sleep stages both in ICU patients and in age and sex-matched patients referred to a clinical sleep laboratory. We hypothesized that agreement between the two models signifies confidence in the determined sleep stage. To test this hypothesis, we used the sleep laboratory dataset to compare sleep staging performance between our model agreement method and the gold standard, which involves manual scoring of polysomnography EEG signals by experts. We also hypothesized that the HRV and breathing models would agree substantially in the ICU dataset, although we expected more disagreement compared to the sleep laboratory dataset given the extent of respiratory and cardiac physiology in critical illness. To characterize variables that could reduce the reliability of the presented HRV- and breathing-based sleep analysis in the ICU, we evaluated whether specific HRV- and respiratory features, medical conditions, severity of illness, and pharmacological drugs are associated with disagreement of these two sleep staging models.
The second aim of this study was to identify common patterns of sleep pathology in ICU patients compared to non-critically ill patients undergoing overnight diagnostic polysomnography recordings for suspected sleep disorders in the sleep laboratory. We computed common sleep statistics and respiratory variables such as respiratory rate and sleep breathing variability, for both cohorts, and tested the robustness of our results through sensitivity analysis.
Finally, sleep has been shown to be fragmented in the ICU(30), which prevents patients from getting adequate consolidated periods of rest. As such, the last aim of this study was to quantify sleep fragmentation in patients admitted to the ICU.
II. Methods
A. Study Oversight
Patients were enrolled after written consent in a randomized clinical trial, Investigation of Sleep in the Intensive Care Unit (NCT03355053, (31)), at the Massachusetts General Hospital (MGH) from June 2018 to November 2019. The clinical trial involved randomizing patients into three groups, where two of the groups received a low dose of dexmedetomidine (0.1 or 0.3 mcg/kg/h) overnight continuously for 11 hours, and the third group received placebo (normal saline). Exclusion criteria for the clinical trial include severe dementia, known pre-existing neurologic diseases or cognitive deficits, serious cardiac disease, severe liver dysfunction, severe renal dysfunction, and low likelihood of survival for 24 hours – all criteria can be found in the online supplement. The study was approved by the Mass General Brigham Institutional Review Board.
B. Dataset – ICU Cohort
The sample size for this study was determined by starting with enrolled patients and then excluding all patients under 45 years of age and patients who had less than 2 hours of ECG or respiratory data. Patients were non-mechanically ventilated at the time of enrollment, although some were subsequently mechanically ventilated during the course of hospitalization; see Table 1. At the start of the trial, a respiratory belt (Airgo, a CE Class IIa certified wearable medical device (32), Figure S1) was placed around a patient’s chest, as close to the floating ribs as possible, until they were transferred outside of the ICU. The belt contains a conductive silver band that measures respiratory effort by sensing changes in electrical resistance that correspond to changes in belt length induced by thoracic movements. The amplitude values of the belt were not calibrated. The sampling frequency was 10 Hz. In addition to demographic data, we collected information regarding labs, medications, vital signs, and ICD-10 codes from the hospital’s electronic medical records. Vital signs at higher time resolution (0.5 Hz) and electrocardiogram (ECG) (256 Hz) data were collected from the bedside telemetry monitors over the hospital network using BedMaster software (Excel Medical, Jupiter, FL). Signals collected through bedside monitors and the wearable respiratory device both contained real-time timestamps; correct alignment was manually reviewed for all patients. Charlson Comorbidity Index(33) and Sequential Organ Failure Assessment (SOFA)(34) scores were computed.
Sleep Laboratory Cohort: 404 patients who underwent overnight polysomnography (PSG) in the Massachusetts General Hospital sleep laboratory between January 2019 and January 2020 wore the same respiratory belt that was used in the ICU cohort. Participants were enrolled through verbal consent shortly before onset of the PSG. There were no exclusion criteria and enrollment stopped after reaching a target sample size of 404 patients. This study was also approved by the Mass General Brigham Institutional Review Board. To use sleep laboratory patients as control subjects for comparison with ICU patients, we applied the same exclusion criteria, i.e., excluding all patients under 45 years of age. We also balanced the distribution of age and sex between the ICU and sleep lab cohorts by applying k-nearest-neighbor matching, as previously described(35). We further stratified sleep lab patients according to their Apnea-Hypopnea-Index (AHI) severity into no-disordered breathing (AHI < 5) and disordered breathing (AHI > 15) groups and applied the same matching method for each of these subgroups. Seven trained sleep technicians annotated PSG studies as part of routine clinical care according to American Academy of Sleep Medicine guidelines (36).
C. Biosignals Preprocessing
Non-physiological data and data with low signal quality were removed from both the ECG and respiratory effort belt signal. For the respiratory signal, this was done with an algorithm checking for high and constant amplitude (belt not worn) and for absence of reliable breath detection (low signal quality). For each patient, we normalized the respiratory signal by subtracting the mean and dividing by the standard deviation calculated from the 1-99% quantile clipped signal. We used the open-source PhysioNet Cardiovascular Signal Toolbox(37) to filter the ECG signal, extract R peaks based on the Pan Tompkins algorithm(38), and obtain a signal quality measure. We provide parameter settings for the preprocessing steps in the online supplement.
D. Sleep Staging
We used deep neural network models that use heart rate variability (a binary sequence with 1 for a detected R-peak in the ECG, 0 else) and respiratory effort signals as inputs to assign a sleep stage (Wake, R, N1, N2 and N3) to every 30-second epoch. We previously validated this approach on datasets from the MGH sleep laboratory and the Sleep Heart Health Study(27). Here, we used a model trained with signals from the wearable respiratory belt as input. For visual representation of the resulting sleep stages for all patients, swimmer plots were created.
E. Breathing Features
From the respiratory belt’s signal, we computed four features using a moving window approach:
Respiratory rate (RR): number of breaths (inspiratory peaks) detected in 10 seconds (moving window) x 6.
Inter-breath-interval (IBI): time (seconds) between two consecutive breaths.
Ventilation coefficient of variation (ventilation CVar): we first computed a proxy of minute ventilation as the sum of positive amplitude changes (inspiration) over 10 seconds and scaled it to a reference of one minute (multiplied by 6), then computed the coefficient of variation over a 30-second window.
Variability index: we computed the coefficient of variation of the IBIs over a 30-second window, and defined the variability index as: variability index = (ventilation CVar + IBI CVar)/2, i.e., the mean of the coefficient of variation computed from the breathing timing (IBI) and breathing amplitude (ventilation).
F. Statistical Analysis – Sleep Staging
Definition - concordant and discordant sleep. For all analyses in this study, we only used data where both HRV and breathing data was simultaneously available. We applied both the HRV and breathing-based sleep staging models individually on all data. Each 30-second epoch in the data was assigned a sleep stage by the HRV- and breathing-based models, yielding two hypnograms. Because disagreement in sleep staging in human experts is common(39), and because stages between wake, N1, N2, and N3 form a continuum, we defined concordance regarding the stage assigned to a given 30-second epoch of sleep if the models agreed to within one stage, and discordance if they did not. Specifically, we defined an ordinal progression of sleep depth for NREM sleep: W < N1 < N2 < N3, such that e.g., if the HRV and breathing-based models assigned W and N1, respectively, this would be considered concordant; whereas assignments of W and N2 would be considered discordant. For REM sleep R, models were considered concordant only if both assigned a stage of R. For completeness, the full set of concordant and discordant stage assignments were:
Concordance: (W, W), (N1, N1), (N2, N2), (N3, N3), (R, R), (W, N1), (N1, N2), (N2, N3).
Discordance: (W, N2), (W, N3), (W, R), (N1, N3), (R, N1), (R, N2), (R, N3).
For the sleep lab polysomnography data, sleep staging agreements between the models and the experts were measured with confusion matrices and Cohen’s kappa(40) for both concordant and discordant data.
Sleep indices. We split each patient’s data into 24-hour segments (full day), starting and ending at 08:00, and further defined day as 08:00 – 20:00 and night as 20:00 – 08:00. For every 24-hour segment, we obtained the HRV-based and breathing-based hypnograms, and computed the following sleep indices:
Total sleep duration (in hours).
Concordant sleep duration (in hours).
Discordant sleep duration (in hours).
Proportion of discordant sleep from total sleep (in %).
Sleep fraction (%): sleep duration divided by amount of data available.
Stage R (%), Stage N1 (%), Stage N2 (%), Stage N3 (%), Stage N2+N3(%): Time spent in a specified sleep stage divided by sleep duration.
Sleep fragmentation index (SFI): Number of sleep stage transitions from (N2, N3, R) to (N1, W) divided by sleep duration.
Wake transitions / hour: Number of sleep stage transitions from (N1, N2, N3, R) to (W) divided by sleep duration.
We further computed mean sleep indices across the models: where x is any of the 8 sleep indices listed above.
To ensure robust conclusions regarding sleep indices, we performed sensitivity analysis, by computing sleep indices using three complementary approaches. The approaches vary by what segments are included for sleep index computation, and by which part of sleep (total sleep or concordant sleep) is used to compute the sleep indices:
A1. Inclusion criteria: Any amount of sleep. Sleep indices computed on total sleep.
A2. Inclusion criteria: At least two hours of concordant sleep. Sleep indices computed on total sleep.
A3. Inclusion criteria: At least two hours of concordant sleep. Sleep indices computed on concordant sleep.
For each approach, we carried out the following test procedure: For every subject we computed the mean sleep indices (mean xmean across all 24-hour segments). We then applied a Mann-Whitney U (MWU) test (H0: ‘equal distribution for both groups’) and Mood’s median test (H0: ‘equal medians for both groups’) to assess statistical difference between the ICU and sleep lab (overall and AHI subgroups) subjects. We considered sleep indices to be significantly different for two groups (Mann-Whitney U) with a significantly different effect direction (Mood’s median test) if both tests resulted in a p-value of less than 0.05. We chose to apply non-parametric tests instead of t-tests because we found that none of the data were normally distributed in either the ICU or sleep lab group, where we define non-normally distributed to mean that either the Shapiro-Wilk or D’Agostino’s K-squared test rejects null hypothesis of normality for alpha=0.05. We report test results for all applied statistical tests.
Sleep fragmentation in the ICU. We characterized sleep fragmentation in the ICU(30) by the following metrics using HRV model-based sleep stages:
Proportion of day (08:00 – 20:00) spent asleep.
Proportion of night (20:00 – 08:00) spent asleep.
Proportion of sleep occurring in the day versus in the night.
Proportion of REM sleep occurring in the day versus in the night.
Number of sleep periods, with a duration of at least one minute, per 24 hours.
Number of sleep periods, with a duration of at least five minutes, per 24 hours.
Subgroup Analysis. The ICU patients were manually grouped according to their primary and main conditions, and the median sleep indices (HRV-based model, analysis approach A2) were computed for each group. The Kruskal-Wallis H test was applied to assess differences of individual sleep indices between groups.
Latent feature representation of sleep. To assess similarities and differences in the latent feature representation between sleep epochs in the ICU and the sleep lab, we computed unsupervised UMAPs (Uniform Manifold Approximation and Projection)(41) based on the sleep staging neural networks’ last hidden layer activations. UMAPs were created separately for the HRV-based model and breathing-based model across pooled sleep lab and ICU data, see supplement for details.
Disagreement between HRV and breathing models - error analysis. We hypothesized that the following variables affect discordance in the HRV- and breathing-based models: 1) specific features from cardiac and respiratory signals; 2) daily dosing of opioids, benzodiazepines, and antipsychotics; 3) SOFA score, a measure of a patient’s illness severity. HRV and breathing features of interest were: RR interval, RR root mean square of successive differences (RMSSD), HRV very low frequency power (VLF), HRV low frequency power (LF), HRV high frequency power (HF), inter-breath-intervals, respiratory rate, respiratory variability index, ventilation CVar, cardiopulmonary coupling(42) (CPC) low frequency coupling (LFC), CPC high frequency coupling (HFC). We computed these features for concordant and discordant sleep parts for each 24-hour segment and performed a Mann–Whitney U test with a significance level of 0.01 for each feature pair. Next, we computed features for each 24-hour segment and performed multilinear regression analysis with a LASSO penalty, with the features as independent variables and the discordant proportion (log-transformed) as the dependent variable. To test if daily administered doses of opioids (in Fentanyl milligram(43)), benzodiazepines (in Midazolam milligram(44, 45)), antipsychotics (DDD method(46)) and the SOFA score were associated with discordance, we computed Pearson and Spearman correlations between each of these variables and the proportion of discordant sleep for each 24-hour segment.
G. Statistical Analysis – Breathing
For every 24-hour segment with at least two hours of concordant sleep, we computed the breathing features as described above from concordant sleep and for each sleep stage. For each patient we averaged the features across all nights. For ICU and sleep lab cohorts, we computed the mean and standard deviation of each feature and assessed statistically significant differences between cohorts with t-tests for Gaussian features or Mann-Whitney U tests and Mood’s median test (analogous to sleep stage analysis) for non-Gaussian features.
Results are reported in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines(47). We provide the de-identified data, the analysis code and computational models used in this study on our GitHub page(48).
III. Results
B. Dataset
For the ICU cohort, 129 patients were enrolled in the clinical trial; we excluded two patients under 45 years old, and 25 patients for having less than 2 hours of ECG or respiratory data available from our analysis, resulting in a cohort size of 102 patients (41 females, 61 males). In the sleep lab cohort, out of 404 enrolled patients, 97 were excluded due to age under 45 years, resulting in a sample size of 307 before matching. Matching resulted in 220 (98 females, 122 males) sleep lab patients, 77 (40 females, 37 males) with an AHI < 5 and 52 (18 females, 34 males) with an AHI > 15. Age distributions were similar for ICU and matched groups, both for male and female subjects; see Table S1 for details. Table 1 summarizes the baseline characteristics of the ICU and the matched sleep lab cohort.
C. Biosignals Preprocessing
After signal preprocessing, we obtained 6,728 hours (280 days) of ECG data, 3,886 hours (162 days) of breathing data, and 3,502 hours (146 days) of simultaneous ECG and breathing data for the ICU cohort. For the sleep lab cohort, the numbers were 1,634, 1,609, and 1,562 hours, respectively. Mean (SD) hours of data available per patient in the ICU (N=103) were ECG 66.0 (27.8), Breathing 38.1 (28.1), for simultaneous ECG and breathing signals 34.3 (24.6), and for the sleep lab (N=220) 7.4 (0.8), 7.3 (1.0), 7.1 (0.9) hours respectively.
F. Statistical Analysis, Sleep Staging
In the sleep laboratory data, 131,157 out of 173,977 30-second epochs (75.4%) were assigned concordant sleep stages by the HRV- and breathing-based models. The models showed higher staging agreements with experts for concordant data than for discordant data, both when agreement was evaluated with Cohen’s kappa with AASM standard stages (W, R, N1, N2, N3), as well as with combined NREM stages (W, R, NREM), see Figure 1.
The three analysis approaches resulted in the following total sleep times (TST), concordant sleep times (CST), and proportions sleep (S(%)) per 24-hour segment (numbers given as mean (SD)):
A1. Inclusion: any sleep; sleep indices computed on total sleep:
ICU: 102 subjects (274 24-hour segments), TST 6.2 (3.1) hours, S(%) 50.4 (19.7)
Sleep lab: 220 subjects (220 segments), TST 4.9 (1.6) hours, S(%) 73.9 (19.3)
A2. Inclusion: ≥2 hours of concordant sleep; sleep indices computed on total sleep:
ICU: 80 subjects (163 segments), TST 8.6 (3.0) hours, S(%) 56.4 (18.7)
Sleep lab: 190 subjects (190 segments), TST 5.3 (1.3) hours, S(%) 76.7 (16.4)
A3. Inclusion: ≥2 hours of concordant sleep; sleep indices computed on concordant sleep:
ICU: 80 subjects (163 segments), CST 4.2 (1.7) hours, S(%) 58.3 (26.9)
Sleep lab: 190 subjects (190 segments), CST 4.1 (1.4) hours, S(%) 76.8 (18.2)
Sleep staging results for each patient over time are shown in Figure 2 and Figure S2, sample hypnograms for both ICU and sleep lab patients are shown in Figure 3.
Sleep indices. For approach A2, the results of the sleep indices computation on a 24-hour level for the HRV and breathing models, together with the expert labels for the sleep lab data, are shown in Figure 4. Mean total sleep time per 24 hours in the ICU was determined to be 6.5 hours with the HRV-based model and 11.8 hours with the breathing-based model (+ 81.5%). In the sleep lab, total sleep time was determined to be 4.9 hours with HRV-based model and 5.7 hours with breathing model (+16.3%) and 5.6 hours by the human sleep expert.
Figure 5 depicts sleep indices (mean of breathing and HRV-based indices) distributions for all patients. In the ICU, there was a greater proportion of N1 compared to the AHI<5 sleep lab cohort (p MWU=0.05) and significantly less N2 compared to all sleep lab cohorts (p<0.001). The proportion of N2+N3 was reduced compared to total sleep lab (p<0.01) and sleep lab AHI<5 (p<0.001) cohorts but not compared to the sleep lab subgroup with AHI>15. While stage R distributions showed heavier tails in the ICU cohort (p MWU=0.05). Median wake transitions per hour of sleep were similar in the ICU and sleep lab AHI>15 groups (3.6 and 3.9), but lower in the sleep lab AHI<5 group (2.7, p<0.05). The ICU cohort showed a larger proportion of discordance between the breathing and HRV-based sleep staging models compared to the sleep lab cohort (41% and 19% respectively). In the ICU, a median of 25% of the day (08:00 – 20:00) and 41% of the night (20:00 – 08:00) were spent asleep. 38% of total sleep and 51% of R sleep occurred during daytime hours.
The following sleep indices results for the ICU were confirmed (same effect direction, all significant) by all analysis approaches A1-A3: Elevated discordant sleep time and proportion, reduced N2 proportion, and reduced N2+N3 proportion. The number of wake transitions per hour of sleep in the ICU was increased compared to the total sleep lab cohort (significant in 2 out of 3 analysis approaches), significantly increased compared to the sleep lab AHI<5 cohort (significant in all analysis approaches), and similar compared to the sleep lab AHI>15 cohort (not significantly different in any analysis approach). Results for the three analysis approaches are presented in Tables S2-S6; a summary is presented in Table S7.
Subgroup Analysis. The median and interquartile ranges of sleep indices for patients grouped by diagnosis are shown in Figure 6 (and Table S8 for numerical values including test results). Kruskal Wallis tests were not significant for any of the sleep indices (minimum p-value of 0.08 observed for N3 proportion). Patients with lowest N2+N3(%) observed had diagnoses of hemorrhage, shock, and encephalopathy, and patients with highest N1(%) had diagnoses of hemorrhage, sepsis and encephalopathy. Discordant sleep proportion was highest for patients with cirrhosis and liver transplant, pneumothorax, respiratory failure, and pneumonia, and lowest for patients with shock, hemorrhage, and acute kidney failure.
Latent feature representation of sleep. We generated 2-dimensional maps by using the deep neural network last hidden layer’s activation for each epoch as inputs for the UMAP (Figure 7). Coloring the data points with the predicted sleep stage shows clusters that correspond to those sleep stages (Figure 7A). For some sleep stages more than one cluster is apparent (HRV: R, N1, N3; Breathing: R, N3). For each sleep stage, estimated gaussian kernel densities of the UMAP-derived features show that the ICU features largely overlap with the sleep lab features, but are more widely dispersed (i.e., more variable) than in the sleep lab cohort (Figure 7B).
Disagreement HRV and breathing model - error analysis. Out of 44 computed biosignal-based features, 27 showed a significantly different distribution (Mann–Whitney U test, significance level 0.01) between concordant and discordant sleep, see Tables S9-S10 for details on the difference between HRV and breathing models. For N1, significantly reduced mean values in discordant sleep were observed for HRV-VLF, HRV-LF, HRV-HF, HRV-RMSSD, inter-breath-intervals, respiratory variability, ventilation CVar, CPC-LFC, CPC-HFC, and significantly increased for respiratory rate. For N2, NN-interval duration, inter-breath-intervals, respiratory variability, ventilation CVar were reduced, while HRV-VLF, HRV-LF, and respiratory rate were increased. For N3, NN-interval duration was decreased, while HRV-VLF, HRV-LF, HRV-HF, HRV-RMSSD, respiratory variability, ventilation CVar and CPC-LFC were increased. Lastly, for REM sleep, we observed increased HRV-RMSSD and CPC-HFC for discordant sleep compared to concordant equivalents.
In the multivariate linear regression analysis, the automated feature selection (see supplement for details) led to a model with 53 input variables (F-test statistic of 1.51, a F-test p-value of 0.036) and an r-squared value of 0.424. Hence, on a 24-hour segment level, the computed HRV and breathing features could explain up to 42% of the variance of discordant sleep proportion.
None of SOFA scores, amounts of opioids, benzodiazepines or antipsychotics administered correlated with the discordant sleep proportion (all p-values for Pearson and Spearman correlations >0.05, see Table S11).
G. Statistical Analysis – Breathing
Figure 8 shows the mean and standard deviation of breathing features during sleep for each patient. The mean and standard deviation of respiratory rate during sleep was significantly larger for the ICU (mean respiratory rates = 17.4 cycles per minute, respiratory rate standard deviation = 4 cycles per minute, p<0.01), compared to the sleep lab patients. The median inter-breath-interval during sleep was significantly lower in the ICU (3.8 seconds) compared to the sleep lab (4.7 seconds). Effect directions were the same (with significance p<0.001) for the sleep lab AHI<5 and sleep lab AHI>15 groups. The mean and standard deviations of the ventilation’s coefficient of variation (as a proxy for minute ventilation) was significantly smaller in the ICU (p<0.05). The mean respiratory variability index in the ICU was smaller compared to the AHI>15 group (p<0.001) and larger to the AHI<5 group (not significant). See Table S12 for a summary and Table S13 for details.
IV. Discussion
Our main contributions are the following key findings: 1) In the ICU, heart rate variability (through ECG) and respiratory information provide meaningful information about sleep stage, quality, and fragmentation – this is significant because HRV and respiratory signals are much more easily obtained in the ICU than conventional polysomnography; 2) Clear differences are evident between sleep indices of ICU patients and those of an age and sex-matched clinical sleep laboratory cohort, such as decreased N2 and N2+N3 proportions in the ICU; 3) Sleep indices of ICU patients, particularly awakening frequency, resemble those of sleep-disordered breathing (SDB) sleep laboratory patients more closely than those of non-SDB patients; 4) Sleep stage concordance between HRV- and breathing-based models was associated with higher agreement with experts compared to sleep stage discordance; 5) The proportion of sleep stage concordance was lower in the ICU (60%) compared to the sleep laboratory cohort (81%); 6) Sleep in the ICU is fragmented; 7) ICU patients showed higher respiratory rate compared to sleep laboratory patients, higher breathing variability compared to non-SDB sleep laboratory patients, and lower breathing variability compared to SDB sleep laboratory patients.
Sleep in the ICU – In this study we compared sleep of ICU patients with covariate-matched sleep laboratory patients. Sleep indices of ICU patients obtained from the HRV- and breathing-based deep neural network sleep staging models were robust across sensitivity analyses. Compared to the sleep laboratory cohort, we observed a shift of NREM sleep stage proportion from N2 and N3 towards more light and non-restorative N1. This is significant, as there has been growing evidence that deep NREM sleep has a crucial role in neurophysiological phenomena such as immunity, glucose metabolism, hormone release and memory(49). The median amount of REM sleep in ICU patients was lower compared to non-sleep disordered breathing and similar to sleep disordered breathing patients from the sleep laboratory. Further, sleep of ICU patients was highly fragmented: patients spent a median of 25% of the day and 41% of the night asleep. Moreover, 38% of total sleep and 51% of REM sleep occurred during the day.
The significant differences between the ICU and the sleep laboratory patients provide strong evidence that sleep is frequently deranged during critical care. Our results reinforce prior published findings (typically PSG studies with smaller sample size(22)), including abnormal hypnograms, high arousal index, abnormal sleep stage shifts, a reduction of deep NREM and varying amount of REM sleep (22, 30, 50–53). Furthermore, there is growing evidence that sleep in the ICU, especially for ventilated patients, is often abnormal and cannot be scored with traditional EEG-based scoring criteria (18, 54). Potentially, periods of discordance observed in our study coincide with EEG patterns that have been classified as ‘atypical sleep’ and ‘pathological wakefulness’ in previous studies examining sleep in the ICU(18, 54).
The presented results are important as there is evidence that sleep impairment in the ICU is associated with delirium(19). Delirium, a state of brain dysfunction with fluctuating awareness, disorganized thinking, and an altered level of consciousness(19), is associated with long-lasting cognitive impairment after hospital discharge and accelerated onset of dementia(55). While the mechanism of delirium is not entirely established, the absence or reduction of deep, restorative NREM sleep and REM sleep, and disturbances of circadian rhythms may be risk factors of developing delirium (19). Cardiovascular and pulmonary signals, as we have shown here, can help monitoring sleep more routinely in the ICU, which can guide measures to preserve sleep continuity, such as managing noise and light levels better or concentrating times of procedures and exams to allow more consolidated periods of rest.
Feasibility of staging sleep with heart rate variability and respiratory signals – The deep neural network activation patterns, visualized in 2D-space with embedding maps, of ICU patients were mostly similar to those of sleep lab patients, with additional activation patterns only present in the ICU. This means that most of the ICU input data were processed by the artificial networks in such a way that the learned data representations (i.e., high-dimensional features about sleep before making the final sleep stage decision) resembled representations resulting from input data that are like the training data (the networks were trained using sleep lab data). This result helps mitigate concerns about covariance shift. Because sleep appears to be fundamentally changed in the ICU, ICU activation patterns that are not present in sleep lab patterns are potentially of interest, and more research is needed to understand the significance and implications of these clusters, as they may affect reliability of models or indicate certain pathological states present in the ICU population and not in sleep laboratory populations. 27 out of 44 HRV, CPC, and breathing features significantly differed between concordant and discordant sleep epochs, indicating that these well-established and interpretable features might be used a-priori to assess the reliability in sleep stage assessment by HRV- and breathing-based sleep staging models. This result was confirmed with a multivariate linear regression model, using HRV and breathing features from 24-hour segments as independent variables, where 42% of the variance of discordance proportion was explained by these features. Surprisingly, we found no association between the discordance of the models and use of opioids, benzodiazepines and antipsychotics, or health status of the patient (SOFA score). Similarly, patients grouped by primary and secondary diagnosis, did not show statistically significant differences in model discordance. This is remarkable, as we hypothesized more severe medical conditions would lead to a higher discordance, but non-significance may be a result of low sample size per subgroup.
These results demonstrate that the presented method of measuring sleep in ICU patients, insofar as our cohort is representative, is feasible in most cases, although it does lead to more inconclusive sleep stage assessments (measured by discordance of the HRV- and breathing-based sleep staging models) than in the sleep laboratory.
Respiratory analysis – Mean respiratory rate during sleep was significantly higher in the ICU compared to all sleep lab cohorts. Variance of respiratory rate was significantly increased compared to total sleep lab cohort and AHI<5 subgroup but not compared to AHI>15 subgroup. Conversely, mean inter-breath-intervals were significantly lower in the ICU compared to all sleep lab cohorts. Interestingly, the variance of the inter-breath-intervals was also low in the ICU (comparable to sleep lab AHI<5, significantly lower than AHI>15 group). Breathing variability, which considers variability in timing and amplitude of breaths, was higher in the ICU group than in the AHI<5 sleep lab patients and lower than in AHI>15 sleep lab patients. This suggests that there is greater baseline instability of respiratory patterns in the ICU, likely reflecting acute illness, even when respiration is mildly abnormal. On the other hand, this level of instability is seen in healthier (non-ICU) patients with severe sleep apnea. Similarities of breathing statistics in ICU patients and in SDB sleep laboratory patients, and overall higher breathing variability in the ICU patients, underscore the high prevalence of sleep-disordered breathing in the ICU patients.
Limitations – In our observational study, we did not have EEG recordings available in the ICU. Therefore, we could not analyze relationships of sleep EEG and the HRV- and breathing-based sleep stage assessments. Joint analysis of cerebral cortex, cardiac, and respiratory activity would allow a more holistic analysis and understanding of sleep physiology in the ICU but are a practical challenge in this population. While the sample size of our study is relatively high compared with other sleep studies done in the ICU setting, 102 patients is still likely a relatively low number relative to the full spectrum of ICU patients and sleep physiology. As the target enrollment of the clinical trial is 750 patients, we hope to be able to follow up with more granular analysis of sleep in the ICU in the future. Patients in this cohort were admitted to surgical and medical ICUs, thus results in other types of ICU’s (e.g., neurological, cardiothoracic) might be different. Further, because we studied sleep in patients mostly not mechanically ventilated, the accuracy of the proposed sleep measurement method and reported sleep results during mechanical ventilation might differ. Finally, all patients were randomized into three groups as part of our clinical trial, where they were infused overnight for 11 hours with 0.1 or 0.3 mcg/kg/h dexmedetomidine (low dose) or placebo (normal saline). When this ongoing quadruple blinded trial is concluded, it will be possible to analyze the effects of low dose dexmedetomidine, if any, on our measures of interest.
Summary – Studying sleep in the ICU is possible and repeatable with readily accessible biological signals: ECG and respiration, with an accuracy tradeoff compared to full polysomnography. However, the patterns noted may themselves encode information about disease pathologies in the ICU setting. HRV/respiration analysis may provide important physiological information not captured by EEG analysis. We found partial explanations for discordance in sleep stages determined by the HRV- and breathing-based models. Sleep architecture and breathing during sleep in the ICU differed from an age and sex-matched sleep lab cohort, including subgroups with and without disordered breathing. As expected, signatures of deep NREM sleep were reduced in the ICU, and dispersion of sleep into the day was noted. The ability to monitor sleep or sleep-like states in the ICU without complex monitoring can enable better tracking of sleep-wake state boundaries and fragmentation, the impact of such fragmentation on delirium and other ICU outcomes, and even estimate the effects of sleep targeted therapies.
Data Availability
Data and models used in this research are available under: https://github.com/mghcdac/ecg_respiration_sleep_staging_icu/
https://github.com/mghcdac/ecg_respiration_sleep_staging_icu
Author contribution
WG: data management, developed code and models, data analysis, drafted and reviewed the manuscript.
PVK: data acquisition, clinical chart review, classification of patients into diagnoses groups, drafted and reviewed the manuscript.
SQ: data acquisition, data management, critical review and revision of the manuscript.
RAT: data management, data acquisition, critical review and revision of the manuscript
AB: data management, data acquisition, critical review and revision of the manuscript
MDSC: was part of the data acquisition, critical review and revision of the manuscript.
NA: was part of data acquisition, critical review and revision of the manuscript.
MJL: was part of the data management, critical review and revision of the manuscript.
EMY: was part of the data management, critical review and revision of the manuscript.
AH: was part of data acquisition, critical review and revision of the manuscript.
SR: was part of data acquisition, critical review and revision of the manuscript.
EP: was part of data acquisition, critical review and revision of the manuscript.
LP: was part of data acquisition, critical review and revision of the manuscript.
JH: was part of data acquisition, critical review and revision of the manuscript.
MA: was part of data acquisition, critical review and revision of the manuscript.
BC: was part of the data acquisition, critical review and revision of the manuscript.
TH: critical review and revision of the manuscript
HS: critical review and revision of the manuscript.
YS: data management, critical review and revision of the manuscript.
SC: critical review and revision of the manuscript.
BTT: critical review and revision of the manuscript.
OA: critical review and revision of the manuscript.
DK: critical review and revision of the manuscript
RJT: design of the work, supervision, critical review and revision of the manuscript
MBW: design of the work, supervision, critical review and revision of the manuscript.
Financial disclosure
Dr. Westover is a co-founder of Beacon Biosignals and reports grants from NIH, during the conduct of the study; Dr. Cash reports other COI from Neuralink, Paradromics, and Synchron, and Beacon Biosignals, outside the submitted work. Dr. Thomas reports personal fees from GLG Councils, Guidepoint Global, and Jazz Pharmaceutics, outside the submitted work. In addition, Dr. Thomas has a patent ECG-spectrogram with royalties paid by MyCardio, LLC, a patent Auto-CPAP with royalties paid by DeVilbiss-Drive, and an unlicensed patent CO2 device for central / complex sleep apnea issued. Dr. Thompson reports consulting for Bayer, Novartis, and Thetis, outside the submitted work. In addition, Mr. Kuller has a patent: Patent US10123724B2 “Breath volume monitoring system and method” issued.
Non-financial disclosure
Mr. Kuller reports non-financial support from Myair Inc., during the conduct of the study, and non-financial support from Myair Inc, outside the submitted work.
Sources of Support (Grants/Gifts/Equipment and/or Drugs):
M.B.X was supported by the Glenn Foundation for Medical Research and American Federation for Aging Research (Breakthroughs in Gerontology Grant); American Academy of Sleep Medicine (AASM Foundation Strategic Research Award); Football Players Health Study (FPHS) at Harvard University; Department of Defense through a subcontract from Moberg ICU Solutions, Inc; and NIH (1R01NS102190, 1R01NS102574, 1R01NS107291, 1RF1AG064312).
SUPPLEMENT
A. Dataset – Patients’ eligibility for the clinical trial was determined according to the following inclusion/exclusion criteria taken from the Investigation of Sleep in the Intensive Care Unit(1) ClinicalTrials.gov page:
Inclusion Criteria:
In order to be eligible to participate in this study, an individual must meet all of the following criteria:
Admitted to MGH Blake 7 or 12, or Ellison 4 ICU at Massachusetts General Hospital
Male or female, aged > 50 years.
Provision of signed and dated informed consent form (by patient or LAR).
Stated willingness to comply with all study procedures and availability for the duration of the study.
Not on mechanical ventilation at the time of enrollment.
Able to be enrolled before 7PM.
For females of reproductive potential: pregnancy test is negative.
Exclusion Criteria:
Any individual who meets any of the following criteria will be excluded from participation in this study:
Unable to be assessed for delirium (e.g. blindness or deafness).
Pregnancy or lactation.
Known allergic reactions to components of dexmedetomidine.
Follow-up would be difficult (e.g. active substance abuse, homelessness).
Severe dementia, as measured by a score of ≥3.3 on the Short Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE).
Known pre-existing neurologic disease or injury with focal neurologic or cognitive deficits.
Serious cardiac disease (e.g. sick sinus syndrome, sinus bradycardia).
Severe liver dysfunction (Child-Pugh Class C).
Severe renal dysfunction (receiving dialysis).
Low likelihood of survival >24 hours.
Low likelihood of staying in the ICU overnight
Patient is receiving either of the anticholinergic drugs scopolamine or penehyclidine.
Concomitant enrollment in another study protocol that may interfere with data acquisition or reliability of measurements.
Deemed unsuitable for selection by the research team or ICU providers due to any medical, legal, social, or interpersonal issues that would either compromise the study or the routine care of patients.
A. Biosignals Preprocessing
For ECG analysis, we mainly used the open source ‘PhysioNet Cardiovascular Signal Toolbox’ (2). The settings below correspond to the toolbox parameters we used for this study.
Window Size: 300 seconds windows; increment 30 seconds; minimum of 20% of data needs to be good quality for a window to be considered for analysis; maximum 15% of data is allowed to be missing in a window in order to be considered for analysis.
Signal Quality: low quality threshold = 0.9, comparison window length = 10 seconds; increment = 1 second; time threshold = 0.1 seconds; Margin time not to include in comparison = 2 seconds.
Preprocessing: Maximum believable gap in RR intervals = 2 seconds, Percent limit of change from one interval to the next = 100%; outlier method = ‘remove outlier points’, signal quality threshold for good data = 0.9; minimum length of good data = 30 seconds.
Frequency Domain Analysis Settings: ULF = [0 .0033]; VLF = [0.0033 .04]; LF = [.04 .15]; HF = [0.15 0.4]; Power Spectral Estimation Technique: Lomb Scargle.
Peak Detection Settings: Refectory period = 0.2 seconds; energy threshold = 0.15, window size for QRS detection = 15 seconds.
B. Sleep Staging
Latent feature representation of sleep
The last hidden layers’ dimensions (D_LHL) for the Long Short-Term Memory (LSTM) networks were (40, 1) for the HRV-based model and (200, 1) for the breathing-based model. We performed two UMAPs (Uniform Manifold Approximation and Projection) separately for the data resulting from the HRV-based and breathing-based models. The python package UMAP(3) was used to perform the UMAPs, and python packages matplotlib and seaborn were used for visualization.
Pseudo code:
Select either HRV-based or breathing-based model, and concatenate all last hidden layer activations for both ICU and sleeplab data, resulting in an array (‘LHL array’) with shape: (N_ALL, D_LHL) with N_ALL = 693,401, number of epochs from sleeplab data (324, 928) plus number of epochs from ICU data (368, 473), and D_LHL=40 for HRV- and D_LHL=200 for breathing-based model.
Perform unsupervised UMAP with input LHL array and select the first two dimensions of the resulting UMAP embedding, resulting in an array (N_ALL, 2). Parameters: number of neighbors=15, minimum distance=0.1, metric = “euclidean”.
For Figure 3A, plot UMAP embeddings for sleeplab data only, ICU data only, and sleeplab and ICU data combined (three figure panel columns). For each plot, color the data according to the assigned sleep stage.
For Figure 3B, for each panel column, select a sleep stage (W, R, N1, N2, N3) and estimate kernel densities for the UMAP embeddings of both the sleeplab and ICU data. Plot the 10% iso-proportion levels.
Disagreement HRV and breathing model - error analysis
In order to identify features in the cardiovascular and pulmonary system that are associated with discordant sleep, we compute breathing and heart rate variability related features and compare the values of those features for agreeing and discordant sleep. We use the four breathing features as described above: respiratory rate, inter-breath-intervals, variability index, and ventilation CVar. For HRV, we compute a root mean squared successive difference of NN intervals (RMSSD), very low frequency (VLF, 0.0033-0.04 Hz), low frequency (0.04-0.15 Hz), and high frequency (HF, 0.15-0.4 Hz) with a 5-minute sliding window, using Lomb-Scargle periodogram for spectral analysis. We further compute Cardiopulmonary Coupling (CPC)(4), a frequency domain analysis method that combines cross spectral power and coherence of the respiratory and HRV signals. Unlike in the original publication where respiratory signal is estimated from the ECG signal(4), we use the wearable respiratory signal. From the CPC spectrogram, we compute the amounts of low frequency coupling (LFC, 0.01-0.1 Hz) and high frequency coupling (HFC, 0.1-0.4 Hz) with an 8.5-minute sliding window.
Summary features computed:
Heart Rate Variability: NN interval, mean very low frequency power (VLF, 0.0033-0.04 Hz), low frequency power (0.04-0.15 Hz), high frequency power (HF, 0.15-0.4 Hz), RMSSD of NN intervals.
Breathing: inter-breath-interval, respiratory rate, variability index, ventilation coefficient of variation.
Cardiopulmonary Coupling: low frequency coupling (LFC, 0.01-0.1 Hz), high frequency coupling (HFC, 0.1-0.4 Hz).
We computed the mean feature values (e.g. mean NN interval) for concordant and discordant parts for every 24-hour segment (inclusion criteria: at least 2 hours of concordant sleep) and performed a Wilcoxon signed-rank test with an alpha value of 0.01 for each feature pair. Features were computed for each sleep stage. Note: Discordant sleep is partially asymmetric, an epoch that is classified as N2 by model A and as Wake by model B is included for model A analysis here but not for model B analysis. Further, if model A classifies an epoch as N1, and model B as N3, the epoch is treated as ‘N1-discordant’ for model A and ‘N3-discordant’ for model B.
For each 24-hour segment included (inclusion criteria: at least 2 hours of concordant sleep), we obtained the proportion of discordant sleep, which is a number in the [0, 1] interval. To make the variable unbounded and more gaussian, we used the following transformation: We computed the mean, standard deviation, 0.25-, 0.50-, and 0.75 quantiles for each feature for every 24-hour segment and for each sleep stage. This lead to 886 features, and as a consequence, we used the LASSO penalty for multilinear regression. We start with a penalty of 0 and with a step size of 0.1, we increase the penalty until 10. For each penalty value, we train a LASSO model, obtain the variables included in the model and with those variables train a vanilla multilinear regression model for which we obtain the F-test statistic, F-test p-value and the r-squared. The result is shown in Figure S4, indicating overfitted models close to a penalty of 0 and underfitted models close to a penalty of 10. A penalty of 4 results in a regression model that contains 53 variables, an F-statistic of 1.51, a F-test p-value of 0.036, and an r-squared of 0.42. Hence, we obtained evidence that up to 42% of the variance in the discordance sleep proportion over a full 24-hour segment can be explained by HRV and breathing based features that are computed over pooled discordant and concordant parts.
F. Statistical Analysis – Breathing
Footnotes
↵* Co-senior authors
List of abbreviations
- ECG
- electrocardiogram
- HRV
- heart rate variability
- ICU
- intensive care unit
- SOFA
- sequential organ failure assessment
- CCI
- Charlson Comorbidity Index
- TS
- total sleep
- CS
- concordant sleep
- DS
- discordant sleep
- S
- sleep percentage
- R
- stage R
- N1
- stage N1
- N2
- stageN2
- N3
- stage N3
- SFI
- sleep fragmentation index
- WT
- wake transitions per hour of sleep
- AHI
- apnea hypopnea index
- UMAP
- Uniform Manifold Approximation and Projection
- SDB
- sleep disordered breathing
- Std
- standard deviation
- IQR
- inter quartile range
- CVar
- Coefficient of Variation
- p_dag
- p-value D’Agostino’s K-squared test
- p_sha
- p-value Shapiro Wilk test
- p_tt
- p-value Student’s t-test
- s_tt
- t-statistic Student’s t-test
- p_mwu
- p-value Mann Whitney U test
- s_mwu
- Mann Whitney U statistic
- p_medians
- p-value Mood’s median test
- s_medians
- Mood’s Median test statistic