Summary
Background Sleep is essential to life. Accurate measurement and classification of sleep/wake and sleep stages is important in clinical studies for sleep disorder diagnoses and in the interpretation of data from consumer devices for monitoring physical and mental well-being. Existing non-polysomnography sleep classification techniques mainly rely on heuristic methods developed in relatively small cohorts. Thus, we aimed to establish the accuracy of wrist-worn accelerometers for sleep stage classification and subsequently describe the association between sleep duration and efficiency (proportion of total time asleep when in bed) with mortality outcomes.
Methods We developed and validated a self-supervised deep neural network for sleep stage classification using concurrent laboratory-based polysomnography and accelerometry data from three countries (Australia, the UK, and the USA). The model was validated within-cohort using subject-wise five-fold cross-validation for sleep-wake classification and in a three-class setting for sleep stage classification wake, rapid-eye-movement sleep (REM), non-rapid-eye-movement sleep (NREM) and by external validation. We assessed the face validity of our model for population inference by applying the model to the UK Biobank with 100,000 participants, each of whom wore a wristband for up to seven days. The derived sleep parameters were used in a Cox regression model to study the association of sleep duration and sleep efficiency with all-cause mortality.
Findings After exclusion, 1,448 participant nights of data were used to train the sleep classifier. The difference between polysomnography and the model classifications on the external validation was 34.7 minutes (95% limits of agreement (LoA): −37.8 to 107.2 minutes) for total sleep duration, 2.6 minutes for REM duration (95% LoA: −68.4 to 73.4 minutes) and 32.1 minutes (95% LoA: −54.4 to 118.5 minutes) for NREM duration. The derived sleep architecture estimate in the UK Biobank sample showed good face validity. Among 66,214 UK Biobank participants, 1,642 mortality events were observed. Short sleepers (<6 hours) had a higher risk of mortality compared to participants with normal sleep duration (6 to 7.9 hours), regardless of whether they had low sleep efficiency (Hazard ratios (HRs): 1.69; 95% confidence intervals (CIs): 1.28 to 2.24) or high sleep efficiency (HRs: 1.42; 95% CIs: 1.14 to 1.77).
Interpretation Deep-learning-based sleep classification using accelerometers has a fair to moderate agreement with polysomnography. Our findings suggest that having short overnight sleep confers mortality risk irrespective of sleep continuity.
Funding This research has been conducted using the UK Biobank Resource under Application Number 59070. The UK Biobank received ethical approval from the National Health Service National Research Service (Ref 21/NW/0157). We would like to acknowledge the Raine Study participants and their families for their ongoing participation in the study and the Raine Study team for study coordination and data collection. We also thank the NHMRC for their long-term contribution to funding the study over the last 30 years. The core management of the Raine Study is funded by The University of Western Australia, Curtin University, Telethon Kids Institute, Women and Infants Research Foundation, Edith Cowan University, Murdoch University, The University of Notre Dame Australia and the Raine Medical Research Foundation. The 22-year Gen2 Raine Study follow-up was funded by NHMRC project grants 1027449 & 1044840. The data collection for the Pennsylvania dataset is funded, in part, by US National Institute of Health (NIMH) grant R21 MH103963 (MB).
HY, DB, and AD are supported by Novo Nordisk. RW and AD are supported by Health Data Research UK, an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities. AD is additionally supported by Swiss Re, Wellcome Trust [223100/Z/21/Z], and the British Heart Foundation Centre of Research Excellence (grant number RE/18/3/34214). DWR is supported by MRC programme grant MR/P023576/1; Wellcome Trust (107849/Z/15/Z). TP and AR are supported by the National Institute for Health Research (NIHR) Leicester Biomedical Research Centre and NIHR Applied Research Collaboration East Midlands (ARC EM). SDK is supported by the NIHR Oxford Health Biomedical Research Centre, Health Technology Assessment Programme, Efficacy and Mechanisms Evaluation Programme, Programme Grants for Applied Research, and the Wellcome Trust. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
Computational aspects of this research were funded from the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC) with additional support from Health Data Research (HDR) UK and the Wellcome Trust Core Award [grant number 203141/Z/16/Z]. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
For the purpose of open access, the author has applied a CC-BY public copyright licence to any author accepted manuscript version arising from this submission.
Evidence before this study Sleep plays a crucial role in our mental and physical health. Nonetheless, much of our understanding of sleep relies on self-report sleep questionnaires, which are subject to recall bias. We searched on Web of Science, Medline, and Google Scholar from the database inception to June 23, 2023, using terms that included “wearable”, “actigraphy” or “accelerometer” in combination with “sleep stage” or “sleep classification”, and “polysomnography”. Existing studies have attempted to use machine learning to predict both sleep and sleep stages using accelerometry. However, prior methods were validated in populations of small sample sizes (n<100), making the prediction validity unclear. To date, no study has examined variations of accelerometer-derived sleep stage estimates in large population datasets with longitudinal disease outcomes.
Added value of this study We showed that our deep-learning-based method improves sleep staging for wrist-worn accelerometers against the current state-of-the-art. We quantified the model uncertainty in a large multicentre dataset with 1,448 nights of concurrent raw accelerometry and polysomnography recordings. We further demonstrated that our sleep staging method could capture population differences concerning age, season, and other sociodemographic characteristics using a large health database. Shorter overnight sleep duration was associated with an increased risk of all-cause mortality after seven years of follow-up in groups with both low and high sleep efficiencies.
Implications of all the available evidence This study helps clinicians to interpret sleep measurements from wearable sensors in routine care. Researchers can use derived sleep parameters in large-scale accelerometer datasets to advance our understanding of the association between sleep and population subgroups with different clinical characteristics. Our findings further suggest that having a short overnight sleep is a risky behaviour regardless of the sleep quality, which requires immediate public attention to fight the social stigma that having a short sleep is acceptable as long as one sleeps well.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
We would like to acknowledge the Raine Study participants and their families for their ongoing participation in the study and the Raine Study team for study coordination and data collection. We also thank the NHMRC for their long-term contribution to funding the study over the last 30 years. The core management of the Raine Study is funded by The University of Western Australia, Curtin University, Telethon Kids Institute, Women and Infants Research Foundation, Edith Cowan University, Murdoch University, The University of Notre Dame Australia and the Raine Medical Research Foundation. The 22-year Gen2 Raine Study follow-up was funded by NHMRC project grants 1027449 & 1044840. The data collection for the Pennsylvania dataset is funded, in part, by US National Institute of Health (NIMH) grant R21 MH103963 (MB). HY, DB, and AD are supported by Novo Nordisk. RW and AD are supported by Health Data Research UK, an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities. AD is additionally supported by Swiss Re, Wellcome Trust [223100/Z/21/Z], and the British Heart Foundation Centre of Research Excellence (grant number RE/18/3/34214). DWR is supported by MRC programme grant MR/P023576/1; Wellcome Trust (107849/Z/15/Z). TP and AR are supported by the National Institute for Health Research (NIHR) Leicester Biomedical Research Centre and NIHR Applied Research Collaboration East Midlands (ARC EM). SDK is supported by the NIHR Oxford Health Biomedical Research Centre, Health Technology Assessment Programme, Efficacy and Mechanisms Evaluation Programme, Programme Grants for Applied Research, and the Wellcome Trust. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. Computational aspects of this research were funded from the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC) with additional support from Health Data Research (HDR) UK and the Wellcome Trust Core Award [grant number 203141/Z/16/Z]. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This research has been conducted using the UK Biobank Resource under Application Number 59070. The UK Biobank received ethical approval from the National Health Service National Research Service (Ref 21/NW/0157).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.