Detecting sleep in free-living conditions without sleep-diaries: a device-agnostic, wearable heart rate sensing approach
========================================================================================================================

* Ignacio Perez-Pozuelo
* Marius Posa
* Dimitris Spathis
* Kate Westgate
* Nicholas Wareham
* Cecilia Mascolo
* Søren Brage
* Joao Palotti

## Abstract

**Study Objectives** The rise of multisensor wearable devices offers a unique opportunity for the objective inference of sleep outside laboratories, enabling longitudinal monitoring in large populations. To enhance objectivity and facilitate cross-cohort comparisons, sleep detection algorithms in free-living conditions should rely on personalized but device-agnostic features, which can be applied without laborious human annotations or sleep diaries. We developed and validated a heart rate-based algorithm that captures inter- and intra-individual sleep differences, does not require human input and can be applied in free-living conditions.

**Methods** The algorithm was evaluated across four study cohorts using different research- and consumer-grade devices for over 2,000 nights. Recording periods included both 24-hour free-living and conventional lab-based night-only data. Our method was systematically optimized and validated against polysomnography (PSG) and sleep diaries and compared to sleep periods produced by accelerometry-based angular change algorithms.

**Results** We evaluated our approach in four cohorts comprising two free-living studies with detailed sleep diaries and two PSG studies. In the free-living studies, the algorithm yielded a mean squared error (MSE) of 0.06 to 0.07 and a total sleep time deviation of −0.60 to −14.08 minutes. In the laboratory studies, the MSE ranged between 0.06 and 0.10 yielding a time deviation between −23.23 and −33.15 minutes.

**Conclusions** Our results suggest that our heart rate-based algorithm can reliably and objectively infer sleep under longitudinal, free-living conditions, independent of the wearable device used. This represents the first open-source algorithm to leverage heart rate data for inferring sleep without requiring sleep diaries or annotations.

**Statement of significance** Sleep studies in free-living conditions are becoming easier to scale due to advances in sensor technology and affordability of consumer-grade wearables. Adoption has been driven by interest in long-term health monitoring, with incentives for person-generated health data insights. Historically, sleep classification algorithms have been trained on manually annotated records from selected cohorts with monophasic, night-time sleep data. Cultural preferences and shift work mean standard approaches might fail to reveal unbiased insights over 24-hour free-living conditions. We leverage the heart rate signal recorded by most state-of-the-art wearables and evaluated our device- and annotation-agnostic labeling approach across four separate study populations against gold-standard PSG or sleep diaries. Results show that heart rate-derived labels preserve the accuracy demanded in clinical sleep studies without the need for effort-intensive human annotations. This has potential applications in longitudinal studies and personalized medicine.

Key words
*   wearable
*   sleep tracking
*   ambulatory sleep monitoring
*   polysomnography
*   sleep diary
*   validation
*   mathematical modeling of sleep
*   sleep epidemiology

## 1 Introduction

Human sleep is a physiological reversible state that is homeostatically regulated and vital for health and performance (1). The functions of sleep are not fully understood but its influence on energy restoration, brain function, cognitive performance and behaviour, alongside interactions with the immune system, promotion of healing and consequences for numerous health conditions have been studied extensively (2; 3; 4; 5; 6; 7; 8; 9; 10). As a consequence of its personal and public health significance, objective monitoring of sleep is paramount such that we can further understand its role in human health and behaviour. The gold-standard method to monitor sleep and diagnose most sleep disorders is PSG. PSG involves the collection and conveyance of different signals from many sensors operating simultaneously. Traditional PSG is limited to laboratory settings and requires an overnight stay for one or two days, expensive and obstructive equipment and trained laboratory technicians. These factors limit its scalability and prevent its use in objective sleep monitoring in large-scale population studies, as well as for long-term surveillance. Furthermore, the unfamiliar environment in which PSG monitoring takes place may result in atypical sleep that does reflect the study participants’ or patients’ typical pattern (11).

Actigraphy is a well-established and widely used method to objectively detect sleep non-obtrusively and longitudinally. This method, as well as its modern counterpart, accelerometry, are often integrated into wrist-worn wearable devices, offering a scalable and affordable alternative to PSG (12; 13). Actigraphy has its precedent in early telemetric measurements of motor activity in the 1970s which were used to assess sleep quality (14). Since then, a vast number of studies have assessed the use of actigraphy for sleep monitoring against PSG (12; 15; 16).

Over the past 30 years, a number of actigraphy-based algorithms have been derived with nocturnal sleep-wake scoring, showing strong validity and reliability against PSG (17; 18; 19; 20; 21; 22). These algorithms have been readily used ever since and were recently benchmarked against both each other and newer machine learning and deep learning methods, highlighting the strengths and limitations of each method (23; 24). Across multiple studies that evaluated the performance of actigraphy against PSG, it was shown that actigraphy struggled to classify wake events during the sleep period, yielding poor specificity (23; 13; 21; 25; 26). Similarly, these actigraphy-based algorithms were only optimised for nocturnal sleep-wake scoring, thus, they face a major challenge when applied to 24 hour recording. In order to work, they require additional information from expert sleep scorers or reliable sleep diaries (27; 28). This requirement is also observed in most proprietary commercial brand algorithms, requiring the user to report habitual sleep times when they first set up a profile. Further, these algorithms do not allow the assessment of daytime sleep, severely limiting their relevance in cultures where multiple sleep episodes are common or amongst shift workers.

Wearable devices for both research and consumer applications have increasingly adopted multimodal sensing capabilities, combining movement and cardiac sensing, usually through actigraphy or accelerometry and photoplethysmography (PPG), respectively. These devices exploit recent advances in microelectromechanical systems (MEMS) and the associated improvements in cost, battery capacity, and increased memory. Thus, they are attractive not only for personal health monitoring, with large technological companies investing heavily in the space, but also in large epidemiological studies, as exemplified by the “All of US” research program (29). Given the widespread adoption of consumer-grade wearable devices and their potential in large scale cohorts, the need for validation against gold-standard sleep measures has become imperative (30). Indeed, recent studies have shown that consumer and research-grade multimodal devices can be used to predict not only sleep-wake but also sleep stages during the night period (31; 32). Furthermore, these multimodal approaches have been shown to improve the performance of models that only employ movement data in large populations, likely due to their ability to measure changes in autonomic nervous system activity reflected in heart rate (HR) and heart rate variability (HRV) (33; 24).

Whilst these approaches are valuable, they have limited applicability in other large, free-living cohort studies for three main reasons. First, they rely on machine learning and deep learning methods which were derived specifically for those datasets, hence necessitating domain adaptation to be appropriately used in a different population or device. Second, in common with all previous well-established and widely used algorithms, they are only derived for the night period, limiting their ability to infer sleep in less regular sleepers, including shift workers (34). Finally, these approaches rely on self-reported habitual sleep data, either through questionnaires (i.e., how much sleep you get on a typical weekday), or sleep diaries, which typically record times of getting into and out of bed. Both self-report measures are prone to recall bias, with survey data not providing sufficiently reliable labels (35; 36). Of note, when using sleep diaries, it can often take in excess of 6 recorded days to achieve an agreement with objective labels, even amongst those with more regular sleep patterns(37). At present, the typical recording time for large-scale studies is about one week, contingent on device battery life. Thus, annotation and device/cohort-independent algorithms have the potential to infer sleep from objective 24-hour sensor data in a hitherto unprecedented manner.

In this work, we leverage heart rate data which can be obtained from most commercial and research grade wearable devices to develop a universal set of statistical attributes and an algorithm. We use these to infer both sleep periods and awakenings. In contrast to machine learning methods that require data to be sent to the cloud and large computational resources, our method does not require this process or training before deployment, making it an attractive candidate to run directly on devices. This also limits the privacy issues associated with the transfer of personally generated data that are of paramount concern due to the nature of this data. The approach is also independent of device-type and make and was evaluated in four separate settings. First, the approach was developed in a large population (n = 193) with multiple nights of recording accompanied by detailed sleep diaries. This cohort wore a combined heart rate and movement sensor, in addition to a variety of accelerometers (both wrist and hip), facilitating the comparison of the performance of our method against both the diaries and angular postural changes in multiple anatomical locations. We chose to develop the algorithm in this population because we were able to evaluate ≈ 8 nights of sleep per participant, enabling testing of both inter and intra-individual variability. We then assessed the performance of our method in a large, diverse, open-source dataset with PSG data (n = 1,743). Moreover, to showcase the performance of our method in a readily available commercial device, we validated the approach in a cohort that wore an Apple Watch and concurrent PSG (n = 31). Finally, the performance of our method in free-living conditions was further validated in a separate population against detailed, non-habitual sleep diaries that also wore a triaxial accelerometer and heart rate sensor (n = 22).

## 2 Methods

### 2.1 Datasets

Here we used four different datasets using a variety of devices and populations to showcase the performance of our proposed method.

#### 2.1.1 The UK Biobank Validation Study (BBVS)

Participants of the BBVS study were recruited from the Fenland study (38). In brief, 193 participants were recruited between the ages of 40 and 70, with a BMI between 20 and 50·kg · m−2. Recruitment aimed to balance age, sex, and BMI distributions. Participants were invited to attend an assessment centre on two separate occasions, separated by a free-living period of 9 to 14 days during which they wore three waveform triaxial accelerometers (dominant and non-dominant wrists and thigh) as well as a combined movement and heart rate sensor. During the free-living period, participants were asked to keep a detailed log of their sleep, by recording the time they fell asleep and woke up on a daily basis. Ethical approval for the study was obtained from Cambridge University Human Biology Research Ethics Committee (Ref: HBREC/2015.16). All participants provided written informed consent. Full details of the BBVS study are described elsewhere (39).

#### 2.1.2 Multi-Ethnic Study of Atherosclerosis (MESA)

The Multi-Ethnic Study of Atherosclerosis (MESA) is a multi-site prospective study that includes 6,814 men and women who identify as White, Black/African American, Hispanic, or Chinese, and are between the ages of 45-84 (40; 41). Participants in this study were followed prospectively to evaluate risk factors for cardiovascular disease. 2,237 MESA participants are enrolled in a sleep exam (MESA Sleep Ancillary Study (42)), which includes seven days of wrist-worn actigraphy, one full overnight unattended polysomnography (wrist-worn actigraphy collected concurrently), and a sleep questionnaire. MESA participants who reported regular nighttime use of nocturnal oxygen or positive airway pressure devices were excluded from participation.

All data used from the MESA Sleep Ancillary study used in this work is publicly available from the National Sleep Research Resource repository1. Institutional Review Board approval was obtained at each MESA study site (Wake Forest University School of Medicine, Northwestern University, University of Minnesota, Columbia University, University of California Los Angeles and the Johns Hopkins University). All participants provided written informed consent.

A number of common sleep disorders were identified and logged for the MESA sleep study, representing numbers that are close to their real prevalence in similar populations. A breakdown of those diseases is presented in Table S1.

#### 2.1.3 PhysioNet Apple Watch Polysomnography Study

Data for this study was collected at the University of Michigan between 2017 and 2019. The study consisted of 39 healthy subjects with no prior diagnosis of sleep-related breathing disorders, parasomnias, restless leg syndrome, central disorders of hypersomnolence, peripheral vascular disease, cardiovascular disease, vision impairments not correctable by glasses or contact lenses or other disorders that could cause neurological or psychiatric impairment. The study also excluded on the basis of shift work and recent transmeridian travel. Furthermore, participants were ruled out on the basis of excessive daytime sleepiness according to the Epworth Sleepiness Scale, and after the PSG visit, participants which showed symptoms of either obstructive sleep apnoea or REM sleep behaviours were also excluded. A total of 31 subjects met the required criteria. Data for the study can be obtained through Physionet (43) and a detailed description of this data set is available elsewhere (31).

Participants in this study wore an Apple Watch to collect their activity patterns for 7 to 14 days before spending one night in a sleep lab. During the final night, participants underwent a PSG study while wearing the Apple Watch device (which collected HR and triaxial acceleration). The study was approved by the University of Michigan Review Board and all participants provided written informed consent.

#### 2.1.4 The Multilevel Monitoring of Activity and Sleep in Healthy people (MMASH)

Data for the MMASH study was collected by BioBeats in collaboration with researchers from the University of Pisa and was obtained through Physionet (43; 44). The study collected data from 22 healthy young adult male participants comprising continuous heart rate and triaxial accelerometry monitoring as well as a variety of questionnaires to assess their physical activity, psychological and sleep characteristics as well as a detailed sleep diary. Participants also recorded their perceived mood (Positive and negative Affect Schedule-PANAS), Daily Stress Inventory (DSI) during the free-living protocol and completed a Morningness-Eveningness Questionnaire (MEQ), State-Trait Anxiety Inventory (STAI-Y), Pittsburgh Sleep Quality Questionnaire Index (PSQI) and Behavioural avoidance/inhibition (BIS/BAS) during their clinic visit. Further, anthropomorphic characteristics were recorded. All data was processed and recorded by sport and health scientists with the objective of assessing psychophysiological response to stress stimuli and sleep.

All participants provided written informed consent. Information was provided to them regarding the research protocol in accordance with General Data Protection Regulation: Regulation − EU 2016/679 of the European Parliament and of the Council 27/04/2016. Further, all experiments conducted were in accordance with the Helsinki Declaration as revised in 2013, the study was approved by the Ethical Committee of the University of Pisa (#0077455/2018).

Table 1 summarizes the types of wearable devices and ground truth used in each one of the studies.

View this table:
[Table 1:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/T1)

Table 1: 
Summary of population size and devices used in the different datasets.

### 2.2 Data processing

#### 2.2.1 BBVS

Participants were fitted with a combined heart rate and movement sensor (Actiheart, CamNtech, Cambridgeshire, UK), measuring heart rate and uniaxial acceleration of the trunk every 15 seconds (45). In addition, participants were fitted with three waterproof triaxial accelerometers (AX3, Axivity, Newcastle upon Tyne, UK); one device was attached to each wrist with a standard wristband, and one to the anterior midline of the right thigh using a medical-grade adhesive dressing. These devices were set up to record raw, triaxial acceleration at 100Hz with a dynamic range of ±8*g*. BBVS participants were asked to wear all four devices continuously for the following 8 days and nights while continuing with their usual activities. In addition, they were asked to complete a diary of their sleep onset and wake times daily. This ensured that any small changes in onset and offset of sleep were captured during the recording period.

Following the download of the devices, the combined sensor heart rate data was cleaned and non-wear periods identified by the combination of non-physiological heart rate and prolonged periods of no movement (46). All signals from the triaxial accelerometers were re-sampled to a uniform 100Hz signal by linear interpolation, and then calibrated to local gravity using a well-established technique (47; 48). Periods of non-wear were classified on the basis of windows comprising an hour or more wherein the device was inferred to be completely stationary, where stationary is defined as standard deviation in each axis not exceeding the approximate baseline noise of the device itself (13·milli-*g*). All non-wear periods were removed from the analysis. Additionally, pitch, roll and z-angles for all three accelerometry devices were calculated enabling angular postural assessments and direct comparisons to previously established approaches which only rely on acceleration data (49; 27). The residual acceleration signal can be interpreted as a measurement of the rotated gravitational field vector which can then be used to determine the accelerometer’s orientation angles (the conventional pitch and roll and z-angle, defined as the dorsal-ventral direction (49; 27)). Angles for each device were derived according to these formulae: ![Formula][1]</img>  ![Formula][2]</img>  ![Formula][3]</img>  

The accelerometry and heart rate signals were summarized to a common time resolution of one observation per 30 seconds and the time-series were aligned. Participants were excluded from the final analysis if they had less than 72 hours of concurrent wear data (three full days of recording from all four devices). Participants with less than 3 nights of concurrent wear and diary data were excluded from the final analysis. After these pre-processing steps the resulting analytical sample was of 158 participants. Three of these participants were on cardioreactive medication and two were taking betablockers.

#### 2.2.2 MESA

The MESA Sleep Study was conducted using a Compumedics Somte System for PSG, which includes the ECG signals here used to derive HR and HRV and their associated features, alongside an Actiwatch Spectrum from Philips Respironics (Pennsylvania, USA) to record actigraphy data. This device captures measurements of movement defined as “activity counts”2 and aggregates them into 30 second epochs. The Actiwatch was securely fastened to participant’s non-dominant wrist. These actigraphy signals and their associated features can be derived in most research-grade wearable devices.

The sensors for the Compumedics PSG comprised: cortical EEG, bilateral EOG, chin EMG, abdominal and thoracic respiratory inductance plethysmography, airflow, ECG, leg movement sensor and finger pulse oximetry. These sensors collected three types of signals: bioelectrical potentials (EEG, EOG, EMG, ECG), waveforms received from transducers (thermistors on the airflow devices, inductance respiratory bands, piezo leg sensors and position sensors from the leg device) and auxiliary devices (oximetry measures of oxyhemoglobin saturation and nasal pressure records). Full details of the setup, protocol and sampling rates are available3,4. All participants included in our study had at least one full night of PSG recording with concurrent actigraphy and ECG. All nocturnal recordings were transmitted to a centralized reading center at the Brigham and Women’s Hospital (Boston, MA, USA) and data was scored by trained technicians using AASM guidelines.

For this study, we synchronized PSG, ECG and actigraphy records into 30-second sleep epochs for a subset of 1,743 out of the 2,237 participants included in the original study. A total of 494 participants were excluded on the basis of: (1) lack of concurrent PSG, ECG and actigraphy data; (2) lack of sufficient quality standard data (<3h of usable data from the concurrent three sensing methods); or (3) lack of data integrity or misalignment of data, removing the resulting actigraphy outlier epochs based on human expert annotations. These outliers resulted from either non-wearing periods or equipment failure periods. For actigraphy epochs labeled as outliers, their corresponding HR/HRV epochs were also removed (50). Further, to ensure that our evaluation was fair, we only included participants who had at least 30 minutes of wake time prior to sleep onset and a maximum of 240 minutes after sleep offset, resulting in a total of 1,154 participants.

To obtain HR information, we used the QRS complexes (R-points) detected using Compumedics Somte (Abbotsford, VIC, Australia) software Version 2.10 (Builds 99 to 101). The R-points were classified as normal sinus, supraventricular premature complex or ventricular premature complex. For the data cleaning, filtering and noise removal, we used the Python package HRV-analysis5. First, RR interval outlier data was filtered using a threshold method, with a range between 300 to 2000 ms, based on the approach previously described by Tanaka et al. (51), then ectopic beats were removed by through the methods described in Malik et al. (52). After this step was completed, we linearly interpolated the removed R-points and we grouped the RR intervals into 30 seconds epochs.

#### 2.2.3 PhysioNet Apple Watch

For the PhysioNet Apple Watch study, Apple Watch raw triaxial acceleration data (x, y, z axis measured in *g*) at a 50Hz resolution was converted into angular postural based metrics like the ones described on BBVS.

The Apple Watch measures HR in beats per minute, sampling every several seconds through its PPG sensor. For our analysis, we down-sampled HR by taking the mean of all samples within 15-second windows. For the PhysioNet Apple Watch study, the laboratory technicians started a “recording” period for the watch before the PSG recording started. For our final analysis, we only included participants whose sleep onset and offset were greater than 10 minutes from the start and end of the recording period, respectively. Through this process we intended to introduce a more realistic setting for our model. Details on the laboratory PSG settings can be found elsewhere (31). The final cohort consisted of 22 participants.

#### 2.2.4 MMASH

The 22 MMASH participants were fitted with two devices for continuous recording during 2 days: a heart rate monitor (Polar H7, Polar Electro Inc., Bethpage, NY, USA) which recorded beat-to-beat intervals and was used to obtain HR data and a triaxial accelerometer (ActiGraph wGT3X-BT − Acti-Graph LLC, Pensacola, FL, USA) was worn on the wrist. Participants were asked to wear the devices continuously during the duration of the protocol and to complete a diary of their sleep onset and wake up times during the recording period. For MMASH we followed the same pre-processing, data quality and noise removal protocols that we described in BBVS for both the triaxial accelerometry signal and the HR signal. Two participants were removed from analysis on the basis of missing diary entries.

A description of the cohorts we analysed and the wearable devices used to record data in each study is available in Table 1.

### 2.3 Algorithm to estimate the sleep window using heart rate

Several challenges must be accounted for when developing a method for the detection of sleep in free-living conditions. First and foremost, most methods derived for sleep-wake classification using wearable devices have been derived on and for use during the night period (20; 19; 23; 17; 31). These approaches were mostly conducted in small studies using concurrent PSG and as such, their application during the full day period greatly compromises the quality of the results. They also tend to be optimized in small, non-diverse populations, comprising their generalizability to other cohorts. Moreover, they tend to be device and make specific, often requiring conversions into arbitrary activity intensity measures or counts. Finally, most algorithms that can be applied during the 24 hour period require sleep diaries or questionnaires for guidance, which are often biased and burdensome to obtain (53).

Here we introduce a simple approach to estimate sleep window leveraging the HR sensing capabilities that most modern wearables have. One of the major challenges presented by large cohort studies is inter-individual differences. For instance, individuals who are fitter, tend to have lower resting heart rates than those who are not as fit (54). Hence, an approach that relies on HR signals should not follow a *one size fits all*, but rather adapt to each individuals’ own heart rate profiles. To account for these considerations, we use the empirical cumulative distribution function (ECDF) of each individual’s daily heart rate profile. This function, *F* (*x*), is the probability that for each individual their heart rate takes a value *x* such that: ![Formula][4]</img>  for every sequence *i* = 1,…, *n*. Namely, *F* (*xo*) is the probability of the event {*Xi* ≤ *x*}. In this case, *x* is a threshold heart rate value (in beats per minute). To estimate the probability of a given event, we turn to the ratio of such an event given an individual’s daily sample of heart rates. This results in: ![Formula][5]</img>  as the estimator of *F* (*xo*), that is the ratio of HR less than *x*, where *I*() is the *indicator function*.

Thus, for every *x*, we can use such quantity as an estimator, so the estimator of the cumulative distribution function, *F* (*x*) is ![Graphic][6]</img>, which is referred to as the *empirical cumulative distribution function*.

By using the HR cumulative distribution function for each participant and each day of recording, our method accounts for inter- and intra-individual variation. It can adjust to different levels of fitness which often result in different resting HR during sleep (54). Further, an elevated resting heart rate (RHR) accompanied by a fever is a well-known response to infection (55), alcohol consumption (56), stress (57) and can even be used to monitor influenza-like illness (58), something that our approach would account for. The method contains no in-built assumption of absolute time for the sleep window, and can therefore be used in night shift-workers and non-monophasic sleepers (those whose have more than one principal sleep windows in a 24-hour period) where the circadian HR rhythm is shifted so that most of the lower HR values still occur during sleep independent of the absolute time window when their sleep takes place. An example of our method applied to a shift worker can be observed in Supplementary Figure S3.

The first step of our heart rate sleep algorithm involves pre-processing the time series by assigning binary wake/sleep labels whenever the participant’s heart rate dips above/below a specific quantile threshold (*Q*). The threshold value is calculated from the ECDF over 24-hour windows arbitrarily starting at 15:00 each day. Figure 3 showcases this cutoff for the full BBVS population based on two intervals (full day and from 21:00 to 11:00, a conventional night). Wake/Sleep labels are then smoothed with a 5-minute rolling median and the length of their sequences is calculated. Sequences of sleep labels that are longer than a minimum length (*L*) are extracted and merged with other sleep sequences if their gap is smaller than a pre-defined length (*G*). We study the behavior of the parameters *Q, L* and *G* for each dataset with the goal of finding the best possible combination. To be eligible as part of the final sleep window, the sleep sequence must not be preceded by more than 90 minutes of wake in the previous 4 hours of recording. The limits of the merged sleep sequences then guide a search (in a window starting 240 minutes before and 60 minutes after) for epochs with high HR volatility. This HR volatility threshold is defined as a rolling 10-minute standard deviation of the HR signal of 6 beats per minute. Defining the final sleep window limits as the last, and first high volatility epochs for sleep onset and offset, respectively, is meant to increase the algorithm’s sensitivity at discriminating sedentary time just before or after sleep (e.g. reading in bed) from the sleep window itself.

Finally, the algorithm also labels naps and awakenings, but these were not used in the analysis of the present datasets. Naps are the initial sleep sequences that lie outside a buffer 180-minute window either side of the main sleep window. For awakenings, the algorithm labels all the epochs when the HR rises above a quantile threshold *AV* extracted from the daytime (8am − 10pm) HR ECDF. From these only the sequences longer than 5 minutes are kept and then the sequences separated by less than 5 minutes of sleep are merged and then labeled as the final awakenings.

Pseudocode for the approach is provided in the Supplementary Material 1. A visual overview of the algorithm is provided in Figure 1 and Figure 2 showcases the application of the algorithm to a random participant trace.

![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/08/2020.09.05.20188367/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/F1)

Figure 1: Heart rate sleep algorithm description.
The approach can be broken down into three distinct steps. The first step, involves obtaining the wearable sensor HR data, pre-processing that data and setting initial sleep blocks through ECDF quantile thresholds *Q*. Blocks longer than *L* minutes are kept and merged with other blocks if their gap is smaller than *G* minutes. We extract the limits of the resulting blocks as sleep candidate for sleep onset and offset. Next, rolling heart rate volatility is used to refine these candidate times by finding nearby periods where this volatility is high. Finally, nap and awakenings are labeled, the former coming from the candidate sleep blocks not included in the largest sleep window, while the latter are short periods (<60 minutes) within the sleep window when the heart rate exceeds the daytime threshold. A detailed description of this algorithm and parameters used can be found in the methods section.

![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/08/2020.09.05.20188367/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/F2)

Figure 2: Heart rate sleep algorithm in action for a participant chosen at random.
The first step involves setting initial sleep blocks through ECDF quantile thresholds (in this experiment, *Q* = .35). Blocks longer than *L* = 40 are kept and merged if the gap between blocks is smaller than *G* = 60 minutes. We extract the limits of the resulting blocks as candidate state changes. The bottom panel highlights the use of rolling heart rate volatility to refine these candidate times by finding nearby periods where this volatility is high. The resulting candidate times designate each day’s main sleep window.

![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/08/2020.09.05.20188367/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/F3)

Figure 3: Cumulative distribution function for BBVS heart rates.
The figure shows the HR ECDF for the full-day across all participants and all days, where the yellow dotted line shows the 0.35 HR quantile cutoff. Each individual line represents one participant for one day of recording.

## 2.4 Validation of the proposed approach

We used the four previously described cohorts to validate our method against gold-standard measures of sleep using PSG (MESA, Apple Watch PhysioNet) and detailed silver-standard measures through sleep diaries, as opposed to habitual sleep diaries which could be subject to recall bias (BBVS, MMASH). Although an ideal experimental protocol would have multiple days of PSG and free-living wearable sensor data, detailed sleep diaries allowed us to evaluate the algorithm across more than one or two nights, showcasing the strength of our method at discerning both inter- and intra-individual variability.

We performed epoch by epoch evaluation on all four cohorts and derived comparisons regarding the performance of our method with regards to total sleep time (TST), sleep onset and sleep offset time.

### 2.4.1 Evaluation metrics

The following performance metrics were used to evaluate against the ground truth in each study: differences in onset/offset/total sleep block duration (minutes), mean square error (MSE) and Cohen’s kappa. We evaluated our algorithm systematically for individual HR CDF quantiles *Q* ∈ [0.25, 0.9] with step size 0.025, window lengths *L* ∈ [25, 50] minutes with step size of 2.5 minutes, and gap between blocks *G* ∈ [30, 360] minutes with step size of 30 minutes, optimizing for MSE.

We defined MSE as: ![Formula][7]</img>  where *algo* and *ground truth* are the binary labels for that epoch (1 for sleep, 0 for wake) out of *n* epochs in each subject’s heart rate time series. Epoch length is specified by the different study cohorts (1 minute in BBVS, 30 seconds in MESA and 15 seconds in both PhysioNet Apple Watch and 5 seconds in MMASH). Thus, if the sleep windows found by the HR algorithm match the ground truth labels exactly, *MSE* = 0. If the algorithm labels all epochs as wake, then MSE is the proportion of sleep in the time series according to ground truth, while if the algorithm and ground truth labels diverge entirely, MSE will be the sum of their sleep proportions out of the total time series. For all four cohorts we performed systematic parameter optimization for best MSE on the basis of quantile, window length and window merge values. We also computed Cohen’s kappa, which is used to determine the classifier agreement with ground truth (PSG or sleep diary), relative to chance (59). Cohen’s kappa is calculated through (*po* −*pe*)/(1−*pe*), where p*o* stands for the percentage of observed classifications with agreement, and p*e* is the percentage of classifications from hypothetical chance agreement. Finally, tests of statistical significance were conducted using a two-tailed t-test (60).

### 2.4.2 Evaluation with sleep diary and angle change: BBVS

In the BBVS study, participants wore a variety of wearable devices and recorded the time they went to bed and woke up on a daily basis, providing detailed sleep diaries. As such, we conducted two types of evaluations on this cohort.

#### Evaluation with sleep diary

First, we compared the performance of our method against those sleep diaries. For our evaluation, we only included participants who had filled out those diaries and had more than three days of concurrent sensing and diary data. We evaluated our model against the diaries in terms of total sleep time, sleep onset and offset.

#### Evaluation with angle change algorithm

We assessed the performance of our approach versus an angular change algorithm inspired by previous work (27; 49). The angular change approach started with calculating the pitch, roll and z-angle using triaxial acceleration for the device being evaluated. To isolate the gravitational acceleration for each axis, we applied a low-pass filter (0.2 Hertz) to each of the three axes (X, Y and Z) of every recording being evaluated.

Pitch, roll and z-angles were then calculated and the difference between successive epoch values was then smoothed using a 5 minute median rolling window. A threshold method (< 10*th* percentile of values in that given day *·* 15) was applied to both columns, dividing the time series into initial sleep and wake blocks. Of these blocks, only those larger than 30 minutes were kept. Blocks separated by less than 60 minutes were then merged and the largest block was deemed as the main sleep block within the day (27).

Two different angular change evaluations were performed, first, the intersection of the epochs when both pitch and roll calculations agreed on a sleep label created a voting system for a more reliable final sleep window. Alternatively, z-angle only measures were used to generate those sleep metrics as previously described (27). No significant difference was found when comparing the performance of these two different approaches, so we only report the values obtained from the z-angle measures. All the previous steps were done separately for each limb (dominant and non-dominant wrists, and thigh) on which BBVS participants wore a device.

In BBVS, HR is recorded continuously across the 24-hr period. Thus, the threshold quantile is expected to be lower the longer the sampling interval for the ECDF given that sleep occupies a smaller proportion of the total interval being evaluated. To evaluate the effect of the chosen ECDF, we analyzed the optimal thresholds and their associated results to better understand how parameter choice may affect the performance of our approach.

### 2.4.3 Evaluation with polysomnography and sleep diary: MESA

#### Evaluation with polysomnography

The recording time for PSG started when the subject’s setup was complete, yielding a period of sedentary wakefulness prior to sleep onset. While in an ideal scenario the participant would have been subject to ground truth recording also during the day, this is not a possibility given the nature of PSG. However, this limitation was addressed by evaluating PSG against sleep diary on the same dataset and evaluating our approach against both PSG and diary data. For this evaluation we compared the resulting sleep blocks from PSG, defined as epochs where the participant was in either NREM (N1, N2, N3) or REM sleep, to the sleeping window obtained through our HR algorithm.

Further, in MESA, we explored how our algorithm performed in healthy participants versus participants with sleep disorders. To do so, we first evaluated in the full cohort (n = 1,154) and then on the subset of participants with (n = 189, 16.4%) and without (n = 965, 83.6%) any sleep disorders. The goal of this analysis was to caution and inform about potential limitations that our method may have when evaluating in diseased participants.

#### Evaluation with sleep diary

PSG derived sleeping windows were compared to sleep diary records in the MESA cohort. This comparison allowed us to further understand the deviations of habitual self-reported sleep to objectively monitored, ground-truth through PSG. For the evaluation we use the same metrics as previously explored in the evaluation against PSG.

## 2.4.4 Evaluation with polysomnography and angle change: PhysioNet Apple Watch Polysomnography Study

### Evaluation with polysomnography

The PhysioNet Apple Watch study provided a unique opportunity to test our method in a commercial-grade wrist-worn wearable sensor that was concurrently worn during PSG. For this study, we used the same evaluation method explored in MESA, exploring our method based on the night-time concurrent recordings of wearable HR and PSG.

### Evaluation with angle change algorithm

Given the multimodal nature of the study, we evaluated both the HR based algorithm and the angular change based algorithm on this population. For this evaluation we followed the same procedure as previously described on BBVS.

## 2.4.5 Evaluation with sleep diary and angle change: MMASH

In the MMASH study, participants wore an HR strap and triaxial wrist accelerometer and recorded detailed sleep diaries including the time they fell asleep and woke up, which was filled on a daily basis. For this cohort, we also conducted two types of evaluation following the procedures used during the BBVS evaluation.

### Evaluation with sleep diary

First, we compared the performance of our method against the sleep diaries of each participant. We evaluated our approach against the sleep diaries in terms of total sleep time, sleep onset and offset.

### Evaluation with angle change algorithm

Similar to our second evaluation in BBVS, we also assessed the performance of our approach against the angular change approach previously described.

## 2.5 Code availability

The implementation of the HR sleep period method described in this paper is available in an open-source Python 3.7 package ([https://github.com/placeholder](https://github.com/placeholder)). This package also provides with a number of tools to analyze a variety of wearable devices and derive sleep, circadian rhythms and physical activity inferences. The Python code used for the evaluations with PSG can be found here: ([https://github.com/placeholder2](https://github.com/placeholder2)).

## 3 Results

### 3.1 Evaluation of the algorithm in the BBVS

The results of the evaluation on the BBVS study are summarized in Table 2 and figures summarizing the results of the optimal parameter search for this cohort can be found in the Supplementary Material (Figure S1). Our HR algorithm estimated TST on average 0.60 minutes less than those reported through sleep diary. The optimal quantile was 0.35 yielding an MSE of 0.06. The angular change approach had an overestimation of 125.37 minutes on the best performing wrist-worn device (non-dominant wrist). The results across all three accelerometers for this approach were comparable as summarized in Table S2, each yielding an MSE of 0.10.

View this table:
[Table 2:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/T2)

Table 2: Comparison of HR and angle change algorithm performance for the BBVS dataset.
In this table, the angle change algorithm presented was applied on data from the device worn on the non-dominant wrist (ndw). Results for devices worn on other limbs are available in the Supplementary Material. The lower the MSE, the better; the higher the Cohen’s Kappa, the better; the closer to zero the time differences, the better. BBVS TST for diaries mean ± 95% CI = 7.739 ± 0.073 hours (464.34 ± 4.38 minutes).

Our HR model estimated sleep onset on average 1.14 minutes later than sleep diary while the angular change approach resulted in an average underestimation of 60.17 minutes. For sleep offset, our HR algorithm the estimation was on average 0.54 minutes after, while for the angular change approach that estimation was 65.20 minutes earlier for the non-dominant wrist. Finally, modified Bland-Altman plots for the HR and angle approaches against sleep diary for the BBVS cohort are presented in Figure 4.

![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/08/2020.09.05.20188367/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/F4)

Figure 4: Modified Bland-Altman plot for BBVS.
Modified Bland-Altman plot on the left shows the TST differences (delta) between the HR algorithm and diary in the Y-axis and the X-axis shows the TST average for every participant. The figure to the right shows the same comparison for the angle algorithm and diaries in BBVS. TST: total sleep time

### 3.2 Evaluation of the algorithm in the MESA study

In MESA, we evaluated our algorithm against polysomnography for the full population, the subset of the population that were deemed healthy sleepers and those that were diagnosed with sleep disorders, a full breakdown of the results is presented in Table 3. Further, we evaluated the performance of sleep diaries against polysomnography in this study as well. Modified Bland-Altman plots for both the evaluation of our method against PSG and the diaries are presented in Figure 6.

View this table:
[Table 3:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/T3)

Table 3: Results for the MESA dataset.
Both the HR algorithm and sleep diaries are evaluated against PSG. Results are also shown for the subset of healthy participants and participants with sleep disorders. MESA TST for PSG mean ± 95% CI = 7.433 ± 0.079 hours (445.95 ± 4.71 minutes).

![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/08/2020.09.05.20188367/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/F5)

Figure 5: Example participant (chosen at random), showcasing estimated sleep through the heart rate sleep window algorithm, sleep diary sleep onset and offset and angle changes for both wrists and the thigh accelerometers
The algorithm picks up subtle sleep regularity differences at a participant level. This approach overlaps more closely to the sleep diary than any of the accelerometer-based approaches. Notice that, for the angle change approach, the algorithm is more effective on the non-dominant wrist accelerometer than on the dominant wrist or thigh accelerometer for most nights. TST: total sleep time

![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/08/2020.09.05.20188367/F6.medium.gif)

[Figure 6:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/F6)

Figure 6: Modified Bland-Altman plot for MESA.
Modified Bland-Altman plot on the left shows the TST differences (delta) between the HR algorithm and PSG in the Y-axis and the X-axis shows the TST average for every participant. The figure to the right shows the same comparison for the sleep diaries and PSG in MESA. Further, healthy participants are color coded in blue for both plots and participants that were diagnosed with sleep disorders are shown in orange.

The results in MESA reflect that our HR algorithm yields better performance to that of sleep diaries in terms of MSE for the full population (0.10 versus 0.13 MSE) as well as for the subset of the population with sleep disorders and healthy participants (also 0.10 versus 0.13 MSE). For all three analysis the best quantile was 0.85, likely due to the short amount of wake and active time in the recordings. The time differences in total sleep time are lower for the HR approach than for the sleep diary (−33.15 versus −34.04 versus in the full population). Interestingly, while the time difference for sleep onset was around 20 minutes for the algorithm, it was only around 6 minutes for the sleep diary. In contrast, the algorithm approach fared much better at inferring sleep offset (between −5 and −10 minutes of time deviation) whereas the diary underestimated sleep offset by almost 30 minutes. The HR approach performed similarly in the subset of participants that had sleep disorders and the healthy subset, outperforming sleep diaries in both cases. Finally, the HR approach yielded a stronger Cohen’s kappa for all three sub-analysis than the sleep diary.

### 3.3 Evaluation of the algorithm in the PhysioNet Apple Watch Polysomnography study

Our algorithm was applied to data obtained from a commercial, readily available wrist-worn wearable and evaluated against gold-standard measures of sleep obtained with PSG. In this cohort, we evaluated both the HR algorithm and the angle change approach given the presence of triaxial accelerometry. The HR algorithm resulted in an MSE of 0.06 while the wrist-based angular change approach yielded an MSE of 0.12. Similar to MESA, the best performing quantile threshold was 0.8, with 0.85 yielding almost identical results. This high quantile is likely due to the nature of the evaluation protocol consisting of concurrent PSG and wearable without much out of bed activity. Total sleep time deviation was of −23.23 minutes for the HR approach and 44.39 for the angle change approach. Sleep onset time deviation was of 15.12 minutes for the HR approach and of −21.77 for the angle change approach, while the difference was of −8.10 and 22.61 for sleep offset. However, Cohen’s kappa was slightly lower for the HR approach (0.67) than for the angle change algorithm (0.71). These results are summarized in Table 4.

View this table:
[Table 4:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/T4)

Table 4: Results for the PhysioNet Apple Watch dataset.
The table presents results for both the HR and angle change algorithm for total sleep time, sleep onset and sleep offset in the PhysioNet Apple Watch dataset. PhysioNet Apple Watch TST for diaries mean ± 95% CI = 7.165 ± 0.544 (429.89 ± 32.65 minutes). ndw: Non-dominant Wrist

### 3.4 Evaluation of the algorithm in the MMASH study

Our final set of evaluations took place in the MMASH cohort, which included both HR and triaxial accelerometer data recorded continuously for full-day periods. As in BBVS, we evaluated both the proposed HR approach and the angle change approach against detailed sleep diaries. We found that the results validated the findings from BBVS, resulting in the same optimal quantile (0.35) yielding an MSE of 0.07 and total sleep time difference of −14.08 minutes, with a Cohen’s kappa of 0.85 for the HR approach. On the other hand, the angle change approach resulted in an MSE of 0.08 and Cohen’s kappa of 0.83, but the total time deviation was substantially worse, yielding a total sleep time difference of −60.17 minutes. Full results for the MMASH cohort are presented in Table 5. Similar to BBVS, results of the optimal parameter search for the MMASH cohort can be found in the Supplementary Material (Figure S2).

View this table:
[Table 5:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/T5)

Table 5: Results for the MMASH dataset.
The table presents results for both the HR and angle change algorithm for total sleep time, sleep onset and sleep offset in the MMASH dataset. MMASH TST for diaries mean ± 95% CI = 6.200 ± 0.622 hours (371.98 ± 37.33 minutes). ndw: Non-dominant Wrist.

## 4 Discussion

Objective and unobtrusive measurement of sleep in large, free-living populations at scale will help facilitate epidemiological investigations powered to explore the relationships between sleep, physical behaviours and disease. Concurrently, the rapid growth and adoption of commercial grade wearable devices offers a unique opportunity for the objective monitoring of sleep at scale. However, most commercial devices use algorithms that are not open-source or do not report thorough validation against gold-standard measures. Similarly, conventional algorithms tend to rely on device specific metrics, such as counts, requiring extensive adaptation for each device and cohort tested, as well as a predefined search window through expert annotations or sleep diaries. This often renders evaluation across devices and without sleep diaries futile.

Here we introduced a device agnostic algorithm that exploits the HR sensing capabilities present in most modern wearable devices. We presented an algorithm based on a personalized HR feature that allows detection of sleeping windows under free-living conditions. The proposed method relies on the well-established changes in HR that occur when individuals transition from wake to sleep (61). Hence, it is able to infer sleep on individuals regardless of fitness level or illness and can be used amongst shift workers who exhibit sleep episodes outside of the night period. These qualities may be particularly relevant when evaluating sleep in populations with fragmented sleep, in countries where sleep timing changes due to seasonality or where cross-cultural sleep differences are observed (62). The value of this approach lies in the fact that it is device-agnostic, does not require sleep diary or questionnaire data and adapts to inter- and intra-individual (day-to-day) variability, allowing for accurate and reliable sleep window labeling.

We evaluated our HR-based algorithm in four cohorts: BBVS, MMASH, PhysioNet and MESA. Both BBVS and MMASH include free-living HR, movement and sleep diary data for multiple days. By contrast, PhysioNet and MESA provide lab-based HR data and gold-standard PSG. Our aim was to evaluate the algorithm’s performance in free-living conditions in the first two cohorts and compare it to existing measures that could be leveraged in these cohorts, whilst using the last two cohorts to evaluate its validity against gold-standard measures. Further, through this process, we aimed to identify the range of parameters (*Q, L, G*) that produce the best results in free-living conditions, allowing for application and deployment in the absence of any ground truth.

For the first evaluation in the BBVS study, we found that the proposed method performed strongly in free-living conditions, with an average time deviation for total sleep time compared to non-habitual sleep diaries of −0.6 minutes. In this study, we performed optimal parameter search using both full day measures of HR as well as night-only measures to analyze how the availability of sensor data or the design of the experiment affect the choice of best parameters. The parameter search for the optimal MSE was performed based on quantile, window merge and window length values and are presented in Figure S1. We found that the optimal parameters for this cohort were 0.35 for the quantile (*Q*) and 30 minutes for the window length (*L*) and time merge block of 120 minutes (*G*). This resulting optimal quantile, 0.35, also makes intuitive sense as it represents about 8h, which is around the expected time spent sleeping for most individuals in a day.

The algorithm performed better at detecting sleep offset (wake up), than sleep onset, yielding a time difference of 0.54 and 1.14 minutes respectively. This may be due to the fact that the sleep diaries for validation of sleep onset and offset, while more detailed than traditional sleep diaries, rely on self-report and may not be wholly accurate. While sleep offset is relatively straightforward to annotate as most people wake up with alarm clocks, the exact time of sleep onset cannot be recorded, and is prone to measurement bias, if attempted at the time, or recall bias, if filled in the next day.

Thus, the quality of self-reported sleep may vary based on the sleep onset latency of each participant for each given night. Nevertheless, the performance of the method across a diverse population and multiple nights of recording showcases its potential for free-living applications.

Finally, in the BBVS cohort, we evaluated the performance of an angle change-based algorithm inspired by previous work (27; 49) leveraging the multiple accelerometers available to evaluate angle-based postural changes. We found that this approach is valuable, but the results were more modest than those of our proposed method, yielding a total sleep time MSE of 0.10 and a time deviation of 125.37 minutes for the non-dominant wrist device. We also found that using the combined pitch and roll approach versus only the z-angle did not significantly alter the results. In sum, while valuable, the angle change approach performed significantly worse than our HR-based algorithm in the BBVS cohort. These results suggest that when HR is available, it should be used in preference, but triaxial accelerometry is a valuable second option in the absence of HR.

The algorithm was also evaluated in the MESA cohort, a large, diverse population where gold-standard PSG sleep measures through PSG were available, alongside self-reported sleep (through sleep diaries). In MESA we optimized the method to minimize MSE and additionally evaluated it in subsets of population with and without sleep disorders, yielding the results reported in Table 3. In MESA, the deviation of total sleep time versus gold-standard measures of sleep was −33.15 minutes and MSE of 0.10 for the full population, whereas the same comparison between PSG and sleep diaries yielded a total sleep time deviation of −34.04 minutes and MSE of 0.13. This shows that our HR-based method can reliably and objectively monitor sleep in the absence of PSG and performs better than sleep diaries. It is also worth noting that the HR approach was significantly better at detecting sleep offset (wake up) with a time difference versus PSG of −9.76 minutes compared to the −27.79 of the diaries. These results further highlight that our algorithm can be used in the absence of sleep diaries and also shows superior performance in terms of MSE to conventional, habitual sleep diaries in this large cohort. Furthermore, the comparable results for the analysis carried out in the subset of the cohort with formally diagnosed sleep disorders point to the fact that our method may also be valuable when monitoring sleep in people suffering from these conditions. To the best of our knowledge, this is the first study that conducts these types of sensitivity analyses on a subset of disease subjects to show the validity of the proposed method in individuals who suffer from sleep disorders. Future work should carry more through validation on a larger population sample with sleep disorders to confirm these findings.

For the MESA cohort, recording in laboratory conditions and for less than 24 hours produces an HR ECDF that is different in shape from free-living conditions. The scarcity of non-sedentary activities means that the optimum quantile (*Q*) threshold needs to be higher to preserve its sensitivity at detecting sleep versus sedentary periods (optimum quantile of 0.85). This could have potentially constrained the performance of our method in this population. Future studies should explore continuous recording of HR during the day paired with full-PSG to test the validity of our method in a similar setting to that of BBVS or MMASH.

We further examined the performance of the HR algorithm in the PhysioNet Apple Watch cohort that recorded concurrent Apple Watch data and a night of PSG. This study followed a similar experimental protocol to that of the MESA study. In this cohort, the HR algorithm yielded an MSE of 0.06 and a time-deviation of −23.23 minutes when compared to gold-standard measures of sleep through PSG. Similar to MESA, the optimal quantile (*Q*) was quite high (0.8, with 0.85 producing the same results), which is likely due to the nature of the experimental setting where the Apple Watch recording period only started when the PSG was setup. Nevertheless, these results showcase the potential of the method in commercial-grade wearable devices that obtain HR through PPG. We also examined the angle change approach in this cohort, with this method performing less well than it did in BBVS, yielding an MSE of 0.12 and a total sleep time deviation of 44.39 minutes.

Finally, we validated our method in the MMASH cohort, where free-living HR, movement and sleep diary data was available. In this cohort, our algorithm’s performance was optimal using a 0.35 quantile (*Q*) and window length (*L*) of which is the same result obtained for BBVS. Our HR approach yielded an MSE of 0.07 and a total sleep time deviation of −14.08. The angle change approach resulted in an MSE of 0.08 and total time deviation of −60.17. Our results in MMASH confirm the strong validation performance we observed in BBVS when using our algorithm in free-living conditions across multiple days of recording. Based on the results obtained in BBVS which were then validated in MMASH, in free-living conditions a quantile (*Q*) of 0.35 lead to the best MSE results. Similarly, we recommend using a window length (*L*) in the range of 30-45 minutes and a merge block (*G*) in the range of 60-240 minutes as summarized in the pseudocode 1 formulation of our approach. These values can be used as priors for our algorithm and can be further fined tuned in the presence of ground truth.

One important limitation of the BBVS and MMASH studies is that they did not include PSG-derived ground truth sleep annotations. Although an ideal experimental protocol would have multiple days of PSG and free-living wearable sensor data, detailed sleep diaries allowed us to evaluate the algorithm across more than one or two nights, showcasing the strength of our method in discerning both inter- and intra-individual variability. Similarly, the accelerometers included in these studies offer an important perspective on accelerometry-based angular postural changes and how they compare to our proposed approach. Moreover, in ideal circumstances, HR for the full day would have been available in both the MESA and PhysioNet cohorts, optimizing the results of our approach by having exposure to non-sedentary wake behaviors. However, the results in these two datasets showcase the validity of our approach even under constrained laboratory conditions.

Future work should explore the robustness of the HR-based algorithm in cohorts such as inpatients. As the algorithm relies on HR signals already monitored continuously for other medical purposes, no additional accelerometer sensor would be required. Accurately labeling sleep in inpatients is challenging due to other factors that influence the HR ECDF, such as limited mobility, fever, medication, physiological and psychological stress, drug and alcohol use and cardiovascular conditions. However, objectively monitoring sleep without additional obtrusion could help improve sleep quality during hospital stays, which is a challenge for most patients (63), and hence promote both healing and patient satisfaction. Moreover, optimization of the angle change approach should be explored such that it can be used more reliably in the absence of HR sensor data, in this investigation we limited our evaluation of this method to the original parameters reported (27). Parameter optimization could yield more generalizable and stronger outcomes for this approach. Finally, our method could be used in collaboration with some of the well-established activity-based approaches where multimodal settings are present. For instance, using conditional programming traditional methods could complement our approach in the detection of awakenings and assist in the derivation of conventional and novel sleep metrics.

Overall, our work highlights the potential of HR to detect the sleeping window not only in research and clinical contexts, but also in ecologically valid free-living conditions, enabling the objective monitoring of sleep in large-scale populations without PSG labels or sleep diary guidance. The low effort involved in collecting and analysing objectively inferred sleep data coupled with low exclusion rate due to technical issues, missed diary entries or dropout would likely result in larger and more diverse study cohorts, as well as facilitating long-term objective data collection. For instance, few studies have been able to properly test the longitudinal, and likely synergistic, association between sleep quality and disease. Where this has taken place, sleep data is often collected through questionnaires (64) or with short, arbitrary follow-up periods. These studies could have missed long-term trends that significantly influence health status over months or years.

In sum, our proposed method was shown to accurately infer sleep in both free-living and laboratory conditions without the need for sleep diaries. As highlighted by Depner and colleagues (65), our analysis and evaluation will help enable the translation of findings from laboratory-based sleep studies into large-scale cohort studies and clinical trials, by providing an objective, device agnostic method to monitor sleep without the need for sleep diaries.

## Data Availability

Data for 3 out of the 4 datasets used in this work is freely available on the Internet.

## Funding

The authors declare that there is no conflict of interest regarding the publication of this work. Our work was supported by GlaxoSmithKline and EPSRC through an iCase fellowship (17100053), the Embiricos Trust Scholarship of Jesus College Cambridge, and EPSRC through Grant DTP (EP/N509620/1). The work of KW is supported by the NIHR Cambridge Biomedical Research Centre (IS-BRC-1215-20014). The icons used in some of the figures are licensed under Creative Commons by [https://thenounproject.com](https://thenounproject.com).

## Conflict of interest statement

The authors declare no conflict of interest.

## Supplementary Material

View this table:
[Table S1:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/T6)

Table S1: Sleep Disorder Population details for the MESA study
The MESA study allowed us to evaluate our method in a population which included sleep disorders with roughly the same prevalence as that of in the general population.

![Figure7](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/08/2020.09.05.20188367/F7.medium.gif)

[Figure7](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/F7)

![Figure S1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/08/2020.09.05.20188367/F8.medium.gif)

[Figure S1:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/F8)

Figure S1: Mean Square Error (MSE) results for Biobank Validation Study (BBVS) using the full-day Empirical Distribution Function method to detect sleep windows
The MSE was calculated through evaluation against sleep diary. The Y axis represents the quantiles tested for the analysis while the X axis are the window lengths. The optimal combination found through this search was a quantile (*Q*) of 0.35, time merge block (*G*) of 120 minutes and a window length (*L*) of 30 minutes, yielding an MSE of 0.06 in the BBVS study.

View this table:
[Table S2:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/T7)

Table S2: Comparison of angle algorithm performance for the BBVS dataset by the limb on which the device was worn
All participants wore devices on their dominant (dw) and non-dominant (ndw) wrist as well as on their thigh. The best performance metrics were obtained for the non-dominant wrist device, but thigh wearables gave the least time differences overall in terms of total sleep time (TST), sleep onset and offset. BBVS TST for diaries mean ± 95% CI = 7.739 ± 0.073 hours (464.34 ± 4.38 minutes).

![Figure S2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/08/2020.09.05.20188367/F9.medium.gif)

[Figure S2:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/F9)

Figure S2: Applying the HR sleep algorithm on a shift worker
The free-living trace shows the subtle changes for day of the week picked up by the algorithm, with 2 sleep windows detected on Saturday, when they were not at work during the night. HR: Heart Rate; Sed: Sedentary; LPA: Light Physical Activity; ACC: Acceleration.

![Figure S3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/08/2020.09.05.20188367/F10.medium.gif)

[Figure S3:](http://medrxiv.org/content/early/2020/09/08/2020.09.05.20188367/F10)

Figure S3: Applying the HR sleep algorithm on a shift worker.
The free-living trace shows the subtle changes for day of the week picked up by the algorithm, with 2 sleep windows detected on Saturday, when they were not at work during the night. HR: Heart Rate; Sed: Sedentary; LPA: Light Physical Activity; ACC: Acceleration.

## Acknowledgements

For BBVS, we thank all the participants and the staff from the MRC Epidemiology Unit Functional Group Teams. In particular we would like to thank Lewis Griffiths and Stefanie Hollidge for their contribution to data collection and data preparation. We thank all the teams involved in collection and open access management of the MESA, PhysioNet and MMASH cohorts. These initiatives allow for greater transparency and advancement of wearable-based objective monitoring of physical activity and sleep. We would also like to thank Emma Clifton for her insightful comments on the implications of objective monitoring of sleep in the field of epidemiology.

## Footnotes

*   1 [https://sleepdata.org/datasets/mesa](https://sleepdata.org/datasets/mesa)

*   2 [https://www.salusa.se/Filer/Produktinfo/Aktivitet/TheActiwatchUserManualV7.2.pdf](https://www.salusa.se/Filer/Produktinfo/Aktivitet/TheActiwatchUserManualV7.2.pdf)

*   3 [https://sleepdata.org/datasets/mesa/pages/equipment/montage-and-sampling-rate-information.md](https://sleepdata.org/datasets/mesa/pages/equipment/montage-and-sampling-rate-information.md)

*   4 [https://sleepdata.org/datasets/mesa/files/documentation](https://sleepdata.org/datasets/mesa/files/documentation)

*   5 [https://pypi.org/project/hrv-analysis/](https://pypi.org/project/hrv-analysis/)

*   Received September 5, 2020.
*   Revision received September 5, 2020.
*   Accepted September 8, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## Reference

1.  [1]. B. Rasch and  J. Born, “About sleep’s role in memory,” Physiological reviews, vol. 93, no. 2, pp. 681–766, 2013.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1152/physrev.00032.2012&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23589831&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 

2.  [2]. J. R. Schwartz and  T. Roth, “Neurophysiology of sleep and wakefulness: basic science and clinical implications,” Current neuropharmacology, vol. 6, no. 4, pp. 367–378, 2008.
    
    
3.  [3]. L. Imeri and  M. R. Opp, “How (and why) the immune system makes us sleep,” Nature Reviews Neuroscience, vol. 10, no. 3, pp. 199–210, 2009.
    
    
4.  [4]. L. Xie,  H. Kang,  Q. Xu,  M. J. Chen,  Y. Liao,  M. Thiyagarajan,  J. O’Donnell,  D. J. Christensen,  C. Nicholson,  J. J. Iliff, et al., “Sleep drives metabolite clearance from the adult brain,” science, vol. 342, no. 6156, pp. 373–377, 2013.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNDIvNjE1Ni8zNzMiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wOS8wOC8yMDIwLjA5LjA1LjIwMTg4MzY3LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

5.  [5]. K. Adam and  I. Oswald, “Sleep helps healing.,” British medical journal (Clinical research ed.), vol. 289, no. 6456, p. 1400, 1984.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czozOiJibWoiO3M6NToicmVzaWQiO3M6MTM6IjI4OS82NDU2LzE0MDAiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wOS8wOC8yMDIwLjA5LjA1LjIwMTg4MzY3LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

6.  [6]. J. H. Benington and  H. C. Heller, “Restoration of brain energy metabolism as the function of sleep,” Progress in neurobiology, vol. 45, no. 4, pp. 347–360, 1995.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0301-0082(94)00057-O&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7624482&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995QN92200003&link_type=ISI) 

7.  [7]. S. M. Bertisch,  B. D. Pollock,  M. A. Mittleman,  D. J. Buysse,  L. A. Bazzano,  D. J. Gottlieb, and  S. Redline, “Insomnia with objective short sleep duration and risk of incident cardiovascular disease and all-cause mortality: Sleep heart health study,” Sleep, vol. 41, no. 6, p. zsy047, 2018.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 

8.  [8]. D. Dawson and  K. Reid, “Fatigue, alcohol and performance impairment,” Nature, vol. 388, no. 6639, pp. 235–235, 1997.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9230429&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 

9.  [9]. E. Van Cauter,  K. Spiegel,  E. Tasali, and  R. Leproult, “Metabolic consequences of sleep and sleep loss,” Sleep medicine, vol. 9, pp. S23–S28, 2008.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1389-9457(08)70013-3&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18929315&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000260820700005&link_type=ISI) 

10. [10]. M.-P. St-Onge,  M. A. Grandner,  D. Brown,  M. B. Conroy,  G. Jean-Louis,  M. Coons, and  D. L. Bhatt, “Sleep duration and quality: impact on lifestyle behaviors and cardiometabolic health: a scientific statement from the american heart association,” Circulation, vol. 134, no. 18, pp. e367–e386, 2016.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6ImNpcmN1bGF0aW9uYWhhIjtzOjU6InJlc2lkIjtzOjExOiIxMzQvMTgvZTM2NyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzA4LzIwMjAuMDkuMDUuMjAxODgzNjcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

11. [11]. H. Agnew Jr,  W. B. Webb, and  R. L. Williams, “The first night effect: an eeg studyof sleep,” Psychophysiology, vol. 2, no. 3, pp. 263–266, 1966.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1469-8986.1966.tb02650.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=5903579&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 

12. [12]. S. Ancoli-Israel,  R. Cole,  C. Alessi,  M. Chambers,  W. Moorcroft, and  C. P. Pollak, “The role of actigraphy in the study of sleep and circadian rhythms,” Sleep, vol. 26, no. 3, pp. 342–392, 2003.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/sleep/26.3.342&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12749557&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000182439100020&link_type=ISI) 

13. [13]. M. Marino,  Y. Li,  M. N. Rueschman,  J. W. Winkelman,  J. Ellenbogen,  J. M. Solet,  H. Dulin,  L. F. Berkman, and  O. M. Buxton, “Measuring sleep: accuracy, sensitivity, and specificity of wrist actigraphy compared to polysomnography,” Sleep, vol. 36, no. 11, pp. 1747–1755, 2013.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5665/sleep.3142&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24179309&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 

14. [14]. D. J. Kupfer,  T. P. Detre,  G. Foster,  G. J. Tucker, and  J. Delgado, “The application of delgado’s telemetric mobility recorder for human studies,” Behavioral biology, vol. 7, no. 4, pp. 585–590, 1972.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0091-6773(72)80220-7&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=5050143&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1972N221700012&link_type=ISI) 

15. [15]. M. T. Smith,  C. S. McCrae,  J. Cheung,  J. L. Martin,  C. G. Harrod,  J. L. Heald, and  K. A. Carden, “Use of actigraphy for the evaluation of sleep disorders and circadian rhythm sleep-wake disorders: an american academy of sleep medicine clinical practice guideline,” Journal of Clinical Sleep Medicine, vol. 14, no. 07, pp. 1231–1237, 2018.
    
    
16. [16]. A. Sadeh and  C. Acebo, “The role of actigraphy in sleep medicine,” Sleep medicine reviews, vol. 6, no. 2, pp. 113–124, 2002.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1053/smrv.2001.0182&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12531147&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000176231600004&link_type=ISI) 

17. [17]. R. J. Cole,  D. F. Kripke,  W. Gruen,  D. J. Mullaney, and  J. C. Gillin, “Automatic sleep/wake identification from wrist activity,” Sleep, vol. 15, no. 5, pp. 461–469, 1992.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/sleep/15.5.461&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=1455130&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1992JT08400012&link_type=ISI) 

18. [18]. L. de Souza,  A. A. Benedito-Silva,  M. L. N. Pires,  D. Poyares,  S. Tufik, and  H. M. Calil, “Further validation of actigraphy for sleep studies,” Sleep, vol. 26, no. 1, pp. 81–85, 2003.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/sleep/26.1.81&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12627737&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000181724800016&link_type=ISI) 

19. [19]. E. Sazonov,  N. Sazonova,  S. Schuckers,  M. Neuman,  C. S. Group, et al., “Activity-based sleep– wake identification in infants,” Physiological measurement, vol. 25, no. 5, p. 1291, 2004.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1088/0967-3334/25/5/018&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15535193&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000224710800019&link_type=ISI) 

20. [20]. A. Sadeh,  M. Sharkey, and  M. A. Carskadon, “Activity-based sleep-wake identification: an empirical test of methodological issues,” Sleep, vol. 17, no. 3, pp. 201–207, 1994.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/sleep/17.3.201&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7939118&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1994NQ80400001&link_type=ISI) 

21. [21]. J. Tilmanne,  J. Urbain,  M. V. Kothare,  A. V. Wouwer, and  S. V. Kothare, “Algorithms for sleep– wake identification using actigraphy: a comparative study and new results,” Journal of sleep research, vol. 18, no. 1, pp. 85–98, 2009.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19250177&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 

22. [22]. D. F. Kripke,  E. K. Hahn,  A. P. Grizas,  K. H. Wadiak,  R. T. Loving,  J. S. Poceta,  F. F. Shadan,  J. W. Cronin, and  L. E. Kline, “Wrist actigraphic scoring for sleep laboratory patients: algorithm development,” Journal of sleep research, vol. 19, no. 4, pp. 612–619, 2010.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1365-2869.2010.00835.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20408923&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000283598500015&link_type=ISI) 

23. [23]. J. Palotti,  R. Mall,  M. Aupetit,  M. Rueschman,  M. Singh,  A. Sathyanarayana,  S. Taheri, and  L. Fernandez-Luque, “Benchmark on a large cohort for sleep-wake classification with machine learning techniques,” NPJ digital medicine, vol. 2, no. 1, pp. 1–9, 2019.
    
    
24. [24]. B. Zhai,  I. Perez-Pozuelo,  E. A. Clifton,  J. Palotti, and  Y. Guan, “Making sense of sleep: Multimodal sleep stage classification in a large, diverse population using movement and cardiac sensing,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 4, no. 2, pp. 1–33, 2020.
    
    
25. [25]. M. L. Blood,  R. L. Sack,  D. C. Percy, and  J. C. Pen, “A comparison of sleep detection by wrist actigraphy, behavioral response, and polysomnography,” Sleep, vol. 20, no. 6, pp. 388–395, 1997.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9302721&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1997XX25300002&link_type=ISI) 

26. [26]. J. Paquet,  A. Kawinska, and  J. Carrier, “Wake detection capacity of actigraphy during sleep,” sleep, vol. 30, no. 10, pp. 1362–1369, 2007.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17969470&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000250019300014&link_type=ISI) 

27. [27]. V. T. van Hees,  S. Sabia,  S. E. Jones,  A. R. Wood,  K. N. Anderson,  M. Kivimäki,  T. M. Frayling,  A. I. Pack,  M. Bucan,  M. Trenell, et al., “Estimating sleep parameters using an accelerometer without sleep diary,” Scientific reports, vol. 8, no. 1, pp. 1–11, 2018.
    
    
28. [28]. A. Doherty,  D. Jackson,  N. Hammerla,  T. Plötz,  P. Olivier,  M. H. Granat,  T. White,  V. T. Van Hees,  M. I. Trenell,  C. G. Owen, et al., “Large scale population assessment of physical activity using wrist worn accelerometers: the uk biobank study,” PloS one, vol. 12, no. 2, 2017.
    
    
29. [29].A. of Us Research Program Investigators, “The “all of us” research program,” New England Journal of Medicine, vol. 381, no. 7, pp. 668–676, 2019.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMsr1809937&link_type=DOI) 

30. [30]. I. Perez-Pozuelo,  B. Zhai,  J. Palotti,  R. Mall,  M. Aupetit,  J. M. Garcia-Gomez,  S. Taheri,  Y. Guan, and  L. Fernandez-Luque, “The future of sleep health: a data-driven revolution in sleep science and medicine,” NPJ digital medicine, vol. 3, no. 1, pp. 1–15, 2020.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41746-020-0300-0&link_type=DOI) 

31. [31]. O. Walch,  Y. Huang,  D. Forger, and  C. Goldstein, “Sleep stage prediction with raw acceleration and photoplethysmography heart rate data derived from a consumer wearable device,” Sleep, vol. 42, no. 12, p. zsz180, 2019.
    
    
32. [32]. D. M. Roberts,  M. M. Schade,  G. M. Mathew,  D. Gartenberg, and  O. M. Buxton, “Detecting sleep using heart rate and motion data from multisensor consumer-grade wearables, relative to wrist actigraphy and polysomnography,” Sleep, 2020.
    
    
33. [33]. M. de Zambotti,  J. Trinder,  A. Silvani,  I. M. Colrain, and  F. C. Baker, “Dynamic coupling between the central and autonomic nervous systems during sleep: a review,” Neuroscience & Biobehavioral Reviews, vol. 90, pp. 84–103, 2018.
    
    
34. [34]. H. Park and  B. Suh, “Association between sleep quality and physical activity according to gender and shift work,” Journal of Sleep Research, vol. n/a, no. n/a, p. e12924, 2019. eprint: [https://onlinelibrary.wiley.com/doi/pdf/10.1111/jsr.12924](https://onlinelibrary.wiley.com/doi/pdf/10.1111/jsr.12924).
    
    
35. [35]. T. Arora,  E. Broglia,  D. Pushpakumar,  T. Lodhi, and  S. Taheri, “An Investigation into the Strength of the Association and Agreement Levels between Subjective and Objective Sleep Duration in Adolescents,” PLOS ONE, vol. 8, p. e72406, Aug. 2013. Publisher: Public Library of Science.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0072406&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18752997&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 

36. [36]. D. S. Lauderdale,  K. L. Knutson,  L. L. Yan,  K. Liu, and  P. J. Rathouz, “Self-reported and measured sleep duration: how similar are they?,” Epidemiology, pp. 838–845, 2008.
    
    
37. [37].Aili Katarina,  Åström Paulsson Sofia,  Stoetzer Ulrich,  Svartengren Magnus, and  Hillert Lena, “Reliability of Actigraphy and Subjective Sleep Measurements in Adults: The Design of Sleep Assessments,” Journal of Clinical Sleep Medicine, vol. 13, pp. 39–47, 2017. Publisher: American Academy of Sleep Medicine.
    
    
38. [38]. L. O’Connor,  S. Brage,  S. J. Griffn,  N. J. Wareham, and  N. G. Forouhi, “The cross-sectional association between snacking behaviour and measures of adiposity: the Fenland Study, UK,” British journal of nutrition, vol. 114, no. 8, pp. 1286–1293, 2015.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S000711451500269X&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26343512&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 

39. [39]. T. White,  K. Westgate,  S. Hollidge,  M. Venables,  P. Olivier,  N. Wareham, and  S. Brage, “Estimating energy expenditure from wrist and thigh accelerometry in free-living adults: a doubly labelled water study,” International Journal of Obesity, vol. 43, no. 11, pp. 2333–2342, 2019.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 

40. [40]. D. A. Dean,  A. L. Goldberger,  R. Mueller,  M. Kim,  M. Rueschman,  D. Mobley,  S. S. Sahoo,  C. P. Jayapandian,  L. Cui,  M. G. Morrical,  S. Surovec,  G.-Q. Zhang, and  S. Redline, “Scaling Up Scientific Discovery in Sleep Medicine: The National Sleep Research Resource,” Sleep, vol. 39, pp. 1151–1164, 5 2016.
    
    
41. [41]. G.-Q. Zhang,  L. Cui,  R. Mueller,  S. Tao,  M. Kim,  M. Rueschman,  S. Mariani,  D. Mobley, and  S. Redline, “The National Sleep Research Resource: towards a sleep data commons,” Journal of the American Medical Informatics Association, vol. 25, pp. 1351–1358, 10 2018.
    
    
42. [42]. X. Chen,  R. Wang,  P. Zee,  P. L. Lutsey,  S. Javaheri,  C. Alcántara,  C. L. Jackson,  M. A. Williams, and  S. Redline, “Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa),” Sleep, vol. 38, no. 6, pp. 877–888, 2015.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25409106&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 

43. [43]. A. L. Goldberger,  L. A. Amaral,  L. Glass,  J. M. Hausdorff,  P. C. Ivanov,  R. G. Mark,  J. E. Mietus,  G. B. Moody,  C.-K. Peng, and  H. E. Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,” circulation, vol. 101, no. 23, pp. e215–e220, 2000.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6ImNpcmN1bGF0aW9uYWhhIjtzOjU6InJlc2lkIjtzOjExOiIxMDEvMjMvZTIxNSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzA4LzIwMjAuMDkuMDUuMjAxODgzNjcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

44. [44]. A. Rossi,  E. Da Pozzo,  D. Menicagli,  C. Tremolanti,  C. Priami,  A. Sirbu,  D. Clifton,  C. Martini, and  D. Morelli, “Multilevel monitoring of activity and sleep in healthy people (version 1.0.0),” PhysioNet, 2020.
    
    
45. [45]. S. Brage,  N. Brage,  P. W. Franks,  U. Ekelund, and  N. J. Wareham, “Reliability and validity of the combined heart rate and movement sensor actiheart,” European journal of clinical nutrition, vol. 59, no. 4, pp. 561–570, 2005.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/sj.ejcn.1602118&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15714212&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000228060200013&link_type=ISI) 

46. [46]. O. Stegle,  S. V. Fallert,  D. J. MacKay, and  S. Brage, “Gaussian process robust regression for noisy heart rate data,” IEEE Transactions on Biomedical Engineering, vol. 55, no. 9, pp. 2143–2151, 2008.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TBME.2008.923118&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18713683&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000258722200004&link_type=ISI) 

47. [47]. V. T. Van Hees,  Z. Fang,  J. Langford,  F. Assah,  A. Mohammad,  I. C. da Silva,  M. I. Trenell,  T. White,  N. J. Wareham, and  S. Brage, “Autocalibration of accelerometer data for free-living physical activity assessment using local gravity and temperature: an evaluation on four continents,” Journal of Applied Physiology, vol. 117, no. 7, pp. 738–744, 2014.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1152/japplphysiol.00421.2014&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25103964&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 

48. [48]. P. Lukowicz,  H. Junker, and  G. Tröster, “Automatic calibration of body worn acceleration sensors,” in International Conference on Pervasive Computing, pp. 176–181, Springer, 2004.
    
    
49. [49]. I. Perez-Pozuelo,  T. White,  K. Westgate,  K. Wijndaele,  N. J. Wareham, and  S. Brage, “Diurnal profiles of physical activity and postures derived from wrist-worn accelerometry in UK adults,” Journal for the Measurement of Physical Behaviour, vol. 1, no. aop, pp. 1–11, 2019.
    
    
50. [50]. A. Varri,  B. Kemp,  T. Penzel, and  A. Schlogl, “Standards for biomedical signal databases,” IEEE Engineering in Medicine and Biology Magazine, vol. 20, no. 3, pp. 33–37, 2001.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11321718&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000168046800006&link_type=ISI) 

51. [51]. H. Tanaka,  K. D. Monahan, and  D. R. Seals, “Age-predicted maximal heart rate revisited,” Journal of the American College of Cardiology, vol. 37, no. 1, pp. 153–156, 2001.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo0OiJhY2NqIjtzOjU6InJlc2lkIjtzOjg6IjM3LzEvMTUzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDkvMDgvMjAyMC4wOS4wNS4yMDE4ODM2Ny5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

52. [52]. M. Malik, “Heart Rate Variability.,” Annals of Noninvasive Electrocardiology, vol. 1, pp. 151–181, 4 1996.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1542-474X.1996.tb00275.x&link_type=DOI) 

53. [54]. C. L. Jackson,  S. R. Patel,  W. B. Jackson,  P. L. Lutsey, and  S. Redline, “Agreement between self-reported and objectively measured sleep duration among white, black, hispanic, and chinese adults in the united states: Multi-ethnic study of atherosclerosis,” Sleep, vol. 41, no. 6, p. zsy057, 2018.
    
    
54. [54]. T. I. Gonzales,  J. Y. Jeon,  T. Lindsay,  K. Westgate,  I. Perez-Pozuelo,  S. Hollidge,  K. Wijndaele,  K. Rennie,  N. Forouhi,  S. Griffn, et al., “Resting heart rate as a biomarker for tracking change in cardiorespiratory fitness of uk adults: The fenland study,” medRxiv, 2020.
    
    
55. [55]. J. Karjalainen and  M. Viitasalo, “Fever and cardiac rhythm,” Archives of internal medicine, vol. 146, no. 6, pp. 1169–1171, 1986.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/archinte.1986.00360180179026&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=2424378&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1986C626200021&link_type=ISI) 

56. [56]. J. Ryan and  L. Howes, “Relations between alcohol consumption, heart rate, and heart rate variability in men,” Heart, vol. 88, no. 6, pp. 641–642, 2002.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiaGVhcnRqbmwiO3M6NToicmVzaWQiO3M6ODoiODgvNi82NDEiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wOS8wOC8yMDIwLjA5LjA1LjIwMTg4MzY3LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

57. [57]. T. G. Vrijkotte,  L. J. Van Doornen, and  E. J. De Geus, “Effects of work stress on ambulatory blood pressure, heart rate, and heart rate variability,” Hypertension, vol. 35, no. 4, pp. 880–886, 2000.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1161/01.HYP.35.4.880&link_type=DOI) 

58. [58]. J. M. Radin,  N. E. Wineinger,  E. J. Topol, and  S. R. Steinhubl, “Harnessing wearable device data to improve state-level real-time surveillance of influenza-like illness in the usa: a population-based study,” The Lancet Digital Health, 2020.
    
    
59. [59]. J. Cohen, “A coefficient of agreement for nominal scales,” Educational and psychological measurement, vol. 20, no. 1, pp. 37–46, 1960.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/001316446002000104&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1960CCC3600004&link_type=ISI) 

60. [60]. J. E. Freund, Modern Elementary Statistics. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1988.
    
    
61. [61]. F. Snyder,  J. A. Hobson,  D. F. Morrison, and  F. Goldfrank, “Changes in respiration, heart rate, and systolic blood pressure in human sleep,” Journal of applied physiology, vol. 19, no. 3, pp. 417–422, 1964.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=14174589&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A19643725B00034&link_type=ISI) 

62. [62]. D. L. Bliwise, “Invited commentary: cross-cultural influences on sleep—broadening the environmental landscape,” American journal of epidemiology, vol. 168, no. 12, pp. 1365–1366, 2008.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwn336&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18936435&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000261683100003&link_type=ISI) 

63. [63]. H. M. Wesselius,  E. S. v. d. Ende,  J. Alsma,  J. C. t. Maaten,  S. C. E. Schuit,  P. M. Stassen,  O. J. d. Vries,  K. H. A. H. Kaasjager,  H. R. Haak,  F. F. v. Doormaal,  J. J. Hoogerwerf,  C. B. Terwee,  P. M. v. d. Ven,  F. H. Bosch,  E. J. W. v. Someren, and  P. W. B. Nanayakkara, “Quality and Quantity of Sleep and Factors Associated With Sleep Disturbance in Hospitalized Patients,” JAMA Internal Medicine, vol. 178, pp. 1201–1208, Sept. 2018. Publisher: American Medical Association.
    
    
64. [64]. R. K. Dishman,  X. Sui,  T. S. Church,  C. E. Kline,  S. D. Youngstedt, and  S. N. Blair, “Decline in Cardiorespiratory Fitness and Odds of Incident Sleep Complaints,” Medicine & Science in Sports & Exercise, vol. 47, pp. 960–966, May 2015.
    
    
65. [65]. C. M. Depner,  P. C. Cheng,  J. K. Devine,  S. Khosla,  M. de Zambotti,  R. Robillard,  A. Vakulin, and  S. P. Drummond, “Wearable technologies for developing sleep and circadian biomarkers: a summary of workshop discussions,” sleep, vol. 43, no. 2, p. zsz254, 2020.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F08%2F2020.09.05.20188367.atom)

 [1]: /embed/graphic-2.gif
 [2]: /embed/graphic-3.gif
 [3]: /embed/graphic-4.gif
 [4]: /embed/graphic-5.gif
 [5]: /embed/graphic-6.gif
 [6]: /embed/inline-graphic-1.gif
 [7]: /embed/graphic-10.gif