Abstract
Introduction A better understanding of the heterogeneity in the cognitive and mood symptoms of Parkinson’s disease will require research conducted in large samples of patients. Fully online and remote research assessments present interesting opportunities for scaling up research but the feasibility and reliability of remote and fully unsupervised performance-based cognitive testing in individuals with Parkinson’s disease is unknown. This study aims to establish the feasibility and reliability of this testing modality in Parkinson’s patients.
Methods Sixty-seven Parkinson’s patients and 36 older adults completed two sessions of an at-home, online battery of five cognitive tasks and three self-report questionnaires. Feasibility was established by examining completion rates and data quality. Test-retest reliability was evaluated using the Intraclass Correlation Coefficient (ICC (2,1)).
Results Overall completion rates and data quality were high with few participant exclusions across tasks. With regards to test-retest reliability, intraclass correlation coefficients were quite variable across measures extracted from a task as well as across tasks, but at least one standard measure from each task achieved moderate to good reliability levels. Self-report questionnaires achieved a higher test-retest reliability than cognitive tasks. Feasibility and reliability were similar between Parkinson’s patients and older adults.
Conclusion These results demonstrate that remote and unsupervised testing is a feasible and reliable method of measuring cognition and mood in Parkinson’s patients that achieves levels of test-retest reliability that are comparable to those reported for standard in-person testing.
Introduction
Although Parkinson’s disease is clinically diagnosed on the basis of motor symptoms, cognitive and mood symptoms are also prevalent and impactful symptoms of the disease [1]. However, the presence, timing and severity of these neuropsychiatric symptoms vary significantly from patient to patient, and little is known about the causes underlying this heterogeneity [2]. Better understanding this heterogeneity will require the evaluation of large samples of Parkinson’s patients with comprehensive assessments of cognitive and mood function. To this end, there has been an increasing interest in turning to fully online and remote research assessments, as this mode of data collection is more easily scalable than in-person research protocols [3]. However, most studies that rely on online assessments, particularly those in clinical populations, such as Parkinson’s patients, include only self-reported measures of symptoms and function rather than performance-based measures, which are necessary to better characterize and quantify the cognitive deficits of Parkinson’s disease [4]. One reason for the omission of cognitive performance tests in large-scale online studies is that the feasibility and reliability of fully remote and unsupervised online cognitive assessments has not yet been established in Parkinson’s disease. This is especially relevant as factors such as computer literacy in older age and disease-related impairments (e.g., motor slowing or mild cognitive impairment) may interfere with the computer-based and unsupervised nature of this mode of data collection [5].
Prior studies of remote, unsupervised cognitive testing have primarily established its reliability in non-clinical adult populations [6–9]. For instance, one study conducted remote unsupervised testing in older adults who completed a battery of tasks assessing visual memory, attention, and executive function on a monthly basis [6]. They found that reliability, measured with the Intraclass Correlation Coefficient (ICC), ranged from 0.50 to 0.76 across tasks, indicating moderate to good reliability, which is in keeping with reliability calculations found in similar studies [7–9].
In clinical populations of patients with neurodegenerative diseases, comparatively little work has been done to establish the test-retest reliability of remote, unsupervised cognitive testing. This likely reflects concerns that, on account of motor and cognitive symptoms which could interfere with data quality, using this mode of testing is not feasible. In Parkinson’s disease, however, there are promising preliminary results. First, with respect to feasibility, a recent study conducted fully online and unsupervised research assessments that consisted of an extensive battery of questionnaires in a large sample of over 20,000 Parkinson’s patients [3]. This study showed good rates of questionnaire completion, with only a 10.1% drop-off rate from the first to the last assessment in the battery [3]. In a follow-up study, Parkinson’s patients within this cohort were shown to be demographically representative to Parkinson’s patients tested in traditional, in-person cohorts [4]. Though these studies did not include any performance-based tasks, which are likely more susceptible to the effects of poor task engagement than self-report questionnaires, the results nonetheless suggest that fully remote and unsupervised interactions with participants are feasible for Parkinson’s disease research. Second, with respect to reliability, one study compared the test-retest reliability of standard in-person testing to supervised virtual testing in Parkinson’s patients, where patients were at home and supervised by a rater on a video call [10]. The results of this study showed that though test-retest reliability between the in-person and virtual administration varied across tasks, many of the tasks achieved at least moderate reliability, which is a level of reliability comparable to that of standard in-person paper-based cognitive testing in Parkinson’s patients [11]. These results are promising but the test-retest reliability of fully remote, unsupervised testing in individuals with Parkinson’s disease remains to be determined prior to large-scale adoption of this mode of data collection.
To address this gap, the objectives of this study were first, to evaluate the feasibility of remote unsupervised cognitive and mood testing by assessing completion rates and data quality, and second, to evaluate the test-retest reliability of the unsupervised tasks and questionnaires. Sixty-seven individuals with Parkinson’s disease and 36 older adults completed two sessions of the online protocol, which consisted of five cognitive tasks targeting working memory, executive function, sustained attention and perceptual decision-making, and three self-report questionnaires assessing mood and cognitive function. Overall, we found that completion rates and data quality were high, with very few participants failing inclusion criteria and attention checks. Although reliability was quite variable across measures, we found that task performance and questionnaire scores achieved at least moderate to good levels of reliability and that reliability was generally similar for Parkinson’s patients and older adult controls. These results suggest that remote and unsupervised testing is a feasible and reliable method of measuring cognition and mood in Parkinson’s patients that achieves levels of test-retest reliability that are comparable to those reported for standard in-person testing.
Methods
Participants
We recruited a sample of 70 Parkinson’s patients (PD) and 36 older adults (OA) between the ages of 50-90 through a collaboration with two Parkinson’s patient registries, the Quebec Parkinson’s Network (QPN) and the Canadian Open Parkinson’s Network (C-OPN), which rely on a neurologist-confirmed Parkinson’s disease diagnosis for inclusion in the registry [12]. No additional diagnostic confirmation was conducted in the patients for the purposes of the present study. Older adults were additionally recruited from the community. To capture a sample as representative as possible, we had no exclusion criteria other than age. The 106 participants of the present study were recruited from a larger sample of 223 participants (144 PD and 74 OA) who had completed a more extensive initial online assessment (the results of which will be reported separately), and who agreed to complete testing at a second timepoint after an interval that ranged from 21 to 135 days. The participants included in the present study are those who completed, at minimum, one task of the second assessment (Time 2). Three Parkinson’s patients were excluded because the interval between their assessments was either less than 21 days or more than 135 days. The final sample included 67 Parkinson’s patients and 36 older adults (Table 1). Of the 67 Parkinson’s patients and 36 older adults, 8 Parkinson’s patients and 2 older adults (9.71% of total participants) had only partial data for Time 2. All participants were entered in a draw to win one of ten $100 gift cards. All procedures were evaluated and approved by the McGill University Health Centre (MUHC) Research Ethics Board.
Procedure
Potential participants, i.e. those who had completed the Time 1 online assessment, were emailed with information on our study and a link to the consent form. The Time 1 assessment was more extensive and contained a battery of ten cognitive tasks and six mood questionnaires. Of these, to enhance recruitment and keep the burden on participants to a minimum, only five cognitive tasks and three mood questionnaires were selected for inclusion in the reliability assessment (Time 2), and only this subset will be described here. Expected total time required to complete these tests was 45 minutes and participants could pause between (but not during) tasks. All testing was conducted fully remotely, from personal desktop or laptop computers (smartphones and tablets were not allowed), and there was no interaction with the research team during testing.
Cognitive tasks & mood questionnaires
The cognitive tasks included in the protocol were selected because they are tests of domains of cognition affected early in Parkinson’s disease [13]. We included computerized versions of Trail Making [14], Digit Span [15], Color-word Stroop [16], and Sustained Attention to Response (SART) [17] tasks. We also included a signal-detection task [18], which is not a standard measure of cognition in Parkinson’s patients but was included because it is considerably longer (15 minutes) and therefore could be more susceptible to the effects of remote unsupervised testing on task engagement. The self-report questionnaires assessed apathy (Apathy Evaluation Scale; AES) [19], impulsivity (Barrett Impulsiveness Scale-11; BIS-11) [20], and everyday cognitive function (Parkinson’s Daily Activities Questionnaire; PDAQ) [21].
Each task started with a set of detailed on-screen instructions, followed by a short practice, and then a review of the key instructions before the initiation of the main phase of the task. In addition, each questionnaire embedded an extra item that was a prompt to check for attention (e.g., “Select ‘Not likely’. This is simply to ensure that you are paying attention”), a common practice in online behavioural research [22]. More detailed information on our administration of the tasks and questionnaires can be found in the Supplementary Methods.
Assessing data quality to establish feasibility
We used data quality as our main measure of feasibility, which we defined as the proportion of participants who were excluded because they failed minimum data quality checks.
Cognitive tasks
We excluded participants who performed two standard deviations below the mean on one timepoint (of a given task) and 2 standard deviations above the mean on the other timepoint of that same task. We assumed that such a large variation in performance suggest that external circumstances leading to poor task engagement may have affected their performance on one of the two testing instances. Additionally, in the case of the Stroop, signal detection and SART tasks, we excluded participants who either did not input a response to any of their trials or responded using the same key for each trial at either of the 2 timepoints. Proportions of participants excluded were computed for each task.
Questionnaires
We computed the proportion of participants excluded for failing the attention check.
Data Analysis
Welch two sample t tests were used to evaluate group differences and intraclass correlation coefficients (ICC) were calculated to assess test-retest reliability, separately for the Parkinson’s patients and older adults. According to the guideline outlined in Koo & Li [23], a two-way mixed effects model for consistency and with a single rater was used (ICC (2,1)). The correlation coefficient was interpreted according to the following ranges, with ICCs <0.5, 0.5-0.75, 0.75-0.90 and >0.90 indicating poor, moderate, good and excellent reliability respectively [23]. Paired t tests were used to determine differences in performance between the two timepoints and evaluate the presence of practice effects. The critical p value in all analyses was set to 0.05. All analyses were conducted in R version 4.2.3 [24].
Results
Assessment of data quality
For self-report questionnaires, an average of 3.33 (4.97%) Parkinson’s patients (PD) and 0.33 (0.92%) Older Adults (OA) were excluded for each questionnaire after failing the ‘attention check’ at either timepoint. For the cognitive tasks, only Trail Making, Stroop and SART had participant exclusions based on our previously defined criteria. In Trail Making, six PD (10.17%) and 4 OA (11.43%) were excluded from part A and 2 PD (3.39%) and 2 OA (5.71%) were excluded from part B. In Stroop, 4 PD (6.06%) and 1 OA (2.86%) were excluded. In SART, 2 PD (3.23%) and 1 OA (2.86%) were excluded from both conditions. Ten participants (9.71%), of which 8 were PD (11.94%) and 2 were OA (5.56%), dropped out before completing Time 2.
Comparison of performance between Parkinson’s patients and older adults
As shown in Table 1, both patients and older adults reported similar levels of mood symptoms except that PD patients reported a higher level of apathy symptoms (AES: t(90.69)=3.24, p=0.0017) as well as a higher impairment of daily activities (PDAQ: t(83.11)=1.99, p=0.0497).
With respect to the cognitive tasks, there were no statistically significant differences in performance between the PD and OA on the Digit Span, Stroop and SART tasks (all ps>0.05; Table 2). Parkinson’s patients were slower on the Trail making Part A (t(81.83)=3.74, p<0.001) and on the Trail making Part B (t(71.47)=3.18, p=0.0022). On the Signal Detection task, Parkinson’s patients were slower (t(94.95)=2.25, p=0.027); and had lower discriminability (t(81.52)=4.75, p<0.001). Additional secondary outcome measures that can be derived from each task, such as accuracy, also showed similar performance between groups (Supplementary Table S1).
Test-retest reliability of cognitive tasks
For each of the tasks, at least one of the outcome measures extracted demonstrated moderate to good reliability in both Parkinson’s patients and older adults, but there was notable variability between tasks and even between different measures extracted from different conditions of the same task (Table 3). For instance, in Parkinson’s patients, forward Digit Span demonstrated an ICC of 0.49 and backward Digit Span had an ICC of 0.71, whereas the ICCs for parts A and B of Trail Making were very similar. In the case of the Stoop task, the derived measure (i.e. ‘Stroop effect’) had very poor reliability (ICC=0.035) whereas the reliability for average response times on both congruent and non-congruent trials were higher (part A: ICC=0.73; part B: ICC=0.41). In the case of the SART, the time-dependent outcome measures (Ascending-RT: ICC=0.73; Descending-RT: ICC=0.74) had higher reliability than the accuracy-based outcome measures, for which ICCs ranged from 0.36 to 0.60. For the Signal Detection task, both the raw response time-based outcome measure (RT: ICC=0.80) and the derived outcome measure (Discriminability: ICC=0.78) demonstrated good reliability. Reliability of alternative outcome measures were in a similar range (Supplementary Table S4). Similar ranges of ICCs across tasks were also found in older adults (Table 3 and Supplementary Table S4).
Test-retest reliability of self-report questionnaires
As shown in Table 4, test-retest reliability for the mood and cognitive function questionnaires were good in both the PD patients (ICC range: 0.77-0.87) and the OAs, with the exception of the PDAQ which was in the moderate range in OAs (ICC=0.56).
Practice effects
In Parkinson’s patients, no significant improvement in performance was observed across the two timepoints (ps>0.05; Supplementary Table S2). The only exception was a speeding of responses on the ascending and random condition of the SART (Asc mean change=-15.96ms, p=0.048; Ran mean change = −23.44, p=0.018) as well as an increased discriminability in the signal detection task (mean change=0.28, p=<0.001) in PDs. In OAs, a speeding of responses in the random condition of the SART was observed (mean change=-39.42ms, p=0.0087), as well as an increased span in the reverse Digit Span condition (mean change=1.00, p=<0.001). Practice effects are also reported for alternative measures of interest in Supplementary Table S3.
Discussion
The goals of this study were to determine the feasibility and test-retest reliability of online unsupervised performance-based cognitive tasks and mood questionnaires in Parkinson’s patients. We found that data quality was high across all tasks and questionnaires, resulting in very few exclusions, and that test-retest reliability was at least moderate to good for at least one measure of interest from each task and was similar in both Parkinson’s patients and older adults. These results suggest that online and fully unsupervised measurements of mood and cognition are feasible and reliable in Parkinson’s patients and that this mode of testing can be incorporated into online clinical research studies.
We found moderate or higher test-retest reliability (ICCs >0.5) for at least one of the standard outcome measures extracted from each performance-based cognitive task in both Parkinson’s patients and older adults. These results build on recent work, conducted primarily in older adults, that has shown that test-retest reliability of remote, unsupervised administrations of cognitive tasks, including tasks of attention, working memory and executive function is in the moderate to good range [8,9,25]. For instance, one study in older adults examining test-retest reliability of a remote administration of the Trail Making and Stroop tasks, two tasks also included in our battery, found ICC values of 0.5-0.74 [6]. More importantly, however, the level of reliability we found across tasks was comparable to that reported for more traditional supervised, in-person cognitive testing, which remains the ‘gold standard’ of cognitive testing. For instance, in a study assessing the test-retest reliability of an in-person battery of ten cognitive tasks administered to a sample of Parkinson’s patients, reliability values ranged from 0.40-0.75, calculated via weighted Cohen’s kappa [11]. Our results therefore suggest that despite the early mild cognitive deficits that can be present in patients with Parkinson’s disease, and despite the presence of motor deficits, both of which could interfere with remote cognitive testing, the effect of having Parkinson’s disease does not disproportionately impact the reliability of remote cognitive testing in Parkinson’s patients when compared to non-clinical older adult populations. This suggests that unsupervised remote cognitive testing of Parkinson’s patients can be considered as a reliable alternative to paper-based, supervised assessments in the design of clinical research protocols.
Test-retest reliability of the self-report questionnaires assessing apathy, implusivity and daily living ability ranged from 0.77 to 0.87 indicating good reliability. This is consistent with the range of reliability levels reported in the original, paper-based administrations of these questionnaires. Reported test-retest reliability for the Apathy Evaluation Scale and Barratt Impulsiveness scale was 0.76-0.94 [19] and 0.83 [26], respectively. For the Parkinson’s Disease Activities Questionnaire, the test-retest reliability of the shorter, 15 item version used in our study has not been assessed, however the original 50 item version of the questionnaire was demonstrated to have a high ICC of 0.97 [27].
As expected, the test-retest reliability of the self-report questionnaires was, on average, higher than that of the cognitive measures. This aligns with one other study providing a comparison of the reliability of performance-based cognitive tasks and self-report questionnaires [28]. The authors suggested that the lower test-retest reliability of performance-based tasks might reflect the fact that cognitive tasks, in contrast to mood questionnaires, are typically designed to maximize between group differences, which results in comparatively lower between-subject variability. Because test-retest reliability is computed as the ratio of between-subject variance over total variance, the resulting ICC of cognitive tasks is lower [28,29]. Given the increasing interest in the field of Parkinson’s disease to relate individual differences in cognitive function to underlying features of the neurodegenerative process, clinical research protocols aimed at cognitive phenotyping will have to ensure the selection of performance-based measaures with good psychometic properties.
We found that data quality, across both the performance-based cognitive tasks and the self-report questionnaires was high in both the Parkinson’s patients and the older adults. Fewer than 5% of participants, on average, were excluded for failing attention checks on the questionnaires. Similarly, less than 4% of PD and OA participants were excluded based on data quality, on average, for each cognitive task. Additionally, over 90% of participants completed the full protocol. Although the feasibility of remote, unsupervised performance-based cognitive testing in Parkinson’s patients has not explicitly been previously explored, other studies have demonstrated the feasibility of using self-report questionnaires for large-scale, remote, research in Parkinson’s patients. For instance, the Fox Insight study, a large online-only study, has successfully recruited a very large cohort of individuals with and without Parkinson’s disease to complete longitudinal assessments and has shown that self-report assessments obtained online are comparable to those obtained during in-person assessments [4,5]. Our results suggest that the added complexity of cognitive testing does no significantly hinder the feasibilty of the remote, unsupervised testing modality but future work is need in patients with more advanced disease and a higher burden of cognitive deficits to ensure this mode of testing can be expanded to a more representative patient population, and in particular, to ensure that large-scale online testing can be leveraged to better understand the progression to mild cognitive impairment and to dementia.
Another factor potentially limiting the utility of performance-based cognitive tasks is the presence of practice effects. It is conceivable that such effects might be accentuated in online computer-based research given that participants’ performance might benefit from prior exposure to the computer-based and at-home setting (e.g., learning to minimize distactions, gaining familiarity with typical key responses, etc.). Overall, however, we did not find a pattern of performance change over time consistent with practice. Performance was sligtly worse on two tasks, and slightly better (faster responses) for one of the measures of a different task. Given the inconsistent changes, it is unlikely that this represents meaningful disease-related changes. Furthermore, given that the interval between testing sessions (20 to 135 days) was much shorter than the typical interval between assessments in longitudinal research, even possible small practice effects would likely abate over longer intervals.
One limitation of our study is that our sample is likely not representative of the greater Parkinson’s patient population. Though we successfully recruited patients with a range of disease duration (mean duration 6.82 years), as well as a higher proportion of females (40% female) than is typically reported [30], the education level was high and perhaps most telling, the cognitive performance and mood differences between the patients and older adults were minimal. This suggests a selection bias towards patients with earlier and/or milder disease. Though the inclusion of patients with significant cognitive impairment might be beyond the scope of online research, we think that certain modifications, such as shortening the protocol, ‘gamifying’ the cognitive tasks, and developing smartphone-compatible assessments could help bolster recruitment from a more diverse population of patients but future work engaging with patients is needed in order to identify and address the full range of barriers to participation.
In summary, these results indicate that online and unsupervised performance-based cognitive testing and self-report-based mood testing conducted in Parkinson’s patients is feasible and produces levels of reliability that are comparable to that of standard in-person testing. There is already evidence that leveraging remote online testing results in enrolment of much larger samples of Parkinson’s patients than would otherwise be possible with in-person testing. Our results suggest that more in depth cognitive and neuropsychiatric phenotyping is also possible on this scale, which is an important step towards designing research studies that are aimed at identifying the potential mechanisms underlying the heterogeneity in Parkinson’s disease.
Data Availability
All data produced in the present study are available upon reasonable request to the authors.
Funding
M.S. received from the Canadian Institutes of Health Research (CIHR 180588) and from the Fonds de Recherche du Québec – Santé.
Author contributions
Nasri Balit: Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft. Sophie Sun: Investigation, Methodology, Writing – review & editing. Yilin Zhang: Investigation, Methodology, Writing – review & editing. Madeleine Sharp: Conceptualization, Methodology, Project administration, Resources, Supervision, Writing – review & editing.
Competing interests
The authors declare no conflicts of interest.
Data and materials availability
All cognitive task code is openly available at https://github.com/cognitive-neuroscience/neuron and all analysis scripts are available at https://osf.io/vdw5h/. De-identified raw data is available upon request.
Acknowledgements
We would like to thank the participants for their time.