Abstract
Online, remote neuropsychological assessment paradigms may offer a cost-effective alternative to in-person assessment for people who experience subjective cognitive decline (SCD). However, it is vital to establish the psychometric properties of such paradigms. The present study (i) evaluates test-retest reliability of remote, online neuropsychological tests from the NeurOn software platform in people with and without SCD (Non-SCD) recruited from the general population; and (ii) investigates potential group differences in baseline performance and longitudinal change. Ninety-nine participants (SCD N = 44, Non-SCD N = 55) completed seven tests from the NeurOn battery, covering visual and verbal memory, working memory, attention and psychomotor speed. Sixty-nine participants (SCD N = 34, Non-SCD N = 35) repeated the assessment six (+/-one) months later. SCD was classified using the Cognitive Change Index questionnaire. Test-retest reliability of the NeurOn test outcome measures ranged from poor to good, with the strongest evidence of reliability shown for the Sustained Attention to Response Test and Picture Recognition. The SCD group was significantly older than the Non-SCD group so group differences were investigated using analysis of covariance whilst controlling for the effect of age. SCD scored significantly better than Non-SCD for Digit Span Backwards (maximum sequence length) and Picture Recognition (recall of object position) at baseline. However, these were not significant when using the Bonferroni-adjusted alpha level. There were no differences between SCD and Non-SCD in longitudinal change. The results suggest online, remote neuropsychological assessment is a promising option for assessing and monitoring SCD.
Author summary A considerable proportion of the older adult population experiences subjective decline in their thinking skills even though they score within ‘normal’ limits on screening tests for mild cognitive impairment or dementia. Research suggests that, for a small percentage of these people, their experience of a decline in their thinking skills might indicate an early stage of dementia. It is important for research to identify the earliest markers of dementia as this is when treatments may be most effective. By harnessing computing technology to improve on the accuracy and availability of cognitive assessments, we may be able to identify early and subtle cognitive changes caused by dementia. This study investigated whether online and remote cognitive assessment is a reliable method to assess and monitor thinking skills in the general older adult population. We were able to identify tasks which showed the best evidence for reliability when completed online and remotely by people with and without a subjective experience of cognitive decline, and therefore may be appropriate for monitoring thinking skills in people who are concerned about their cognitive ability. Our findings suggest online cognitive assessment may be a useful and cost-effective alternative to in-person clinic-based assessment.
Introduction
Cognitive and functional impairment associated with dementia places a significant burden on healthcare. This is projected to rise in line with the ageing population in the United Kingdom [1]. Research into treatments has been hampered by the lack of biomarkers for early or pre-symptomatic detection of neurodegenerative disease [2]. Pathophysiological changes of neurodegenerative disease occur years before symptom onset [3–5]. Therefore, earlier detection of dementia is a key priority for research as this is when disease-modifying treatments may be most effective [6,7]. There is emerging evidence that subtle cognitive changes are detectable years before diagnosis in sporadic neurodegenerative disease [6,8].
Neuropsychological assessment is a key tool for the detection and monitoring of cognitive impairment associated with dementia [9]. Better assessment methods are required to detect subtle cognitive changes in early disease stages [10]. The ability to harness advances in technology to collect more comprehensive and frequent data is a key area of interest in dementia research, including the use of digital methods for in-home monitoring of cognition [11]. Unsupervised, online neuropsychological assessment has the potential to increase the availability and frequency of cognitive assessments in order to detect and track subtle changes in cognitive ability [12].
It has been suggested that subjective cognitive decline (SCD) might be an early marker of cognitive impairment due to neurodegeneration [13]. SCD is the self-perception of a decline in cognitive performance despite unimpaired performance on standardised tests sensitive to mild cognitive impairment (MCI) or dementia [14]. Most people with SCD do not progress to MCI or dementia. However, research suggests they are at increased risk of doing so compared to people without SCD [14–16]. Specific factors have been identified to be associated with an increased risk of cognitive decline in people who experience SCD (known as the “SCD plus” criteria): subjective decline in memory, onset within the last five years, onset at age 60+, persistence of SCD, presentation at a memory clinic, and informant-reported cognitive decline [14,15]. A recent meta-analysis identified additional risk factors for objective cognitive decline in people with SCD beyond the SCD plus criteria [17], including biomarkers of Alzheimer’s disease pathology (e.g. high amyloid β/ high total tau protein in the brain and/or hippocampal atrophy), the presence of apolipoprotein E4 genotype, comorbid depression or anxiety, smoking status, fewer years of education, and poorer performance on a measure of executive functioning (investigated using Trail-Making Test B performance).
Given that most individuals with SCD will not progress to MCI, it is not recommended to monitor everyone. However, for those with additional risk factors, remote, online neuropsychology offers a low-cost method to assess and monitor cognition over time. Further, given the projected increase in the average age of people living in rural areas in England [1] remote assessment options offer a practical method to support accessibility of neuropsychological assessment services for rural populations [18]. Although such research is in its infancy, initial evidence suggests online neuropsychological assessment, completed remotely, can detect subtle deficits in cognition in people with SCD [19], therefore suggesting that this is a promising tool for the assessment and monitoring of SCD.
It is unclear whether online neuropsychological tests, completed remotely and unsupervised, show comparable psychometric properties to the ‘gold-standard’ in-person pen and paper tests. Various factors associated with online, remote test completion may impact on the reliability of results, such as technical issues, computer skills, cognitive and physical abilities affecting computer use, and a lack of supervision and additional instruction [20], meaning that equivalence to in-person tests cannot be assumed. A number of online neuropsychological assessment batteries have been developed which have shown low to high validity and reliability [21–26]. However, there is heterogeneity between the studies in terms of study populations and methods used. Therefore, more data is needed in different populations particularly for online, remote neuropsychological assessment to inform its use in clinical practice [20,27]. The present study evaluates the reliability of remote, online neuropsychological tests, completed without supervision by people with and without SCD recruited online from the general population.
The primary objective of the study is to establish the test-retest reliability of online tests from the NeurOn software platform in people with and without SCD. A selection of NeurOn tests were previously found to have moderate test-retest reliability in healthy older adults and feasibility for completing remotely [26]. The secondary objective of the study is to characterise online neuropsychological test performance in people with and without SCD by investigating group differences in baseline performance and baseline-to-follow-up change. These objectives were achieved.
Hypotheses
We predicted that:
Materials and methods
Ethical approval
Ethical approval was obtained from the University of East Anglia Faculty of Medicine and Health Sciences Research Ethics Subcommittee (ETH2223-0113). All participants provided informed consent electronically via an online consent form.
The Mantal and NeurOn software platforms
The Mantal software platform (https://mantal.co.uk/) from AAH Software Limited was developed by Alex Howard, Software Lead within the Norwich Research Park to facilitate the management of online clinical research studies. The NeurOn platform (https://neuropsychology.online/) was created by Professor Michael Hornberger in collaboration with Dr Emma Woodberry, Consultant Clinical Psychologist and Alex Howard, Software Lead as an alternative to in-person neuropsychological testing for clinicians and researchers. The NeurOn platform currently contains cognitive tests covering domains including memory, language, visuospatial ability, executive functioning and attention. Some standardised data are available and new tests are being developed. The tests feature randomised stimulus sets to allow longitudinal cognitive testing with minimal test-retest effects. NeurOn tests can be accessed within the Mantal software platform via an application programming interface. Therefore, participants are only required to create an account with one platform (Mantal) where they can then complete the relevant cognitive tests, pre-selected by the research team.
Test-retest reliability has been evaluated for a selection of the NeurOn tests (Reaction Time, a Go-No/Go test and the Virtual Supermarket Task) in a healthy control group who completed the online tests in-person (baseline) and remotely (follow-up), one week apart [26]. The four tests showed moderate test-retest reliability. In the present analysis, we extended these findings by assessing test-retest reliability for a larger selection of NeurOn tests in SCD and Non-SCD groups, separately, and for fully remote participation.
Participants
Participants were included if they met the following eligibility criteria:
Inclusion criteria
Age 60+ in line with the World Health Organisation definition of old age
Capacity to give informed consent
Sufficient computer literacy to complete the online Consent Form
Fluent in English
Access to a device (computer or laptop) for the completion of the study
Exclusion criteria
A diagnosis of a neurological or neurodegenerative condition
A diagnosis of mild cognitive impairment
Being under the care of a secondary mental health service, due to the link between severe psychiatric disorders (and some pharmacological treatments) with cognitive dysfunction [28].
We aimed to recruit a sample size of 50 people per group (SCD; Non-SCD) based on similar studies of normative neuropsychological test data [29,30]. Longitudinal research studies with older adults have reported drop-out rates of between 5-37% [31,32]. Therefore, we aimed to recruit 120 participants to factor in an attrition rate in this region (assuming roughly 20%).
Recruitment
Recruitment began in April 2023. Participants were recruited via advertisement on social media, within the University of East Anglia campus, and via the National Institute for Health Research “Join Dementia Research” register (http://www.joindementiaresearch.nihr.ac.uk) in Norwich.
Procedure
The Participant Information Sheet was sent to potential participants via email along with a link to the study, hosted on the Mantal clinical research software platform. Potential participants were advised they could take as much time as they like to consider the information sheet. People who decided to take part in the study were able to register with the study website (using their email address) and complete an online consent form. After completing the consent form, participants were able to access an online eligibility screen which they were asked to complete by indicating whether they met each of the eligibility criteria via check boxes. If participants met all eligibility criteria they were then able to access the full baseline study session. Participants were instructed to use a laptop or desktop computer to complete the study as some of the current versions of the NeurOn tasks do not function correctly if the screen size is too small.
At the baseline session, participants provided demographic information before completing the study measures (mood questionnaires, SCD questionnaire, and NeurOn tests). The following demographic data were collected: age, sex, level of education (1 = did not complete GCSE, 2 = GCSE or equivalent, 3 = A Level or equivalent, 4 = Undergraduate degree or equivalent, 5 = Master’s degree or equivalent, 6 = Doctoral degree), self-rated confidence using computers (1= not at all confident; 5 = very confident) since computer literacy may be related to online cognitive test performance [33], self-estimated average sleep time, social interaction (measured using the Duke Social Support Index [34], Social Interaction subscale: max score = 12, with higher scores indicating greater social interaction), previous COVID-19 infection or long-covid since a previous infection has been shown to affect cognition [35], occupation, first part of postcode (as a proxy measure of socioeconomic status) and whether participants had a diagnosis of dyslexia. First part of postcode was converted to a socioeconomic status score using the Indices of Multiple Deprivation produced by the Ministry of Housing, Communities and Local Government [36] to derive an income deprivation percentage for the relevant local authority. Higher scores indicate greater levels of deprivation in the local authority area.
Participants were contacted by email five months after completing their baseline session to invite them to complete their six- (+/-one) month follow up session. Participants repeated the mood questionnaires and the NeurOn tests at follow up.
Measures
Participants completed the following measures online via the Mantal study website:
Assessment of subjective cognitive decline
Given the recruitment method precluded detailed screening of participants, we used a validated questionnaire to assess SCD, the 20-item Cognitive Change Index (CCI) [37]. The CCI was developed to assess cognitive complaints in older adults. We defined SCD as a score of 20 or above on the first 12 items of the CCI in accordance with recommendations by the developers of the measure [38]. Participants completed the CCI during the baseline session.
Mood questionnaires
Mood was assessed since there are well documented links between mood and cognitive performance [39]. The 15-item version of the Geriatric Depression Scale (GDS-15; [40]) was used to screen for depression. The maximum score is 15. A score of five or above indicates mild depression symptoms; a score of nine or above indicates moderate depression symptoms. The Geriatric Anxiety Inventory (GAI) [41] was used to screen for anxiety. The maximum score is 20. A score of nine or above indicates clinically significant anxiety symptoms. These scales were chosen as they were developed for use in older adult populations, therefore avoiding misattributing signs of normal ageing to depression or anxiety, and are well validated and commonly used.
Online neuropsychological assessment
Participants completed computerised neuropsychological tests from the NeurOn software platform within their Mantal account via an application programming interface within the Mantal study website. The tests can be completed using either touch screen or keyboard input, depending on the capabilities of the equipment used by participants. Participants completed the following tests in the order shown:
Picture Encoding: a stimulus encoding phase in which everyday objects are presented on screen at varying locations (top, bottom, left or right). Participants are instructed to remember the pictures and where on the screen they were presented.
Simple Reaction Time: participants are instructed to respond to repeated, on-screen stimuli as fast as they can.
Digit Span backwards (working memory): participants are required to remember a sequence of digits which are presented one by one on the screen. They must recall the digits in reverse order. The length of the sequence increases until two trials of a sequence length are failed, ending the test.
Picture Recognition (visual memory): a recognition phase in which everyday objects (made up of a mixture of previously presented objects during the Picture Encoding phase, and novel objects) are presented on screen. For each item, participants must indicate whether they saw the object before. If they answer ‘yes’, they are then asked where on the screen the object was presented.
Word Encoding: a stimulus encoding phase in which a series of high-frequency words are presented on screen at varying locations (top, bottom, left or right). Participants are instructed to remember the words and where on the screen they were presented.
Sustained Attention to Response Test (attention): participants are presented with a series of digits and are instructed to respond to each digit apart from one (the ‘no-go’ target stimulus). There are 255 trials in the test, therefore requiring sustained attention over time. The task records reaction time, and will identify responses that are “too soon” or anticipatory (i.e. indicating responses that are faster than would be possible if following the rules of the task).
Word Recognition (verbal memory): a recognition phase in which a series of words (made up of a mixture of previously presented words during the Word Encoding phase, and novel words) are presented on screen. For each item, participants must indicate whether they saw the word before. If they answer ‘yes’, they are then asked where on the screen the word was presented.
Trail-Making Tests A and B (psychomotor speed, attention): participants are required to click 25 symbols in a certain order as fast as possible. For Trail-Making Test A participants must click numbered circles in order from smallest to largest, whereas for Trail-Making Test B they must alternate between numbers and letters in ascending order.
These tests were selected as they measure cognitive abilities commonly affected in early stages of dementia [42–44].
There was a delay of approximately 10 minutes between the picture/word encoding and recognition subtasks. The full neuropsychological test battery took approximately 20-30 minutes to complete. While the neuropsychological test battery was required to be completed in one sitting, participants were informed they could complete the neuropsychological tests and the questionnaires in separate sittings.
Analysis
The study used a longitudinal observational case-control design. Participants were grouped (SCD; Non-SCD) according to their score on the CCI. Test-retest reliability of the online neuropsychological tests was assessed in both groups, separately. Performance on the online neuropsychological tests at baseline and the change over time was compared between the two groups. The selected outcome measures for each cognitive test are detailed in Table 1.
Test-retest reliability was assessed using two-way mixed effects intraclass correlation coefficients (ICC) with absolute agreement as is recommended [45]. Koo and Li [46] suggest the following interpretation of ICC values: less than 0.5 indicates poor reliability, 0.5-0.75 indicates moderate reliability, 0.75-0.9 indicates good reliability, and greater than 0.9 indicates excellent reliability.
Chi-square test was conducted to investigate differences in sex, previous COVID-19 infection, long-covid prevalence, and dyslexia prevalence between the two groups. Continuous demographic data were assessed for normality using the Shapiro-Wilk test. The assumption of normality was violated for all continuous demographic measures. Therefore, Mann-Whitney U test was used to test for group differences (SCD versus Non-SCD) in these variables and group statistics reported using median and interquartile range. Analysis of covariance (ANCOVA) was used to explore group differences in baseline and baseline-to-follow-up change scores for each neuropsychological test outcome measure while controlling for the effect of age. Omega squared (ω 2) was used as a measure of effect size as it is less biased than other effect size measures in small samples [47]. Given each set of ANCOVAs examined 18 dependent variables (NeurOn test outcome measures), a Bonferroni-adjusted alpha level of 0.05/18 = 0.003 was used for the ANCOVA results. Change scores were calculated by subtracting baseline scores from follow-up scores.
Data analysis was conducted using JASP (version 0.18.3) [48], R (version 4.0.2) and RStudio (version 2023.12.1) [49].
Results
Participant demographics
Figure 1 shows participation and completion rates for each part of the study. Twelve people registered and provided consent to participate but then did not complete the eligibility screen. Therefore, it is presumed they did not meet eligibility for the study. Two people did not complete the CCI and therefore were excluded from the group comparisons. 108 people (SCD N=47, Non-SCD N=61) completed the CCI and at least the study questionnaires at baseline. The demographics and questionnaire scores of the 108 participants are summarised in Table 2. All participants lived in the United Kingdom. The SCD group were significantly older than the Non-SCD group and scored significantly higher for depression and anxiety, however, the medians were well below the clinical range for both tests. As expected, CCI score was significantly higher in the SCD group.
Test-retest reliability
Table 3 shows the ICC values for each outcome measure, separated by group (SCD, Non-SCD). Two of the Digit Span Backwards outcome measures (‘N correct’, and ‘Max length’) showed moderate test-retest reliability in the SCD group. However, in the Non-SCD group, reliability was poor for all Digit Span Backwards measures. Word Recognition subscores showed poor to good reliability in the Non-SCD group, but poor reliability in the SCD group. ICC values for the Simple Reaction Time task indicated poor reliability across the two groups. Completion time for Trail-Making Test A showed moderate reliability for the Non-SCD group only, whereas for Trail-Making Test B showed moderate reliability in the SCD group only. ICC values for Picture Recognition indicated moderate to good reliability in the Non-SCD group for all measures, and moderate reliability in the SCD group for ‘N position correct’. The Sustained Attention to Response Test showed poor to moderate reliability in both groups, for all outcome measures.
Group differences in online neuropsychological test performance
Eighty-five participants completed the full neuropsychological test battery at baseline (SCD N = 42, Non-SCD = 43; Figure 1). Up to 98 participants provided data for each individual test. The ANCOVA results for group differences in baseline neuropsychological test scores while controlling for age are presented in Table 4. The assumption of homogeneity of regression was tested for each ANCOVA and was non-significant for all. Using the unadjusted alpha level of 0.05, the SCD group scored significantly better than the Non-SCD group for Digit Span Backwards – ‘Max length’, and Picture Recognition – ‘N position correct’. However, these were not significant when using the Bonferroni-adjusted alpha level of 0.003. There were no other significant group differences in baseline neuropsychological test performance.
The ANCOVA results for group differences in baseline-to-follow-up change in scores while controlling for age are presented in Table 5. The assumption of homogeneity of regression was violated for the ANCOVAs of group differences in change scores for the following: Digit Span - N correct, Digit Span - Max Length, Word Recognition - N correct. There were no significant group differences in baseline-to-follow up change in neuropsychological test scores.
Discussion
The aim of the current study was to investigate the test-retest reliability of online, remote neuropsychological assessment in people with and without SCD. Seven online neuropsychological tests were investigated, covering cognitive domains of visual and verbal memory, working memory, attention and psychomotor speed. There was poor to good reliability across all outcome measures. We predicted that the tests would show moderate reliability in line with a previous study [26], however the present study used a larger battery with different tests and featured a greater number of outcome measures. Therefore, our results showed greater variability in terms of estimates of reliability. Overall, the best evidence of reliability was found for the Sustained Attention to Response Test and Picture Recognition, as these showed moderate to good reliability across both groups for at least one outcome measure. These tests can be recommended for remote and repeated assessment.
A second aim of the study was to explore whether there are group differences (SCD versus Non-SCD) in baseline and longitudinal change in online neuropsychological test scores. At baseline, the SCD group scored significantly better than the Non-SCD group for Digit Span Backwards – ‘Max length’ (a measure of working memory), and Picture Recognition – ‘N position correct’ (a measure of spatial memory), which is opposite to what we predicted based on previous research [19]. However, these were not significant when using the Bonferroni-adjusted alpha level which accounts for multiple testing. Therefore, it is possible that these represent false positive results. Additionally, there were no group differences in baseline to follow-up change scores. Given that most of the research into cognition in SCD has employed in-person assessment, it was unclear whether subtle impairment would be detected using online, remote assessment, for which reliability can be impacted by factors specific to this method [20]. It is important to identify reliable online tests as a first step to exploring group differences in performance, and, given the subtle differences reported in the literature to date [19,50], large sample sizes may be required to detect changes when using online assessment methods.
Our results suggest that NeurOn online neuropsychological tests have moderate test-retest reliability in people with SCD and Non-SCD, in particular the Sustained Attention to Response Test and Picture Recognition. In-person equivalents of these tests have shown test-retest reliability estimates of 0.76 (one week follow-up [51]), and 0.60 (one-month follow-up, visual memory [52]), respectively, in healthy control populations. Therefore, the online versions of these tests show comparable reliability in this population of healthy older adults when completed remotely. This suggests that online, remote, completion of these tests is a reliable method for monitoring changes in cognition in this population.
Limitations
There are a number of limitations to the present study. Some participants discontinued the baseline or follow-up neuropsychological testing sessions and, therefore, there were missing data for the tests. This may have been due to the fully online, remote methodology (i.e. due to lack of additional instruction). Group sizes differed across neuropsychological tests for this reason. However, since the aim of the present research is to understand the feasibility of this methodology for research and clinical practice, this is likely an inevitable consequence of this study design. Future research should investigate whether the rate of non-completion during online, remote assessment paradigms is above that seen in studies using in-person/ supervised assessment methods. Reasons for non-completion were unclear unless participants contacted the lead researcher directly. Therefore, it is not possible to draw firm conclusions about factors contributing to discontinuation of testing in the present study.
There was no option to ‘skip’ a neuropsychological test during the testing session, meaning that if people encountered technical issues they would be unable to complete the later tasks. This may have reduced the sample sizes for neuropsychological tests towards the end of the battery.
Our definition of SCD was based on the recommended cut-off score on a validated questionnaire (the CCI). This is in line with other studies which have defined SCD using the CCI [38]. However, this method may not completely map on to the definition of SCD proposed by the SCD-Initiative working group [53]. There is considerable variability across studies in the methods used to define SCD making it difficult to compare findings [54]. Therefore, it is not clear whether the finding of no group difference in performance between SCD and Non-SCD in the present study reflects differences in the tests used in the current study to those used in a previous study which found subtle impairment in SCD [19], or whether this reflects differences in the criteria used to define SCD across studies. This should be explored further. There is a need to improve consistency across studies in the definition of SCD. This study was conducted fully online, precluding in-person screening of SCD. It will be particularly important to establish the most suitable method of classifying SCD for online studies.
Finally, while the results of the present study show moderate reliability for a subset of the included tests when completed online and remotely these results are not generaliseable to other online neuropsychological test platforms which may differ in ways to the tests assessed in the current study.
Conclusion
We found moderate test-retest reliability for NeurOn tests of memory and attention in people with and without SCD. This suggests online, remote neuropsychological assessment is a promising option for assessing and monitoring SCD, offering a cheaper alternative to in-person assessment and potentially increasing accessibility for some people. While there are practical issues to be resolved in future research, including exploring issues relating to drop-out, online and remote neuropsychological assessment has the potential to improve efficiency and accuracy of neuropsychological assessment.
Declaration of interest
Professor Michael Hornberger is the research director at NeurOn.
Funding
No funding was received for conducting this study.
Data availability statement
The data that support the findings of the current research are available from the corresponding author on reasonable request.
Author contributions
KAP, AL and MH contributed to conceptualisation of the research, development of the methodology, and project administration. KAP collected the data and conducted the formal statistical analysis. AL and MH supervised the research activity. KAP drafted the manuscript. AL and MH contributed to review and editing of the manuscript.
Acknowledgements
The authors would like to thank Alex Howard, AAH Software Ltd, for his advice and support with setting up the study website and the National Institute for Health Research Join Dementia Research service for support provided during recruitment.