Abstract
INTRODUCTION Mobile app-based unsupervised monitoring of cognition holds the promise to facilitate case-finding in clinical care and the individual detection of cognitive impairment in clinical and research settings. In the context of Alzheimer’s disease, this is particularly relevant for patients who seek medical advice due to memory complaints.
OBJECTIVE We developed a Remote Digital Memory Composite score from an unsupervised remote and mobile cognitive assessment battery focused on episodic memory and long-term recall and assessed its construct validity using a neuropsychological composite score for early cognitive impairment in Alzheimer’s disease, the Preclinical Alzheimer’s Cognitive Composite (PACC5). We also assessed the test-retest reliability of the Remote Digital Memory Composite score across two independent test sessions. Finally, we assessed the diagnostic accuracy of the remote and unsupervised cognitive assessment battery when predicting PACC5-based cognitive impairment in a memory clinic sample and healthy controls.
SETTING This was an add-on study of the DZNE-Longitudinal Cognitive Impairment and Dementia Study (DELCODE) which was also performed in a separate memory clinic-based sample.
PARTICIPANTS A total of 102 study participants were included as healthy controls (HC; n=25), cognitively unimpaired first-degree relatives of AD patients (REL; n=7), individuals with subjective cognitive decline (SCD; n= 48) or patients with mild cognitive impairment (MCI; n=22).
MEASUREMENTS We analyzed results from the objects-in-rooms recall (ORR) test, the mnemonic discrimination for objects and scenes (MDT-OS) test and the complex scene recognition (CSR) test implemented on the neotiv digital platform to derive a Remote Digital Memory Composite. Participants used the neotiv mobile app to complete one unsupervised test session every two weeks on their own mobile device in an environment of their choice. We assessed the relationships of the Remote Digital Memory Composite acquired through the mobile app and in-clinic measures of the PACC5 conducted by trained neuropsychologists in the memory clinics participating in the DELCODE study.
RESULTS 102 participants provided technically complete data for at least one single session of each of the three test paradigms, of which 87 participants provided data from at least two test sessions of each task. The derived Remote Digital Memory Composite score was highly correlated with the PACC5 score across all participants (r=.75, p<0.001), and also in those without complaints (HC and REL, r=.51, p=0.003) and those with complaints separately (SCD and MCI, r=.76, p<0.001). Good test-retest reliability for the Remote Digital Memory Composite score was observed in those with at least two assessments of the three tests. (r=.74; p<.0001). Diagnostic accuracy for discriminating PACC5-based memory impairment from no impairment was high (AUC = 0.9) with a sensitivity of 0.83 and a specificity of 0.74.
CONCLUSION Our results indicate that unsupervised mobile cognitive assessments in a memory clinic setting using the implementation in the neotiv digital platform has high construct validity and results in a good discrimination between cognitively impaired and unimpaired individuals based on the PACC5 score. Thus, it is feasible to complement neuropsychological assessment of episodic memory with unsupervised, remote assessments on mobile devices. This contributes to recent efforts for implementing remotely performed episodic memory assessment for case-finding and monitoring in large research trials and clinical care.
Background
Differentiating mild cognitive impairment (MCI) from subjective cognitive impairment is important to provide prognosis regarding future cognitive decline as well as regarding the potential eligibility for treatments at the MCI stage of Alzheimer’s disease (AD). However, differentiating MCI from subjective cognitive impairment is still very challenging using brief cognitive tests (Petrazzuoli et al., 2020). Older adults who seek medical advice due to memory complaints and who are later found to have an Alzheimer’s biomarker profile, have an amnestic variant in which a major component of the impairment affects episodic memory in more than 80% of the cases (Xie et al., 2014). Indeed, episodic memory, the ability to recall spatial and temporal relationships of personally experienced events (Tulving, 2002), is a key component of the neuropsychological assessment of individuals with suspected AD (Costa et al., 2017). Not surprisingly, episodic recall is an important element of the Preclinical Alzheimer Cognitive Composite (PACC5) (Donohue et al., 2014; Papp et al., 2017).
The aim of the PACC5 is to provide a comprehensive assessment of AD relevant cognitive impairment and to serve as a tool with validated sensitivity to detect cognitive decline over time (Donohue et al., 2014; Papp et al., 2017). The assessment of the PACC5 is time-consuming and requires supervision by a trained neuropsychologist (Donohue et al., 2014). This severely restricts its utility and implementation in primary care, especially when considering equal-opportunities to PACC5-like assessments also in rural areas, and high-frequency monitoring of cognitive functions in clinical trials and research studies. In general, the long test duration and specialized supervision make the high-frequency longitudinal use of established neuropsychological assessments practically impossible. There is, thus, a strong need for unsupervised, remote, high-frequency cognitive assessment that can provide meaningful approximation of PACC5-like composite scores.
Given that the PACC5 draws heavily on episodic memory measures (WMS-R Logical Memory Delayed Recall and Free and Cued Selective Reminding Test), implementing a mobile and remote proxy for a neuropsychological assessment such as the PACC5 also offers the opportunity to overcome some of the shortcomings of neuropsychological tests. One potential disadvantage of established neuropsychological assessments of episodic memory is for example that they heavily tax on verbal abilities which makes it difficult to assess episodic memory in multi-lingual settings or when verbal abilities are already impaired (Costa, et al., 2017). In addition, implementing new cognitive tests allows to take into account the latest insights into the functional architecture of episodic memory and the spread of AD pathology. Recent work on the functional neuroanatomy of episodic memory showed that episodic memory involves a network including medial temporal, midline parietal and cortical regions, each of which serve different functions and are affected in different stages of AD (Grothe et al., 2017). Episodic memory requires pattern separation processes that are mediated by the dentate gyrus (Bakker et al., 2008; Berron et al., 2016) and reduce memory interference between similar events, and pattern completion processes that are mediated by hippocampal Cornu Ammonis 3 (CA3) and enable the recollection of details from a past event in interplay with neocortical regions (Grande et al., 2019). The medial temporal lobe regions provide information to the hippocampus mainly through the entorhinal cortex. That in turn, receives partly domain-segregated information such that object representations are transferred via the perirhinal cortex and the anterior-lateral entorhinal subdivision and scene representations via the parahippocampal cortex and posterior-medial parts of the entorhinal cortex (Berron et al., 2019, 2018; Maass et al., 2019, 2015; Schröder et al., 2015). Taken together, there is converging evidence that in addition to long-term recall, short-term mnemonic discrimination of object and scene representations is impaired in the predementia stages of AD (Grande et al., 2021). Besides pattern separation and completion, a third aspect of episodic memory is recognition memory (Düzel et al., 2018, 2011). Although the neurobiology of recognition memory is complex and it is likely to have a non-episodic, familiarity-based component (Düzel et al., 2001, 1999; Horner et al., 2012), it is evident that medial temporal lobe dysfunction can impair recognition memory alongside impairments of recall (Horner et al., 2012).
A set of anatomically-informed and non-verbal tasks for episodic memory that incorporate these recent insights into the functional anatomy of episodic memory is available on the neotiv digital platform (https://www.neotiv.com/en) and has been implemented in prospective cohort studies of the German Center for Neurodegenerative Diseases (DZNE). There are three different tests of memory. First, a short-term mnemonic discrimination test tapping into pattern separation, separately implemented for object and scene stimuli, second, a short- and long-term cued-recall test of object-scene associations tapping into pattern-completion and, third, a long-term photographic scene recognition memory test.
Here we evaluate these three memory measures in a remote and unsupervised fashion using mobile devices. To that end, we develop a Remote Digital Memory Composite score and assess its construct validity using PACC5 in-clinic testing as well as its retest reliability across independent test sessions. Finally, we assess the diagnostic accuracy of the Remote Digital Memory Composite score when differentiating between individuals with and without PACC5-based cognitive impairment in a memory clinic sample.
Materials and Methods
DELCODE study design
DELCODE is an observational longitudinal memory clinic-based multicenter study in Germany. The detailed study design of DELCODE is reported in (Jessen et al., 2018). In total, 1079 individuals at the age of 60 years or higher were enrolled in the study between April 2014 and August 2018. Participants were included as individuals with subjective cognitive decline (SCD; n=445), if they presented to a memory clinic with a complaint of cognitive decline and performed better than -1.5 standard deviations (SD) of the age-, sex- and education adjusted normal range on all subtests of the consortium to establish a registry of AD neuropsychological test battery (CERAD) and fulfilled the SCD research criteria (Jessen et al., 2014; Molinuevo et al., 2016). Participants with amnestic MCI (MCI; n=190) and mild dementia of the Alzheimer’s type (DAT; Mini-Mental-State-Examination, MMSE, ≥ 18 points; n=126) were enrolled based on the memory clinic’s diagnosis, which were guided by the current research criteria for MCI and DAT (National Institute on Aging and Alzheimer’s Association - NIA-AA) (Albert et al., 2011; McKhann et al., 2011). First-degree relatives of individuals with DAT were recruited by advertisement (REL; n=82). DAT in the relatives had to be confirmed by medical documentation. Healthy control participants (HC; n=236) were also recruited by advertisement, which explicitly addressed individuals who felt no relevant cognitive impairment. Ten university-based memory centers are participating, which are all collaborators of local DZNE sites. All local institutional review boards (IRB) and ethical committees approved the study protocol.
Remote mobile monitoring add-on study
The remote mobile monitoring add-on study started in 2019 after a separate approval by IRBs and ethical committees of each participating site. All DELCODE participants except patients with DAT were eligible in case they owned a smartphone or tablet with internet access that was technically suitable for the mobile app to be installed on and that they could operate on their own. Seven DELCODE sites recruited 77 participants successfully into the remote mobile monitoring add-on study. One memory clinic associated with the DZNE, the memory clinic of the Department of Neurology and the Institute of Cognitive Neurology and Dementia Research at the Medical Faculty of the University Hospital of the Otto-von-Guericke University, recruited additional 25 memory complainers that were referred from general practitioners (GPs) following memory complaints. The PACC5 was conducted according to the same Standard Operating Procedures in all participating memory clinics throughout the study. DELCODE participants were asked at their regularly scheduled annual follow-up visit and memory clinic patients during their in-clinic visit whether they would like to participate in the add-on study and perform one remote cognitive test every two weeks on their smartphone for 1.5 years. If they agreed, study personnel did lend support installing the app from the respective app store on the participants own mobile device (smartphone or tablet computer), but participants received no further verbal instructions apart from a printed manual. The Object-in-Room Recall test (ORR), the Mnemonic Discrimination Test for Objects and Scenes (MDT-OS) and the Complex Scene Recognition Test (CSR) were completed by participants remotely and unsupervised using their mobile device. Participants were asked to complete memory assessments every two weeks, each of which consisted of a 2-phase session separated by a short delay. The two phases were either two halves of mnemonic discrimination, or encoding and retrieval phases of complex scene recognition and object-in-room recall (see details of the tasks below). Every phase took around 10 minutes. The three different paradigms alternated over the weeks in the following order: CSR, ORR, MDT-OS. Note, that we only present the results of the first test session of each task (and used the second session for reliability measures). Tests were remotely initiated every two weeks via push notifications which were sent at the same time-of-day as the registration, but participants had the possibility to postpone test sessions. This approach was chosen in order not to urge participants to take the test under suboptimal conditions such as distraction, fatigue or temporary illness. Daily reminders were sent via push notifications until the respective task was completed, and the actual time of testing was recorded. Before each test session, participants were reminded by the app to perform the test in a quiet environment, to put their glasses on if needed and to ensure that their screen was bright enough to see the pictures clearly. They also received a short practice session at the beginning of each session. After each test session, participants were asked within the app if they were distracted by things happening around them during the session (yes/no decision) and to rate their concentration level and subjective performance (1=very bad, 2=bad, 3=middling, 4=good and 5=very good). Hence, participants received the instructions for the cognitive tests remotely and performed the test fully unsupervised.
Clinical and neuropsychological assessments
The annual neuropsychological testing in DELCODE included the PACC5 (Papp et al., 2017) and other assessments reported in full in (Jessen et al., 2018). The PACC5 z-score was calculated as the mean performance z-score across the MMSE (Folstein et al., 1975), a 30 item composite screening test, the WMS-R Logical Memory Delayed Recall (Wechsler and Stone, 1987), a test of delayed (30 min) story recall, the Digit-Symbol Coding Test (DSCT; 0–93) (Wechsler, 1981), a test of memory, executive function and processing speed, the Free and Cued Selective Reminding Test–Free Total Recall (FCSRT96; 0–96) (Grober et al., 2008), a test of free and cued recall of newly learned associations, and the Category Fluency Test, a test of semantic memory and executive function. The z-scores for the PACC5 in our analysis were derived using the mean and standard deviation of healthy controls, participants with SCD as well as relatives of patients with dementia in the entire DELCODE study. A PACC5 composite score was calculated when at least three of its five components were available while making sure that at least the MMSE, one memory and either category fluency or DSCT were included (out of the 102 participants, eight provided four PACC5 elements, four participants provided three elements, and 90 provided all five elements).
In the DELCODE cohort, the clinical labels (HC, REL, SCD, MCI) were established in the baseline assessment of each participant. Therefore, the PACC5 assessment provided a more accurate and up-to-date assessment of the cognitive impairment of each participant with respect to the time at which the Remote Digital Memory assessment was conducted (mean time between baseline assessment and app-based testing was 1.2 years while mean time between closest-in-time PACC5 visit and app-based testing was only 0.7 years). Furthermore, the PACC5 assessment is a composite of widely used and well-established cognitive tests and thus allows generalizability of our findings that is stronger than what would be achievable with a single neuropsychological test-based clinical classification. In the DELCODE cohort, clinical assessments also included the Clinical Dementia Rating (CDR).
Mnemonic Discrimination Test of Objects and Scenes (MDT-OS)
Figure 1A shows the outline of the MDT-OS test (Berron et al., 2019, 2018; Güsten et al., 2021; Maass et al., 2019). In this test, participants are presented with 3D rendered computer-generated objects and scenes that are repeated either identically or in slightly modified versions. Participants need to decide whether a repeated presentation shows a repetition of the original picture or a modified version. They indicate their response by either tapping on a button (for an exact repetition) or by tapping on the location of a change (for a modified version). They see 32 object and 32 scene pairs where half are repeated or modified respectively. One session was split into two phases and completed on two consecutive days following a 24-hour delay. The first phase was presented as a one-back task while the second phase was presented as a two-back task. The test provides a hit rate, a false alarm rate and a corrected hit rate for both the object and scene condition. The corrected hit rate for the scene condition is used for the Remote Digital Memory Composite.
Object-in-Room Recall Test (ORR)
Figure 1B shows the outline of the ORR-Test (for a discussion of the principles of pattern completion on which this test is based see (Grande, et al., 2019)). In this test, participants are presented with 3D rendered computer-generated rooms, in which two 3D-rendered objects are placed. Participants recall which object was placed at a specific location cued by a colored circle in the empty room in an immediate recall test. They indicate their recall decision by tapping on one of three objects displayed below the empty room: the correct object for that location, the object that was also present in the room but at a different location (correct source distractor) and a completely unrelated object (incorrect source distractor). They learn 25 such object-scene associations. After a delay of either 30 minutes or 24 hours, the same recall test is repeated. In the ORR test, the ability to recall the correct association is graded and allows to separate correct episodic recall from incorrect source memory. Thus, correct recall excludes the choice of an object that was present in the same room but at a different location (wrong source memory for specific location) and an object that was not present in the room but nevertheless associated with the objects belonging to the room during encoding (wrong source memory for overall location). The test provides several outcome measures. Total recall: Number of correct immediate plus correct long-term recalled items with a maximum number of 50 correct responses. Total delayed recall: the number of correctly recalled items at the delayed recall. Total recall and cued recognition: the number of correct choices of the target object and the correct source distractor (but not the incorrect source distractor). Delayed recall of successfully encoded items: The number of correct immediately recalled items plus those items that have additionally been recalled after a delay. The latter measure is used here for the Remote Digital Memory Composite.
In the DELCODE add-on study, 12 test sessions of the ORR test with 30-minute and 24-hour delay versions were alternated over successive measures (tests sessions with odd numbers had 30-minute delays while test sessions with even numbers had 24-hour delays). Here, we only report results of the first session, i.e. using a 30-minute delay. For reliability measures, we use data from the first and the third ORR test, since they both have 30-minute delays.
Complex Scene Recognition Test (CSR)
Figure 1C shows the outline of the CSR test (Bainbridge et al., 2019; Düzel et al., 2018, 2011). Participants see 60 photographic images depicting indoor and outdoor scenes. For encoding, participants make a button-press decision whether the presented scene is indoors or outdoors. After a delay of 65 minutes, the participants are informed via push notification to complete the second phase of the task. Here, the encoded images are presented together with 30 new images and participants make old/new/uncertain recognition memory decisions. The test provides a hit rate, a false alarm rate and a corrected hit rate. The corrected hit rate is used for the Remote Digital Memory Composite.
Data handling and quality control
DELCODE participants used the app with a pseudonymized ID (no identifying information or clinical information was available or required in the mobile app) provided to them during a memory clinic visit. The app data were transferred directly to the clinical research platform of the DZNE in accordance with the General Data Protection Regulation. The mobile app data were then related to the clinical data by the clinical research platform of the DZNE and in the following released to DELCODE Principal Investigators and to neotiv GmbH. Data handling and quality control procedures for the clinical DELCODE data are reported in (Jessen et al., 2018)
Statistical analysis
All statistical analyses were performed in R (R Core Team, 2020). We correlated the Remote Digital Memory Composite score with the PACC5 score to assess convergent validity using the Pearson correlation coefficient. We conducted this analysis for the entire group of participants and also for those without (HC and REL), and with memory complaints (SCD and MCI) given that the latter subgroups are especially relevant in health care settings. Multiple regression models were used to assess the relationship with age, sex and years of education on the PACC5 as well as the Remote Digital Memory Composite. In addition, we assessed the influence of the Time-of-Day, the Time-to-Retrieval and the screen size of the mobile device on the individual components of the Remote Digital Memory Composite as well as on the Remote Digital Memory Composite itself. We also assessed test-retest reliability using the Pearson correlation coefficient. Diagnostic accuracy and receiver operating characteristic (ROC) analyses were performed using the pROC package (Robin, et al., 2011).
Results
Recruitment and adherence
Here we considered the first 102 study participants who completed at least one session of each of the three cognitive tests (25 healthy controls, 48 individuals with SCD, 7 relatives of DAT patients and 22 MCI patients, see Table 1 for sample characteristics). In addition, 87 of these participants completed at least two sessions of each cognitive test which allows us to estimate the test-retest reliability. Thus, 15% of those that have completed the first composite (at 6 weeks) have not yet reached the second completion (at 12 weeks). The DZNE site in Magdeburg obtained additional recruitment data to quantify interest and identify reasons to decline participation in the add-on study. Of the first 90 participants that were asked to participate, 51% agreed and were successfully recruited. 28% expressed interest, but could not be recruited for technical reasons (either they owned no mobile device, their mobile device was technically too old, or they had no mobile plan or WIFI at home). 4% were undecided and agreed to be asked again at the next annual DELCODE visit. 5% expressed mistrust towards apps and 12% were not interested. The remote mobile monitoring add-on study in DELCODE is scheduled for 1.5 years and the corresponding long-term adherence data will be published once available.
Contextual factors
Across all three cognitive tests, participants reported high concentration levels during the task (mean = 4, scale 1-5, which translates to good concentration), and high subjectively rated task performance (mean = 3.7, scale 1-5 which translates to good subjectively rated performance). While concentration levels were similar across tasks (3.63, 4.11, 4.22 for MDT-OS, ORR and CSR respectively), subjective performance indicated higher task difficulty for the MDT-OS (2.87) compared to ORR and CSR (4 and 4.3 respectively). In addition, 89% of the participants reported no distractions during their test sessions.
The time between encoding and retrieval in the ORR and CSR tests was adhered to as follows. 45% of participants completed the retrieval within 1.5 hours, 22% within 6 hours, 19% within 48 hours and 13% took more than 48 hours. Participants were invited to the retrieval phase of the ORR after 30 minutes, and their actual median delay was 57 minutes, while they were invited to the CSR retrieval after 65 minutes, and completed it after a median delay of 2 hours 46 minutes.
Across tasks, individual test sessions were performed between 8.20 AM and 8.30 PM (mean 1.54 PM, SD = 2 hours 22 minutes). Mobile devices had a screen diagonal between 10.15 – 27.65 cm (mean 13.7 cm, SD = 3.6) indicating the use of smartphones as well as tablet computers.
Development of the Remote Digital Memory Composite
We built a Remote Digital Memory Composite score using equal weights where each component (each of the three cognitive tests) had the same weight. The mnemonic discrimination test comes in two task conditions, one for scenes and one for objects. For the Remote Digital Memory Composite, we decided to include scene mnemonic discrimination but not object mnemonic discrimination for the following reasons. First, we were aiming for a rather short overall testing time for the future Remote Digital Memory Composite score and therefore wanted to only include a single mnemonic discrimination condition. Second, our earlier work showed that while object mnemonic discrimination (MDT-O) has been associated with measures of tau pathology in cognitively unimpaired individuals (Berron et al., 2019; Maass et al., 2019), scene mnemonic discrimination (MDT-S) was associated with amyloid load in posterior brain networks known to be affected at the MCI stage (Maass et al., 2019). All individual components (ORR, MDT-S and CSR) were z-standardized using the mean and standard deviation of the cognitively unimpaired participants (HC, REL, SCD). The resulting three z-scores were averaged to derive the final Remote Digital Memory Composite score. The test-retest reliability between two independent time points was good (r = 0.74, p<.001).
Relationship between the Remote Digital Memory Composite and the PACC5
Given that the participants are part of a longitudinal cohort study, we used the PACC5 score from the closest-in-time in-clinic visit (to the mobile app add-on study) to perform a correlation analysis between the Remote Digital Memory Composite and the PACC5 score to assess convergent validity. In DELCODE, data release is conducted by the clinical research platform and for those individuals where the closest-in-time data had not yet been released, we used data from the second closest assessment. The average time interval between the in-clinic visits and the remote app assessments was 0.7 years. The first Remote Digital Memory Composite correlated highly (r=.75, p<.001) with the closest-in-time available in-clinic PACC5 scores. When considering only participants with memory complaints, meaning those that were referred to the memory clinics by their GP and fulfilled either SCD or MCI criteria, the construct validity of the Remote Digital Memory Composite remained very high (r = .76, p<.001). The construct validity in individuals without memory complaints (HC and REL) was moderate (r = .51, p=.003). Results of the whole cohort, and separately for memory complainers are presented in Figure 2. A multiple regression model including all individual mobile components (ORR, MDT-S and CSR) predicting the PACC5 score showed a significant effect for each predicting component (Adjusted R2 = 0.55, βORR = 0.44; βCSR = 0.3; βMDT-S = 0.25). For completeness, we also ran a multiple regression model including all possible tests including the MDT-O. While ORR, MDT-S and CSR showed a significant effect again, MDT-O did not contribute significantly to the model in addition to the other three components (Adjusted R2 = 0.56, βORR = 0.41; βCSR = 0.29; βMDT-S = 0.2; βMDT-O = 0.14).
Relationship with Age, Sex, Education and other factors
Multiple regression models with age, sex, years of education, Time-Of-Day, Time-to-Retrieval and screen size were calculated to identify the relationships with individual components of the Remote Digital Memory Composite. For ORR and MDT-S, none of the above predictors was significantly associated with task performance in any of the tests. For CSR, however, sex, years of education as well as Time-to-Retrieval were significant predictors for task performance, i.e. female participants and those with higher education performed better in the task, and the longer the delay between encoding and retrieval, the worse the particpants’ performance.
With respect to the Remote Digital Memory Composite, female sex (βsex = 0.49, p=0.017) and more years of education (βedu = 0.28, p=0.005) were associated with higher task performance, but not age and screen size. In comparison, the PACC5 was also associated with sex (βsex = 0.58, p=0.003) and years of education (βedu = 0.29, p=0.002), i.e. women and participants with more years of education received a higher PACC5 score.
Diagnostic accuracy
In order to assess how well the Remote Digital Memory Composite score differentiates cognitively impaired and cognitively unimpaired individuals based on the PACC5 score, we calculated a cut-off score across all non-demented participants in the DELCODE cohort (n=933; 235 HC, 440 SCD, 82 REL and 176 MCI patients) that distinguishes MCI from cognitively unimpaired participants (HC, REL and SCD) with an optimal cut-off prioritising sensitivity > 0.8. This resulted in a cut-off of -0.515 and yielded sensitivity and specificity of 0.82 (female: 0.83, male: 0.8) and 0.8 (female: 0.9, male: 0.69), respectively. No other cut-off resulted in more favorable values for men. Based on that cut-off, we divided the entire sample of the add-on study in cognitively unimpaired (CUPACC5 n=73) and cognitively impaired (CIPACC5 n=29) (see Table 1 for participants’ characteristics). The Remote Digital Memory Composite score differentiated both groups with an Area under the Curve (AUC) of 0.9 and a sensitivity and specificity of 0.83 and 0.74 respectively (optimal cut-off = -0.3). The ROC curve and classifications with the cut-off are presented in Figure 3. When restricting the sample to subjects with tests where the Time-to-Retrieval was below 24 hours (n=88), the AUC remained stable (0.92).
In order to test whether all three components of the Remote Digital Memory Composite are needed to achieve the best possible classification, we performed individual AUC analyses for each individual component (ORR = 0.85; MDT-S = 0.8; CSR = 0.73) as well as for alternative composite scores covering all possible combinations of only two test paradigms (ORR/MDT-S: = 0.88; ORR/CSR = 0.83; MDT-S/CSR = 0.8). No individual component or composite combining two components could however reach an AUC of 0.9.
Functional impairment
We also investigated wether the Remote Digital Memory Composite was related to a clinical functional impairment. A subgroup analysis within individuals from the DELCODE study allowed us to determine the AUC for the differentiation of individuals with a Clinical Dementia Rating scale (CDR global score) of 0 and those with higher scores. Scores higher than 0 indicate that participants are already somewhat constrained in their every-day life. For this analysis, the AUC was 0.69 and a cut-off of -0.3 resulted in a sensitivity of 0.52 and a specificity of 0.73. This suggests that a majority of those that have been identified as being cognitively impaired based on the Remote Digital Memory Composite had some level of clinical functional impairment in daily live. Hence, the cognitive impairment uncovered on the basis of the PACC5 by the Remote Digital Memory Composite indeed bears clinical relevance with respect to independence in everyday life.
Discussion
We developed an unsupervised and Remote Digital Memory Composite based on one single test session from each of three equally weighted memory tests (ORR, MDT-S and CSR) which were performed remotely and fully unsupervised. The resulting Remote Digital Memory Composite showed high construct validity in relation to the PACC5 score and good retest reliability in a subsample that performed each test twice. Finally, the Remote Digital Memory Composite could differentiate between individuals with and without PACC5-based cognitive impairment with an AUC of 0.9 demonstrating high diagnostic accuracy.
In terms of construct validity, we found a strong correlation between the Remote Digital Memory Composite and the PACC5. This correlation was present in both non-complaining healthy older adults and those with memory complaints indicating that the correlation was not driven by collating an impaired and a non-impaired group as two extremes into the same analysis. The fact, that the correlation also held within memory complainers (SCD and MCI) and that all of these individuals were recruited on the basis of referrals (as opposed to recruitment advertisements) indicates that the construct validity would also hold in a health care setting. In terms of reliability, we found a high correlation between two different instances of the Remote Digital Memory Composite conducted within a time interval of ∼12 weeks.
The Remote Digital Memory Composite identified individuals with an MCI-grade impairment in the PACC5 with an AUC of 0.9. This allowed to identify individuals with MCI-grade impairment with a sensitivity of 0.83 and a specificity of 0.74 on the basis of a single assessment of the Remote Digital Memory Composite using optimal cut-offs. In this study, we used the PACC5 to define an MCI-grade cut-off between impaired and unimpaired individuals for several reasons. First, neuropsychological assessments that are used to identify memory impairment in the context of MCI are often based on a single test, such as delayed verbal recall. Validating against a single test could potentially undermine the generalizability of the Remote Digital Memory Composite among different clinical settings and MCI populations where a different test was used as a criterion. Validating against a composite including several dedicated assessments, protects from potential validation distortions caused by single test-based criteria. Second, the PACC5 is also a measure optimized to detect longitudinal decline. Hence validating against the PACC5 also holds the promise that the Remote Digital Memory Composite would be equally sensitive to longitudinal decline, but much easier to implement widely. Third, in the DELCODE sample, the diagnostic classification of each individual was performed at the baseline visit. However, when these participants were recruited into the mobile add-on study, this was on average 1.5 years later. Hence, there was the possibility that some of the SCD participants had already progressed to MCI or that some of the MCI diagnoses had to be reverted back to SCD. Given this uncertainty, defining a cut-off distinguishing between MCI and all pre-MCI groups based on the closest-in-time PACC5 assessment provided a more accurate approach for classifying impaired and non-impaired individuals several years after their established diagnoses.
The Remote Digital Memory Composite allowed to differentiate individuals with and without PACC5-based MCI-grade impairment with high diagnostic accuracy. This is higher or comparable to several other recently reported unsupervised (Mackin et al., 2018) or in-clinic and supervised digital cognitive assessments (Alden et al., 2021; Groppell et al., 2019; Kalafatis et al., 2021; Ye et al., 2020). Importantly, however, several earlier approaches reported outcomes by comparing MCI patients against samples that exclusively consisted of healthy asymptomatic older adults (Alden et al., 2021; Groppell et al., 2019; Kalafatis et al., 2021; Mackin et al., 2018; Maruff et al., 2013; Ye et al., 2020). In health care settings, the main challenge is to identify significant impairment within memory complainers. Therefore, we believe that our focus on memory complainers and the inclusion of a large number of SCD patients who sought medical advice (hence were not recruited through advertisements) in this sample is a major advance in the validation and critical for future application.
Usability is a major limitation for mobile device-based assessments of cognition in old age and particularly in preclinical and prodromal AD. While participants were assisted during the installation of the neotiv mobile app and received a printed manual at the time when their consent was obtained at the memory clinic, all three tests were conducted fully remotely and without supervision. Participants received a push-notification on their mobile device each time a test was available to be performed. All instructions and guidance for performing the tests was provided in the app and included a training run of each test. Participants were also instructed to seek a quiet place where they would not be distracted and after each test were inquired through a questionnaire about whether they could perform the test without distraction. The adherence to the mobile tests was quite good, with a maximum of 15% of participants dropping out after 6 tests within a period of at least 12 weeks. Our results, thus, indicate that it is possible to achieve the level of usability that is required to perform a detailed assessment of episodic memory fully remotely and without any supervision in a memory complainer cohort.
The total testing time required to obtain the Remote Digital Memory Composite (a single run of ORR, CSR and MDT-OS) was ∼45 minutes. In principle, all three tests could be obtained within a single day. However, we decided not to enforce the shortest possible acquisition time. Instead, we decided to leverage the opportunities of mobile and unsupervised testing to achieve a more meaningful implementation. To that end we stretched out the assessment over several weeks to enable a more representative sampling of memory performance over time and thereby be less vulnerable to day-to-day performance fluctuations. We used the spaced testing to ease stress for the patients and eliminate potential implementation problems that would lead to worries and complaints by those patients that felt being tested on a bad day. Thus, the Remote Digital Memory Composite reflects memory performance over a period of several weeks rather than a single day, something that would be very difficult to implement with a supervised testing approach.
Episodic memory tests such as the FCSRT (Buschke, 1984) and the other elements of the PACC5 place heavy demands on verbal abilities. This significantly reduces applicability in international trials or in conditions with mild language disorders (e.g., due to a vascular event or primary progressive aphasia) (Costa et al., 2017). The three tests of the Remote Digital Memory Composite established here, however, are not dependent on verbal abilities such as naming, word-finding or pronunciation and thereby facilitate testing across different dementia syndromes and subtypes of AD as well as in international comparisons. Furthermore, the Remote Digital Memory Composite shows no overlap with the PACC5 in terms of the paradigm and modalities tested so that there would be no interference with a memory-clinic or trial-based PACC5 assessment following case-finding.
In the currently used implementation of the ORR and CSR tests, we did not strictly reinforce adherence to the planned retrieval-delay intervals of these tests, which led some individuals to perform recall and recognition assessments after longer than planned delays. When we restricted the diagnostic accuracy analysis of the Remote Digital Memory Composite to discriminate MCI-grade impairment in the PACC5 to those individuals who were more strictly adhering to the delay intervals in the ORR and CSR, the AUC increased numerically to 0.92. This might indicate that in a health care implementation of the Remote Digital Memory Composite, it could be beneficial to optimize usability aspects to a stricter reinforcement of delay intervals.
This study has a number of shortcomings. First, our results are based on a single study with a modest sample size and thus need to be cross-validated across independent cohorts and different countries. Second, while we could show evidence for limited relationships between the Remote Digital Memory Composite and sample demographics, a large and diverse norm sample is needed in order to adjust norm scores for various covariates. Third, our sample size was not yet sufficient to assess the relationship with AD biomarkers and the diagnostic accuracy of biomarker stratified subgroups. Finally, the number of follow-up remote assessments in our sample did not allow yet to assess the added benefit of calculating a mean composite across several repetitions of each test over a longer assessment period.
Taken together, the high construct validity and retest reliability of the Remote Digital Memory Composite score in a memory clinic setting paves the way for implementing mobile app-based remote assessment in clinical studies as well as in health care. The current data indicate that the Remote Digital Memory Composite can facilitate case-finding whenever the main question is about an individual’s impairment based on a comprehensive neuropsychological assessment score. Future studies need to show whether repeated assessments of the Remote Digital Memory Composite over time will be sensitive to cognitive change.
Data Availability
The data, which support this study, are not publicly available, but may be provided upon reasonable request to the authors and pending a material transfer agreement with the DZNE.
Acknowledgments
ED and DB are co-founders and hold shares of neotiv GmbH. OB and IH are full employees of neotiv GmbH.
Footnotes
Registration German Clinical Trials Register (DRKS00007966), retrospectively registered (04/May/2015)