Abstract
Objective The Stricker Learning Span (SLS) is a computer-adaptive digital word list memory test specifically designed for remote assessment and self-administration on a web-based multi-device platform (Mayo Test Drive). We aimed to establish criterion validity of the SLS by comparing its ability to differentiate biomarker-defined groups to the person-administered Rey’s Auditory Verbal Learning Test (AVLT).
Participants and Methods Participants (N=353; mean age=71, SD=11; 93% cognitively unimpaired [CU]) completed the AVLT during an in-person visit, the SLS remotely (within 3 months) and had brain amyloid and tau PET scans available (within 3 years). Overlapping groups were formed for 1) those on the Alzheimer’s disease (AD) continuum (amyloid PET positive, A+, n=125) or not (A-, n=228), and those with biological AD (amyloid and tau PET positive, A+T+, n=55) vs no evidence of AD pathology (A-T-, n=195). Analyses were repeated among CU participants only.
Results The SLS and AVLT showed similar ability to differentiate biomarker-defined groups when comparing AUROCs (p’s>.05). In logistic regression models, SLS contributed significantly to predicting biomarker group beyond age, education and sex, including when limited to CU participants. Medium (A- vs A+) to large (A-T- vs A+T+) unadjusted effect sizes were observed for both SLS and AVLT. Learning and delay variables were similar in terms of ability to separate biomarker groups.
Conclusions Remotely administered SLS performed similarly to in-person-administered AVLT in its ability to separate biomarker-defined groups, providing evidence of criterion validity. Results suggest the SLS may be sensitive to detecting subtle objective cognitive decline in preclinical AD.
1. Introduction
The need for efficient and scalable approaches for identifying individuals at risk for preclinical and prodromal Alzheimer’s disease (PAD) is paramount to ongoing clinical trial efforts, emerging decentralized trials, and for identifying individuals who will most benefit from currently available pharmacologic or behavioral treatments, or those on the horizon1,2. Self-administered cognitive measures that can be completed “remotely” (i.e., outside of a typical clinical setting, including at home) are a critical component of an early PAD detection strategy3,4. Frequently, tests originally designed for and validated within clinic settings are converted to remote use to increase access for those unable to readily visit research centers or due to necessity during the COVID-pandemic5-7. The limitation of this “conversion” approach is that tests are not developed specifically with remote self-administration as a priority for test design decisions, which can contribute to mixed findings when performance is compared across settings8-10. There is an urgent need for valid self-administered cognitive assessment tools designed specifically for remote use. Verbal memory measures are among the most sensitive to early changes in the Alzheimer’s disease (AD) process11 but are also challenging to adapt to remote, self-administered methods5.
The Stricker Learning Span (SLS) is a digital computer-adaptive word list memory test specifically designed for remote assessment12. The SLS is administered via a web-based multi-device platform designed for unsupervised self-administration of digital cognitive tests, Mayo Test Drive (MTD): test Development through Rapid Iteration, Validation and Expansion (DRIVE)13. Recent work has highlighted learning as a key deficit in PAD, conceptualized as a failure to benefit from repeated exposure14 or a lack of benefit from practice15,16. In line with this, SLS test design emphasizes learning. The SLS paradigm was influenced by cognitive science principles and neural network process simulations12. The SLS stresses the contextual system during learning through use of high frequency word stimuli and variations in word item-level imagery to increase difficulty.
Preliminary support for the feasibility, reliability and validity of the SLS was previously reported in an all-female older adult sample13. Whereas that prior study used traditional approaches to test validation, the current study aimed to establish test validity using a novel approach to avoid the inherent issues with existing validation approaches. For example, one common approach is to correlate a new test with existing cognitive tests. However, existing tests, while well established, are themselves imperfect measures of hypothesized underlying constructs17. Another frequent approach is to establish validity by examining the ability of a new test to differentiate clinically defined groups. In the AD field, for example, it is common to establish the clinical validity of a new test by comparing individuals who are cognitively “normal” or unimpaired to individuals with mild cognitive impairment (MCI) or dementia; however, this introduces circularity because the use of cognitive tests is central to establishing those syndromal classifications. In vivo biomarkers offer an alternative ground truth for test validation studies that is completely independent of cognitive test performance. This is akin to validation with neuropathological diagnosis at autopsy given the correspondence between antemortem PET imaging and autopsy findings but has the notable benefit of being feasible during life18,19. A research framework is now available to use available AD biomarkers to characterize participants using the amyloid (A), tau (T) and neurodegeneration (N), or the AT(N), system20. Imaging biomarkers of N are considered non-specific to AD and will not be included in the current manuscript to limit the number of subgroups. Individuals with evidence of elevated amyloid (A+) are considered to show Alzheimer’s pathologic change. An in vivo biological diagnosis of AD is defined by the presence of both A+ and elevated tau (T+).
The objective of this study was to determine the criterion validity of the SLS. Critically, this validation study was limited to unsupervised completion of the SLS in a remote environment outside of a typical clinical research setting. Our primary study hypothesis (Aim 1) was that remotely administered SLS and in-person-administered Rey’s Auditory Verbal Learning Test (AVLT) would differentiate AD biomarker-defined groups similarly. This hypothesis is tested in groups defined by biomarker status alone (A+ vs A- and A+T+ vs A-T-) to avoid circularity. Secondary hypotheses included that the SLS would be sensitive to preclinical AD in analyses limited to CU participants (Aim 2), SLS and AVLT would show significant correlations to support convergent validity (Aim 3), and that word list learning vs. delay indices would show similar sensitivity to biologically defined AD (A-T- vs A+T+; Aim 4).
2. Methods
Most participants were recruited from the Mayo Clinic Study of Aging (MCSA), a longitudinal population-based study of aging among Olmsted County, Minnesota, residents. Participants are randomly sampled by age- and sex-stratified groups using the resources of the Rochester Epidemiology Project medical records-linkage system, which links the medical records from all county providers21. All participants are without dementia at MCSA enrollment. Exclusion criteria include terminal illness or hospice. Participants complete study visits every 15 months that include a physician exam, study coordinator interview, and neuropsychological testing22. The physician exam includes a medical history review, complete neurological exam, and the Short Test of Mental Status (STMS)23. The study coordinator interview with an informant includes the Clinical Dementia Rating® scale24. Participants complete multi-domain battery of nine neuropsychological tests administered by a psychometrist22. The interviewing study coordinator, examining physician, and neuropsychologist initially each make an independent diagnostic determination. A final diagnosis of cognitively unimpaired, MCI25 or dementia26 is then achieved through consensus agreement22,25. The diagnostic evaluation does not consider prior clinical information, prior diagnoses, SLS performance, or knowledge of biomarkers.
To enrich the sample for participants with cognitive impairment, additional participants were recruited from the Mayo Alzheimer’s Disease Research Center (ADRC).
The study protocols were approved by the Mayo Clinic and Olmsted Medical Center Institutional Review Boards. All participants provided informed consent.
In Vivo Neuroimaging Markers of Amyloid and Tau
The most recent imaging available ±3 years of baseline SLS was used. Amyloid and tau positivity is determined using Pittsburgh Compound B PET (PiB-PET) and tau PET (flortaucipir)27-29. PET images are acquired using a GE Discovery RX or DXT PET/CT scanner. A global cortical PiB PET standard uptake value ratio (SUVR) is computed by calculating the median uptake over voxels in the prefrontal, orbitofrontal, parietal, temporal, anterior cingulate, and posterior cingulate/precuneus regions of interest (ROIs) for each participant and dividing this by the median uptake over voxels in the cerebellar crus gray matter; for tau PET we utilize median uptake over the voxels in the meta regions consisting of entorhinal, amygdala, parahippocampal, fusiform, inferior temporal, and middle temporal ROIs normalized to the cerebellar crus gray matter28. Cutoffs to determine amyloid and tau positivity are SUVR ≥ 1.48 (centiloid 22)30 and ≥ 1.2528 respectively, to maintain consistency with our past Cogstate-focused work31-33.
Person-administered AVLT completed in clinic
The psychometrist reads a list of 15 words (List A) aloud and asks the participant to repeat back as many words as they can recall. This is repeated 5 times (learning trials 1-5 total). A distractor list (List B) is presented, followed by short-delay (Trial 6) of list A words. Recall of List A is again tested after 30-minutes (30-minute delay), followed by written recognition34,35. The primary variable for this study is sum of trials (trials 1–5 total + trial 6 + 30-min recall; range 0-105), which is sensitive to early changes in memory36. Additional variables include correct words on trials 1-5 total and trial 5 (thought to reflect learning), as well as 30-minute delay. Long-term percentage retention (AVLT 30-minute delay / Trial 5), thought to reflect delayed memory or storage/savings, is also reported.
Self-administered SLS completed remotely (not in clinic)
All participants completed the SLS remotely and without supervision or assistance. Participants followed a link provided in an email to complete the test session. The SLS is administered via the Mayo Test Drive (MTD) platform13.
The SLS is a 5-trial adaptive list learning task. Single words are visually presented sequentially during learning trials. After each list presentation, memory for the word list is tested with 4-choice recognition. Following a computer adaptive testing approach, the SLS starts with 8 items and then the number of words either stays the same, increases by 5 or decreases by 2 according to pre-specified rules based on percentage of correct responses to extend the floor and ceiling relative to traditional word list memory tests (range 2-23 words; Figure 1). Short delay follows the Symbols Test13,37,38; all items presented on any learning trial are tested during delay (range 8-23).
The SLS uses common, high frequency words that are easier to recall, but harder to recognize39, as previously described12. There are 23 item bins with 4 words each, and words within a bin have similar imagery ratings40. Each successive item bin has lower imagery ratings, thus increasing the difficulty of subsequent items. Most (90%) of the 92 total words used are on the Dolch sight words reading list (preschool through Grade 3), with half at the preschool level41. Each test session randomly selects 1 word from each item bin as the “target,” the 3 others serve as the foils and a 23-item target word list is generated, even if not all items are presented due to the adaptive procedure. To reduce recency effects, the order of item presentation is randomized for each trial and the last item presented is never the first tested. The primary variable is sum of trials (total correct across learning trials 1-5 plus delay, range 0-108). Secondary variables include maximum (max) learning span across any trial (range 0-23), total correct across learning trials (1-5 total, range 0-85), and total correct short delay (range 0-23). Percent retention (max span / delay) is also reported to allow within-test comparisons, but this measure is not meant to be compared to AVLT percent retention as differences are expected based on differences in test design.
Inclusion Criteria
To be included in this study, participants had to have both SLS sum of trials and AVLT sum of trials available and an amyloid PET scan within 3 years. Participants who completed the SLS as of 7/7/22 were included in this manuscript. Available linked data as of 8/22/22 were included.
Statistical methods
Demographics and clinical characteristics were descriptively summarized using counts and percentages for categorical data and means and standard deviations for continuous data. Data distributions across groups (A- vs A+ and A-T- vs A+T+) were compared using chi-square tests for categorical variables and linear model ANOVA tests for continuous variables. Pearson correlation coefficients were used to characterize the linear relationship between AVLT and SLS variables. Unadjusted and adjusted Hedge’s G with weighted and pooled standard deviation was used to assess effect size for group comparisons. Unadjusted and adjusted logistic regression models were used to determine the predictive accuracy of AVLT and SLS sum of trials in predicting abnormal amyloid PET (A+ vs A-) and abnormal amyloid and tau PET (A+T+ vs A-T-). To formally compare the ability of both tests to differentiate biomarker-defined groups, the AUROC from models with AVLT were directly compared to models with SLS42. For models that adjusted for demographic variables, age, sex, and education were the adjustment terms for both Hedge’s G and logistic regression models. A 2-sided p-value <0.05 was considered statistically significant. All analyses were performed using R version 4.1.2.
3. Results
3.1 Participant Characteristics and Convergent Validity
The mean age of the 353 participants was 71.8 (SD=10.8) years, mean education was 15.7 (SD=2.4), 53.5% were male, 98.0% were White and 92.6% were cognitively unimpaired (Table 1). MTD remote testing was completed within half a month (mean) of the in-person visit. Participant characteristics by biomarker subgroups are reported in Table 2. As expected, based on known increases in A+ and T+ rates with increased age27-29, biomarker positive groups were older than biomarker negative groups (p’s < .05). Biomarker positive and negative groups showed similar years of education and sex distribution (p’s > .05).
SLS sum of trials and AVLT sum of trials were strongly correlated (r = 0.62, p < .001). Additional correlations are reported in Table 3 and Supplemental Figure 1.
3.2 The SLS shows similar ability to differentiate PET-defined biomarker groups compared to the AVLT (Aim 1, all participants)
3.2.1 AUROC comparisons
Total AUROC values for SLS sum of trials vs. AVLT sum of trials were similar for differentiating biomarker groups (p’s > .05; Table 4, Figure 2). This similarity was seen for all pairwise AUROC comparisons (adjusted and unadjusted models; A- vs A+ and A-T- vs A+T+).
3.2.1a Amyloid Groups
Unadjusted models using only the primary cognitive variable as the predictor show that both the SLS and AVLT significantly differentiate A- vs A+ (AUROCs of 0.63 and 0.64, respectively). Adjusted models that include demographic variables increase the overall AUROC values of the full model (both AUROCs=0.76), and both the SLS and AVLT significantly improve biomarker group prediction over and above the demographic variables.
3.2.1b Amyloid and Tau Groups
Unadjusted models using only the primary cognitive variable as the predictor show that both the SLS and AVLT significantly differentiate individuals without AD biomarkers (A-T-) from those with biological AD (A+T+; AUROCs of 0.72-0.73). Adjusted models that include demographic variables increase the overall AUROC values of the full model (AUROCs of 0.83-0.84), and both the SLS and AVLT significantly improve biomarker group prediction over and above the demographic variables.
3.2.2 Descriptive effect sizes from group difference analyses (Table 5)
3.2.2a Amyloid Groups
Unadjusted analyses showed medium group effect sizes for both SLS sum of trials (g = -0.51) and AVLT sum of trials (g = -0.55) for differentiating A- and A+ groups (Figure 3). Effect sizes after adjusting for demographics were decreased and small in magnitude (g’s = -0.26 to -0.27), and group differences remained significant.
3.2.2b Amyloid and Tau Groups
Unadjusted analyses showed large group difference effect sizes for SLS sum of trials and AVLT sum of trials (both g’s = -0.88) for differentiating A- T- and A+T+ groups (Figure 3). Effect sizes after adjusting for demographics were decreased but remained moderate in magnitude and significant for both SLS sum of trials (g = -0.53) and AVLT sum of trials (g = -0.51).
3.3 The SLS shows similar ability to differentiate PET-defined biomarker groups compared to the AVLT for preclinical AD (CU participants only; Aim 2)
Because the majority (93%) of participants in the overall sample are CU, limiting analyses only to CU participants had relatively little impact on the results. Findings show that the SLS is sensitive to preclinical AD, consistent with our Aim 2 hypothesis.
3.3.1 AUROC comparisons
Total AUROC values for SLS sum of trials vs. AVLT sum of trials were similar for differentiating biomarker groups in CU participants (p’s > .05 for each pairwise comparison; Table 4).
3.3.1a Amyloid Groups
Unadjusted models using only the primary cognitive variable as the predictor show that both the SLS and AVLT significantly differentiate A- vs A+ (both AUROCs = 0.63). Adjusted models that include demographic variables increase the overall AUROC values of the full model (AUROCs=0.76-0.77), and both the SLS and AVLT significantly improve biomarker group prediction over and above the demographic variables.
3.3.1b Amyloid and Tau Groups
Unadjusted models using only the primary cognitive variable as the predictor show that both the SLS and AVLT significantly differentiate individuals without AD biomarkers (A-T-) from those with biological AD (A+T+; AUROCs = 0.67-0.69).
Adjusted models that include demographic variables increase the overall AUROC values of the full model (AUROCs = 0.81-0.83). The SLS significantly improved biomarker group prediction over and above the demographic variables; the AVLT approached significance (p=0.06).
3.3.2 Descriptive effect sizes from group difference analyses (Table 5)
3.3.2a Amyloid Groups
Unadjusted analyses showed small to medium group effect sizes for both SLS sum of trials (g = -0.47) and AVLT sum of trials (g = -0.48) for differentiating A- and A+ groups (Figure 3). Effect sizes after adjusting for demographics were decreased and small in magnitude (g’s = -0.18 to -0.22), and group differences did not reach significance.
3.3.2b Amyloid and Tau Groups
Unadjusted analyses showed medium group difference effect sizes for SLS sum of trials (g = -0.74) and AVLT sum of trials (g = -0.62) for differentiating A-T- and A+T+ groups (Figure 3). Effect sizes after adjusting for demographics were decreased (small in magnitude) with SLS sum of trials remaining significant (g = -0.37) and AVLT sum of trials no longer reaching significance (g = -0.25).
3.4 Learning measures show similar ability as delay memory measures to differentiate A-T- and A+T+ groups for both the SLS and the AVLT
3.4.1 All participants
All secondary SLS and AVLT variables show results in the expected direction with lower performance in the A+T+ compared to A-T-group (p’s < .05 for both unadjusted and adjusted analyses), with generally similar effect sizes across comparable SLS and AVLT variable pairs (Table 5). Within-test descriptive comparisons show that learning variables (1-5 total) show effect sizes that are similar in magnitude as delay variables. For example, SLS 1-5 (g = -.86) and delay (g = -.88) both show large unadjusted effect sizes, and AVLT 1-5 (g = -.82) and delay (g = -.86) also show large unadjusted effect sizes (Figure 4). Trials 1-5 total (g=-.86 SLS and -.82 AVLT) may be a slightly more advantageous learning measure for group discrimination relative to SLS max span (−.81) or AVLT Trial 5 (−.74). Similarly, comparison of two different types of delayed memory measures suggests a slight advantage for delay total correct relative to retention (−.88 vs -.55 for SLS, respectively and -.86 vs -.79 for AVLT, respectively). See Figure 5 for a visualization of trial-by-trial data for both the SLS and AVLT. See Supplemental Tables 1 and 2 for results of other memory tests.
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request and with appropriate institutional approvals.
4. Discussion
This study followed a novel approach to test validation and established the criterion validity of an unsupervised computer adaptive word list memory test (SLS) completed outside of a clinic setting. The SLS differentiates AD biomarker-defined groups as well as a traditional word list recall test (AVLT) administered by trained psychometrists in a clinic setting. Specifically, our Aim 1 hypothesis was supported by AUROC comparisons that showed remotely administered SLS sum of trials and in-person-administered AVLT sum of trials have comparable ability to differentiate individuals on the Alzheimer’s continuum (A+) or not (A-) and individuals meeting a research framework for a biological diagnosis of AD (A+T+) or not (A-T-) in a predominantly cognitively unimpaired sample.
In line with our prior results showing that the AVLT has the potential to be useful for detecting subtle objective cognitive decline in preclinical AD 33, our current results extend this prior AVLT finding and suggest that the SLS also has promise in this regard (Aim 2). Specifically, when limiting the sample to CU participants, our AUROC results show that the SLS by itself could help predict, better than chance, which individuals had elevated brain amyloid vs. did not and which had elevated brain amyloid and tau vs did not. In contrast, our prior work examining the utility of the Learning/Working Memory index comprised of visual recognition and working memory tasks from the Cogstate Brief Battery administered in clinic did not significantly differentiate biomarker groups better than chance (CU A-T- vs CU A+T+ or CU A- T- vs CU A+T-) and showed that the AVLT was significantly better than the Learning/Working Memory Index for differentiating CU A-T- vs CU A+T- when comparing total AUROCs. While the predictive ability of the SLS by itself is relatively modest, predictive ability improves when demographic variables are added to the model and the SLS continues to show an independent effect over and above demographic variables. For example, a model with age, sex, education and SLS sum of trials together had an AUROC of 0.83 for predicting A+T+ status in CU participants. Thus, our current results suggest that the SLS could be a scalable, easily accessible addition to a multivariable model approach that improves overall prediction of AD risk. Given its capacity for remote self-administration, the SLS would be a good candidate screening measure in non-specialty care settings to use in combination with plasma biomarkers to inform the need for further work-up.
The highly overlapping results for the ability of the SLS and AVLT to discriminate AD biomarker groups are particularly interesting given that although the SLS was designed to mimic the sensitivity of the AVLT, it was not designed to be a one-to-one adaptation of the AVLT. Results of correlation analyses align with this intent and support our hypothesis that the SLS and AVLT would show a significant correlation (r = 0.62 observed), and therefore further support convergent validity (Aim 3). Our initial pilot study similarly supported the convergent validity of the SLS, but slightly lower correlation coefficients were observed in that study, likely due to homogeneity of that sample (e.g., all female, restricted age range, excluded individuals with dementia), and potentially due to a longer duration between in-person and remote testing (average 10 months)13. The current results are particularly notable given that an exact computerized replication of the AVLT facilitated by audio recording and speech recognition to allow self-administered on an iPad showed only slightly higher correlations (r=0.63-0.70) with the AVLT as typically administered in a well-controlled cross-over design completed in a clinic setting43.
We also qualitatively compared commonly derived supraspan wordlist indices within each test given that the learning indices may have equivalent utility for the early detection of AD as delayed memory indices44,45. Results supported our hypothesis that word list learning measures would show similar sensitivity as word list delay memory measures to biologically defined AD (A-T- vs A+T+; Aim 4). The effect sizes of learning trials and delayed memory are similar (Figure 4). We chose to focus on delay items correct instead of percent retention (i.e., savings) for the comparison to learning indices. It is important to note that retention is largely dependent on specific test design characteristics including the influence of serial position effects46-48. The SLS randomizes word order to minimize the recency effect often observed in individuals with Alzheimer’s dementia that leads to an over-estimate of true forgetting49. For example, learning to criterion studies demonstrate that individuals in the early stages of AD take a longer time to reach criterion, reflective of lower learning ability, whereas once learning is equated by reaching criterion rates of forgetting are similar to healthy control participants48,50,51. Accordingly, SLS learning indices (1-5, max span) demonstrate a larger effect size than SLS retention in biologically defined AD versus those without AD biomarkers, whereas AVLT retention effect size is more similar to AVLT learning indices.
This study has several strengths. Most studies examining the ability of cognitive measures to differentiate biomarker groups have focused on differentiation of A+ versus A-individuals52,53. Our inclusion of tau status defined by PET imaging, in addition to amyloid, is a strength. Our population-based sample helps to increase generalizability to clinical settings where comorbidities are common. Our approach of reporting both unadjusted and adjusted effect sizes helps illustrate the robust biomarker-group difference effect sizes observed in unadjusted analyses. For example, the SLS showed a large group difference effect size across A-T- and A+T+ groups in all participants (−0.88), that decreased to a medium effect size when controlling for demographics (−0.53) and was further attenuated when limited to CU only (−0.74 unadjusted, -0.37 adjusted). There is growing evidence that cognitive decline is not a normal part of the aging process, but rather, is reflective of previously undetected neuropathologies that increase in prevalence with increasing age54-56. To maximize the utility of cognitive measures for informing risk of AD biomarker positivity, we recommend the use of raw scores and argue that the effect of age should not be routinely “adjusted” away as it decreases the predictive power of cognition.
Several limitations should also be noted. First, because this is a population-based study, the racial and ethnic characteristics reflect that of Olmsted County from which participants are randomly sampled, resulting in a predominantly White, Non-Hispanic sample. Second, the predominantly CU composition of this sample may have decreased AUROC values and magnitude of effect sizes for differentiating biomarker positive and negative groups, as suggested by the highly similar results seen when limiting analyses only to CU participants. A more balanced sample design, with more inclusion of individuals with mild to moderate dementia, could suggest greater utility of these memory measures for identifying individuals at risk for biomarker positivity; the current results are more relevant for preclinical detection given that 93% of our sample is CU. Third, a majority of the sample (83%) had prior exposure to the AVLT given the longitudinal nature of the MCSA and ADRC studies, thus practice effects could have impacted the ability of the AVLT to discriminate biomarker groups. Because biomarker negative participants benefit more from practice effects than biomarker positive participants, it is possible this could have amplified group difference effects for the AVLT15,57. Future work is needed to replicate these results in a setting where both the SLS and AVLT are baseline administrations. Similarly, the entirely unsupervised and remote approach for the SLS could dampen the sensitivity of the SLS as the results presented in this study include all available remote data. While we capture participant-reported information about test interference, noise in the test environment, and participant comments that can provide additional information about test interruptions or environmental considerations, because our goal was to establish the robust criterion validity of the SLS “in the wild,” we did not apply any exclusionary criterion based on this information in the present study. Another reason we did not want to prematurely apply such exclusionary criteria is that individuals who are less able to follow instructions provided to establish the recommended test environment may be more likely to have cognitive impairment. Thus, increased likelihood of lower test performance in an uncontrolled environment, worsened by environmental distractions, could also be related to risk of cognitive decline. If cognitive screening / risk for cognitive decline is the goal, worse performance in remote settings may help identify risk in a way not captured by controlled clinical settings, adding an element of ecological validity in terms of ability to adapt to a new task without assistance. Future work will examine whether and to what degree these factors may influence test performance, as increased distractions in the home environment can negatively impact performance58. Finally, while use of a strictly biomarker-defined ground truth is a novel aspect of this study, in vivo PET biomarkers also have some limitations such as high but imperfect reliability (manifested by “noise” in the trajectories of imaging results over time in some individuals), and the fact that PET measures of amyloid and tau pathology have a sensitivity floor and medically significant pathology can exist that lies beneath this detection threshold59. Also, we adopted a liberal window for inclusion of available biomarker data to allow for some missed scanning opportunities during the COVID-19 pandemic, to maximize the sample size and because of the generally good stability of amyloid and tau classifications60, but this also decreases study precision relative to a narrower time window.
In summary, SLS test design prioritized remote assessment needs and a computer-adaptive approach12. Even though the SLS is not a direct adaptation of the AVLT, our results show highly similar ability of the remotely-administered SLS and in-person-administered AVLT to differentiate AD biomarker-defined groups. These results challenge preconceived notions about memory assessment by showing that creative use of a recognition memory paradigm that emphasizes learning in an all-remote unsupervised sample differentiates AD biomarker-defined groups as effectively as a traditional word list memory measure based on free recall responses.
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request and with appropriate institutional approvals.
Acknowledgment
The authors wish to thank the participants and staff at the Mayo Clinic Study of Aging and Mayo Alzheimer’s Disease Research Center. Research reported in this publication was supported by the Kevin Merszei Career Development Award in Neurodegenerative Diseases Research IHO Janet Vittone, MD, the Rochester Epidemiology Project (R01 AG034676), the National Institute on Aging of the National Institutes of Health (grant numbers R21AG073967, P30 AG062677, P50 AG016574, U01 AG006786, RF1 AG55151, R01 AG041851, R37 AG011378), the Robert Wood Johnson Foundation, The Elsie and Marvin Dekelboum Family Foundation, GHR Foundation, Alzheimer’s Association, and the Mayo Foundation for Education and Research. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. A Mayo Clinic invention disclosure has been submitted for the Stricker Learning Span and the Mayo Test Drive platform (NHS, JLS). We have no other conflicts of interest to disclose related to this work.
The following disclosures are present, outside the scope of this work: NHS has received research support from NIH and Biogen. MMMi has consulted for Biogen, Eli Lilly, Lab Corp, Merck, and Siemens Healthineers and receives research support from NIH. MMMa and JAF receive research support from NIH. DSK serves on a Data Safety Monitoring Board for the DIAN-TU study and is an investigator in clinical trials sponsored by Lilly Pharmaceuticals, Biogen, and the University of Southern California. RCP has served as a consultant for Hoffman-La Roche Inc., Merck Inc., Genentech Inc., Biogen Inc., Eisai, Inc. and Nestle, Inc. He also receives NIH support. CRJ serves on an independent data monitoring board for Roche but he receives no personal compensation from any commercial entity. CRJ receives research support from NIH and the Alexander Family Alzheimer’s Disease Research Professorship of the Mayo Clinic.