Abstract
INTRODUCTION Over 50% of logopenic variant primary progressive aphasia (lvPPA) cases are associated with AD pathology, yet their speech is not characterized compared to amnestic AD. In this study, we compared AD and lvPPA patients in a biologically confirmed cohort.
METHODS We extracted language variables with automated lexical and acoustic pipelines from oral picture descriptions produced by 44 AD and 21 lvPPA patients.
RESULTS LvPPA patients produced fewer verbs, adjectives, and more fillers with lower lexical diversity and higher pause rate than AD. Both groups showed some shared language impairments compared to HC, including more frequent and shorter words. Some of these measures were related to clinical test scores and CSF p-tau levels.
DISCUSSION Our speech measures captured subtle differences between the two phenotypes. Also, shared speech markers were linked to the common underlying pathology. This work demonstrates the potential of natural speech in detection of underlying AD pathology.
1. Background
Speech production is a complex behavior, involving coordinated activation of multiple regions of the brain; thus, examining speech production provides potential opportunities to identify neurodegenerative disease markers. Since Alzheimer’s disease (AD) accounts up to 80% of patients with dementia,1 much attention has been paid to cognitive profiling of AD, including the linguistic domain. Language produced by amnestic AD patients was found to be “empty” with abundance of nonspecific words, circumlocutions, and sparse content.2
Logopenic variant primary progressive aphasia (lvPPA) is a recently identified PPA variant,3,4 and it is an atypical, non-amnestic form of AD pathology,3,5–9 with over 50% of cases associated with underlying AD pathology.9,10 Since the identification of this PPA variant, many studies have been dedicated to characterizing its linguistic features. Previous studies have showed that lvPPA patients speak slowly, with impaired lexical access11 and have poor phonemic discrimination12 with limited auditory-verbal short-term memory, naming impairment,4,5 and dysfluencies.13
Previous studies of neurodegenerative patients suggest that language features are especially useful as a prescreening tool14–18, because speech is easy to collect and non-invasive, yet highly sensitive to cognitive impairments. Despite the shared pathology of AD and lvPPA, most previous studies have focused on the linguistic profiling of the two syndromes (AD and lvPPA) separately. With few comparative studies, this leaves an important gap in the literature. To use language as a prescreening tool of AD pathology and to benefit lvPPA patients by including them in AD clinical trials, understanding language characteristics of AD and lvPPA is crucial. In this study, we identified similarities and differences between biologically confirmed AD and lvPPA patients by analyzing semi-structured, natural speech samples with automated methods. Based on previous studies, we hypothesized that lvPPA patients would produce more dysfluent speech with more limited lexical content than AD patients. We also hypothesized that AD and lvPPA patients would share linguistic similarities, including decreased speech production due to their shared pathology. We associated language variables with clinical test scores and CSF analytes for additional validation and specific mechanistic clarification.
2. Methods
2.1 Participants
We examined oral picture descriptions that were collected from 93 participants in the Department of Neurology at the Hospital of the University of Pennsylvania. Fourty-four participants had amnestic-AD, and 21 were lvPPA patients with a confirmed underlying AD pathology, based on autopsy (n=15) or on cerebrospinal fluid (CSF) analytes levels (n=50; phosphorylated Tau (p-Tau)/beta-amyloid 42 (Aβ) ≥ 0.0919 and total Tau/Aβ ≥ 0.3420). Twenty-eight matched elderly healthy controls (HC) were included as a control group. Participants with other neurological, psychiatric, or medical conditions that could impact cognition were excluded.
Table 1 shows the demographic and clinical characteristics of the participants. The groups did not differ in age, sex, or education level. The patient groups did not differ from each other in disease duration (p=0.45), the Mini-Mental State Exam (MMSE; p=0.12), the Boston Naming Test (BNT; p=0.32), CSF p-Tau level (p=0.81), CSF Aβ level (p=0.36), or CSF p-Tau/Aβ ratio (p=0.66).
2.2 Data collection
We digitally recorded the participants’ descriptions of the Cookie Theft picture from the Boston Diagnostic Aphasia Examination.21 Recordings were orthographically transcribed. Only the earliest recording of each participant was analyzed with our lexical and acoustic pipelines, as described below.
2.3 Lexical pipeline
We automatically tagged the part-of-speech (POS) category of all tokens, using spaCy22 with its large language model (‘en_core_web_lg’) for English. The count of each POS category was tallied and the counts per 100 words were calculated, controlling for the total number of words per participant. The number of tense-inflected verbs per 100 words was also calculated by summing the number of modal auxiliaries, past-tense, and present-tense verbs in the POS tags. Dysfluency markers, including fillers, repetitions, and partial words, were counted separately and converted to counts per 100 words.
We rated each word for concreteness,23 semantic ambiguity,24 frequency,25 age of acquisition (AoA),26 familiarity,26 and word length by the number of phonemes,27 based on published norms. We calculated the mean scores of these measures for content words (nouns, verbs, adjectives, and adverbs) per participant.
Lastly, we measured lexical diversity, i.e., how diverse one’s word usage is in the description, using the moving-average type-token ratio (MATTR),28, which has been described as one of the most reliable measures for calculating lexical diversity.29 The window length was set at 15 words. Detailed description of the lexical pipeline and validation of the POS tagging accuracy has been previously published.30
2.4 Acoustic pipeline
We employed a speech activity detector (SAD) developed at LDC to segment audio recordings into speech segments and silent pauses. After segmenting the audio files, we visually reviewed the segments to validate the SAD output. Non-speech segments at the beginning and end of each recording and interviewer’s prompts were excluded.
Using the SAD output, we calculated five duration-related measurements, including mean speech segment duration, mean pause segment duration, total speech time, percent of speech, and pause rate per minute. We summed the number of syllables of all words from the published norms27 and computed articulation rate as the number of syllables per second.
Additionally, we pitch-tracked all speech segments with Praat31 and calculated the 10th to 90th percentile pitch estimates (f0). To normalize physiological differences in voice, we converted the pitch values from Hz to semitones (st) using the 10th percentile of each participant as a baseline: st=12*log2(f0/baseline f0). We used the converted 90th percentile as a measure of the pitch range of each speaker. Detailed description of the acoustic pipeline has been published previously,32 and the full list of analyzed features is included in the Appendix.
2.5 CSF analysis
Thirty-seven (30 AD and 12 lvPPA) patients had CSF biomarkers collected within one year of the Cookie Theft recording (mean interval = 4.7 months ± 3.5), including Aβ and p-Tau. This subset did not differ demographically or clinically from the larger group of patients. CSF was analyzed with two platforms, Luminex xMAP or Innotest ELISA, which was then transformed to the Luminex scale.33 We previously related CSF levels of p-Tau directly to cerebral burden of tau in our autopsy cohort.33 The two subset groups did not differ in age (p=0.64), sex (p=0.99), education (p=0.82), disease duration (p=0.75), or the time difference between CSF sample collection and the Cookie Theft recording (p=0.43). To determine the association of language features with in vivo measures of pathology, we examined the relationship between our language variables and the two CSF biomarkers of amyloid and tau, Aβ and p-Tau, which were the most commonly observed in AD and required for AD pathological diagnosis.
2.6 Statistical considerations
To compare the groups, we tested if requirements for parametric tests were met with a Levene’s test. If the data met the requirements for parametric tests, we performed an Analysis of Variance. If not, we performed a Kruskal-Wallis test. We visually assessed residuals of the models to make sure the data was suitable for parametric tests. When a group difference was significant, we additionally performed a posthoc test for pairwise group comparisons (either pairwise t-tests or pairwise Wilcoxon rank sum tests), adjusting p-values for multiple group comparisons (n=3) with the false discovery rate. We reported the effect size of each group comparison using Cohen’s d.
Patients’ language variables were z-scored using HC’s mean and standard deviation. These z-scores were used for visualization and linear regressions to estimate relations of our language variables and clinical ratings. We did not use z-scores to determine significant group differences with Z-tests, since the test statistic in some variables did not follow a normal distribution.
The z-scored language variables that showed significant group differences were associated with patients’ MMSE and BNT scores to investigate the relations of our language features to clinical ratings of cognitive and language impairment. To examine significant interactions of language measures and phenotype, we included phenotype as an interaction term (MMSE/BNT ∼ language variable*phenotype).
To validate our findings with levels of specific CSF biomarkers, we also correlated patients’ CSF levels to the language variables that AD and lvPPA shared using Pearson correlation tests. CSF p-Tau levels were log-transformed to normalize the data. We also checked if patients’ clinical phenotype and the time difference between Cookie Theft recording and CSF sample collection was a significant factor with linear regression models. Since the two factors were not significant in all language measures that showed significant relations with the CSF biomarkers, we only reported the results of simple correlations to simplify the models. All statistical analyses were carried out with R version 4.1.034 and RStudio version 1.4.1717.35
2.7 Ethics
The Institutional Review Board of the Hospital of the University of Pennsylvania approved the study of human subjects, and all participants agreed to participate in the study by written consent. All digital data was stored in secured HIPAA-compliant servers and handled by personnel trained in PPI protection.
3. Results
3.1 Differences between amnestic AD and lvPPA patients
LvPPA patients produced fewer tense-inflected verbs compared to AD (p=0.001, |d|=0.94) and HC (p=0.048, |d|=0.61; Fig.1A). AD patients’ tense-inflected verb counts did not significantly differ from HC (p=0.124, |d|=0.4). Patients with lvPPA showed lower lexical diversity than AD (p=0.05, |d|=0.52) and HC (p=0.005, |d|=1.02), yet AD patients did not differ from HC (p=0.149, |d|=0.39; Fig.1B). LvPPA patients produced fewer adjectives than AD patients (p=0.019, |d|=0.66) and HC (p<0.001, |d|=1.72); AD patients also produced fewer adjectives than HC (p=0.003, |d|=0.75; Fig.1C). Thus, both patient groups were impaired in their adjective production, but lvPPA was more severely impaired compared with AD speakers.
LvPPA patients also produced more fillers than AD (p=0.022, |d|=0.6) and HC (p=0.01, |d|=1.06; Fig.1D), while AD did not significantly differ from HC (p=0.383, |d|=0.23). Patients with lvPPA showed a higher pause rate than AD (p=0.015, |d|=0.55) and HC (p<0.001, |d|=1.58; Fig.1E); AD patients’ pause rate was also higher than HC (p<0.001, |d|=1.34). Lastly, lvPPA patients produced more partial words than HC (p=0.042, |d|=0.61), but they did not significantly differ from AD speakers(p=0.147, |d|=0.3); AD patients did not differ from HC (p=0.313, |d|=0.24).
Thus, lvPPA speakers clearly produced abnormal number of partial words, while AD speakers did not.
3.2 Impaired speech features in both AD and lvPPA
AD and lvPPA patients produced fewer prepositions and nouns than HC; patients produced shorter speech segments than HC, and their percent of speech out of total time was also lower than HC (p<0.001, |d|>0.8 for all comparisons). Both groups’ content words were shorter, more frequent (length & frequency: p<0.001, |d|>0.8), earlier-acquired (lvPPA: p<0.001, |d|=0.95; AD: p<0.001, |d|=0.56), and more concrete (lvPPA: p=0.028, |d|=0.7, AD: p=0.002, |d|=0.86) than HC’s. Both patient groups produced more adverbs (lvPPA: p=0.029, |d|=0.84; AD: p=0.029, |d|=0.73) and repetitions (lvPPA: p<0.001, |d|=1.39; AD: p<0.001, |d|=0.79) than HC. Patients also spoke more slowly (lvPPA: p<0.001, |d|=1.01; AD: p<0.001, |d|=0.57), and they produced fewer words in total than HC (lvPPA: p=0.032, |d|=0.7; AD: p=0.007, |d|=0.73).
3.3 Relationships to clinical measures
MMSE was significantly related to eight language variables. Patients with low MMSE produced more frequent words (β=-1.6, p<0.001), paused more frequently (β=-1.21, p<0.001), and produced more adverbs (β=-1.9, p<0.001) and partial words (β=-1.52, p=0.005). Also, patients with low MMSE scores produced fewer adjectives (β=1.2, p=0.032), prepositions (β=1.27, p=0.022), and nouns (β=1.77, p=0.033); their content words were also shorter (β=1.75, p<0.001).
BNT was significantly associated with ten variables. Patients with low BNT produced frequent words (β=-4.36, p<0.001), many adverbs (β=-2.41, p=0.018), and had a high pause rate (β=-2.09, p=0.017). Patients with low BNT showed a low percent of speech produced during the picture description (β=2.4, p=0.022), and they produced earlier-acquired (β=4.68, p=0.001), shorter (β=2.94, p<0.001), and less concrete (β=4.24, p<0.001) content words, with shorter speech segments (β=3.89, p=0.014). Patients with low BNT also produced fewer prepositions (β=2.42, p=0.027) and nouns (β=5.18, p<0.001).
Three variables showed significant interaction with phenotype. AD patients with low MMSE scores produced more adverbs (β=-1.9, p<0.001), yet lvPPA patients with low MMSE scores produced fewer adverbs (β=2.67, p<0.001). In contrast, AD patients with low BNT scores produced fewer prepositions (β=2.42, p=0.027) and earlier-acquired words (β=4.68, p=0.001), but lvPPA patients with low BNT did not show the same trends (preposition: β=-5.37, p=0.032; AoA: β=-5.36, p=0.009).
3.4 CSF results
Patients’ CSF p-Tau level was correlated with lower preposition counts (r=-0.36, p=0.019), lower noun counts (r=-0.31, p=0.047), and a shorter mean speech segment duration (r=-0.33, p=0.032; Fig.4A-C). Also, patients’ p-Tau levels were inversely correlated with a high pause rate (r=0.34, p=0.026) and high word frequency (r=0.33, p=0.036; Fig.4D-E). Aβ alone was not significantly correlated with any of the language measures.
4. Discussion
LvPPA is most frequently associated with underlying AD pathology, but direct comparison of lvPPA with amnestic AD patients has been reported rarely. The current study focuses on characterizing the language similarities and differences between lvPPA and AD in a biologically confirmed cohort. We used fully automated lexical and acoustic analyses to characterize linguistic markers of AD pathology. We expected that lvPPA patients with non-amnestic AD would produce more dysfluent speech with limited vocabulary than amnestic AD patients because of lvPPA’s phenotypic characteristics. Results confirmed that lvPPA patients produced fewer adjectives and tense-inflected verbs with lower lexical diversity than amnestic AD patients and HC. lvPPA patients also paused more frequently and produced more fillers and partial words than HC and/or amnestic AD patients. However, we also found that both patient groups shared impairments in some speech features relative to HC. For example, both patient groups produced more adverbs but fewer prepositions and nouns than HC. Also, patients’ content words were earlier-acquired, shorter, more frequent, and less concrete than those of HC. Patients produced more repetitions and fewer total words with a slower articulation rate and a shorter mean speech duration than HC. Some of these language variables were significantly related to clinical test scores and p-Tau levels in CSF. We discuss important findings below.
The patient groups significantly differed on six language measures: pause rate, partial words, fillers, adjectives, tense-inflected verbs, and lexical diversity. Fillers, tense-inflected verbs, and lexical diversity were significantly more impaired in lvPPA than AD patients, emphasizing the deficits in lexical retrieval and poor fluency in these patients. However, these were not related to patients’ MMSE or BNT. It is thus important to monitor these speech features since they are not easily explained by more general measures such BNT and MMSE. To our knowledge, the result that tense-inflected verb production differed between lvPPA and AD has not been previously reported. This finding seems to suggest that lvPPA patients produced fewer complete sentences, assuming that there was one tense-inflected verb per tense phrase [cite TP]. Frequent fillers in lvPPA were in line with previous observations.11,36,37 Lexical diversity has been frequently examined in the AD literature,14,15,38,39 where previous studies have found that AD patients’ lexical diversity was lower than that of HC. We showed that lexical diversity was even lower in lvPPA than in AD. The fact that lvPPA and AD patients significantly differed on these measures suggests that our language variables may capture subtle but unique phenotypic differences between lvPPA and AD, which are unrelated to traditional clinical ratings. Also, none of the six variables, except pause rate, correlated with CSF p-tau, suggesting that these speech markers are related to the phenotype and not necessarily to the underlying AD pathology. Further studies, including the anatomic distribution of pathology in an autopsy cohort with quantitative measures of pathological burden, may help shed light on this issue.
Pause rate showed more impairment in lvPPA than AD. This might indicate word-finding difficulty in lvPPA which could provoke frequent pausing to recall an appropriate vocabulary item from their lexicon. It could also be that lvPPA patients spoke slowly – patients’ articulation rate was lower than that of HC – due to their difficulty in retrieving words to generate utterances. Pause rate was significantly related to both MMSE, an indicator of general cognitive impairment, and BNT, a measure of confrontation naming; elevated pause rate, therefore, may reflect in part both patients’ word-finding difficulties and general cognitive impairments. In contrast, partial word count, which was impaired only in lvPPA, only correlated with MMSE, suggesting that it reflected in part lvPPA patients’ disease severity and general cognitive impairments, but not impaired object naming.
Word-finding difficulty in AD and lvPPA has been previously noted,11,40–45 where studies have showed that patients had impairments in auditory-verbal short-term memory and could not recall the phonological form of a vocabulary item. However, comparative studies have not been reported to examine if both amnestic AD and non-amnestic lvPPA patients would show word-finding difficulty to a similar degree. In our study, both patient groups produced content words that were more abstract, earlier-acquired, more frequent and shorter than those of HC, suggesting that they had difficulties in retrieving lexical items needed to describe the picture. Word frequency and length were significantly related to both MMSE and BNT scores, which suggests that these lexical measures reflect in part patients’ disease severity and word-finding difficulties. On the other hand, concreteness was significantly associated only with BNT, indicating that it may be more sensitive to word-finding difficulty in patients. AD and lvPPA patients did not significantly vary in these measures, except AoA, confirming that some degree of word-finding difficulty is present in both amnestic and non-amnestic AD.
Adverb counts were greater in patients compared to HC. This may be related to patients frequently using “pro-adverbs,” including “here” and “there,” which replaced locational prepositional phrases. Patients typically produced utterances like “Mom is standing here,” for example, when HC produced “Mom is standing in front of the sink.” Elevated adverb counts were associated with BNT scores in AD patients only, suggesting that greater adverb use reflected AD patients’ difficulties in naming locations. Their difficulty in producing locational phrases were also partly reflected in the decreased preposition counts compared to HC. Decreased prepositions were related to MMSE scores in AD, which might be because the grammatical function of prepositions is diverse, as they are not only used in locational phrases but also in various other situations. Therefore, decreased prepositions better reflected AD patients’ general cognitive impairment than impairment in confrontation naming.
Some of our language variables correlated with CSF p-Tau levels, but not to Aβ. This finding is in line with previous findings that patients’ cognitive impairment is generally not related to Aβ levels but to accumulation of p-Tau.47 Language production is one of the most essential daily functions of humans, which needs to be taken into consideration in AD clinical trials and may serve in monitoring response to treatment. Since our automated procedures for collecting speech features is highly reliable and reproducible, investigations of speech variables as secondary outcome measures should be considered in disease-modifying trials targeting tau.
The strengths of our study include that it is the first study, as far as we are aware, comparing speech features in AD and lvPPA patients with biological evidence of underlying AD pathology. We inspected language differences and similarities in these groups and showed that our language variables could capture subtle linguistic differences between the two phenotypes. Also, we implemented fully automated, reproducible, objective methods in analyzing patients’ language characteristics, which can be potentially applied in screening for underlying AD pathology. These methods may be useful in monitoring disease progression and response to therapeutic interventions, because collecting one-minute speech samples is easy, highly reproducible, and less costly compared with other biomarkers. Future studies, testing the value of these speech features in longitudinal datasets and automatically screening patients with AD pathology using these speech features, would be valuable.
5. Conclusion
We implemented automated methods in analyzing acoustic and lexical characteristics of amnestic and non-amnestic AD patients’ natural speech. We found speech markers that were shared between these two AD phenotypes and linked to the common underlying pathology. We also noted speech markers that differed between the groups, which seemed to relate to the clinical phenotype and less so to the underlying pathology. This work demonstrates the potential of natural speech in detection of underlying AD pathology in specific clinical syndromes. Considering the cost effectiveness of speech data, such markers could serve for screening in AD clinical trials in a more precise and inclusive way.
Data Availability
Data will be available upon request from appropriate research groups.
Funding
The authors thank the patients and caregivers who participated in this study. This study was funded by grants from the National Institute of Health (AG066597, AG054519, NS109260, P30 AG072979), Alzheimer’s Association (AACSF-18-567131, AARF-D-619473, AARF-D-619473-RAPID), Department of Defense (PR192041).