Abstract
Objectives The main goal of machine learning approaches to classify people into healthy, increased Alzheimer’s disease (AD) risk, and AD is the identification of valuable predictors for valid classification, prediction of conversion, and automatization of the process. While biomarkers from cerebrospinal fluid (CSF) are the best-established predictors for AD, other less invasive, easy-to-assess candidate predictors have been identified. Here, we evaluated the predictive value of such less invasive, predictors separately and in different combinations for classification of healthy controls (HC), subjective cognitive decline (SCD), mild cognitive impairment (MCI), and mild AD.
Methods We evaluated the predictive value of personality scores, geriatric anxiety and depression scores, a resting-state functional magnetic resonance imaging (fMRI) marker (mPerAF), apoliprotein E (ApoE), and CSF markers (tTau, pTau181, Aβ42/40 ratio) separately and in different combinations in multi-class support vector machine classification. Participants (189 HC, 338 SCD, 132 MCI, 74 mild AD) were recruited from the multi-center DZNE-Longitudinal Cognitive Impairment and Dementia Study (DELCODE).
Results HC were best predicted by a feature set comprised of personality, anxiety, and depression scores, while participants with AD were best predicted by a feature set containing CSF markers. Both feature sets had equally high overall decoding accuracy. However, all assessed feature sets performed relatively poorly in the classification of SCD and MCI.
Conclusion Our results highlight that SCD and MCI are heterogeneous groups, pointing out the importance of optimizing their diagnosis criteria. Moreover, CSF biomarkers, personality, depression, and anxiety indicate complementary value for class prediction, which should be followed up on in future studies.
Key Points
Using multi-class support vector machine, we compared the predictive value of well-established versus non-invasive, easy-to-assess candidate variables for classification of participants with healthy cognition, subjective cognitive decline, mild cognitive impairment, and mild Alzheimer’s disease.
Personality traits, geriatric anxiety and depression scores, resting-state mPerAF, ApoE genotype, and CSF markers were comparatively evaluated both separately and in different combinations.
Predictive accuracy was similarly high for a combination of personality, anxiety and depression scores as for CSF markers.
Both established as well as candidate variables performed poorly in classifying SCD and MCI, highlighting heterogenous causes of those cognitive states.
CSF biomarkers and extended personality measures show complementary value for class prediction, which should be followed up on in future studies.
1 Introduction
Alzheimer’s disease (AD) is characterized by the formation of extracellular plaques of amyloid beta (Aβ) and intracellular neurofibrillary tangles of hyperphosphorylated tau proteins, ultimately resulting in progressive neurodegeneration and cognitive decline (Li et al. 2015; Hansson et al. 2018; Leuzy et al. 2021). Subjective Cognitive Decline (SCD) refers to a self-perceived, but not clinically relevant decline in cognitive performance as assessed by neuropsychological testing, whereas the diagnosis of Mild Cognitive Impairment (MCI) requires a measurable deviation from normal cognitive performance (Jessen et al. 2014). Both SCD and MCI can be caused by various conditions, including AD. Early intervention in AD – preferably before the begin of neurodegeneration – is considered a crucial prerequisite for effective treatment (Blennow et al. 2010; Sperling et al. 2011; Binnewijzend et al. 2012; Buchhave et al. 2012; Jessen et al. 2014; Badhwar et al. 2017; Jessen et al. 2018).
Established markers for diagnosing AD and associated risk stages are altered levels of amyloid beta (Aβ1-42), total tau (tTau) and phosphorylated tau (pTau181) in cerebrospinal fluid (CSF; Blennow et al. 2010; Olsson et al. 2016; Badhwar et al. 2017). However, obtaining CSF probes requires an invasive lumbar puncture and is therefore typically only performed in cases of clinically suspected dementia or substantially elevated risk. To allow for a broader, potentially population-wide screening for AD and its risk states SCD and MCI, less invasive measures are required. Here, we assessed multiple candidates and compared their predictive values: resting-state functional activity, personality traits, depression, anxiety, and apolipoprotein E (ApoE) genotype.
Changes in personality traits (based on the Big Five model; McCrae und Costa 1987) can be observed during both the development of dementia (Duchek et al. 2007; Yoneda et al. 2016; Terracciano et al. 2017) and in pre-clinical AD (Mendez Rubio et al. 2013; Caselli et al. 2018). Compared to healthy controls (HC), AD patients were observed to score higher on neuroticism in both self-reports and informant ratings, while they scored lower in agreeableness, extraversion, and especially conscientiousness and openness (Duchek et al. 2007). Higher levels of neuroticism have also been observed in stages preceding AD, i.e. during the transition from normal cognition to MCI, while extraversion, openness, and conscientiousness decreased, with rather sparse evidence indicating lower agreeableness as well (Caselli et al. 2018). Depression and anxiety, two core facets of neuroticism (Soto & John 2009; Rammstedt & Danner 2017), have also been linked to MCI and AD. Prevalence of depression is reportedly increased in individuals with MCI (Orgeta et al. 2015) and AD (Zhao et al. 2015). With higher levels of anxiety, an increased relative risk of 1.45 for developing AD was found in a meta-analysis (Santabárbara et al. 2019).
The default mode network (DMN; Raichle et al. 2001) is typically engaged in self-generated or self-related (e.g., autobiographical) thought, social cognition, episodic and semantic memory retrieval (Buckner et al. 2008; Soch et al. 2016; Smallwood et al. 2021) and can be measured using resting-state fMRI (Andrews-Hanna et al. 2014). Changes in resting-state fMRI have been observed in both SCD (Sun et al. 2016) and MCI (Lau et al. 2016). Patterns of Aβ plaques and disturbances in functional connectivity of the DMN show considerable overlap, and DMN functional changes have been observed in individuals with AD as well as those at increased risk (Lau et al. 2016; Mohan et al. 2016). Since DMN functional alterations in individuals with MCI and AD (Hafkemeijer et al. 2012) have repeatedly been described for a bandwidth of measures, such as global and regional connectivity, task-related deactivation, or amplitude of low frequency fluctuation, they may also be of diagnostic value in identifying AD and its risk states (Blennow et al. 2010; Mevel et al. 2011; Cha et al. 2013; Badhwar et al. 2017).
The ε4 allele in apolipoprotein E (ApoE) is a well-documented genetic risk factor of AD (Blennow et al. 2010; Sperling et al. 2011; Dubois et al. 2014; Jansen et al. 2015; Hansson et al. 2018; Jessen et al. 2018; Leuzy et al. 2021). A 2011 meta-analysis found increased risk in ApoE ε4-carriers for progression from MCI to AD, with a reported odds ratio of 2.29 (Elias-Sonnenschein et al. 2011).
In previous research, aforementioned predictors have mostly been tested individually in differentiating cognitively healthy individuals from individuals at risk and/or individuals with AD. Here, we aimed to assess their predictive value individually and in combinations (Figure 1). Importantly, instead of performing only binary classifications (e.g. HC vs. MCI or MCI vs. AD) (Khazaee et al. 2015; Schouten et al. 2016; Bi et al. 2018; Duchek et al. 2020), we assessed prediction accuracies in a multi-class classification approach (Ramzan et al. 2019) including all four potential diagnostic groups at once—akin to a fully automated diagnosis.
We hypothesized that a feature set of resting-state DMN activity, personality, depression, anxiety scores, and ApoE would outperform or be equal to CSF biomarkers (tTau, pTau181, and Aβ42/40 ratio). Additionally, we hypothesized that personality alone would yield class accuracies above chance level. Lastly, we hypothesized that combining personality with depression and anxiety scores would yield higher prediction accuracy than personality alone.
2 Materials and Methods
2.1 Participants
Participants were recruited as part of DELCODE (Jessen et al. 2018). For our study, we used baseline data sets, which comprised a total of 843 participants and yielded 733 useable data sets after exclusion based on missing or low-quality data.
Based on diagnosis at the time of enrolment, participants were split into four groups: HC, SCD, MCI, and AD (Table 1). It should be noted that only participants with mild, early stages of AD were included. Participants were assigned to the SCD group when they reported a subjectively perceived decline in cognitive performance within at least the last six months and at most the last five years in a clinical interview. In addition, they had to have normal performance (<1.5 SD) in all subcategories of the CERAD-plus administered for screening. Subjects were assigned to the MCI group, if their performance on the CERAD was worse than average (>1.5 SD) on the “recall word list” subtest, and they reported decreased cognitive performance, and, at the same time, did not meet dementia criteria. By selecting a memory-related subtest, primarily amnestic MCI (aMCI) patients were included in the study. Frequently, subjects with aMCI were also conspicuous in other subcategories of the CERAD, but non-amnestic MCI individuals were specifically screened out. Assignment to the AD group was based on Mini Mental Status Examination (MMSE) and only subjects with mild dementia (>18 points and <26 points on the MMSE) were included. Participants were defined as healthy if they showed memory test performances within 1.5 SD of the age-, gender-, and education-adjusted normal performance on all subtests of the CERAD and did not meet the SCD criteria.
2.2 MRI data acquisition
Structural and functional MRI data were acquired on 3T Siemens scanners according to the DELCODE study protocol (Jessen et al. 2018; Düzel et al. 2019). A T1-weighted MPRAGE image (TR = 2.5 s, TE = 4.37 ms, flip-α = 7°; 192 slices, 256 × 256 in-plane resolution, voxel size = 1 × 1 × 1 mm) was acquired for co-registration and improved spatial normalization. Phase and magnitude fieldmap images were acquired to improve correction for artifacts resulting from magnetic field inhomogeneities.
The MPRAGE was followed by a 7:54 min resting-state fMRI (rs-fMRI) acquisition, during which T2*-weighted echo-planar images (EPI; TR = 2.58 s, TE = 30 ms, flip-α = 80°; 47 axial slices, 64 × 64 in-plane resolution, voxel size = 3.5 × 3.5 × 3.5 mm) were acquired in odd-even interleaved-ascending slice order. Participants were instructed to lie inside the scanner with eyes closed, but without falling asleep. Directly after, phase and magnitude fieldmap images were acquired to improve correction for artifacts resulting from magnetic field inhomogeneities via unwarping. This was followed by brief co-planar T1-weighted inversion recovery EPIs.
The complete study protocol also included additional scanning sequences (T2-weighted images, T2*-weighted EPIs for task-based fMRI, fast low angle shot, fluid attenuated inversion recovery, susceptibility-weighted imaging) not used in the analyses reported here (Jessen et al. 2018).
2.3 fMRI data preprocessing and analysis
Data preprocessing and computation of mPerAF maps were performed using Statistical Parametric Mapping (SPM12; Wellcome Trust Center for Neuroimaging, University College London, London, UK) and the RESTplus toolbox (Jia et al. 2019), following a recently described protocol (Kizilirmak et al. 2022). EPIs were corrected for acquisition time delay (slice timing), head motion (realignment), and magnetic field inhomogeneities (unwarping), using voxel-displacement maps (VDMs) derived from the fieldmaps. The MPRAGE image was spatially co-registered to the mean unwarped image and segmented into six tissue types, using the unified segmentation and normalization algorithm implemented in SPM12. The resulting forward deformation parameters were used to normalize unwarped EPIs into a standard stereotactic reference frame (Montreal Neurological Institute, MNI; voxel size = 3 × 3 × 3 mm). Normalized images were spatially smoothed using an isotropic Gaussian kernel of 6 mm full width at half maximum.
PerAF is a scale-independent measure of the percentage of BOLD fluctuations relative to the mean BOLD signal intensity for each time point, which has been averaged across the whole time series (Jia et al. 2020). PerAF was computed from rs-fMRI using an adapted version1 of the “RESTplus” toolbox (Jia et al. 2019). It was computed for BOLD variations in the range of 0.01-0.08 Hz. We used mean PerAF (mPerAF), that is, the global-mean-adjusted PerAF. A DMN mask was applied that represented a composite of functionally defined regions of interest (ROIs) created by (Shirer et al. 2012); mPerAF of the DMN was included as a voxel-wise mean-centered predictor variable.
2.4 Predictor variables and evaluated predictor sets for classification
For each subject, the following predictor variables were extracted (for overview, see Supplementary Table S1).
Chronological age: Chronological age was included as a standardized (mean = 0, SD = 1) predictor.
Acquisition site: Participants were scanned at ten different DZNE sites across Germany which were included as a dummy-coded predictors (ten binary predictors).
Gender: Gender was added as a dummy-coded predictor (two binary predictors).
Resting-state mPerAF of the DMN: for details, see Section 2.3.
Personality traits: Personality was assessed using the 10-item short form of the Big Five Inventory (BFI-10; Rammstedt und John 2007; Rammstedt et al. 2017). Scores of the five personality scales (each computed as the mean of both respective items) were included as five standardized predictors.
Depression and Anxiety: Depression was assessed using the Geriatric Depression Scale (GDS; Sheikh & Yesavage 1986). Anxiety was assessed using the short form of the Geriatric Anxiety Inventory (GAI-SF; Byrne and Pachana 2011). GDS and GAI-SF sum scores were included as two standardized predictors.
ApoE genotype: ApoE genotyping was performed with three alleles possible: ε2, ε3, and ε4 (Jessen et al. 2018). Genotypes with no ε4 allele (ε2/ε2, ε2/ε3, ε3/ε3) were coded as 0, genotypes with one ε4 allele (ε2/ε4, ε3/ε4) were coded as 1, and genotypes with two ε4 alleles (ε4/ε4) were coded as 2.
CSF biomarkers: AD biomarkers (tTau, pTau181, and Aβ42/40 ratio; collectively referred to as CSF biomarkers) were determined using commercially available kits according to vendor specifications: V-PLEX Aβ Peptide Panel 1 (6E10) Kit (K15200E) and V-PLEX Human Total Tau Kit (K151LAE) (Mesoscale Diagnostics LLC, Rockville, USA), and Innotest Phospho-Tau(181P) (81581; Fujirebio Germany GmbH, Hannover, Germany). For further CSF analyses DELCODE study data, see (Düzel et al. 2022; Jessen et al. 2022).
To assess the combined value of the predictors for AD risk estimation, we evaluated SVM classification accuracy using the following feature sets:
Base model: age, gender, site
mPerAF: base model + mPerAF maps
Personality: base model + BFI scores
Personality extended: base model + BFI + depression + anxiety scores
ApoE: base model + ApoE
All without CSF: base model + mPerAF + BFI + depression + anxiety + ApoE
CSF: base model + CSF markers
2.5 Handling of missing values and unbalanced class sizes
For feature sets 1 to 6, participants were excluded based on missing values in any of the following predictors: chronological age, gender, site, DMN, personality, depression, anxiety, and ApoE genotype (N = 663; 179 HC, 308 SCD, 113 MCI, 63 AD). For feature set 7, participants were excluded based on missing values in any of the aforementioned predictors as well as CSF markers (N = 341; 75 HC, 155 SCD, 71 MCI, 40 AD). This led to lower sample size for feature set 7 since only about half of all participants assented to liquor extraction. To maintain statistical power in feature sets 1-6, divergent sample sizes were kept, at the expense of feature set 7 not being included in inferential comparisons (Table 3). A variant with equal sample sizes (N = 311) across all feature sets is reported in the supplement (Table S4). Another variant with SCD and MCI merged as a common risk group is reported in the supplement (Table S2).
Subsampling was used to ensure equal number of subjects in each participant group when performing Support Vector Classification (SVC). The size of each group in subsampling was based on the smallest group (rounded off to the nearest tens). A total of 30 subsamples was created, with each subsample undergoing 1000 permutations of group membership to establish a null distribution. Permutations were performed to calculate the p-value of the prediction accuracy.
2.6 Prediction of outcome from predictor variables and performance assessment
For prediction of the outcome variable (participant class) from feature sets, we used SVC using linear SVMs with soft-margin parameter C = 1 and 10-fold cross-validation (CV). All SVM analyses were implemented using LibSVM in MATLAB via in-house scripts available from GitHub (https://github.com/JoramSoch/ML4ML).
Predictive performance of predicting participant classes was assessed using decoding accuracy (DA) and class accuracy (CA; each ranging between 0 and 1).
For each feature set, significant difference from chance-level prediction for DA and CAs was tested and pair-wise comparisons of each feature set against the base model were performed. This was done via one-sided paired t-tests for each feature set’s classification performance against that of the base model, where each pair consists of a subsample assessed using both feature sets. Additionally, a subsample-by-subsample correlation matrix from DAs across all permutations was calculated and incorporated into a general linear model of the pair-wise accuracy differences across all subsamples.
All scripts used to perform analyses are available under https://github.com/jmkizilirmak/DELCODE162.
3 Results
The top three performing feature sets were “Personality extended”, “CSF markers”, and “ApoE” (Figure 2; for direct inferential statistical comparisons, see Section 3.7). Across all feature sets, overall prediction accuracy was highest for classes of HC and mild AD (Figure 3). Class accuracies for SCD and MCI were never significantly above chance, even when using CSF markers as predictors (Table 2). However, respective CAs from different feature sets showed large variation.
3.1 Base model: Low predictive value of age, gender, and site
The base model yielded the lowest DA across all feature sets (Figure 2) and no statistically significant CAs for any group (Figure 3), but DA was statistically significantly different from chance level (Table 2). CA was highest for HC (CA = .48, p = .051), but not significantly above chance (Figure 3 and Table 3). This trend can be explained by age being the strongest overall risk factor for dementia in general (Terracciano & Sutin 2019). On average, AD participants were significantly older than HC and SCD (Table 1), highlighting the importance of including age for the purpose of avoiding misattribution to other predictors.
3.2 mPerAF: Low but above-chance performance of resting-state amplitude
Feature set “mPerAF” showed statistically significant above-chance prediction accuracy (mean DA = .35, mean p = .010, Table 2). CAs were above chance for HC (CA = .42, p = .026) and AD (CA = .45, p = .016), but not for SCD (CA = .29, p = .299) and MCI (CA = .26, p = .419).
3.3 Personality: Highest prediction accuracy combined with depression and anxiety scores
Feature set 3 “Personality” had higher CAs than feature set 2 “mPerAF” for all classes (Table 2) while in turn being outperformed by feature set 4 “Personality extended” which yielded the overall highest CAs for both HC (.56) and MCI (.30). Regarding overall performance, feature sets 4 and 7 produced the highest DAs significantly above chance.
3.4 ApoE: Third best predictive ability
The presence of no, one, or two ε4 allele(s) yielded the second-best predictive performance (DA = .40, p = .002). It also yielded statistically significant CAs for HC (CA = .52, p = .021) and AD (CA = .52, p = .023) above chance, but performances for SCD and MCI were not significantly above chance.
3.5 Relatively poor performance of combined predictors without CSF markers
CAs of feature set 6 were consistently lower compared to those of feature sets 3, 4, and 5 (Table 2). In fact, feature set 6 was never in the top three CAs for any participant class (see Table 3). Although DA was rather low, predictive performance was relatively stable above chance, as indicated by the narrow confidence interval (DA = .36, 90% CI = [.31, .42], p = .006).
3.6 CSF biomarkers predict AD best, but perform poorly for HC
Feature set 7 “CSF” yielded the highest CAs for SCD (CA = .35, p = .301) and AD (CA = .65, p = .009) (Table 2). However, CAs for HC (CA= .43, p = .156) and MCI (CA = .19, p = .675) were comparatively low and non-significant (Table 3). Only the base model and feature set 7 “CSF” did not achieve significant prediction accuracy for HC. Moreover, feature set 7 “CSF” yielded the overall lowest CA for MCI.
3.7 Comparison of feature sets and summary
Best DAs yielded feature sets 4 “Personality extended” and 7 “CSF” with equally high overall accuracy (DA = .41), followed by “ApoE” (DA = .40). All feature sets—except feature set 2 “mPerAF”—performed significantly better than the base model in predicting class membership (see Table 3). Results of other variants of analysis are provided in the supplement.
3.8 Comparison of feature sets and summary
Best predictive performances were yielded by feature sets 4 “Personality extended” and 7 “CSF” with equally high overall accuracy (DA = .41), followed by “ApoE” (DA = .40). All feature sets – except feature set 2 “mPerAF” – performed significantly better than the base model in predicting class membership (Table 3).
To perform inferential statistical comparisons with feature set 7 “CSF”, we reran the same analysis with a reduced, but equal sample size of N = 311 participants (Supplementary Results). Our supplementary analyses show that feature set CSF yielded significantly higher DA than other feature sets, but class accuracies for HC and SCD were not significantly different from chance. The highest CA for HC was achieved by feature set 5 “personality extended”. Overall, our results indicated poor performance of all assessed feature sets in predicting class membership for SCD and MCI, with no feature set performing significantly above chance for any of the two classes. Therefore, we merged classes of SCD and MCI to a create a “risk class” and reran SVM classifications (see Supplementary Results, subsection 2.1). Overall prediction accuracy did not improve, given that chance level was now at 1/3 instead of 1/4. Importantly, no feature set achieved statistically significant above chance class accuracy for the risk class.
4 Discussion
In this study, we set out to assess the diagnostic value of several feature sets for AD, associated risk states (SCD, MCI), and healthy controls, with a focus on performance of combining personality, depression, and anxiety scores as well as resting-state fMRI and ApoE genotype, which each on their own have shown differences in MCI and AD (Mendez Rubio et al. 2013; Lau et al. 2016; Yoneda et al. 2016; Ikeda et al. 2017; Terracciano et al. 2017; Caselli et al. 2018).
All feature sets performed significantly above chance regarding their predictive accuracy (Table 2) and all, except feature set 2, performed significantly better than the base model (Table 3). Moreover, there were clear performance differences for HC, SCD, MCI, and AD. Feature sets with the highest decoding accuracy were (i) feature set 4 “personality extended”, containing the five personality scales’ scores of the BFI-10 in combination with the sum score of the geriatric anxiety inventory and that of the geriatric depression scale, (ii) feature set 7 “CSF”, containing three established CSF biomarkers for AD (tTau, pTau181, and Aβ42/40 ratio), and feature set 5 “ApoE”. Descriptively, feature sets CSF and personality extended performed equally well, yielding 41% DA in multi-class SVM classification. An inferential statistical comparison of the feature sets’ performances was only possible with a substantially reduced sample size (Supplementary Results 2.2). That supplementary analysis showed that with smaller sample sizes (N = 311 instead of N = 663), overall predictive accuracy was significantly higher for CSF compared to all other feature sets, as one would expect from the literature (Olsson et al. 2016; Düzel et al. 2022). However, CA of “CSF” for HC was not significantly different from chance.
4.1 Inferiority of the combined predictor and poor prediction accuracy of resting-state mPerAF
One of our central hypotheses posed that a combined set of relevant predictors would yield higher or equal prediction accuracies compared to CSF markers. This hypothesis was not supported by our data. CAs of the combined predictor were similar to those of the mPerAF predictor, suggesting that inclusion of mPerAF reduced mean prediction accuracy. On average, resting-state mPerAF of the DMN performed better than chance, but not significantly better than the base model, which contained only age, gender and acquisition site. This finding contradicts prior reports on the DMN (Mevel et al. 2011; Schouten et al. 2016; Ramzan et al. 2019), however, most studies on DMN used functional connectivity as opposed to voxel-wise amplitude of low-frequency fluctuation.
One potential explanation for the diverging results may be that we entered all classes at once, akin to a fully automated diagnosis process, instead of performing several binary decisions between two very different classes like HC and AD. The latter is a common approach in classification studies for research purpose (for review, see Jo et al. 2019), resulting in higher accuracies since the chance level lies at 50 % (instead of e.g. 25 % with four classes). The DAs will be even higher when sample sizes of groups are unbalanced. Unequal class sizes introduce bias in classification, which is well known, and several approaches have been proposed to counter the problem (e.g. Brodersen et al. 2010; Mathew et al. 2018). In contrast to the current study, not all authors take this problem actively into account and thus fail to account for said bias (Jo et al. 2019).
4.2 Personality, anxiety, and depression scores yield relatively high prediction accuracy
Personality alone yielded class accuracies significantly above chance level, confirming our hypothesis. Nonetheless, personality alone was consistently outperformed by feature set 5 “personality extended”. These results suggest additional predictive value of both depression and anxiety. A more extensive questionnaire assessing facets of the Big Five might offer higher predictive value than the economic BFI-10. Depression and anxiety may be viewed as proxies for personality, leading to a more comprehensive assessment of personality and thus higher prediction performance. Our finding is in line with previous research showing that patients with MCI show changes in various personality traits in addition to apathy and other affective symptoms (Mendez Rubio et al. 2013; Terracciano et al. 2017; Caselli et al. 2018). First-onset depression in older age has been proposed to represent an early manifestation of clinical dementia (Panza et al. 2010) and a more recent meta-analysis found that depression has high prevalence in MCI (Ismail et al. 2017). Moreover, anxiety has been associated with amyloid positivity in AD (Mendez 2021) and predictive value of conversion from MCI to AD (Palmer et al. 2007).
4.3 Poor classification accuracy for SCD and MCI with any feature set
No feature set yielded CAs significantly above chance for SCD and MCI. This trend remained after merging SCD and MCI to an “increased AD risk” group (Table S2). Group membership was assigned based on entry diagnosis and etiologies of SCD and MCI were not assessed (see Section 2.1). Only a limited share of individuals diagnosed with SCD and MCI will convert to AD, and reported annual conversion rates vary considerably, depending on diagnostic criteria (Chételat et al. 2005; Johns et al. 2012; Bessi et al. 2018). Thus, participants diagnosed with SCD and MCI represent heterogeneous study groups, which obfuscates prediction accuracy.
This points out the necessity of application of clear multifaceted diagnostic criteria (Morris 2012; Jessen et al. 2014; Jack et al. 2016; Petersen 2016).
4.4 Limitations
While sample sizes of most feature sets are identical, sample size of feature set 7 “CSF” differs (section 2.4), since only about half of all participants underwent CSF biomarker assessments. Reduction of sample sizes (311 versus 663) for all feature sets leads to better comparability, but at the expense of statistical power. Such a variant is reported in the supplement.
4.5 Conclusions
Our results show that there is no feature set that yields superior CAs for all assessed groups. They further suggest that CSF biomarkers and extended personality measures show complementary value for class prediction, which should be followed up on in future studies and extended by assessing the predictive value for conversion rates. Lastly, we showed that SCD and MCI remain heterogeneous groups that are hard to classify by machine learning approaches when more than a dichotomous classification is required, pointing out the need for coherent multi-modal diagnosis criteria.
Data Availability
All scripts used to perform the analyses are available under https://github.com/jmkizilirmak/DELCODE162. Data can be made available to cooperation partners of the DZNE after setting up appropriate data sharing contracts.
6 Acknowledgments
We would like to express our gratitude to all the participants of the DELCODE study, and all technical, medical, and psychological staff without whom this study would not have been possible. Special thanks go to the MRI centers at Max-Delbrück-Center for Molekulare Medicine of the Helmholtz society (MDC), at the Freie Universität Berlin Center for Cognitive Neuroscience Berlin (CCNB), and at the Bernstein Center für Computational Neuroscience, Berlin.
Footnotes
Email addresses k.waschkies{at}stud.uni-goettingen.de, Joram.Soch{at}dzne.de, margarita.darna{at}lin-magdeburg.de, Anni.Richter{at}lin-magdeburg.de, slawek.altenstein{at}charite.de, Aline.Beyle{at}dzne.de, Frederic.Brosseron{at}dzne.de, andrea.lohse{at}charite.de, Michaela.Butryn{at}dzne.de, laura.dobisch{at}dzne.de, michael.ewers{at}med.uni-muenchen.de, Klaus.Fliessbach{at}ukbonn.de, tatjana.gabelin{at}charite.de, Wenzel.Glanz{at}dzne.de, doreen.goerss{at}dzne.de, daria.gref{at}charite.de, Daniel.Janowitz{at}med.uni-muenchen.de, Ingo.Kilimann{at}dzne.de, friederike.buchholz{at}charite.de, Matthias.Munk{at}med.uni-tuebingen.de, boris.rauchmann{at}med.uni-muenchen.de, ayda.rostamzadeh{at}uk-koeln.de, nina.roy{at}dzne.de, eike.spruth{at}charite.de, pdechen{at}gwdg.de, Michael.Heneka{at}dzne.de, stefan.hetzer{at}charite.de, alfredo.ramirez-zuniga{at}uk-koeln.de, klaus.scheffler{at}med.uni-tuebingen.de, katharina.buerger{at}med.uni-muenchen.de, Christoph.Laske{at}med.uni-tuebingen.de, Robert.Perneczky{at}med.uni-muenchen.de, oliver.peters{at}charite.de, josef.priller{at}charite.de, anja.schneider{at}dzne.de, annika.spottke{at}dzne.de, stefan.teipel{at}med.uni-rostock.de, emrah.duezel{at}dzne.de, frank.jessen{at}uk-koeln.de, jens.wiltfang{at}med.uni-goettingen.de, Bjoern-Hendrik.Schott{at}dzne.de, Jasmin.Kizilirmak{at}dzne.de
Data availability statement All scripts (https://github.com/jmkizilirmak/DELCODE162) and the machine learning toolbox for Matlab (https://github.com/JoramSoch/ML4ML) are provided online. Data, study protocol, and biomaterials can be shared with cooperation partners based on individual data and biomaterial transfer agreements with the DZNE.
Funding statement The study was funded by the German Center for Neurodegenerative Diseases (Deutsches Zentrum für Neurodegenerative Erkrankungen [DZNE]), reference number BN012.
Conflict of interest disclosure F. Jessen received fees for consultations and presentations between 2019 and 2022 from AC Immune, Biogen, Danone/Nutricia, Eisai, GE Healthcare, Grifols, Janssen, Lilly, MSD, Novo Nordisk, and Roche. E. Düzel is cofounder of neotiv GmbH. The remaining authors report no disclosures relevant to the manuscript.
Ethics approval statement The study protocol was approved by Institutional Review Boards of all participating study centers of the DZNE. The process was led and coordinated by the ethical committee of the medical faculty of the University of Bonn (registration number 117/13).
Patient consent statement All participants provided written informed consent.
Permission to reproduce material from other sources Not applicable.
Clinical trial registration The DELCODE study was registered as a clinical trial under study acronym “DELCODE”, ID DRKS00007966 at the German Clinical Trials Register.
Just deleted a tiny formatting error on p. 8.
↵1 Since the “RESTplus” toolbox only provides 4 default masks, a group-level mask fitting the dimensions and voxel sizes of our preprocessed task-based fMRI was generated and added to the mask directory. Additionally, the parallel processing mode using outdated MATLAB commands had to be switched off.
5 List of Abbreviations
- Aβ
- Amyloid beta
- AD
- Alzheimer’s disease
- aMCI
- amnestic mild cognitive impairment
- ANOVA
- analysis of variance
- BFI
- Big Five Inventory
- BFI-10
- Big Five Inventory 10-item short form
- BOLD
- blood oxygenation level-dependent
- CERAD
- Consortium to Establish a Registry for Alzheimer’s Disease
- CA
- class accuracy
- CI
- confidence interval
- CSF
- cerebrospinal fluid
- CV
- cross-validation
- DA
- decoding accuracy
- DMN
- default mode network
- DELCODE
- DZNE-Longitudinal Cognitive Impairment and Dementia Study
- DZNE
- Deutsches Zentrum für neurodegenerative Erkrankungen
- EPI
- echo-planar imaging
- fMRI
- functional magnetic resonance imaging
- FWHM
- full width at half maximum
- GAI-SF
- Geriatric Anxiety Inventory, Short Form
- GDS
- Geriatric Depression Scale
- HC
- healthy controls
- Hz
- Hertz
- MCI
- mild cognitive impairment
- NIA
- National Institute on Aging
- MMSE
- Mini Mental Status Examination
- MNI
- Montreal Neurological Institute
- mPerAF
- mean percent amplitude of fluctuation
- MPRAGE
- Magnetization Prepared Rapid Gradient Echo
- MRI
- magnetic resonance imaging
- NEO PI-R
- Revised NEO Personality Inventory
- PerAF
- percent amplitude of fluctuation
- pTau181
- phosphorylated tau181
- ROI
- region of interest
- rs-fMRI
- resting-state functional magnetic resonance imaging
- SCD
- subjective cognitive decline
- SD
- standard deviation
- SPM
- Statistical Parametric Mapping
- SVC
- support vector classification
- SVM
- support vector machine
- TE
- echo time
- TR
- time to repetition
- tTau
- total tau
- VDM
- voxel-displacement map
- yrs
- years