Abstract
Background Protein abundance levels, sensitive to both physiological changes and external interventions, are useful for assessing the Alzheimer’s disease (AD) risk and treatment efficacy. However, identifying proteomic prognostic markers for AD is challenging by their high dimensionality and inherent correlations.
Methods Our study analyzed 1128 plasma proteins, measured by the SOMAscan platform, from 858 participants 55 years and older (mean age 63 years, 52.9% women) of the Framingham Heart Study (FHS) Offspring cohort. We conducted regression analysis and machine learning models, including LASSO-based Cox proportional hazard regression model (LASSO) and generalized boosted regression model (GBM), to identify protein prognostic markers. These markers were used to construct a weighted proteomic composite score, the AD prediction performance of which was assessed using time-dependent area under the curve (AUC). The association between the composite score and memory domain was examined in 339 (of the 858) participants with available memory scores, and in an independent group of 430 participants younger than 55 years (mean age 46, 56.7% women).
Results Over a mean follow-up of 20 years, 132 (15.4%) participants developed AD. After adjusting for baseline age, sex, education, and APOE ε4+ status, regression models identified 309 proteins (P ≤ 0.2). After applying machine learning methods, nine of these proteins were selected to develop a composite score. This score improved AD prediction beyond the factors of age, sex, education, and APOE ε4+ status across 15 to 25 years of follow-up, achieving its peak AUC of 0.84 in the LASSO model at the 22-year follow-up. It also showed a consistent negative association with memory scores in 339 participants (beta = -0.061, P = 0.046), 430 independent participants (beta = -0.060, P = 0.018), and the pooled 769 samples (beta = -0.058, P = 0.003).
Conclusion These findings highlight the utility of proteomic markers in improving AD prediction and emphasize the complex pathology of AD. The composite score may aid early AD detection and efficacy monitoring, warranting further validation in diverse populations.
Introduction
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that gradually impairs cognitive functions such as memory and reasoning abilities[1]. This disease significantly affects patients’ ability to perform daily tasks independently [2] and imposes a considerable burden on caregivers and healthcare systems [3]. With the global population aging, the incidence of AD is rising, posing a growing threat to public health and necessitating preventive strategies and effective treatment [4]. The complex nature of AD, especially its prolonged asymptomatic phase, presents challenges for early detection but also opportunities to develop interventions aimed at modifying the disease’s trajectory for secondary prevention [5, 6]. Therefore, identifying AD prognostic markers is crucial due to the disease’s insidious onset and progression and the lack of effective treatments for AD [7].
Plasma proteomic markers are sensitive to both internal physiological changes and external interventions[8], making them excellent candidates for tracking AD progression and response to treatment. Mang studies have highlighted significant associations between specific proteins and the risk of AD, along with associations between changes in protein levels and structural brain alterations over time[9–13]. Research has also identified a relationship between proteomic markers and amyloid burden, suggesting that plasma protein testing could be used to assess brain amyloid deposition[14]. Despite these advancements, the challenge of identifying proteomic prognostic markers remains, largely due to their high dimensionality and strong correlations among proteins. Moreover, there is a significant gap concerning the long-term predictive capacity of these proteomic markers for AD. This gap highlights the need for further research to evaluate how these markers perform in predicting AD incidence at specific future time points, potentially improving early detection and timely intervention strategies for AD.
Survival machine learning is particularly effective at addressing these challenges[15]. These methods are able to account for higher-order interactions and nonlinear relationships, which are crucial for selecting features based on variable importance[16]. Furthermore, survival machine learning models can incorporate time-to-event data and consider censored data. Therefore, we conducted association analysis and applied machine learning models to identify proteomic prognostic markers and constructed a weighted proteomic composite score in a community-based cohort. We aim to identify proteomic biomarkers and construct a composite score to enhance AD prediction across various follow-up periods.
Methods
Study Population
Initiated in 1948, the Framingham Heart Study (FHS) was a prospective cohort study based in a community setting[17]. In 1971, the study expanded to include the FHS Offspring cohort, comprising the children of the original participants and the spouses of these children [18]. Since the first Offspring exam cycle between 1971 and 1975, participants have undergone 10 health examinations approximately every four to six years[19]. This study included 1,913 individuals from the Offspring cohort who participated in the fifth examination cycle between 1991 and 1995, during which their blood was collected for proteomics profiling assessments. Participants were excluded if they had prevalent AD, incident non-AD dementia (n = 62), lacked education (n = 59) and APOE (n= 65) information. In analyses with machine learning, models tend to bias towards the majority class, which can lead to inflated performance metrics during training[20]. To address this imbalance and minimize potential model bias, we excluded participants who were 55 years or younger at baseline (n = 869), leaving 858 participants to identify protein markers to predict AD (Figure 1). This helped balance the age differences between cases and controls without significantly reducing the number of incident AD cases. For the association analysis of the proteomic composite score and memory domain, 339 of 858 participants with neuropsychological (NP) test measures were included (Figure 1). Additionally, to validate the early detection capabilities of the proteomic composite score for AD, we incorporated a separate, younger group of 430 independent participants into this analysis (Figure 1). All participants provided their written consent for genetic studies. The study protocol received approval from the Institutional Review Boards at Boston University Medical Center, Massachusetts General Hospital, and Beth Israel Deaconess Medical Center. The study adhered strictly to regulations and guidelines to ensure compliance.
Proteomics Profiling
Previous studies have detailed the methods of proteomics profiling[21, 22]. In brief, plasma was obtained from blood samples collected at clinical visits and preserved at -80°C[23]. The quantification of protein concentrations with these plasma samples was conducted using the SOMAscan platform[24]. This approach leverages single-stranded DNA aptamers to identify and bind to specific proteins. The efficacy of this technology has been validated through its application in cardiovascular disease research[21, 25]. The protein profiling analysis was conducted on the samples in two distinct batches with 821 and 1,092 participants, respectively. Across the two batches, a total of 1,373 proteins were examined. Logarithmic transformations were applied to the protein measurements, followed by inverse normal transformation to achieve normality. Linear models were used to obtain residuals by regressing the transformed proteins on Plate ID to minimize batch effect. Plate ID denotes the plate in the machine on which a given sample was run. Proteins (n=245) with more than 20% missing data were excluded (Figure 1). The residuals of protein markers were combined from two batches and a total of 1,128 proteins were used for subsequent analyses.
Ascertainment of AD
A participant was identified with incident AD if they were cognitively intact at the time of proteomics profiling but diagnosed with AD during the follow-up. The methodology for monitoring and diagnosing AD within the FHS has been documented in prior publications[26, 27]. In brief, each participant diagnosed with AD was confirmed by a review panel comprising at least a neurologist and a neuropsychologist in FHS. This confirmation is based on available data from neurological and neuropsychological (NP) evaluations, FHS health examinations, clinical records, and discussions with relatives[28]. The criteria for AD diagnosis follow those established by the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association (NINCDS–ADRDA)[29]. For the participants who developed AD, the follow-up duration was calculated from the baseline up to the earliest documented date of AD onset. For those who did not develop AD, the follow-up duration was terminated between the baseline and December 31 of 2022, the date of the last recorded follow-up, or the date of death, depending on which occurred first.
Neuropsychological assessment
The administration of the neuropsychological tests in FHS has been detailed in prior studies [30, 31]. In brief, we obtained the z-scores of six Wechsler Memory Scale (WMS) scores, including WMS Logical Memory Immediate Recall, WMS Logical Memory Delayed Recall, WMS Visual Reproductions Immediate Recall, WMS Visual Reproductions Delayed Recall, WMS Paired Associates Immediate Recall, and WMS Paired Associates Delayed Recall [32, 33]. A z-score was calculated by subtract a score to its mean and divided by its standard deviation. The total memory domain score was the sum of the z-scores from the six WMS variables.
Statistical Analyses
Descriptive statistics
This study conducted a comparative analysis of baseline characteristics between participants who developed AD during the follow-up period and those who did not. Continuous variables were compared using the t-test, and categorical variables were assessed using the Chi-square test to identify any significant differences between the two groups.
Association Analysis and Survival Machine Learning for Identifying Proteomic Prognostic Markers
The association of each protein marker with incident AD was examined using Cox proportional hazard regression models. These models were adjusted for baseline demographic factors, including age, sex, education, and APOE ε4+ status. Proteins with a P value ≤0.2 were selected for further analysis with the machine learning methods.
This study evaluated the importance of proteins using two survival machine learning methods[34]: the LASSO-based Cox proportional hazard regression model (LASSO)[35] and the generalized boosted regression model (GBM) [36]. LASSO utilizes L1 regularization to induce sparsity, setting the coefficients of less important variables to zero, thus simplifying model complexity and preventing overfitting [37]. In contrast, GBM enhances model accuracy through an iterative process where each new model corrects errors from preceding ones, adeptly managing complex non-linear data patterns[38]. Both LASSO and GBM were adapted to accommodate censored data in analyzing time-to-event data. LASSO incorporates a Cox proportional hazards model, applying L1 penalties to enhance model selection and penalize less significant variables. GBM extends to survival scenarios by employing survival trees within its boosting framework. In determining protein importance, LASSO measures it by the magnitude of the coefficients, with larger values indicating a stronger impact on AD incidence[39]. GBM assesses protein importance by measuring its frequency in tree splitting and contribution to model performance, known as relative importance[40].
The survival LASSO and GBM models were developed using proteins with a P value ≤ 0.2. The proteins were initially ranked according to their importance. Beginning with the most significant protein marker, we sequentially built machine learning models, incorporating the next most important protein into each successive model. The mean Harrell’s c-index was calculated using a ten-fold cross-validation approach for each model iteration[41]. Ultimately, the final selected model for LASSO and GBM was determined by identifying the one with the fewest proteins among the top 5 models that achieved the highest mean Harrell’s c-index.
To minimize collinearity issue in regression model and bias in constructing a proteomic composite score, we calculated the pairwise Pearson correlation coefficients for each pair of proteins to eliminate highly correlated proteins identified by the machine learning models. For pairs where the correlation coefficient exceeded 0.3, only the protein with a more significant association with incident AD was retained as the proteomic prognostic marker.
Construction of Proteomic Composite Score
To explore the cumulative impact of proteins on AD development, we constructed weighted composite scores using proteins that were previously identified by the machine learning models. The weights assigned to each protein were derived from their regression coefficients, obtained using a Cox proportional hazard model that adjusted for age, sex, education, and APOE ε4+ status. For proteins that achieved nominal significance (P < 0.05), the composite score was formulated as a linear combination of these weighted proteins.
Association of Proteomic Composite Score with Incident AD and Memory Domain
The association between proteomic composite score and incident AD was examined by Cox proportional hazard regression model, adjusting for age, sex, education, and APOE ε4+ status. To further validate the early detection capabilities of the proteomic composite score for cognitive decline, we assessed its association with the memory domain score in a linear regression model, adjusting for age, sex, and education. We conducted the linear models in three groups of participants to investigate the association between the memory domain scores and composite scores, adjusting for age, sex, education, and APOE ε4+ status: 339 of the 858 participants with available memory domain scores (the older age group), a separate younger group of 430 independent participants (mean age 46, 56.7% women, the younger age group), and the pooled samples (n=769, the combined sample of 339 and 430 participants) (Figure 1).
Assessment of AD Prediction Performance of Proteomic Composite Score
We compared the capacity of the proteomic composite score in predicting AD. The base model included age, sex, education, and APOE ε4+ status as predictors. A second model added proteins with P≤0.2 to the base model. The third model incorporated proteomic prognostic markers into the base model. The fourth model included the proteomic composite score into the base model. Both LASSO and GBM were utilized to evaluate these models. The performance of these models was evaluated using a 10-fold cross-validation. The time-dependent area under the receiver-operating characteristic curve (AUC) for each year during a follow-up period of 15 to 25 years was calculated to determine the model’s prediction performance [42].
Results
Baseline Demographics
To identify proteomic prognostic markers, this study included 858 FHS Offspring participants who were cognitively intact at baseline (mean age 63±5, 52.9% women, 34.0% college or above) (Table 1). During a mean follow-up of 20 years, 132 (15.4%) incident AD cases were identified.
The Association Between Proteins and Incident AD
Among the 1,128 proteins evaluated, 106 proteins were associated with incident AD with nominal significance after adjusting for age, sex, education, and APOE ε4+ status (P<0.05) (Supplementary Table 1). However, none of these proteins remain significant after false discovery rate correction. Among these, 73 proteins were positively associated with incident AD, with the most significant association identified in death-associated protein kinase 2 (DAPK2). Each SD higher level of DAPK2 was associated with a 44% higher risk of incident AD (95% CI: 1.19, 1.73; P = 8.27E-05). Conversely, 33 proteins showed negative associations with AD incidence, with the strongest association for hepatocyte growth factor receptors. Each SD increase in the plasma levels of hepatocyte growth factor receptor was associated with a 27% lower risk of incident AD (95% CI: 0.60, 0.89; P = 1.66E-03).
Proteomic Prognostic Markers
For the 309 proteins associated with incident AD with a significance of P<0.2, we further evaluated their predictive capacity for AD using LASSO and GBM models. Figure 2 presents the importance rankings of these proteins as determined by both models. Starting with the most significant protein, we incrementally added proteins to both the LASSO and GBM models to enhance AD prediction. Figure 2 also illustrates the mean Harrell’s c-index from ten-fold cross-validation as additional proteins are incorporated into the models. The results indicated that the GBM model achieves its highest Harrell’s c-index, 0.750, when the top 19 proteins are included. The LASSO model achieved its optimal predictive performance with a Harrell’s c-index of 0.804 when 19 proteins were included. Five proteins, including GFRa-1, FCN1, Activated Protein C, Siglec-3, LIGHT, were identified by both the LASSO and GBM models. Therefore, a total of 33 proteins were identified by either LASSO or GBM. The univariate association of each of these 33 proteins with incident AD is shown in Table 2, with 6 of these proteins showing a negative association with AD.
Protein selection by LASSO and GBM. The bar charts show the feature importance of each protein, while the blue line depict the Harrell’s c-index, illustrating how the model’s predictive performance improves with the sequential inclusion of proteins. Proteins ultimately selected are highlighted within a green box.
Figure 3 displays a heatmap of the Pearson correlation coefficients for the 33 proteins. Among these, 15 pairs of proteins exhibited correlation coefficients greater than 0.3. Consequently, we eliminated the less significant proteins associated with incident AD from these 15 pairs. After this filtering process, 18 proteins were retained for further analysis.
The correlation of proteins selected by LASSO and GBM. The color gradient represents the correlation levels, where darker red indicates a higher positive correlation and lighter pink signifies a higher negative correlation between the protein pairs selected by the two methods.
Association Analysis of Proteomic Composite Score with the Memory Domain Score
A multivariable Cox regression model assessed the associations between 18 proteins and incident AD, adjusting for age, sex, education, and APOE ε4+ status as covariates. In this model, 9 proteins demonstrated significance with a P value less than 0.05. Consequently, these 9 proteins were used to construct a composite score (Figure 4). In association analyses, the composite score was positively associated with the incidence of AD. Each unit increase in the composite score was associated with a 2.3 times higher risk of developing AD (HR = 2.33; 95% CI: 1.85, 2.79; P = 5.8E-15). To test if the proteomic composite score was predictive to memory score, we conducted a linear regression and found that the proteomic composite score was negatively associated with the memory domain score in the 339 (of the 858) participants with available memory scores (beta= -0.061, SE = 0.030, P = 0.046), adjusting for age, sex, education, and APOE ε4+ status (Figure 5). This negative association was consistent in the independent 430 participants aged 55 and below (beta = -0.060, SE = 0.025, P = 0.018), in the pooled 769 samples (beta = -0.058, SE = 0.019, P = 0.003), adjusting for age, sex, education, and APOE ε4+ status (Figure 5).
Volcano plots illustrate the HR on the x-axis and -log10(P value) on the y-axis, showing the association of proteins with incident AD. Proteins located above the horizontal blue line indicate the significance (P < 0.2) in their association with incident AD. The 9 selected proteomic prognostic markers were highlighted with blue dots.
Forest plot showing the association of memory domain score with proteomic composite score. The association was examined by linear regression model adjusting for baseline age, sex, education, APOE ε4+ status. SE: standard error.
Performance Comparison of AD Predictive Models Across Varying Follow-Up Periods
We evaluated the prediction performance of incident AD using different models over a follow-up time ranging from 15 to 25 years (Figure 6). In the GBM analysis, Model 4, which integrates age, sex, education, and APOE ε4+ status with a proteomic composite score, consistently outperforms the other models, maintaining AUC values above 0.797 throughout the period. Model 1, which includes only age, sex, education, and APOE ε4+ status, ranks as the second-best model, achieving its optimal AUC of 0.800 at the 21-year follow-up. Similar patterns are observed with the LASSO model, where Model 4 also consistently achieves the highest AUC. Additionally, Models 2 and 3 which incorporated proteins, generally remain above the performance of Model 1 (optimal AUC 0.826 at 15-year follow-up). For Model 4 fitting with LASSO, the predictive performance for AD starts with an AUC of 0.83 at the 15-year follow-up, peaks at 0.84 at the 22-year follow-up, and then generally shows a declining trend in predictive performance as the follow-up period extends (mean AUC: 0.79).
Time-dependent AUC estimates of different models at each year of follow-up between 15 years and 25 years. Model 1: age, sex, education, and APOE ε4+ status; Model 2: age, sex, education, APOE ε4+ status, and 309 proteins; Model 3: age, sex, education, APOE ε4+ status, and 9 proteomic prognostic markers; Model 4: age, sex, education, APOE ε4+ status, and proteomic composite score. The horizontal axis represents follow-up time in years and the vertical axis represents the estimated area under the ROC curve for survival at the time of interest.
Discussion
This study identified a set of proteomic prognostic markers for AD by utilizing association analysis and survival machine learning models. A proteomic composite score was developed from 9 selected proteins, which, when integrated with clinical risk factors, significantly enhanced the performance of models for AD prediction across various follow-up periods, reaching a peak AUC of 0.84 at the 22-year follow-up. To our knowledge, this is the first study to employ a survival machine learning-based approach to identify proteomic prognostic markers for AD, consistently achieving high predictive results from 15 to 25 years.
Identifying prognostic markers capable of predicting AD risk across various follow-up periods is crucial for both understanding the disease’s progression and selecting participants for clinical trials focused on preventive therapies. Proteins with their ability to reflect biological processes and responses to treatments makes them particularly valuable in tracking the gradual development of AD and assessing intervention efficacy over time. In this study, 106 out of 1,128 proteins showed nominal significance (P < 0.05) in association with incident AD after adjusting for baseline demographics. The most significant association was observed between DAPK2 and the risk of AD. DAPK2 shares a high degree of homology with DAPK1, particularly in their catalytic domains[43]. DAPK1 is widely expressed throughout the central nervous system and its dysregulation has been linked to neurological disorders, including AD[44]. Notably, activated protein C emerged as significantly associated with an increased risk of AD, despite its known neuroprotective effects, including anti-inflammatory properties and roles in promoting blood flow and preventing brain cell death [45]. This indicates that elevated levels of activated protein C might be a compensatory response to early pathophysiological changes in AD, where the body attempts to counteract damage but may inadvertently exacerbate other underlying mechanisms that promote AD progression. Another possible reason could be the presence of other confounding factors affecting its association with AD. Conversely, hepatocyte growth factor receptor showed the most significant negative association with AD incidence, aligning with findings from other research [46]. Activation of hepatocyte growth factor facilitates stem cell differentiation and neurogenesis, and offers protection against damage in various cells, including neurons[47]. These findings not only contribute to our understanding of the biological pathways involved in AD but also underscore the complexity of the disease’s pathophysiology, suggesting that both inflammatory processes and regenerative mechanisms may play crucial roles.
The use of survival machine learning models to further refine the selection of proteomic markers based on their predictive relevance represents an advanced approach to handling high-dimensional data and incorporating the impact of time. Traditional methods like stepwise selection, which assume independence among predictors, may be biased given correlation among proteins. In contrast, LASSO addresses this issue by employing L1 optimization, which minimizes the total sum of coefficients, selecting one protein from a highly correlated pair. However, due to the high degree of correlation (or collinearity) among proteins, LASSO may randomly select one protein from correlated pairs, resulting in variability with each run. To minimize the issue, we implemented cross-validation to average the feature importance of each protein, providing a more stable and reliable approach to manage collinearity. Additionally, we employed the GBM, a tree-based ensemble method that uses a boosting framework to train models iteratively. Unlike methods that rely on single-variable fitting, GBM utilizes a subset of variables for each tree, enhancing its capacity to integrate and analyze proteins collectively. This method effectively captures the complex interdependencies among proteins and enhances the model’s predictive performance for AD by leveraging multiple variables simultaneously. The ability of these models to achieve high Harrell’s c-index with a relatively small subset of proteins suggests that machine learning techniques can effectively extract key information about the relationship between proteins and AD risk. Notably, both models consistently identified a core set of five proteins, including GFRa-1, FCN1, Activated Protein C, Siglec-3, LIGHT, that share predictive relevance for AD, providing a strong validation for these findings and underscoring the robustness of these methods in identifying critical markers. This alignment with prior research further reinforces the significance of these proteins in AD. For example, GFRa-1 is implicated in neural cell survival and repair mechanisms, highlighting its potential role in AD [48]. FCN1 (Ficolin-1) has been shown to be differentially expressed in AD cases[49]. Siglec-3, expressed in myeloid cells, plays a role in the immune response of neurodegenerative diseases [50]. Tumor necrosis factor has been shown to be associated with AD[51].
Researchers have been investigating the potential of plasma proteomic profiles to predict incident AD[13, 52]. In this study, we developed a proteomic composite score from ten proteomic prognostic markers and evaluated its association with AD incidence. This score positively associated with AD risk, suggesting it could be a crucial part of a broader diagnostic framework to identify individuals at increased risk of AD before symptoms appear. Previous studies have focused on predicting AD incidence at a single future time point[13, 52]. In contrast, our study extends this approach by assessing the composite score’s ability to predict AD incidence over multiple follow-up periods. The enhanced predictive performance, evidenced by increased time-dependent AUC values when integrating proteomic markers with baseline demographics, is particularly noteworthy. Additionally, we calculated the weighted composite scores for individuals under 55 years of age with regression coefficients obtained in the older age group and analyzed its association with memory domain scores. The consistent significant negative association with the general population indicates that the composite score is effective at detecting early changes in cognitive function. The validation of the proteomic composite score is warranted in future studies with larger sample sizes of both older and younger age groups.
We recognize several limitations in our study. Our participants consisted solely of non-Hispanic whites, highlighting the need for future research in diverse ethnic and racial groups. Moreover, despite the robust predictive performance of our selected markers across multiple follow-up times, potential confounders could still influence the association between the proteins and incident AD. Future studies should consider incorporating additional factors to address these potential confounders. This expansion is crucial for enhancing the generalizability of our findings and facilitating external validation. Moreover, similar to other high-throughput “omics” studies, batch effects could affect the reproducibility of our findings. Furthermore, although our proteomics platform is one of the most comprehensive available, it is limited to detecting only the proteins that are incorporated into the platform. Future studies should be conducted to validate our findings in larger cohorts. This study has several advantages. First, the integration of survival machine learning with association analysis enables the selection of markers that can accommodate the complex interactions within high-dimensional proteomic data and include time-to-event information. Second, this study developed protein composite scores that are highly interpretable and easy to use. Further, we validated these composite scores by examining their association with memory domain scores in a separate younger group. This demonstrated the effectiveness of the scores in detecting early cognitive changes. We utilized time-dependent AUC to assess the predictive power of protein composite score for AD incidence risk across various follow-up periods. This method allowed us to track how the prognostic abilities of these markers evolved over time, offering a comprehensive assessment of their long-term efficacy in predicting AD progression.
In summary, this study significantly advances the identification and application of AD proteomic prognostic markers through survival machine learning methods. It demonstrated the proteomic composite score’s ability to predict AD risk consistently across multiple follow-up periods. Further studies involving external validation are essential to ensure the generalizability of these findings.
Declarations
Ethics approval and consent to participate
All participants provided their written consent for genetic studies. The study protocol received approval from the Institutional Review Boards at Boston University Medical Center, Massachusetts General Hospital, and Beth Israel Deaconess Medical Center. The study adhered strictly to regulations and guidelines to ensure compliance.
Availability of data and materials
The data could be requested through an application to the FHS (https://www.framinghamheartstudy.org/fhs-for-researchers).
Competing interests
Dr. Au is a scientific advisor to Signant Health and NovoNordisk, and a consultant to the Davos Alzheimer’s Collaborative. Dr. Doraiswamy has received research grants, advisory/board fees, and/or stock from several companies and is a co-inventor of several patents related to the diagnosis and treatment of dementia. The other authors state that this study was carried out without any commercial or financial affiliations that might be seen as a possible conflict of interest.
Authors’ contributions
CL designed the study. YL analyzed the data and trained the model. HD made significant contributions to writing the manuscript and interpreting the results. TA, RA, and MD reviewed and edited the manuscript. All authors read and approved the final manuscript.
Fundings
Data collection for FHS was supported by N01-HC-25195, HHSN268201500001, and by grants (R01AG059727, R01AG016495, R01AG008122, RF1AG062109, U19 AG068753) from the National Institute on Aging.
Acknowledgments
We would like to thank the participants of the Framingham Heart Study for their commitment. This research could not have been conducted without their involvement.