Abstract
Biological age reflects actual aging and overall health, but current aging clocks are often complex and difficult to interpret, limiting their clinical application. In this study, we introduced a Gompertz law-based biological age (GOLD BioAge) model that simplified aging assessment. We estimated GOLD BioAge using clinical biomarkers and found significant associations of the difference from chronological age (BioAgeDiff) with risks of morbidity and mortality in NHANES. Moreover, we developed GOLD ProtAge and MetAge using proteomics and metabolomics data, which outperformed the clinical-only model in predicting mortality and chronic disease risks in UK Biobank. Benchmark analysis illustrated that our models exceeded common aging clocks in predicting mortality across diverse age groups in both NHANES and UK Biobank. The results demonstrated that the GOLD BioAge algorithm effectively applied to both clinical and omics data, showing excellent performance in predicting age-related outcomes. Additionally, we created a simplified version called the Light BioAge, which used three biomarkers for aging assessment. The Light model reliably captured mortality risks in three validation cohorts (CHARLS, RuLAS, CLHLS). It significantly predicted the onset of frailty, stratified frail individuals, and collectively identified individuals at high risk of mortality. In summary, the algorithm of GOLD BioAge could provide a valuable framework for aging assessment in public health and clinical practice.
Highlights
The algorithm of Gompertz law based biological age (GOLD BioAge) was proposed to construct biological aging clocks with convenient and interpretable calculations, which had better performance in predicting mortality risks.
Our approach was applicable to proteomics and metabolomics, yielding ProtAge and MetAge with great clinical prospect to improve accuracy of aging assessment and prevent age-related diseases.
The Light BioAge, a simplified version, was developed using age and three biomarkers, and it independently predicted mortality in three cohorts.
The Light BioAgeDiff significantly predicted the onset of frailty, stratified frail individuals, and collectively identified individuals at high risk of mortality.
Introduction
The human aging manifests as progressive physiological changes, physical and cognitive function decline, leading to an increased risk of mortality1. There is significant heterogeneity among individuals during aging process2, while chronological age may not accurately reflect the actual pace of aging. In addition, aging is the greatest risk factor for most chronic diseases, suggesting that targeting aging itself has the potential to delay multiple aging-associated disease processes3. Consequently, aging assessments and treatments have the potential to forecast and prevent functional decline and age-related chronic disease4. Some routine clinical biomarkers serve as biomarkers for aging, predicting the risks of functional decline and mortality after adjusting for chronological age5. In addition, integrating these biomarkers into composite panels could offer a more comprehensive and powerful assessment of aging compared to single biomarkers alone.
Biological age measures an organism’s biological functioning compared to the expected level for a specific chronological age, reflecting overall health status6,7. The Levine’s phenotypic age, which integrated nine biomarkers with chronological age, predicted mortality more accurately than chronological age alone8. Building on the concept of phenotypic age, Sheng et al. proposed PCAge to estimate biological age through linear dimensionality reduction; however they could be sensitive to outliers and thresholding effects9. Correspondingly, Wei et al presented ENABLAge, integrating machine-learning models with explainable artificial intelligence to ensure high prediction accuracy10. In addition to clinical aging clocks, omics-based aging clocks hold significant promises, as they capture more precise dynamic molecular interactions and pathways closely tied to the biological aging process11. Specially, the epigenetic biomarkers have been extensively utilized in the DNA methylation aging clocks, such as the Horvath Clock12 and GrimAge Clock13. Furthermore, multi-tissue aging clocks provide insights into how complex organisms undergo molecular changes with age, offering more detailed information about aging and disease states11,14,15. Recently, emerged proteomics and metabolomics data of large cohorts accelerated the development of plasma proteomic and metabolomic aging clocks16–19. These proteomics aging clocks showed promising accuracy in predicting mortality and multimorbidity14,16. Although these biological aging clocks had excellent performance in forcasting diseases and mortality, the clinical translation remained limited, due to the gap between the scientific research and its application in clinical translational settings20.
The complexity of aging clock models, along with issues of interpretability, required features, and generalization capacity, may impede their clinical application. For instance, while DNA methylation clocks are the widely used, biosamples collection and high-throughput sequencing can be both time-consuming and expensive. Therefore, the development of aging clocks should emphasize delivering clinically actionable insights while ensuring affordability, accessibility, and robustness across diverse populations. To tackle these challenges in clinical practice, it is essential to create a computational algorithm that can calculate simplified, robust, and practical biological aging clocks using a small number of effective biomarkers.
The Gompertz law is one of the most widely used mathematical model for describing mortality, and it effectively captures the exponential increase in mortality hazard across adult ages, which strongly fits with empirical mortality data21. Also, the model’s simplicity and flexibility allow it to be applied across wide ranges. For example, the Levine’s phenotypic age was proposed based on 10-years mortality risks using the Gompertz model8. Also, Kuo at al. proposed proteomic aging clock using proteomics data based on cumulative mortality risks of Gompertz model19. Therefore, the Gompertz model provided as a theoretical basis to optimize the phenotypic age for clinical practices.
Here, we developed an algorithm framework for Gompertz law based biological age (GOLD BioAge). The GOLD BioAge constructed aging clocks with a linear combination of chronological age and biomarkers, and linked the its difference from chronological age to morbidity and mortality risks. Then, we applied the GOLD BioAge algorithm on metabolomics and proteomics data in the UKB, to investigate the algorithm validity on omics-based data. Moreover, we compared its prediction performances of mortality with common aging clocks using data from the National Health and Nutrition Examination Survey (NHANES) and UK Biobank (UKB). Finally, we refined and simplified GOLD BioAge as a Light model and validated it across three independent Chinese cohorts: the China Health and Retirement Longitudinal Study (CHARLS), the Chinese Longitudinal Healthy Longevity Survey (CLHLS), and the Rugao Longevity and Ageing Study (RuLAS).
Results
Definition and development the GOLD BioAge model
The biological age referred to the age that accurately reflected an individual’s risk of mortality; a higher mortality risk corresponded to an older biological age. Based on the Gompertz law model, we linked chronological age and biomarkers to mortality hazard with the exponential distribution (Figure 1A). Consequently, the Gompertz law based biological age (GOLD BioAge) was estimated as the age that aligned with the joint mortality hazard derived from both chronological age and biomarkers. Thus, the Gold Biological Age (GOLD BioAge) was calculated as a linear combination of chronological age and biomarkers.
In the NHANES, 39,348 samples (49.5 ± 18.0 years old) with CBC and bioassay biomarkers were enrolled in the analysis. After feature selection implemented in LASSO Cox regression (Figure S1), we developed a clinical aging clock based on 10 biomarkers, referred as GOLD BioAge, which showed a strong correlation with chronological age (R = 0.969, Figure 1B). The GOLD BioAge was the linear combination of chronological age, red blood cell distribution width (RDW), albumin (ALB), creatinine, and etc (Figure 1C). This model provided an intuitive interpretation of how biomarker values relate to biological age.
GOLD BioAgeDiff as a novel aging metric
We then introduced GOLD Biological Age Difference (BioAgeDiff) as the difference between the BioAge and chronological age, to estimate the magnitude of how individuals’ biological age deviated from their chronological age (Figure 1A, S2). If the BioAgeDiff was greater/lower than 0, it meant that the person was older/younger than the CA. The BioAgeDiff, as the linear combination of biomarkers, established a clear relationship between changes in biomarkers and shifts in biological age. This calculation of BioAgeDiff made it easy to understand how deviations in biomarkers from reference values affect biological age. For instance, if an individual’s blood glucose level increased by 1 mmol/L, the BioAge would rise by 0.58 years (Figure 1C).
Figure 1D illustrated the distribution of the BioAgeDiff, which was close to the normal distribution (Mean: 0, SD: 5.707). By counting the major chronic diseases, participants with comorbidity had higher BioAgeDiff compared to those without any diseases (Figure 1E). Notably, individuals with four diseases were approximately 5 years older in BioAge. Considering health status, a higher BioAgeDiff was found to be cross-sectionally associated with poorer self-rated health (Figure 1F). Additionally, unhealthy lifestyles, such as smoke and alcohol use, were associated with a higher BioAgeDiff (Figure 1G). These results of the BioAgeDiff were validated in the UKB (Figure S3).
The BioAgeDiff was associated with risks of mortality in NHANES and UKB (Table 1), with the hazard ratios (HRs) of 1.155 and 1.133, respectively. Survival curve analysis (Figure 2) of 20 years follow-ups revealed that participants in the highest 20% of BioAgeDiff showed a steeper decline in survival probability compared to those in the lowest 20%, especially among middle-aged and older age groups. For instance, individuals aged 65-74 years, about 80% of those in the high-risk group had died after about 16 years, whereas only about 30% of those in the low-risk group had died. The BioAgeDiff could be considered as a measure through linear dimension reduction or projection. Thus, we also compared the performance of BioAgeDiff with those common metrics, including mahalanobis distance statistic22,23 (MDS) and principal component analysis24 (PCA). In middle-aged (45-64 years) and older age groups (65-85 years), BioAgeDiff outperformed other linear metrics in identifying individuals with high risks of mortality (Figure 2).
Application of GOLD BioAge on metabolomics and proteomics
To further investigate the utility of GOLD BioAge with multi-omics biomarkers, we applied our algorithm to create the MetAge and ProtAge models based on blood NMR metabolomics and proteomics data in the UKB, respectively. Like the clinical-based BioAge, the omics-based aging clocks showed strong correlations with chronological age and age-related factors (Figure 3A, S4). The ProtAge exhibited significant abilities to capture mortality risks, surpassing MetAge, clinical BioAge and chronological age (Figure 3B). For all-cause mortality, ProtAge achieved a C-index of 0.790, while MetAge and BioAge reached 0.747 and 0.738, respectively. Additionally, these results were consistent across different age groups and causes-specific mortality (Table S11). Notably, among young adults (<45 years), ProtAge demonstrated a C-index of 0.793 in survival analysis, highlighting its effectiveness in predicting premature mortality risk (Figure 3C). For specific mortality, ProtAge recorded a C-index of 0.754 for cancer mortality and 0.850 for heart disease mortality, the highest among the three aging clocks. Individuals in the top 20% of ProtAgeDiff exhibited the highest cumulative mortality incidence rates throughout the follow-up, compared to those with MetAgeDiff and BioAgeDiff (Figure S5). These findings further emphasized the superiority of proteomic biomarkers over metabolomics and clinical biomarkers in predicting mortality.
Then, we decomposed the ProtAgeDiff into contributions from cardiometabolic (CardioDiff), inflammatory (InflamDiff), neurological (NeuroDiff), and oncological (OncoDiff) proteins (Figure 3D-E), which may reveal various aspects of aging mechanisms. CardioDiff and NeuroDiff emerged as the top two contributors to ProtAgeDiff, demonstrating the highest C-index in survival analysis (Figure 3F, S6). Within these proteins (Table 2), GDF15, NTproBNP, and EGFR have been identified as aging biomarkers, while NEFL is frequently highlighted among neurological proteins. Given the relative independence of the four ProtAgeDiff categories (Figure 3G), we took the counts within the high-risk group (top 20% of Cardio/Neuro/Inflamm/Onco Diff) into a risk score ranging from 0 to 4. This risk score effectively identified individuals at high risk of mortality (Figure 3H); for example, over 60% of those scoring 4 had died within approximately 16 years due to all-cause mortality. In summary, ProtAge and its ProtAgeDiff serve as exceptional aging clocks for predicting mortality risk, and the ProtAgeDiff calculation allows us to analyze the aging process across four distinct biological categories.
GOLD BioAge and incident chronic diseases
To investigate the potential of GOLD BioAge in predicting the incidence of common chronic diseases, we included cancer, myocardial infarction, heart failure, stroke, chronic obstructive pulmonary disease (COPD), and dementia in the association analysis. The cox proportional hazards model provided distinct insights into disease risk, showing that one 1-year increase in the biological age was associated with elevated disease risks (Figure 4A). For instance, in the case of cancer, a 1-year increment in ProtAge, MetAge and BioAge was associated with a 2.7%, 1.8%, and 1.6% and increase of hazard ratios (HRs), respectively. This trend was consistent across other diseases, such as myocardial infraction and stroke. Moreover, the ProtAge model demonstrated slightly higher HRs and C-index values for most specific diseases compared to the BioAge model. In dementia, for example, the HR of ProtAge reached 1.078, while BioAge had an HR of 1.051. Similarly, the MetAge model exhibited robust performance across diseases like myocardial infraction and stroke, with HRs of 1.081 and 1.066, respectively. These results highlighted the value of ProtAge and MetAge in predicting incident chronic diseases in large cohorts.
Cumulative disease incidence trajectories were presented based on aging pace, measured by ProtAgeDiff, MetAgeDiff, and BioAgeDiff (Figure 4B). The differences between the highest and lowest ProtAgeDiff groups were most pronounced among the three metrics, indicating that ProtAge was particularly effective in predicting the onset of chronic diseases. Over a follow-up period of 16 years, cumulative mortality rates for cancer, myocardial infarction, heart failure, stroke, chronic obstructive pulmonary disease (COPD), and dementia in the high ProtAgeDiff group were 28.46%, 13.24%, 5.49%, 8.58%, 18.20%, and 9.49%. Overall, these findings underscore the significance of ProtAge and MetAge in forecasting age-relate chronic diseases.
Comparison with other aging clocks
To investigate the validity of our models, we compared the mortality prediction performance of the GOLD BioAge model with Levine phenotypic age, KDM biological age (KDM-BA), and chronological age in the NHANES (8,106 participants, aged 47.0 ± 16.3 years) and UKB (265,541 participants, aged 56.5 ± 8.0 years). These aging clock models were constructed using clinical biomarkers, with chronological age included as a reference.
Figure 5 showed the C-index of survival analysis and AUC values for 10-year mortality prediction of these aging clocks. The BioAge model significantly showed better overall performance than any other biological and chronological age across the NHANES and UKB dataset, both in the overall sample and within specific age groups. For example, the BioAge model achieved a C-index of 0.847 in the entire cohort, outperforming the Levine’s phenotypic age (0.845), KDM (0.827), and chronological age (0.822). As for cause-specific mortality, the GOLD BioAge model showed the highest the value of C-index and AUC among these aging clocks. For example, for mortality concerning respiratory disease, C-index of the BioAge were 0.885 in NHANES and 0.828 in UKB. Taking the NHANES III as the validation dataset (Figure S7), the GOLD BioAge also showed competitive performance, compared with these common aging clocks. These results shown the validity of our biological age algorithm and its efficiency for capture mortality risks.
Light BioAge for practice simplicity
For clinical practice simplicity, we refined and simplified GOLD BioAge as a light version called the Light BioAge (Figure S8). The Light BioAge model included age, serum creatinine, glucose, and CRP (log-transformed). The calculating formula is as followed:
In NHANES, 38,001 samples (49.6 ± 18.3 years old) with these three biomarkers were enrolled. And the Light BioAge was strongly correlated with chronological age (R = 0.987), which accounted for 93.73% of the variance in the GOLD BioAge. Its difference from chronological age (Light BioAgeDiff) was positively correlated with age (Figure 6B), which followed a nearly normal distribution (Figure 6C). It also showed significant associations with comorbility, self-rated health, unhealthy lifestyles, and risks of mortality (Figure 6D-G, Table 1).
Compared to GOLD BioAge model, the Light BioAge model, utilizing the fewest indicators, demonstrated competitive predictive accuracy (Figure 5). In the NHANES dataset, while the full BioAge model achieved a higher C-index of 0.832 for all-cause mortality, the Light model demonstrated competitive performance with a C-index of 0.811. Further, we found that the C-index of the Light BioAge was very close to previous prominent measures, such as the Levine’s phenotypic age and KDM. For example, for mortality of cerebrovascular disease, the C-index of Light BioAge reached 0.910, comparable to phenotypic age (0.902) and KDM (0.914) in NHANES. Notably, to enhance clinical applicability, we identified its performance in predicting incident chronic disease. For instance, the Light BioAge model demonstrated HR of 1.116, 1.099, and 1.077 for COPD, myocardial infarction, and stroke, respectively (Figure 6H). These results highlighted that the Light BioAge provided a robust and practical alternative while remaining competitive with other aging metrics.
Light BioAge predicted mortality in validation cohorts
We further validated the Light BioAge in three independent datasets, including the CHARLS (17,163 participants, aged 58.4 ± 10.05 years), RuLAS (1,785 participants, aged 77.0 ± 4.2 years), and CLHLS (2,499 participants, aged 85.5 ± 12.0 years). In the three cohorts (Table 1), it documented 1752, 186, 813 deaths during the median follow-up period of 9.0, 4.0, 4.1 years, respectively.
The Light BioAge was strongly correlated with chronological age across the three cohorts (Figure 7A). In the full samples, the Light BioAge achieved AUC values of 0.792 in CHARLS, 0.809 in CLHLS, and 0.746 in RuLAS (Figure 7B). These values were higher than those for chronological age, which were 0.774, 0.790, and 0.646, respectively. Notably, the Light BioAge outperformed chronological age in individuals aged between 60-79 years with AUC exceeding 0.790 in both CLHLS and RuLAS; it also maintained a robust AUC near 0.8 for those aged 75 and older, significantly outperforming chronological age. Participants with high BioAgeDiff (top 20%) experienced a more pronounced decline in survival probability compared to those with low BioAgeDiff (botteom 20%) across CHARLS, RLAS, and CLHLS (Figure 7C). By the end of follow-up periods in each cohort, the survival probabilities of individuals in the high-risk groups were about 75%, 85% and 55%, respectively.
With human aging as a longitudinal process, we examined the dynamic changes of Light BioAgeDiff between wave 1 and wave 3 of CHARLS (Figure 8A). The Light BioAge in the two waves were strongly correlated (R=0.915, Figure 8B), while the Light BioAgeDiff showed a moderate correlation (R=0.475). According to Light BioAgeDiff, paticipants were classified into slow (Diff<0), normal (0<=Diff<5) and fast (Diff>5) aging groups, subsequently classifying them into seven categories based on their aging status across both waves (Figure 8C). The stable slow-aging groups across the two waves were taken as the reference. Compared with the reference, the stable fast-aging groups and accelerated aging groups (slow/normal to fast) exhibited the highest mortality risks (Figure 8D-E). In addition, the decelerated aged (fast to slow/normal) had reduced mortality risks.
Light BioAgeDiff, frailty and mortality risks
Next, we explored the associations of Light BioAgeDiff with frailty as assessed by the frailty index that included age-related chronic diseases, self rated health, basic and instrucmental activities of daily living and mobility capacity. In CHARLS 2011 and 2015, the frailty status were associated with BioAgeDiff, in which the frail individuals were 1.14 and 1.20 years old than the robust counterparts (Figure 9A). During longtidinal follow-ups (2011-2015, 2015-2018), the baseline BioAgeDiff was associated with incident frailty (odds ratio [95% CI]: 1.02 [1.01-1.04]; 1.04 [1.01-1.07], Figure 9B). The paticipants categoried within the fourth quantile of BioAgeDiff had the highest risks. Using the BioAgeDiff as a measure of biologica aging, we examined the mediation role of functional decline, measured by frailty index, in the associations of BioAgeDiff with mortality risks (Figure 9C). The mediation proportion of frailty index was about 26.4% (p<0.001) while its increase accounted for 6.48%.
The Light BioAgeDiff demonstrated performance comparable to that of frailty in predicting mortality, while the C-index of BioAgeDiff and frailty were 0.634 and 0.633 in CHARLS. using both BioAgeDiff and frailty as measures of biological and functional aging, we examined their combined effectiveness in identifying individuals at high risk for mortality. The survival probability of frail individuals was about 70% during 9-year follow-up in CHARLS (Figure 9D). In contrast, frail individuals with the highest BioAgeDiff had a mortality rate of approximately 55% during this period (Figure 9E). Therefore, These findings highlight the potential role of Light BioAgeDiff in preventing incident frailty and its joint contribution with frailty in identifying individuals at elevated risk for mortality.
Discussion
In this study, we presented an elegant algorithm for estimating biological age as a linear combination of chronological age and various biomarkers. The GOLD biological age and its difference from chronological age provided insights into the relationship between individual biomarker values and the aging pace. Notably, the implementation of our algorithm in proteomics and metabolomics demonstrated the significant potential of omics biomarkers in identifying risks of mortality and and age-related chronic diseases. Furthermore, benchmark analysis demonstrated that our models outperformed traditional aging clocks in predicting the risks of both all-cause and cause-specific mortality across different age groups. We also developed a simplified version termed the Light BioAge, which provides a practical and efficient alternative with simplified calculations. The Light BioAge exhibited strong predictive capabilities in assessing mortality risks across three validation elderly cohorts and was associated with the onset of frailty, collectively forecasting mortality risks associated with frailty. In summary, our algorithm was validated as a general framework for constructing aging clocks. Importantly, both the BioAge and its light version can serve as convenient tools for aging assessment in clinical practice.
The robustness of the GOLD BioAge algorithm and the aging clocks were validated through multiple aspects. The evaluation of GOLD BioAge primarily focused on the correlation between BioAgeDiff and chronological age, the prediction of all-cause and cause-specific mortality, the incidence of multiple age-related chronic diseases, the onset of frailty, and validations across diverse populations. Additionally, benchmark analyses of mortality prediction demonstrated the superiority and sensitivity of the GOLD BioAge model. Consequently, GOLD BioAge served as a general and reliable measure of biological aging, offering simple and practical calculations for aging assessment and public health.
The pace of individual aging experiences dynamic changes throughout life, influenced by modifiable lifestyle choices, environmental factors, psychological influences, and health conditions. Identifying individuals at high risk of premature aging can enhance primary prevention efforts and reduce the healthcare and socioeconomic burdens linked to age-related diseases. In this study, the innovative biological aging clock exhibited stronger associations with morbidity and mortality than chronological age, providing a direct measure of an individual’s aging progression. To further advance the application of biological age in public health and clinical settings, we introduced the Light BioAge, a simple and practical aging clock that utilized just three accessible biomarkers alongside chronological age. The Light BioAge demonstrated applicability across various independent cohorts (NHANES, UKB, CHARLS, CLHLS, and RuLAS) with differing study designs, participant characteristics, and morbidity profiles. This model incorporated serum creatinine, blood glucose, and C-reactive protein levels with chronological age to reflect kidney function, metabolic and inflammatory status. These biomarkers were commonly used in medical examinations and were readily available at a low cost. Therefore, Light BioAge offered a convenient tool for ongoing monitoring of aging trajectories to prevent functional decline and age-related diseases.
Compared with the Levine’s phenotypic age, we both estimated the biological age by fitting the Gompertz distribution to empirical mortality data. The Levine’s phenotypic age had been widely used in aging-related studies. Notably, the phenotypic age outperformed earlier biological age-related methods in predicting all-cause mortality and various diseases25. GOLD BioAge exhibited a strong correlation with Levine’s phenotypic age in the NHANES and UKB dataset, demonstrating the robustness and reliability of our algorithm. Notably, the Levine’s phenotypic age relied on the Gompertz cumulative distribution function to estimate the 10-year mortality risk. Its calculation involved a double logarithmic transformation, which inevitably hindered its clinical interpretation. In comparison, our approach simplified the calculation process by utilizing the hazard function to identify instantaneous mortality risk, resulting in a higher predictive accuracy for mortality outcomes.
Our study introduced the ProtAge and MetAge, integrating omics data into the GOLD biological age framework. These novel aging clocks generally outperformed clinical marker-based clocks in predicting mortality, which probably due to the higher sensitivity of omics data in capture aging-related information26. For ProtAge, proteins were categorized into four groups based on their physiological function, each contributing to a distinct age estimate. Proteomics plays a crucial role in aging process, as changes in protein expression and post-translational modifications, particularly those linked to inflammation, oxidative stress, and cell cycle regulation, provide stable, long-term biomarkers. These molecular signatures offered deeper insights into biological aging compared to clinical markers. Additionally, because protein alterations often preceded the onset of chronic diseases, proteomics enhanced the early disease detection, making ProtAge a valuable tool for predicting mortality and early-stage health risks27–29. In addition, metabolomics reflected rapid, short-term fluctuations in the body’s biochemical processes, offering insights into how recent changes in diet, physical activity, and stress impact aging. The integration of both proteomic and metabolomics data into the aging clock framework created a more comprehensive tool to estimate biological age. It also provided the potential for personalized health interventions to mitigate aging-related risks.
This study has several limitations. First, although the omics-based aging clocks demonstrated superior performance compared to those using clinical biomarkers in the UKB dataset, further validation in other elderly cohorts is essential to confirm these findings. Additionally, the selection of biomarkers for the aging clocks was performed using LASSO penalized regression to enhance accuracy; however, different feature selection methods could yield alternative sets of biomarkers, indicating potential for further optimization of biomarker panels in clinical applications. Furthermore, we validated the Light BioAge in three Chinese cohorts, leaving uncertain whether the full GOLD BioAge model would more accurately capture the risks associated with geriatric syndromes and mortality.
Methods and materials
Study populations
Our study used the data of NHANES 1999-2018, UKB, CHARLS, CLHLS, and RuLAS. The US NHANES is a nationally representative cross-sectional survey of civilian living in the US, approved by the National Center for Health Statistics (NCHS) Ethics Review Board30. The UK Biobank is large-scale perspective cohort that collected data from over 500,000 participants across 22 centers in England, Scotland, and Wales. UKB received ethics approval from the North West Multicenter Research Ethics Commitee31. The CHARLS is an ongoing prospective population-based longitudinal cohort study of middle-aged and older Chinese adults. CHARLS was approved by the Ethics Review Board of Peking University, which was conducted in accordance with the Declaration of Helsinki and other relevant guidelines and regulations32. The CLHLS is a nationwide longitudinal study of old-aged Chinese population. The project was approved by the Biomedical Ethics Committee of Peking University, China (IRB00001052-13074)33. The Rugao Longevity and Ageing Study (RuLAS) is a population-based perspective study, which consisted of a longevity cohort and an aging cohort in Rugao, China34. The RuLAS was approved by the Human Ethics Committee of Fudan University School of Life Sciences. All participants provided written informed consent. And This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines for cohort studies35.
Clinical biomarker selection for constructing Gold BioAge
We utilized the data of NHANES 1999-2018 for variable selection, and refined the biomarker panel for constructing Gold Biological Age. As a result, 26 common biomarkers from cell blood count (CBC) tests and biochemical assays were enrolled (Table S1). LASSO Cox regression models were employed for biomarker selection, with five-fold cross-validation to determine the optimal parameter value (lambda.1se) of 0.0166. First, the 26 biomarkers and chronological age were analyzed using the LASSO Cox regression. Then 10 biomarkers were retained in the model (Figure S1), including chronological age, creatinine, glucose, among others. This set of biomarkers formed the basis for the novel biological age model (Gold BioAge). To simplify the panel for practical use, feature selection was performed on 10 blood biomarkers (biochemical and hematological) that were consistently observed across various cohorts (Table S2). This simplified panel included chronological age, serum creatinine, glucose, and C-reactive protein (CRP) for the light Gold Biological age model (Light BioAge).
Metabolomics and proteomics biomarker selection
Then, we applied GOLD BioAge models to metabolic and proteomic biomarkers, respectively, employing data from the UK Biobank (UKB, 2006-2010). For the Gold Proteomic Age Model (Gold ProtAge), we analyzed 2,923 proteins from 53,014 participants. To maintain the relative integrity of the independent sample, we removed proteins with more than 10% missing data, resulting in 1,459 protein biomarkers for analysis. Using LASSO-Cox regression model, we calculated protein-predicted age (ProtAge) in the entire sample (n = 39,772) through five-fold cross-validation. To achieve a balance between simplicity and predictive power, we opted for a lambda value of exp (−6), optimizing the model’s simplicity and model performance (Figure S1). Then it selected 22 protein biomarkers along with chronological age. Detailed descriptions of all the selected proteins are available in the supplementary materials. For the Gold Metabolic Aging Clock (Gold MetAge), we utilized nuclear magnetic resonance (NWR) - based blood profiling metabolomics data from UKB. A total of 248,202 UKB participants were enrolled, each with measurements of 251 circulating metabolomic markers. We performed LASSO-Cox regression using fivefold cross-validation, selecting features based on the lambda value of exp (−6), which corresponded to a one standard deviation increase over the lambda with minimum mean-squared error. Among the 251 metabolic biomarkers, 27 were selected, along with chronological age, to develop the Gold MetAge.
GOLD BioAge model training
We conducted two Gompertz regression models for biological age model training. The first Gompertz regression model only included chronological age as a predictor of time-to-mortality data. The second Gompertz regression model incorporated both chronological age and selected biomarkers as predictors. We defined the gold biological age (Gold BioAge) as the age accounting for the actual mortality hazard by considering both chronological age (CA) and additional biomarkers (Figure S2). The models were specified as follows:
Model 1: Chronological Age Only:
Model 2: Chronological Age and Selected Biomarkers:
In Model 2, biomarkers represented the selected biomarkers included in the model, and βi (coef) represented the coefficients of each biomarker. The GOLD BioAge integrated chronological age with relevant biomarkers to better reflect mortality hazard and aging status. Let h1(Bioage, t = 0) ≈ h2(Bioage, Biomarkers, t = 0). In real dataset, the h1 and h2 empirical distributions were different (Figure S2), resulting in underestimate of biological age in the whole population. Thus, we add a constant (γ) correct the bias and let h1 = γ ∗ h2.
Thus, Gold BioAge was derived as follows:
For further simplicity and robustness, we set the coefficient parameter of CA (β2) equal to the parameter (β1) in Model 2, and estimated the parameter for rate, shape, and coefficients of biomarkers.
When β1 = β2, the formula was further simplified as follows:
The Gompertz distribution parameters (rate, shape, and coefficients) were estimated by maximum likelihood using the “flexsurv” R package. The coefficients of GOLD BioAge, ProtAge and MetAge were shown in the Table S5-8.The algorithm of GOLD BioAge was implemented as a R package (http://github.com/Jerryhaom/GOLDBioAge).
Benchmark of biological age models
To ensure the robustness of our models, we evaluated the model performance in the NHANES and UKB, compared with other common phenotypic aging clocks and dimensional reduction methods. The the Levine phenotypic age, KDM age and mahalanobis distance statistic were calculated using the ‘BioAge’ R package36. The PCA age was calculated through principal component analysis and regressed to age with first five components. In the NHANES and UKB datasets, we compared the concordance index (C-index) of survival analysis and Area Under the Curve (AUC) of 10-year mortality prediction in the full samples and across three age groups: young adults, middle-aged adults, and older adults. It allowed us to determine the models’ ability to discriminate mortality risk across different age groups. In addition to all-cause mortality, these aging clocks in predicting cause-specific mortality were evaluated. Since the aging clock was a single variable, the prediction values of 10-year survival status was estimated through on logistic regressions based on aging clcoks. This comprehensive benchmarking analysis allowed for a thorough evaluation of the models’ performance and comparison with other established aging clocks. The associations of GOLD BioAge, Light BioAge, ProtAge, and MetAge with all-cause and cause-specific mortality were shown in Table S9-11.
Validation in independent elderly cohorts
We validated the Light BioAge model in other three elderly cohorts: the CHARLS, RuLAS and CLHLS datasets. Five waves of CHARLS data (2011-2020) were utilized, with blood-based bioassay data used to construct the Light BioAge. Health and function questionnaires were collected for frailty assessment37 (Table S12). For RuLAS, the wave 2 (2016) was taken as baseline, and data of blood biomarkers were obtained. The data CLHLS 2014-2018 was enrolled to validate the LightBioAge. We calculated the ROC curves to evaluate the prediction performance of Light BioAge and chronological age across the all sample, as well as subpopulations stratified by age (60-79 years; >= 80 years). Additionally, we fitted survival curves for low-risk and high-risk groups based on the Light BioAgeDiff model.
Assessment of mortality and onset of chronic diseases
In NHANES, death information was based on linked data from records taken from the National Death Index (NDI) through December 31, 2019, provided through the Centers for Disease Control and Prevention. Data on mortality status and length of follow-up (in person-months) were available for nearly all participants. In UKB, death information was obtained through death certificates held within the National Health Service (NHS) Information Centre (England and Wales) and the NHS Central Register (Scotland) to November 30, 2022. We calculated participants’ time to death from baseline to the date of death, date of loss to follow-up, or date of last record of follow-up, whichever came first. We used the International Statistical Classification of Diseases, 10th, to define causes of death. The cause-specific mortality included mortality of malignant neoplasm, heart disease, cerebrovascular disease, respiratory disease, Alzheimer disease, diabetes, and others.
In addition, diagnosed dates of incident chronic disease in UKB were also collected, including cancer, myocardial infraction, heart failure, stroke, chronic obstructive pulmonary disease (COPD), and dementia.
Assessment of health-related factors and outcomes
The unhealthy lifestyle score was based on six modifiable lifestyle factors: smoking, alcohol consumption, physical activity, diet, body mass index (BMI), and sedentary behavior, defined by World Health Organization.The score was categorized into five groups (0, 1, 2, 3, 4 and more unhealthy factors). Multimobidites, defined as the number of lifetime disease diagnoses. In NHANES, we included diabetes, high blood pressure, congestive heart failure, coronary heart disease, heart attack, stroke, cancer or malignancy, and chronic bronchitis; In UKB, we included cancer, myocardial infarction, heart failure, stroke, chronic obstructive pulmonary disease (COPD), and dementia. Disease count was classified into five categories: no disease, 1, 2, 3, and 4 or more diseases. Self-rated health was recorded in four levels: excellent or very good, good, fair and poor. The distributions of GOLD BioAge, Light BioAge, ProtAge, and MetAge by unhealthy lifestyles, comorbidity, and self-rated health were shown in Table S13.
Statistical analysis
Survival analysis was conducted in different age groups. Within the same group, participants were classified into quintiles based on their BioAge Difference (BioAgeDiff), with the top 20% representing individuals at highest risk of death. Kaplan-Meier survival curves were then plotted to compare the predicted survival probabilities between the highest and lowest quintiles of the novel BioAge. Harrell’s Concordance Index (C-index) was used to assess the predictive discrimination in survival analysis. And the Area Under the Curve (AUC) was took as a robust metric to evaluate the prediction ability. Cox proportional hazard models were conducted to assess the associations between different biological aging clocks, mortality and the onset of chronic diseases. The cox models were adjusted for sex and chronological age. All statistical analyses were performed using R version 4.3.3.
Funding
This work was supported by grants from the National Natural Science Foundation of China-Youth Science Fund (82301768, 32300533, 32100510), the Shanghai Sailing Program (23YF1430500).
Conflict of interest
None declared.
Author contributions
Concept and design: Meng Hao, Li Yi, Hui Zhang.
Acquisition, analysis, or interpretation of data: Meng Hao, Zixin Hu, Shuai Jiang. Drafting of the manuscript: Meng Hao, Jingyi Wu, Hui Zhang
Critical revision of the manuscript for important intellectual content: All authors. Statistical analysis: Meng Hao, Hui Zhang, Jingyi Wu.
Administrative, technical, or material support: Xiangnan Li, Shuming Wang, Meijia Wang, Yaqi Huang, Jiaofeng Wang, Jie Chen, Zhijun Bao, Li Jin.
Supervision: Meng Hao, Yi Li, Shuai Jiang, Zixin Hu, Xiaofeng Wang.
Acknowledgments
The data used in this research were obtained from the NHANES, UK Biobank, CHARLS, RuLAS and CLHLS. We would like to thank the workers, researchers, and participants involved in these cohorts.