Abstract
Background An individual’s biological age is a measurement of health status and provides a mechanistic understanding of aging. Age clocks estimate a biological age of an individual based on their various features. Existing clocks have key limitations caused by the undesirable tradeoff between accuracy (i.e., predictive performance for chronological age or mortality, often achieved by complex, black-box models) and interpretability (i.e., the contributions of features to biological age). Here, we present ‘ENABL (ExplaiNAble BioLogical) Age’, a computational framework that combines machine learning (ML) models with explainable AI (XAI) methods to accurately estimate biological age with individualized explanations.
Methods To construct ENABL Age clock, we first need to predict an age-related outcome of interest (e.g., all-cause or cause-specific mortality), and then rescale the predictions nonlinearly to estimate biological age. We trained and evaluated the ENABL Age clock using the UK Biobank (501,366 samples with 825 features) and NHANES 1999-2014 (47,084 samples with 158 features) datasets. To explain the ENABL Age clock, we extended existing XAI methods so we could linearly decompose any individual’s ENABL Age into contributing risk factors. To make ENABL Age clock broadly accessible, we developed two versions: (1) ENABL Age-L, which is based on popular blood tests, and (2) ENABL Age-Q, which is based on questionnaire features. Finally, when we created ENABL Age clocks based on predictions of different age-related outcomes, we validated that each one captures sensible, yet disparate aging mechanisms by performing GWAS association analyses.
Findings Our results indicate that ENABL Age clocks successfully separate healthy from unhealthy aging individuals and are stronger predictors of mortality than existing age clocks. We externally validated our results by training ENABL Age clocks on UK Biobank data and testing on NHANES data. The individualized explanations that reveal the contribution of specific features to ENABL Age provide insights into the important features for biological age. Association analysis with risk factors and agingrelated morbidities, and genome-wide association study (GWAS) results on ENABL Age clocks trained on different mortality causes show that each one captures sensible aging mechanisms.
Interpretation We developed and validated a new ML and XAI-based approach to calculate and interpret biological age based on multiple aging mechanisms. Our results show strong mortality prediction power, interpretability, and flexibility. ENABL Age takes a consequential step towards accurate interpretable biological age prediction built with complex, high-performance ML models.
Evidence before this study Biological age plays an important role to understanding the mechanisms underlying aging. We search PubMed for original articles published in all languages with the terms “biological age” published until June 22, 2022. Most prior studies focus on the first generation of biological age clocks that are designed to predict chronological age. These clocks have weak and variable associations with mortality risk and other aging outcomes. Only a few studies present the second-generation of biological age clocks, which are built directly with aging outcomes. However, these studies use linear models and do not provide individualized explanations. Moreover, previous biological age clocks cannot specify what aging process they capture. Unlike our study, none of the previous studies have combined a complex machine learning (ML) model and an explainable artificial intelligence (XAI) method, which allows us to build biological ages that are both accurate and interpretable.
Added value of this study In this study, we present ENABL Age, a new approach to estimate and understand biological age that combines complex ML models and XAI method. The ENABL Age approach is designed to measure secondgeneration biological age clocks by directly predicting age-related outcomes. Our results indicate that ENABL Age accurately reflects individual health status. We also introduce two variants of ENABL Age clocks: (1) ENABL Age-L, which takes popular blood tests as inputs (usable by medical professionals), and (2) ENABL Age-Q, which takes questionnaire features as inputs (usable by non-professional healthcare consumers). We extend existing XAI methods to calculate the contributions of input features to ENABL Age estimate in units of years, which makes our biological age clocks more human-interpretable. Our association analysis and GWAS results show that ENABL Age clocks trained on different age-related outcomes can capture different aging mechanisms.
Implications of all the available evidence We develop and validate a new ML and XAI-based approach to measure and interpret biological age based on multiple aging mechanisms. Our results demonstrate that ENABL age has strong mortality prediction power, is interpretable, and is flexible. ENABL Age takes a consequential step towards applying XAI to interpret biological age models. Its flexibility allows for many future extensions to omics data, even multi-omic data, and multi-task learning.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was funded by National Science Foundation [DBI-1759487, DBI-1552309, DBI-1355899, DGE-1762114]; National Institutes of Health [R35 GM 128638, R01 NIA AG 061132 and P30 AG 013280].
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study used openly available human data from NHANES (https://www.cdc.gov/nchs/nhanes/index.htm) and UK Biobank (https://www.ukbiobank.ac.uk/)
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The study used openly available human data from NHANES (https://www.cdc.gov/nchs/nhanes/index.htm) and UK Biobank (https://www.ukbiobank.ac.uk/)