Summary
Background Alzheimer’s disease and related dementias (ADRD) and Parkinson’s disease (PD) are the most common neurodegenerative conditions. These central nervous system disorders impact both the structure and function of the brain and may lead to imaging changes that precede symptoms. Patients with ADRD or PD have long asymptomatic phases that exhibit significant heterogeneity. Hence, quantitative measures that can provide early disease indicators are necessary to improve patient stratification, clinical care, and clinical trial design. This work uses machine learning techniques to derive such a quantitative marker from T1-weighted (T1w) brain Magnetic resonance imaging (MRI).
Methods In this retrospective study, we developed machine learning (ML) based disease-specific scores based on T1w brain MRI utilizing Parkinson’s Disease Progression Marker Initiative (PPMI) and Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohorts. We evaluated the potential of ML-based scores for early diagnosis, prognosis, and monitoring of ADRD and PD in an independent large-scale population-based longitudinal cohort, UK Biobank.
Findings 1,826 dementia images from 731 participants, 3,161 healthy control images from 925 participants from the ADNI cohort, 684 PD images from 319 participants, and 232 healthy control images from 145 participants from the PPMI cohort were used to train machine learning models. The classification performance is 0.94 [95% CI: 0.93-0.96] area under the ROC Curve (AUC) for ADRD detection and 0.63 [95% CI: 0.57-0.71] for PD detection using 790 extracted structural brain features. The most predictive regions include the hippocampus and temporal brain regions in ADRD and the substantia nigra in PD. The normalized ML model’s probabilistic output (ADRD and PD imaging scores) was evaluated on 42,835 participants with imaging data from the UK Biobank. There are 66 cases for ADRD and 40 PD cases whose T1 brain MRI is available during pre-diagnostic phases. For diagnosis occurrence events within 5 years, the integrated survival model achieves a time-dependent AUC of 0.86 [95% CI: 0.80-0.92] for dementia and 0.89 [95% CI: 0.85-0.94] for PD. ADRD imaging score is strongly associated with dementia-free survival (hazard ratio (HR) 1.76 [95% CI: 1.50-2.05] per S.D. of imaging score), and PD imaging score shows association with PD-free survival (hazard ratio 2.33 [95% CI: 1.55-3.50]) in our integrated model. HR and prevalence increased stepwise over imaging score quartiles for PD, demonstrating heterogeneity. As a proxy for diagnosis, we validated AD/PD polygenic risk scores of 42,835 subjects against the imaging scores, showing a highly significant association after adjusting for covariates. In both the PPMI and ADNI cohorts, the scores are associated with clinical assessments, including the Mini-Mental State Examination (MMSE), Alzheimer’s Disease Assessment Scale-cognitive subscale (ADAS-Cog), and pathological markers, which include amyloid and tau. Finally, imaging scores are associated with polygenic risk scores for multiple diseases. Our results suggest that we can use imaging scores to assess the genetic architecture of such disorders in the future.
Interpretation Our study demonstrates the use of quantitative markers generated using machine learning techniques for ADRD and PD. We show that disease probability scores obtained from brain structural features are useful for early detection, prognosis prediction, and monitoring disease progression. To facilitate community engagement and external tests of model utility, an interactive app to explore summary level data from this study and dive into external data can be found here https://ndds-brainimaging-ml.streamlit.app. As far as we know, this is the first publicly available cloud-based MRI prediction application.
Funding US National Institute on Aging, and US National Institutes of Health.
Evidence before this study We searched PubMed for articles published in English from database inception to May 11, 2023, about the use of machine learning on brain imaging data for Alzheimer’s disease (AD), dementia, and Parkinson’s disease (PD) populations. We used search terms “machine learning” AND “brain imaging” AND “neurodegenerative disorders” AND “quantitative biomarkers”. The search identified 25 studies. Most of these studies are focused on Alzheimer’s disease. They use machine learning to predict conversion from mild cognitive impairment to dementia or to build a classification tool. Many studies also focused on positron emission tomography (PET) images rather than cost-effective T1w MRI images in their analysis. None of the studies have focused on detecting disease during the asymptomatic phase of dementia and PD. Identified studies are limited in sample size (order of hundred samples) and extracted features. The assessments of the clinical utility of machine learning models’ predicted disease probabilities are scarce. Significantly, no attempts were made to validate the algorithm in an external cohort. In this work, we have limited our review to scientific studies that are transparent and reproducible, including those that provide code and validate their findings on a reasonable sample size.
Added value of this study This study developed machine learning based quantitative scores to measure the risk, severity, and prognosis of Alzheimer’s disease and related dementias (ADRD) and Parkinson’s disease (PD) using brain imaging data. Neurodegenerative disorders affect multiple body functions and exhibit significant etiology and clinical presentation variation. Patients with these conditions may experience prolonged asymptomatic periods. Disease-modifying therapies are most effective during the early asymptomatic stage of the disease, making early intervention a crucial factor. However, the lack of biomarkers for early diagnosis and disease progression monitoring remains a significant obstacle to achieving this goal. We leveraged disease-specific cohorts ADNI (1,826 images from 731 dementia participants) and PPMI (684 images from 329 PD participants) to develop a machine learning classifier for AD and PD detection using T1w brain imaging data. We obtain disease-specific imaging scores from these trained models using the normalized disease probability score. In a sizable external biobank, UK Biobank (42,835 participants), we found these scores show strong predictive power in determining the occurrence of PD or dementia during a 5-year followup. The occurrence of PD increased stepwise over ascending imaging score quantiles representing heterogeneity within the PD population. Imaging scores are also associated with pathological and clinical assessment measures. Our study indicates this could be a single numeric indicator representing disease-specific abnormality in T1w brain imaging modality. The association of imaging scores with the polygenic risk score of related disorders implies the genetic basis of these scores. We also identified top brain regions associated with dementia and Parkinson’s disease using feature interpretation tools.
Implications of all the available evidence The findings should improve our ability to create practical passive surveillance plans for individuals with a heightened risk of occurrence of neurodegenerative disease. We have shown that imaging scores complement other risk factors, such as age and polygenic risk scores for early detection. The integrated model could serve as a tool for early interventions and study enrollment. Understanding the genetic basis of imaging scores can provide valuable insights into the biology of neurodegenerative disorders. Additionally, these high-accuracy models able to facilitate accurate early detection at the biobank scale can empower precision medicine trial recruitment strategies as well as paths of care for the future. We have included the development of an interactive web server (https://ndds-brainimaging-ml.streamlit.app) that empowers the community to process their own data based on our models and explore the utility and applicability of these findings for themselves. Users can easily upload a Nifti or DICOM file containing their MRI image, and we handle the entire pre-processing and prediction process. All computations are performed on the Google Cloud Platform. In addition, we provide an interpretation of the ML prediction highlighting areas of the brain that have contributed to the decision and a what-if-analysis tool where users explore different scenarios and their effect on prediction.
Competing Interest Statement
A.D., M.T., M.A.N., H.I., and F.F. declare the following competing financial interests, as their participation in this project was part of a competitive contract awarded to Data Tecnica International, LLC, by the NIH to support open science research. M.A.N. also currently serves on the scientific advisory board for Character Bio and is an advisor to Neuron23, Inc. B.A. are employees of REALM IDx. The study's funders had no role in the study design, data collection, data analysis, data interpretation, or writing of the report. All authors and the public can access all data and statistical programming code used in this project for the analyses and results generation. F.F. takes final responsibility for the decision to submit the paper for publication.
Funding Statement
US National Institute on Aging, and US National Institutes of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
We have developed an interactive website (https://ndds-brainimaging-ml.streamlit.app) where researchers can investigate components of the predictive model and investigate feature effects on a sample and cohort level. To facilitate replication and expansion of our work, we have made the notebook publicly available on GitHub at https://github.com/NIH-CARD/NDDsImagingStreamlitApp and https://github.com/NIH-CARD/NDDsImaging. It includes all codes, figures, models, and supplements for this study. The code is part of the supplemental information; it includes the rendered Jupyter notebook with full step-by-step data preprocessing, statistical, and machine learning analyses.