PT - JOURNAL ARTICLE AU - Haridas, Namitha Thalekkara AU - Sanchez-Bornot, Jose M. AU - McClean, Paula L. AU - Wong-Lin, KongFatt AU - , TI - Autoencoder Imputation of Missing Heterogeneous Data for Alzheimer’s Disease Classification AID - 10.1101/2024.07.18.24310625 DP - 2024 Jan 01 TA - medRxiv PG - 2024.07.18.24310625 4099 - http://medrxiv.org/content/early/2024/07/18/2024.07.18.24310625.short 4100 - http://medrxiv.org/content/early/2024/07/18/2024.07.18.24310625.full AB - Accurate diagnosis of Alzheimer’s disease (AD) relies heavily on the availability of complete and reliable data. Yet, missingness of heterogeneous medical and clinical data are prevalent and pose significant challenges. Previous studies have explored various data imputation strategies and methods on heterogeneous data, but the evaluation of deep learning algorithms for imputing heterogeneous AD data is limited. In this study, we addressed this by investigating the efficacy of denoising autoencoder-based imputation of missing key features of a heterogeneous data that comprised tau-PET, MRI, cognitive and functional assessments, genotype, sociodemographic, and medical history. We focused on extreme (40-70%) missing at random of key features which depend on AD progression; we identified them as history of mother having AD, APoE ε4 alleles, and clinical dementia rating. Along with features selected using traditional feature selection methods, we included latent features extracted from the denoising autoencoder for subsequent classification. Using random forest classification with 10-fold cross-validation, we evaluated the AD predictive performance of imputed datasets and found robust classification performance, with accuracy of 79-85% and precision of 71-85% across different levels of missingness. Additionally, our results demonstrated high recall values for identifying individuals with AD, particularly in datasets with 40% missingness in key features. Further, our feature-selected dataset using feature selection methods, including autoencoder, demonstrated higher classification score than that of the original complete dataset. These results highlight the effectiveness and robustness of autoencoder in imputing crucial information for reliable AD prediction in AI-based clinical decision support systems.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not receive any funding.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesSource codes in the present study are available upon reasonable request to the authors. The original data used in the present study are available at Alzheimer's Disease Neuroimaging Initiative (ADNI). https://adni.loni.usc.edu/