Abstract
A growing number of studies use deep neural networks (DNNs) to identify diseases from recordings of brain activity. DNN studies of electroencephalography (EEG) typically use cross-validation to test how accurately a model can predict the disease state of held-out test data. In these studies, segments of EEG data are often randomly assigned to the training or test sets. As a consequence, data from individual subjects appears in both training and test data. Could high test-set accuracy reflect leakage from subject-specific representations, rather than patterns that identify a disease? We address this question by testing the performance of DNN classifiers using segment-based holdout (where EEG segments from one subject can appear in both the training and test sets), and comparing this to their performance using subject-based holdout (where individual subjects’ data appears exclusively in either the training set or the test set). We compare segment-based and subject-based holdout in two EEG datasets: one classifying Alzheimer’s disease, and the other classifying epileptic seizures. In both datasets, we find that performance on previously-unseen subjects is strongly overestimated when models are trained using segment-based holdout. Next, we survey the literature and find that the majority of translational DNN-EEG studies use segment-based holdout, and therefore overestimate model performance on new subjects. In a hospital or doctor’s office, clinicians need to diagnose new patients whose data was not used in training the model; segment-based holdout, therefore, does not reflect the real-world performance of a translational DNN model. When evaluating how DNNs could be used for medical diagnosis, models must be tested on subjects whose data was not included in the training set.
Competing Interest Statement
Geoffrey Brookshire, Keith J. Yoder, Spencer Gerrol, and Ché Lucero are employed at SPARK Neuro Inc., a medical technology company developing diagnostic aids to help clinicians identify and assess neurodegenerative disease.
Funding Statement
This study was funded by SPARK Neuro, Inc.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
For experiment 1, the IRB of the St. John's Cancer Institute gave ethical approval for this work (Protocol JWCI-19-1101). For experiment 2, the data were obtained from a publicly-available repository. The Ethical Committee of the University of Siena gave ethical approval for collecting these data and posting them for public access.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
EEG data for experiment 1 were provided by the Pacific Neuroscience Institute. These data are described by Ganapathi and colleagues (2022), and can be accessed through agreement with the authors of that study. EEG data for experiment 2 were downloaded from the publicly-available Siena Scalp EEG Database (Detti et al, 2020a; 2020b) hosted on PhysioNet (Goldberger et al, 2000). Paolo Detti. Siena Scalp EEG Database (version 1.0.0). PhysioNet, 2020a. URL https://doi.org/10.13026/5d4a-j060. Paolo Detti, Giampaolo Vatti, and Garazi Zabalo Manrique de Lara. EEG synchronization analysis for seizure prediction: A study on data of noninvasive recordings. Processes, 8(7):846, 2020b. Aarthi S Ganapathi, Ryan M Glatt, Tess H Bookheimer, Emily S Popa, Morgan L Ingemanson, Casey J Richards, John F Hodes, Kyron P Pierce, Colby B Slyapich, Fatima Iqbal, et al. Differentiation of subjective cognitive decline, mild cognitive impairment, and dementia using qEEG/ERP-based cognitive testing and volumetric MRI in an outpatient specialty memory clinic. Journal of Alzheimer's Disease, pages 1-9, 2022 (preprint) Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23):e215-e220, 2000.