ABSTRACT
Interictal high-frequency oscillation (HFO) is a promising biomarker of the epileptogenic zone (EZ). However, objective definitions to distinguish between pathological and physiological HFOs have remained elusive, impeding HFOs’ clinical applications. We employed self-supervised deep generative variational autoencoders to learn such discriminative HFO features directly from their morphologies in a data-driven manner. We studied a large retrospective cohort of 185 patients who underwent intracranial monitoring and analyzed 686,410 candidate HFO events collected from 18,265 brain contacts across diverse brain regions. The model automatically clustered HFOs into distinct morphological groups in the latent space. One cluster consisted of putative morphologically defined pathological HFOs (mpHFOs): HFOs in that cluster were observed to be associated with spikes and exhibited high signal intensity both in the HFO band (>80 Hz) at detection and in the sub-HFO band (10-80 Hz) surrounding the detection and were primarily localized in the seizure onset zone (SOZ). Moreover, resection of brain regions based on a higher prevalence of interictal mpHFOs better predicted postoperative seizure outcomes than current clinical standards based on SOZ removal. Our self-supervised, explainable, deep generative model distills pathological HFOs and thus potentially helps delineate the EZ purely from interictal intracranial EEG data.
INTRODUCTION
Over a third of people with epilepsy do not respond to medication, thereby becoming potential candidates for epilepsy surgery.1 Presently, surgical plans are mainly informed by neuroimaging and electroencephalogram (EEG), including interictal spikes and the seizure onset zone (SOZ). Nonetheless, the success rate of surgical intervention in achieving seizure freedom is less than ideal, varying between 50% and 75%.2,3 Discovering a biomarker that can precisely define the boundaries of the epileptogenic zone (EZ: the brain regions responsible for generating seizures) would be a significant breakthrough. Epilepsy studies in both humans and animals have indicated that interictal high-frequency oscillations (HFOs) in intracranial EEG (iEEG) is a promising spatial biomarker for delineating the EZ.4–7 Numerous retrospective studies have shown a correlation between the excision of brain areas generating HFOs and postoperative seizure freedom.8–11 A fundamental challenge arises because HFOs with similar frequency ranges appear despite significantly different mechanisms: pathological HFOs and physiological HFOs.12 Pathological HFOs are conceptually associated with epileptogenesis, occurs exclusively in epileptogenic brain regions. Physiological HFOs are related to cognitive and sensorimotor function and occur in healthy brain regions.13 Indeed, it has been shown that sparing brain regions with physiological HFOs from resection can result in seizure freedom.14–17 A recent clinical trial investigating the effect of HFO-guided intraoperative resective surgery excluded patients with occipital lobe epilepsy from the study due to the concern of abundant physiological HFOs in the visual cortex.18 Hence, using HFOs as a spatial biomarker for directing epilepsy surgery requires developing a method to distinguish between pathological and physiological HFOs. The resection of brain regions generating pathological HFOs would be associated with favorable postoperative seizure outcomes. Studies have shown pathological HFOs generated by abnormal synchronous burst firing are morphologically distinct from physiological HFOs generated by inhibitory synchronous postsynaptic potentials when recorded with research microelectrodes.12,13 However, typical iEEG signals are recorded using clinical macro electrodes, and algorithms to characterize such morphological differences based on conventional signal processing have proven insufficient.19,20
Supervised machine learning approaches, especially deep learning (DL), have proven effective when human annotators accurately label large-scale datasets. DL models can learn complex patterns representative of labeled categories and then perform automated classification on unseen signals and images, leading to significant advances in medical imaging applications.21,22 If such frameworks are used to classify pathological HFOs, certified experts would be required to label large repositories of HFOs from diverse groups of patients accurately and consistently. However, accurately annotating pathological HFOs is difficult because there is no consensus on what constitutes pathological HFOs. Our recent studies have shown that instead of expert labels, clinical evidence from a channel’s resection status, postoperative seizure outcomes, or functional brain mapping results can be used to design a weakly supervised framework that yields relatively accurate labels for a small cohort size, as verified by retrospective postoperative seizure outcome prediction.23,24 Since these labels are still noisy—not all identified HFOs in resected areas or the SOZ are necessarily pathological—extending such a weakly supervised approach to ensure surgical outcomes for unseen patients would require a much larger patient cohort with postoperative outcomes. However, this approach faces challenges: first, an increasingly significant percentage of patients are being recorded through stereotactic EEG (SEEG) for diagnostic purposes or neuromodulation, and only a small percentage of such patients go through the brain resection; thus, data from patients undergoing diagnostic SEEG cannot be used in the aforementioned weakly supervised method, limiting the availability of the training dataset; second, while such SEEG patients still need accurate predictions of pathological HFOs to delineate the EZ, they would no longer be represented in the training set, thereby potentially limiting the applicability of the weakly supervised model to these patients.
One effective way to overcome such difficulties in using a supervised learning approach is to investigate the original observation directly: since pathological and physiological HFOs indeed differ in their biological mechanisms, leading to distinct morphologies, a generative AI model for the HFOs, trained in a self-supervised manner, should be able to simulate these underlying processes if a sufficiently large set of HFOs is provided. In particular, one should be able to discover the underlying discriminative morphological patterns without any labels. Autoencoders, such as variational autoencoders (VAEs),25 is a DL framework based on a self-supervised training paradigm, which can provide such a capability and has been proven to be successful in many natural language processing,26 computer vision tasks,27 medical data, including EEG analysis.28,29 VAEs are generative models that learn efficient low-dimensional representations (latent encodings) of unlabeled data by requiring that the original high-dimensional data be reconstructed from their latent codes. It is particularly effective in discovering clusters in the latent space, where each cluster corresponds to a different mechanism that led to the generation of data in it. Therefore, clusters automatically produced in the latent space of a VAE can potentially capture different underlying generative mechanisms. The HFO morphologies observed within these different clusters would have unique characteristics and can be interpreted through direct visualization and interpolations in the latent space.
In this study, we utilized a large multi-institutional cohort of 185 patients with epilepsy who underwent iEEG monitoring with grid or SEEG electrodes, providing comprehensive coverage of both deep and superficial brain regions. We developed a VAE framework that analyzes 686,410 HFOs to characterize three different classes of HFOs: 1) morphologically defined putative pathological HFOs (mpHFOs), 2) morphologically defined putative non-pathological HFOs (non-mpHFOs), and 3) morphologically defined artifacts (mArtifacts) of extra-cerebral origin. We leveraged the interpretability of VAEs to explore various neurophysiological characteristics of HFOs, such as (i) characterization of cross-frequency power distributions of HFOs, (ii) exploring the dependence of HFO morphologies on variables, including the origin of iEEG datasets and patient-specific variables such as age, sex, and epilepsy etiology based on histopathology (referred to pathology in the rest of the manuscript), and (iii) investigating HFO morphologies based on anatomical locations of the brain.
Our results showed that mpHFOs, VAE-discovered putative pathological HFOs, generally originated within the SOZ and were associated with spikes. We also demonstrated that a latent space discovered by a VAE model exhibited refined characteristics of pathological HFOs within time-frequency plots. Finally, we showed that the proportion of the removal of mpHFOs outperformed the resection status of the SOZ in predicting postoperative seizure freedom with cross-validation. Our results demonstrate that deep generative models can characterize neurophysiological signatures of pathological HFOs. From a short duration of interictal iEEG data, one can potentially infer the EZ by differentiating pathological from physiological HFOs, thereby enhancing the use of these biomarkers in guiding epilepsy surgery.
RESULTS
Building an explainable self-supervised DL algorithm for HFO analysis
We studied 185 patients (91 females) from two centers who met the eligibility criteria (Table 1). The median age at surgery was 13 years (range: 2–44 years). A total of 18,265 artifact-free electrode sites (median: 106 per patient; range: 29–152) within 34 regions of interest (ROIs) were available for analysis (Figure 1a). There were 1,670 electrode sites sampled within the SOZ (mean: 9.42 per patient; frontal: 437; temporal: 464; parietal: 393; occipital: 135; limbic: 241), and 7,732 sites sampled within non-epileptogenic brain regions, defined as spared brain regions in patients with postoperative seizure freedom (frontal: 2943; temporal: 1806; parietal: 1853; occipital: 635; limbic: 495) (Table 2). The median duration of analyzed EEG data for the UCLA grid/strip dataset was 91.5 min [IQR: 90.6-94.8 min], and for the UCLA SEEG, it was 90.3 minutes [IQR: 87.1-96.8 min]. The median analyzed EEG recording duration for the Detroit grid/strip dataset was 5.3 minutes [IQR: 5.1-5.7 min]. In total, 686,410 putative HFOs were detected from all the datasets by the PyHFO, an automated HFO detection platform, using both STE and MNI detectors.30 The median rate of HFOs (number of detections/min) within SOZ contacts was 3.11 (range: 0.73-7.17), and the median rate of HFOs within the non-epileptogenic contacts was 1.08 (range: 0.63-2.15) across the ROIs. All the detected HFOs were used for the unsupervised VAE model training (Figure 1b). Using the trained VAE model, such HFOs were automatically classified into artifacts (mArtifacts) and non-artifactual HFOs. A cluster for mArtifacts was determined based on detected HFOs’ high reconstruction loss due to the high variability of their morphologies (Figure 1c). All the non-artifactual HFOs were further classified into two clusters—the cluster with a higher resection percentage in seizure-free patients after resection was deemed pathological. HFOs originating from such a pathological cluster were defined as mpHFOs, while those from another cluster were defined as non-mpHFOs. All the detected HFOs were also classified into artifacts, HFOs with spikes (spkHFOs), and HFOs without spikes (non-spkHFOs) using our previous classification algorithm23 (Supplementary Table 1) for the subsequent correlational analysis, including interpretability analysis (Figure 1d). The detailed workflow of the study is described in Supplementary Figures 1-3. There were 163 patients who underwent resective surgery, and 110 patients (67.5 %) achieved seizure freedom. Of the patients who had resection, pathology results were as follows: focal cortical dysplasia (FCD) (41.1%), hippocampal sclerosis (HS) (6.7%), tumor (19.0%), and others (33.1%).
Characterization of pathological HFOs based on the self-supervised VAE algorithm
With five-fold subjective-wise cross-validation, the VAE classification was aligned with conventional knowledge that spkHFOs are more likely pathological: 93.56% of mArtifacts were predicted as artifactual HFOs, 80.1% of the mpHFOs were predicted as spkHFOs and 84.87 % of the non-mpHFOs were predicted as non-spkHFOs (p-value < 0.001). Furthermore, we visualized these findings by projecting the latent codes of each event onto a two-dimensional plane using t-distributed Stochastic Neighbor Embedding (t-SNE). (Figure 2a, b). Such clustering results were consistent throughout the five folds (Supplementary Figure 4). Moreover, the mpHFO rate (count/channel/min) was significantly higher than the non-mpHFO rate (count/channel/min) in the SOZ channels across three datasets (p-value < 0.01 for UCLA grid/strip, Detroit grid/strip, and UCLA SEEG dataset) and did not exhibit significant difference within the non-soz channels (p-value = 0.1 for Detroit grid/strip dataset, p-value = 0.49 for UCLA SEEG dataset, and p-value = 0.38 for UCLA grid/strip dataset) (Figure 2c). We then investigated the neurophysiological characteristics of mpHFO and non-mpHFO directly from the data. The morphological analysis of the time-frequency plot demonstrated that mpHFOs had higher amplitude values throughout the HFO band (≥ 80 Hz), around the center point (0ℒms, where HFOs were detected) than non-mpHFOs (Figure 2d, e). Furthermore, there were statistically higher values of mpHFOs at the sub-HFO band (10-80 Hz) throughout the time window compared to non-mpHFOs. These bands together lead to a “hanging bell” template in the time-frequency plot (Figure 2e). Such a template showed consistency regardless of the variables, including sex, the origin of the dataset, pathology, and age categories (Supplementary Figure 5). mpHFOs also showed distinct characteristics in the frequency domain with FFT analysis. The mpHFOs demonstrated relatively high amplitude within the sub-HFO band (10-80 Hz), while non-mpHFOs, likely representing physiological HFOs, exhibited discrete peaks with the HFO range, maximal at 87.7 Hz (Figure 2f, g). To understand the sub-HFO band component of the “hanging bell” template in mpHFOs, we filtered mpHFOs with a 10-80 Hz bandpass to evaluate the corresponding time-series data. We found a spike-appearing EEG signal after this procedure (Figure 2h-j) but none in non-mpHFOs (Figure 2k-m).
HFO morphology analysis based on dataset origin, sex, age, and pathology
We investigated whether HFO morphology was affected by variables including recording sites/type (UCLA grid/strip, UCLA SEEG, Detroit grid/strip), sex (female vs. male), age (0-5, 6-10, 11-15, 16-20, and 21+), and pathology (HS, FCD, Tumor, others). We first visualized the distribution of the latent space by projecting latent codes of HFOs into 2D space using t-SNE and color-coded with different subcategories within that specific demographic variable (Figure 3a, c, e, g). Three hundred projected codes were randomly sampled from each subject to achieve easier visualization. Then, the accuracy of a logistic regression model trained from actual data was compared with a model trained with label-permuted data. HFO morphology based on the dataset sources could not be differentiated by the classifier (p-value = 0.13, Figure 3b). Similarly, HFO morphology, based on demographic information such as sex, age group, and pathology, could not be differentiated (Sex: p-value = 0.44; Age Group: p-value = 0.10; Pathology: p-value = 0.83) (Figure 3d, f, h).
HFO morphology analysis based on ROIs (anatomical location)
We then investigated whether HFOs generated from different anatomic locations exhibit distinguishable morphologies (frontal, temporal, parietal, occipital, and limbic). Initially, we generated a plot of the latent space for HFOs derived from the preserved regions in patients who remained seizure-free after resection, presumably physiological HFOs. We found a cluster of HFOs corresponding to the occipital region, which were distinguishable from other areas (Figure 4a, b). The confusion matrix showed that such occipital HFOs could be successfully distinguished by the classifier, which achieved an average of 62% accuracy (Figure 4c). To further characterize such distinguishable features, we plotted the summation of the time-frequency plot of each HFO event in different anatomical locations (Figure 4d). The HFOs derived from the occipital area showed distinct peak frequency and power ratio compared to HFOs generated from other anatomical areas (Supplementary Figure 6). Nevertheless, HFOs generated from the SOZ did not demonstrate distinctive morphological differences across the anatomical locations using the same method (Figure 4e-h).
Disentanglement of the latent space to establish neurophysiological characteristics of pathological HFOs
We first plotted each dimension of the latent space and colored them with mpHFO and non-mpHFO for a specific fold (Supplementary Figure 2: Latent dimension interpretation); the mpHFO and non-mpHFO exhibited different distributions in some of the dimensions but did not in the others (Supplementary Figure 7). We then used the decoder of the VAE to understand the meaning of each latent dimension by perturbing each dimension in the latent space and visualizing the reconstructed image. Specifically, in each fold, we first took the mean of the cluster of the mpHFO and non-mpHFO (Supplementary Figure 2: Generation of representative mpHFO and non-mpHFO morphologies based on respective cluster centroids) and then perturbed a single dimension from the smallest value (1 percentile) of that dimension to the largest value of that dimension (99 percentile) (Supplementary Figure 2: Latent dimension interpretation). We found a dimension that demonstrates the separation of mpHFO and non-mpHFO. The morphological transition of the generated images showed more power within the “hanging bell” template region for both mpHFO and non-mpHFO when we traversed the value of that dimension (Figure 5a, b). At a population level, we randomly sampled actual latent code from both mpHFO and non-mpHFO and traversed that specific dimension of each latent code from the smallest (1% tile) to the largest (99% tile); the decoded images showed an increase of the power within the template region (Figure 5c). We fed the decoded time-frequency plots through our inference pipeline and plotted the distribution of the model confidence (Figure 5d), demonstrating the increasing model confidence toward mpHFO as we increased the value of that dimension in the latent space. Furthermore, we also captured dimensions that, while not separating mpHFOs and non-mpHFOs, still manifested interpretable neurophysiological properties. Within the same fold, a dimension represented the HFO’s peak frequency (Figure 5e, f), and subsequent population-level analysis indicated that an ascent in this dimension inversely correlated with the peak frequency in the time-frequency plots (Figure 5g) while the model confidence value did not change towards either the mpHFO and non-mpHFO (Figure 5h). Similarly, the slow-wave dimension (10-20Hz) was revealed (Figure 5i, j), where an increase in dimension value directly paralleled an increase in power within the slow-wave band in the decoded images (Figure 5k), but an increase in power within that dimension did not affect the model confidence towards either the mpHFO and non-mpHFO (Figure 5l).
Prediction of postoperative seizure outcomes using the classified HFOs
Based on the hypothesis that mpHFOs are pathological and their resection status can forecast postoperative seizure freedom, we investigated two prediction models. We first applied a logistic regression model with five-fold cross-validation. Notably, utilizing the resection ratio of the mpHFOs (AUC = 0.63) demonstrated superior separation capability compared to using the resection ratio of unclassified HFOs (AUC = 0.53) and traditional expert-driven classification, spkHFO (AUC = 0.61) (Figure 6a). Further, a multivariable logistic regression model was fitted, incorporating subjects’ demographic data as the baseline (age as a continuous variable and sex as a categorical variable) and the status of SOZ resection during surgery. The mpHFO resection ratio again showed enhanced classification performance over the spkHFO (AUC = 0.72 vs. 0.70) (Figure 6b). Additionally, we explored the feasibility of constructing a non-linear predictive model using a random forest classifier (Figure 6c). The random forest aimed to forecast post-surgical seizure freedom in the test set based on subject-specific features identified in the training set through a subject-wise five-fold cross-validation, ensuring that our classifier was evaluated under consistent and rigorous conditions (Supplementary Figure 3). Our findings revealed that the random forest model, trained exclusively with the resection ratio of mpHFOs, exhibited better (p-value < 0.01, for both unclassified HFOs and spkHFO) predictive performance (F1 = 0.74) compared to using unclassified HFOs (F1 = 0.69) and spkHFO (F1 = 0.66). The model’s predictive power improved when incorporating more features, such as subjects’ demographic information and the SOZ resection status. Specifically, the combination of demographic data and mpHFO resection ratio (F1 = 0.77) outperformed (p-value < 0.01) traditional clinical predictions based on demographic information and SOZ resection status (F1 = 0.73). Furthermore, a comprehensive model including all features, including demographic data, SOZ resection, and mpHFO resection ratio (F1 = 0.81) demonstrated superior (p-value < 0.01) predictive power over the traditional expert-driven HFO classification including demographic data, SOZ resection, and spkHFO resection ratio (F1 = 0.80) (Detailed numbers are listed in Supplementary Table 2). We also conducted an ablation study on predicting surgical outcomes by only using subjects with a higher number of HFOs (Supplementary Figure 10): the mean F1 score remains consistent, demonstrating the robustness of our self-supervised framework.
DISCUSSION
Leveraging the innovative self-supervised deep learning approach, we aimed to characterize pathological HFOs using a cohort of 185 epilepsy patients who underwent intracranial monitoring with grid or stereotactic EEG electrodes across 18,265 brain contacts from 34 ROIs. We analyzed 686,410 HFOs obtained via fully automated detectors. The novel VAE method and extensive data allowed us to control for variables, including recording sites/types, sex, age, pathology, and anatomical location. By analyzing the latent space of VAEs, we identified morphological features of pathological HFOs, notably high signal intensity within the HFO band at detection, extending across the sub-HFO band (10-80 Hz). These features remained consistent across all studied variables. Incorporating the resection status of the identified pathological HFOs into prediction models markedly enhanced the accuracy of postoperative seizure freedom forecasts, outperforming traditional approaches that utilize the resection status of the SOZ.
Until now, the field of HFO research has encountered significant challenges. HFOs can be generated in healthy brain regions, adding complexity to their interpretation. Despite promising results from numerous retrospective studies,8–10 a clinical trial failed to demonstrate the utility of HFOs in improving postoperative seizure outcomes.18 The major challenge in HFO research is distinguishing between pathological HFOs that need to be targeted for resection and physiological HFOs that can be spared during resection. Attempts to differentiate these include identifying HFOs associated with spike-wave discharges,31–33 which are generally considered indicative of pathology. However, there is no uniform method to define HFOs with spikes, and experts often exhibit inconsistencies and poor inter-rater reliability when using visual annotations.34 Moreover, simple analyses of HFO features such as frequency, amplitude, and duration have been ineffective in differentiating pathological from physiological HFOs.19,20 While fast ripples (250–500 Hz) may more accurately delineate epileptogenic zones than ripples (80–250 Hz), they are detected less frequently.35,36 Strategies like adjusting the detection rate of HFOs to match region-specific normative values can help estimate the degree of pathology in the corresponding brain regions,37,38 it cannot reliably classify each HFO event as pathological or physiological. This uncertainty complicates clinical decision-making, especially when treatment decisions hinge on a limited number of detected HFOs.
In this study, we adopted a fundamentally different approach. Instead of pre-selecting potential pathological HFO features, we implemented a data-driven, self-supervised strategy using VAE to define pathological HFOs. In the field of HFOs, prior studies have explored various types of neural networks, including convolutional neuronal networks,39,40 long short-term memory,41,42, and transformer43, to classify HFOs into pathological and physiological based on human-annotated labels. This reliance limits scalability on large datasets and is constrained by the subjectivity inherent in expert labeling. Our method is contingent on the premise that pathological HFOs possess distinct morphological characteristics that are separable from their physiological counterparts. This hypothesis is biologically plausible, given that HFOs with similar frequency ranges emerge via markedly different generative mechanisms: Pathological HFOs arise from abnormal synchronous burst firing, whereas physiological HFOs result from inhibitory synchronous postsynaptic potentials.12,13 These morphological differences, potentially too subtle for detection with traditional signal processing methods using clinical macroelectrodes, may be effectively discerned through our innovative approach. The VAE is a DL framework capable of efficiently learning low-dimensional representations of unlabeled data.25,44 Without providing any labels associated with the input, an unsupervised classification can be achieved by reconstructing the original high-dimensional data from their latent codes. It excels at uncovering clusters in the latent space, with each cluster representing a distinct mechanism underlying the data generation. Consequently, clusters automatically formed in the latent space of a VAE have the potential to capture various morphological features and classify artifacts and HFOs with or without spikes, as demonstrated in our study. The diverse HFO morphologies generated by these clusters possessed distinct traits, allowing for interpretation through direct visualization and interpolations in the latent space, which enabled us to interpret what the algorithm has learned. The algorithm provided us with intuitive and interpretable latent spaces, such as a pathological template domain (the “hanging bell” template on a time-frequency plot), peak frequency domain, and slow-wave domain.
Our results are compelling, as the morphologically defined pathological HFOs identified by our deep generative model exhibited features commonly associated with expert-acknowledged pathological HFOs, such as spikes. While “spike” annotations can be performed by human experts or computationally, there is no standardized method for determining what constitutes pathological spikes. Our deep generative model objectively identified distinctive features of the hanging bell template within the time-frequency plot. Specifically, pathological HFOs were associated with higher signal intensity within the sub-HFO band, peaking at 23 Hz. Such sub-HFO band EEG signals seemed to emerge as spikes in time series data. Thus, rather than arbitrarily define spikes to analyze HFOs with spikes, this data-driven approach to characterize the “spike” component associated with pathological HFOs seemed more unbiased. Another evidence of the pathological nature of mpHFOs was that they were primarily localized around the SOZ, although the model was trained without a SOZ label.
Our findings provide clinically significant insights; the resection ratio of mpHFOs outperformed traditional unclassified HFOs and spkHFOs in predicting postoperative seizure freedom. This supports the idea that mpHFOs are more indicative of pathological HFOs than unclassified HFOs and spkHFOs. The potential clinical utility is evident as the resection ratio of mpHFOs exceeded the current clinical standards regarding the removal status of SOZ in predicting postoperative outcomes. The mpHFOs can be analyzed from short interictal EEG data, offering the potential to reduce the duration of EEG monitoring, hospital stays, and associated costs for the patient. Furthermore, combining the resection status of the SOZ with the proportion of mpHFO resection further enhanced the prediction, suggesting an additive effect when mpHFOs are combined with the current clinical standard.
Another notable discovery was that non-pathological HFOs originating from the occipital lobe displayed distinct morphological features compared to HFOs from other brain regions. Physiological HFOs were reportedly abundant in the occipital lobe15,16,45 and also showed distinct coupling with slow waves.46 This study added essential findings in the literature to establish the unique morphology of HFOs originating in the occipital lobe. These findings could potentially overcome the limitation faced by the HFO trial, which necessitated the exclusion of subjects with occipital lobe epilepsy due to the likelihood of abundant physiological HFOs in such patients.18
There are several factors to consider when interpreting these results. The study was conducted using only macroelectrode recordings, and while we hypothesized that the different biological mechanisms of pathological and physiological HFOs would manifest differently in these recordings, we could not verify the actual neurophysiological mechanisms at the single-neuron level. Although our sample included 185 patients, only 18 were studied using SEEG, limiting our ability to sample from deeper brain areas. With more balanced coverage of both superficial and deep areas, network analysis to account for HFO propagation may provide better accuracy in the prediction of postoperative surgical outcomes.47 Since we aimed to investigate the HFOs’ morphology as a function of age, we primarily included the pediatric population in the study. Expanding the inclusion of the adult population will be needed to generalize our findings. Additionally, the majority of our EEG data comprised short recordings, typically around five minutes from the initial night during sleep. Also, the sampling frequency of the Detroit dataset was limited up to 1,000 Hz, limiting the analysis of the fast ripple band (250-500 Hz). Although our results suggest that the peak frequency of HFOs did not affect pathological classification, further investigation of the fast ripple band will be needed. Longer, multi-day recordings might reveal HFOs with potentially varying morphologies over time.48 The vigilance state should also be considered,49 as morphological differences between various sleep stages and wakefulness remain under-investigated. Finally, our sample size may not be sufficient to fully characterize the subtle differences in HFO morphology associated with different types of epilepsy pathologies. Conditions like focal cortical dysplasia (FCD) and tumors have distinct subtypes whose morphological differences might become clearer with a larger number of cases.
In the foreseeable future, we plan to collaborate with additional institutions to test the generalizability of our approach. Analyzing publicly available datasets will also be considered. Analyzing multi-day recordings from different sleep stages presents challenges, yet such studies could be feasible with our fully automated algorithm. Once we establish definitions for pathological HFOs across different anatomical regions, we can potentially integrate this data with other types of information, such as neuroimaging studies, to guide surgical resections. We aim to test the utility of this approach prospectively and may eventually consider a clinical trial to verify whether HFO-guided resections truly improve postoperative seizure outcomes.
METHODS
Patient cohort
This was a multi-institutional retrospective cohort study. The inclusion criteria consisted of [a] simultaneous video-iEEG recording for epilepsy surgery between August 2016 and December 2023 at UCLA Mattel Children’s Hospital or between January 2007 and May 2018 at Children’s Hospital of Michigan, Detroit, [b] iEEG sampling rate of at least 1,000Hz, [c] iEEG contained at least an artifact-free 20 min slow-wave sleep epoch at least two hours apart from clinical seizure events, and [d] known postoperative seizure outcomes over one year. The exclusion criteria included [a] undergoing hemispherectomy or hemispherectomy, and [b] the presence of massive brain malformations (such as megalencephaly and perisylvian polymicrogyria) or previous surgeries that make it difficult to identify brain anatomy during the iEEG study. The institutional review board at UCLA and Wayne State University have approved the protocol. We obtained written informed consent from patients or the guardians of pediatric patients.
Patient evaluation
All patients with medically refractory epilepsy referred during the study period underwent a standardized presurgical evaluation, which—at a minimum—consisted of inpatient video-EEG monitoring, high resolution (3.0 T) brain magnetic resonance imaging (MRI), and 18 fluoro-deoxyglucose positron emission tomography (FDG-PET), with MRI-PET co-registration. The margins and extent of resections were determined mainly based on the SOZ, clinically defined as regions initially exhibiting sustained rhythmic waveforms at the onset of habitual seizures. In some cases, the seizure onset zones were incompletely resected to prevent an unacceptable neurological deficit.
iEEG recording
Macroelectrodes, including platinum grid electrodes (10 mm intercontact distance) and depth electrodes (platinum, 5 mm intercontact distance), were surgically implanted. The placement of intracranial electrodes was guided by the results of scalp video-EEG recording and neuroimaging studies. All electrode plates were stitched to adjacent plates, the edge of the dura mater, or both to minimize movement of subdural electrodes after placement. Regarding the SEEG placement, BrainLab Elements software was used for planning the electrodes to the intended targets using T1-weighted sequences, and the trajectories were guided by a gadolinium-enhanced T1-weighted MRI. Both institutions obtained iEEG recordings using Nihon Kohden Systems (Irvine, California, USA). The sampling frequency was set at 1,000 Hz in Detroit and at 2,000 Hz in UCLA upon acquisition.
Acquisition of three-dimensional (3D) brain surface images
We obtained preoperative high-resolution 3D magnetization-prepared rapid acquisition with gradient echo (MPRAGE) T1-weighted image of the entire head. Using the FreeSurfer scripts, we created the averaged surface image to which all electrode locations were spatially normalized.23,38 In cases where the software failed to detect the pial surface accurately due to insufficient cerebral myelination, we manually delineated the pial surface using the Control Point function. For patients younger than two, we used the Infant FreeSurfer software package to reconstruct the surface image.46 The averaged surface image functioned as the template for the analysis of anatomical location.
Anatomical labeling and determination of ROIs
For the dataset from UCLA, each implanted contact was labeled visually according to the Desikan-Killiany-Tourville atlas.50 The location of electrodes was directly defined within a Freesurfer-based 3D surface image using post-implant computed tomography (CT) images using Brainstorm software.23 For the dataset from Detroit, all implanted subdural contacts were coregistered with 3D surface images within the FreeSurfer with an FSaverage vertex label.38 We defined 34 ROIs for further analysis (Table 2). For the data harmonization between the two institutions, the FSaverage vertex of Detroit datasets was converted to MNI coordinates.51 Finally, these data were combined with UCLA patients, which were projected to the MNI normalized space under Brainstorm for the co-registration image (Figure 1a).
iEEG data pre-processing
We used a customized common average reference for the grid23 and a bipolar montage for SEEG data. The EEG was first resampled to be the same sampling frequency of 1,000 Hz, and a band-reject filter was to reject the 60 Hz and its harmonics with a stopband of 2 Hz. iEEG channels not recording from grey matter (e.g., located outside of the brain) or otherwise deemed ‘bad’ (e.g., excessively noisy or artifactual) by the clinicians were discarded from the analysis. All EEG data pre-processing and analysis were performed using Python 3.9.1 (Python Software Foundation, Wilmington, DE).
Automated HFO detection
HFOs were detected by the PyHFO platform using both STE and MNI detectors to enhance the sensitivity of detection.30 For STE and MNI detectors, the detection parameters were set to default (Supplementary Table 3), except that the frequency band for filtering the signal was set to be 80-300 Hz to accommodate a minimum 1,000 Hz sampling frequency.
Overall training and inference method
The overall study flow is outlined in Figure 1 and Supplementary Figure 1-3.
Morphology-based pathological HFO classification by deep generative model
Subject-wise k-fold Cross-validation
To thoroughly test our method, we use a subject-wise five-fold cross-validation. In this setup, for each fold, we set aside 20% of the subjects as a test set (controlling sampling from UCLA grid/strip, UCLA SEEG, and Detroit grid/strip, ensuring that over five folds, every subject is tested once. All remaining data became the training set. Within the training set, we then randomly sampled ten subjects uniformly from three datasets for the validation set.
Feature Representation
Each event detected by the automatic detector was represented by a time-frequency plot (Morlet wavelet transform), as the time-frequency plot is known for representing spatial-temporal information for each HFO event. The time-frequency plot spans ± 285 ms (centered in the middle of the HFO event) in time and 10-290 Hz in frequency. This plot was resized to a 64 x 64 resolution, and the value was normalized within the range of 0 to 1 before being sent to the VAE encoder.
VAE architecture
VAE emerges as a prominent deep learning mode. Distinct from traditional autoencoders, VAE introduces a probabilistic approach; the encoder encodes inputs as distributions. During the decoding process, the model stochastically samples from these distributions, enabling it not only to accurately reproduce known data but also requires the decoder to generate new similar data. The loss of training the VAE, reconstruction loss, and variational regularization, the reconstruction loss measures the distance from the input to the output, and the variational regularization ensures the distribution generated from the encoder follows the normal distribution. We’ve chosen ResNet52 as the backbone for both the encoder and decoder, drawing inspiration from its widespread application in the community of computer vision, as evidenced by sources, as well as the successful application of using ResNet in capturing morphological information within the time-frequency plot.23,24 The designated size for the latent space was fixed at 8. As an ablation study, we set the latent dimension to 16 and demonstrated that it would produce more redundant latent space by visualization (Supplementary Figure 8).
Self-supervised VAE training
Our VAE underwent training over 80 epochs with the main goal of reconstructing the input image (Figure 1a). During the training, we used Adam optimizer with a learning rate of 3e-4 and batch size of 512. Furthermore, the time-frequency plot was augmented by randomly flipping at the time axis to improve the model’s generalization ability. To ensure an even representation of each subject during every epoch, we employed a stratified sampling method, capping the sample number at 2,500 per subject in each epoch of training. Moreover, we selected perceptual loss as our reconstruction loss criterion to capture the morphological discrepancies more effectively between input and output images. We adopted the beta-VAE44 in training to better disentangle the latent space; therefore, the final loss function loss = perceptual loss + beta*KL divergence, where beta = 0.1. The model iteration (epoch) exhibiting the minimum validation loss was the one selected for subsequent utilization in the unsupervised classifier assembly and inference.
Unsupervised discovery of HFO clusters
We designed a hierarchical (two-stage) Gaussian Mixture Model (GMM) based clustering pipeline to discover morphologically different classes of HFOs (Figure 1c). Once the VAE had been trained, we processed the entire training dataset through the VAE encoder, thereby extracting the associated latent codes (Figure 1b). After this extraction, at the first stage, we employed the GMM clustering algorithms, setting the cluster number, k, to 2. We adopted stratified sampling to balance the contribution of information from each subject. We capped the number of events from each patient at 10,000. The cluster with high reconstruction loss was defined as the cluster of artifacts of extra cerebral origin, as the artifact is diverse in morphology and the VAE cannot properly decode. Then, at the second stage, within the cluster that was classified as non-artificial events, we trained another GMM to cluster them further into two clusters. We capped the number of events from each patient at 2000. To allocate either a pathological (mpHFO) or physiological (non-mpHFO) label to each cluster, we employed a minimalistic use of clinical data. The cluster with a higher resection percentage in seizure-free patients after resection was deemed pathological. As part of an ablation study, we also explored different stratified sampling methods, which demonstrated similar performance in predicting surgical outcomes (Supplementary Figure 9).
HFO morphology inference pipeline
To predict the HFO events from a new subject, time-frequency plots were first computed. Then, these time-frequency plots were sent into the VAE encoder to extract the latent codes. The trained hierarchical GMMs predicted each latent code and assigned class labels: mpHFO, non-mpHFO, and mArtifact for each detected event.
mpHFO characterization
VAE-based HFO morphology inference pipeline
Time-frequency plot characteristics of pathological and physiological HFOs
We investigated whether the time-frequency scalogram of mpHFOs identified by the VAE exhibited a distinction from that of non-mpHFOs (Figure 2d, e). We used similar methodologies outlined in preceding studies.23,24 Specifically, we executed a t-test across every pixel in the time-frequency scalogram comparing the predicted mpHFOs and non-mpHFOs. The foundational null hypothesis is that scalogram values associated with mpHFOs are higher than those linked with non-mpHFOs. We binarized the p-value less than 0.05 (if the p-value is less than 0.05, the value is 1, and vice versa). The characteristics map displays mpHFOs’ morphological distinction within frequency and time domains.
Latent space interpretability analysis
Latent space 2D visualization
The encoder extracts latent vectors from the time-frequency plot of each event, and the latent code could contain essential morphological information that the decoder can reconstruct. To visually demonstrate the topology of the latent space trained from VAE better, we used t-SNE to project the latent space acquired from the VAE into 2D space. We used the TSNE accelerated by GPU (cuml package),53 the parameters of fitting the TSNE using default parameters as the released package except for the n_components as 2 (Supplementary Figure 2: Clinical interpretation of latent space clustering). We projected all of the latent code into the 2D space. For better visualization, we randomly sampled a maximum of 200 data points from each subject for visualization and color-coded each 2D point by different types of HFOs: mpHFO, non-mpHFO, mArtifact (Figure 2a); spkHFO, non-spkHFO, the artifact (Figure 2b); different subcategories in subjects demographic information included sex, recording sites/type, age, pathology (Figure 3) and different anatomical locations (Figure 4).
Statistical tests to evaluate the dependence of HFO morphology on the significance of different demographic subject-level information and anatomic locations of the HFO events variability
We evaluated if HFO morphology is dependent on demographic and anatomical factors. Suppose HFO morphology depended on a particular variable (for example, the subject’s sex, Figure 3c). In that case, the latent codes of the HFOs labeled with the corresponding subcategories (female or male) may be distinguishable. Consequently, a classifier designed to identify true labels from the latent codes of HFOs should achieve higher accuracy compared to a classifier trained with randomly assigned labels. The methodology can be illustrated using sex as an example. For each fold of the cross-validation, we randomly selected three subjects from each subcategory (male, n=3; female,n=3). We randomly sampled 100 HFO events from each subcategory (100 HFO events for male subjects and 100 HFO events for female subjects). This resulted in 200 (n_subcategories * 100) latent codes for training a logistic regression classifier. For computing accuracy, we repeated the sampling process for the left-out subjects for that fold. This resulted in one accuracy datapoint. We repeated this sampling, training, and testing process five times (trials) for each fold. Thus, we have 25 trails in total (5 folds * 5 trails) for a classifier to determine sex based on actual sex labels. Next, we describe the design of a surrogate classifier based on randomizing the labels. For each fold and each trial, the same latent codes sampled for the true-label case are used. However, the sex label is randomized. That is, we randomly shuffle the subjects and assign the label of male to the first three and female to the last three. Then, a surrogate logistic classifier is trained with the randomized sex labels. For testing, we again used the same data samples from the true-label case and randomized the sex assignment. This leads to another 25 accuracy samples for the surrogate classifier. A one-tailed t-test was employed to assess whether the true-label classifier’s accuracy was significantly better than that of the surrogate.
We performed two such tests, one for the dependence of the morphology of HFOs from preserved regions in seizure-free patients (presumed physiological HFOs) on anatomical locations. The second one was for the dependence of the morphology of the HFOs from SOZ regions (presumed pathological HFOs) on anatomical locations (Figure 4). Note that each channel in each subject was assigned a unique anatomical location. Given an HFO type, for each fold and each trial, the HFO events were sampled equally (n=100, in total 500) from each anatomical location across all patients. A multi-class logistic regression model was trained, and the confusion matrix was computed based on 500 samples acquired from the test subject under the same fold. Five such trials were done for each fold. Thus, for each HFO type, we obtained 25 confusion matrices, and the average of the confusion matrices and the accuracy were computed. Then, for the surrogate classifier, the randomization of labels of the 500 samples was done by shuffling samples on the channel level; for example, all samples from a particular channel that were initially labeled as “Frontal” would be reassigned to the same randomly selected label such as “Occipital”. This led to 25 surrogate accuracy samples. A one-tailed t-test was employed to assess whether the true-label classifier’s accuracy was significantly better than that of the surrogate.
Latent space disentanglement on latent dimension perturbation
By manipulating specific dimensions of the latent space and observing the changes in the reconstructed images, we can infer the significance of each dimension in relation to the data characteristics it influences (Figure 5, Supplementary Figure 2: Clinical interpretation of latent space interpolation). This approach not only can aid in the interpretability of complex models but also provide insights into the underlying data structure that the VAE has learned to encode. The latent code was a one-dimensional vector sampled from a distribution generated by the encoder using a time-frequency plot. The distribution of latent codes was plotted on each dimension and classification type (mpHFO and non-mpHFO) to identify the dimension that contributes the most to the decision boundary. To further understand the morphological characteristics contributed from each dimension, we manipulated each dimension of the latent code and then observed the resultant images generated from the decoder. The trend of the resultant images corresponding to the manipulation could reveal the morphological characteristic for that dimension. Specifically, for one latent code, we moved the value of a specific dimension to the 1st percentile and 99th percentile of all of the values within this dimension (we avoided extreme values to exclude outliers) while keeping the values in other dimensions unchanged. For visualization, we focused on the morphological impact on the two most representative latent codes, which were determined by averaging the latent vectors of all identified mpHFO and non-mpHFO events, respectively. To evaluate the effect of each dimension at a population level, we first randomly selected 500 high-confident mpHFO (model confidence > 0.99) and 500 high-confident non-mpHFO (model confidence < 0.01). We then manipulated the latent codes of 1000 randomly selected samples, irrespective of the event types (mpHFO/non-mpHFO). To quantitively evaluate the morphological change of resultant images, we designed various metrics based on simple image processing techniques on the time-frequency plot as a proxy to represent the morphological characteristic of the corresponding HFO events. To identify if an event was more pathological (determining pathological dimension), we calculated the ratio of power in the “hanging bell” template (Figure 2e) to the overall power within the time-frequency plot. To calculate the peak frequency of an event (determining the peak-frequency dimension), we averaged the information across -90 ms to 90 ms (as the center of the event to be 0 ms) of the time-frequency plot to construct a frequency-power vector and found the frequency that had the max power within 80 Hz to 290 Hz. To measure the slow-wave power of an event (determining the slow-wave dimension), we computed the power in the 10-20 Hz range on the time-frequency plot to the total power. These analyses were visualized using box plots. Furthermore, we used the inference pipeline (encoder and GMM) to classify each generated event and visualized the model confidence through the violin plot.
Clinical correlation: Predicting surgical outcomes
Once the VAE and GMM classifiers were trained on the training set, this pipeline was able to make a prediction on the test set. In the following analysis, we used all subjects who underwent resection. We evaluated the effectiveness of a discovered biomarker by predicting the surgical outcomes of the resected subjects. Specifically, features associated with each subject were extracted, and then a predictive model was trained using these subject-specific features to predict whether or not one subject would be seizure-free. In particular, we used the resection ratio (No. resected events / No. overall detected events), which was the same as in previous studies,23 and subjects’ demographic features such as sex and age. Furthermore, we also considered the resection status of the SOZ (whether the SOZ assigned by clinicians was removed or not), which is a current clinical standard to guide epilepsy surgery. We applied two approaches to validate and predict the surgical outcomes. First, we evaluated the separation of post-surgical seizure-free subjects and post-surgical non-seizure-free subjects in the feature space and also evaluated the balance of the trained unsupervised models in the five-fold cross-validation, i.e., if a model trained in one fold is too conservative or aggressive on predicting the biomarkers. Therefore, we trained a single/multivariable logistic regression model to predict post-surgical outcomes on features generated from all subjects in all test sets among the five-fold cross-validation and compared the AUC of the logistic regression. Second, we evaluated the forward predictive power of our discovered biomarker. In each fold of five-fold cross-validation, we trained a random forest model from the training set and validation set, using subject-associated features as input of random forest and surgical status as labels. We used the trained random forest to predict all subjects underwent resection in the test set. Then, one random forest model was trained for each fold, and a set of performance metrics (accuracy, precision, recall, and F1 score) were reported; we reported the mean and SEM of the F1 score based on the unbalanced dataset.
Statistical analysis
The above-mentioned statistical calculations were carried out using Python (version 3.9.1; Python Software Foundation, USA). The deep neural network was developed using PyTorch (version 2.1.0; Facebook’s AI Research lab). Quantitative measures are described by medians with interquartile or mean with standard deviations. Comparisons between groups were performed using chi-square for comparing two distributions and Student’s t-test for quantitative measures (in means with standard deviations). All comparisons were two-sided, and significant results were considered at p < 0.05 unless stated otherwise. Specific statistical tests performed for each experiment were described in each section. Machine learning model performance was evaluated using accuracy ([TP + TN]/[TP+TN+FP+FN]), recall (TP/[TP+FN]), precision (TP/[TP+FP]), and F-1 score (2/[1/recall + 1/precision]).
Data sharing and availability of the methods
All data produced in the present study are available upon reasonable request to the corresponding authors. Anonymized EEG data and metadata, including labels (channel’s resection status, SOZ, patients’ demographics, seizure outcomes, and pathology) used in this study will be available on the OpenNEURO website (https://openneuro.org) in the near future. The Python-based code used in this study is available at (https://github.com/roychowdhuryresearch/HFO-VAE). One can train and test the deep learning algorithm from their data and confirm our methods’ validity and utility.
FUNDING
The authors have no conflict of interest to disclose. HN is supported by the National Institute of Neurological Disorders and Stroke (NINDS) (K23NS128318), the Sudha Neelakantan & Venky Harinarayan Charitable Fund, the Elsie and Isaac Fogelman Endowment, and the UCLA Children’s Discovery and Innovation Institute (CDI) Junior Faculty Career Development Grant (#CDI-SEED-010124). AD is supported by the Uehara Memorial Foundation and the SENSHIN Medical Research Foundation. EA is supported by the National Institute of Health (R01NS064033). RS serves on scientific advisory boards and speakers bureaus and has received honoraria and funding for travel from Eisai, Greenwich Biosciences, UCB Pharma, Sunovion, Supernus, Lundbeck Pharma, Liva Nova, and West Therapeutics (advisory only); receives royalties from the publication of Pellock’s Pediatric Neurology (Demos Publishing, 2016) and Epilepsy: Mechanisms, Models, and Translational Perspectives (CRC Press, 2011). RS is also supported by the Sudha Neelakantan & Venky Harinarayan Charitable Fund. RJS is supported by the NINDS R01NS106957, R01NS033310, and R01NS127524. JEJ is supported by NINDS U54NS100064 and R01NS033310. RJS and JEJ are supported by the Christina Louise George Trust. The research described was also supported by NIH/National Center for Advancing Translational Science (NCATS) UCLA CTSI Grant Number UL1TR001881.
COMPETING INTERESTS
The authors have no conflict of interest to disclose.
Abbreviations
- HFOs
- high-frequency oscillations
- EEG
- electroencephalogram
- mpHFOs
- morphologically defined pathological HFOs
- SOZ
- seizure onset zone
- EZ
- epileptogenic zone
- iEEG
- intracranial EEG
- SEEG
- stereotactic EEG
- DL
- deep learning
- VAE
- variational autoencoder
- mArtifacts
- Artifacts defined based on morphological analysis using the VAE model
- spkHFOs
- HFOs with spikes
- non-spkHFOs
- HFOs without spikes
- HS
- hippocampal sclerosis
- FCD
- focal cortical dysplasia
- t-SNE
- t-distributed stochastic neighbor embedding
- ROIs
- regions of interest
- AUC
- area under the curve
- CT
- computed tomography
- MRI
- magnetic resonance imaging
- RMS
- root mean square
- SD
- standard deviation
- STE
- short-term energy
Data Availability
All data produced in the present study are available upon reasonable request to the corresponding authors.