Abstract
The potential of decoding handwriting trajectories from brain signals for use in brain-to-text communication has yet to be fully explored. Here, we developed a novel brain-computer interface (BCI) paradigm that tried to fit the trajectories of imaginary handwriting movements from intracortical motor neural activities and translate them into texts using machine learning approach. The trajectories for handwriting of digits and multi-stroke characters were decoded using a diverse array of neural signals, achieving an average correlation coefficient of 0.75. We developed a speed profile identifier based handwriting recognition algorithm, which accomplished a recognition rate of around 80% within an extensive database of 1000 characters. Additionally, our research uncovered a notable distinction in the neuronal direction tuning between writing strokes and cohesions (air connections between strokes), leveraging which a dual-model approach could exploit to enhance performance by up to 11.7%. Collectively, these findings demonstrated a new approach for BCIs that could possibly implement a universal brain-to-text communication system for any written languages.
Teaser Handwriting trajectory was successfully decoded from brain signal for direct brain-to-text translation of any written languages.
Introduction
Over the past two decades, intracortical brain-computer interfaces (BCIs) have emerged as revolutionary tools that enable direct communication between the human brain and external devices (1–5). Initially conceptualized for assisting individuals with severe motor impairments, BCIs have since expanded into various applications, ranging from restoring speech (6, 7) to walk (8). By translating motor-related neural signals into actionable commands, either by classification or trajectory fitting, BCIs have opened new avenues for individuals to interact with their environment, offering a means to overcome physical limitations and engage with technology in unprecedented ways (9).
The introduction of Handwriting paradigm into BCIs represents a significant leap forward in the field, allowing users to convert imagined handwriting movements—a more natural mode of expression—into textual output. A seminal work in this domain is done by Willett et al. (10), which demonstrated the feasibility of translating neural activity into English letters then into text. A recurrent neural network was trained to convert the neural activity into letter probabilities, which were then thresholded to emit discrete characters for real-time decoding. Remarkably, the participant achieved typing speeds of 90 characters per minute with an impressive 94.1% raw classification accuracy within a 30-character scope. The system demonstrated the potential of BCIs to facilitate complex, intuitive interactions that closely resemble the fluidity and nuance of human writing. Building on this progress, more recent studies have explored the classification of handwritten characters using neural activity recorded from scalp-based electrodes (11, 12).
Despite these advancements, the current state of handwriting BCIs presents several challenges and limitations. A principal concern is that the classification-based decoding scheme utilized in previous studies is tailored for Latin-based languages, which require the discrimination of only a few dozen letters to construct text. In contrast, non-Latin languages, such as Chinese, demand the classification of thousands of distinct characters, a task that is currently beyond the scope of neural signal-based classification for BCIs. Moreover, while neuroimaging and lesion studies have successfully identified the brain regions associated with handwriting (13, 14), the underlying neural mechanisms of the handwriting process remain poorly understood. This gap in knowledge may impede the advancement and broader application of handwriting BCIs, as it limits the ability to refine algorithms and develop systems that can effectively interpret the complex neural activity associated with writing movements across strokes.
Here, we present a novel paradigm for BCI that that shifts the focus from classifying the identities to reconstructing the very trajectories of imagined handwriting, to realize a universal brain-to-text system by translating the trajectories into any forms of text. Concurrently, our approach allows for a detailed examination of neuronal tuning mechanisms for handwriting to facilitate the decoding of trajectories. We have achieved high-fidelity reconstruction of the trajectories and attained an approximate 80% recognition rate within a vast character database. These advancements hold profound implications for the field of assistive technology, offering a potential new avenue for communication and expression for broad population.
Results
We surgically implanted two Utah arrays into the left motor cortex of a patient, specifically targeting the region surrounding the hand ‘knob’ area, as depicted in the inset of Fig. 1A. The patient, a right-handed individual in his 70s, had experienced a C4-level spinal cord injury resulting in total sensory and motor loss below the shoulders. The subject was instructed to attempt to handwrite characters with his right-hand using chalk on a blackboard, following a video displayed on a screen (as illustrated in Fig. 1A). The video presented handwriting sequences of strokes and cohesions—representing the air connections between strokes—of a single character at a consistent speed (Fig. 1B). Fig. 1C illustrates the smoothed velocity profiles in the x- and y-direction during the writing of a character that comprised three strokes and two cohesions, with each stroke or cohesion exhibiting a bell-shaped velocity curve.
We recorded raw neural signals during the imagined handwriting process. From these signals, we extracted features from both low and high-frequency bands (as shown in Fig. 1D), encompassing a range of measurements such as local field potential (LFP), single and multiple unit activity (SUA and MUA), entire spike activity (ESA), etc. (see Methods). Fig. S1 presents examples of SUA recorded across all 192 electrodes during a single session, providing a glimpse into the neural activity associated with the motor imagery of handwriting.
Neuronal tuning during handwriting
To examine the directional tunning properties of individual neurons, we initiated our investigation by requesting the subject to perform a center-out handwriting task, which involved tracing eight directional paths from the center and two circular paths both clockwise and counterclockwise. A raster plot of a well-isolated example neuron is displayed in Fig. 2A, from which, we inferred that the preferred direction (PD) of this neuron is predominantly downward and toward the lower right, as illustrated in Fig. 2B. Concurrently in the same session, the subject was also tasked with handwriting the digits from 0 to 9, and the resulting raster plot for the same neuron was presented in Fig. 2C. Upon plotting the spikes back onto the digit numbers as shown in Fig. 2D, it was observed that the majority of spikes occurred during the writing downward strokes (such as in digits of 0, 1, 7, etc.) or those inclined toward the lower right (such as in digits of 5 and 8). This is in well coincidence with the results obtained from the center-out task. We exhibited additional example neurons of one session in Fig. S2A, which displayed a variety of tuning directions and profiles. The spike-on-digit plots in Figures S2B and S2C depicted the firing patterns of another two example neurons from Fig. S2A with preferred directions of leftward and upward, respectively. Once again, the directional tuning observed between the center-out task and the digit-handwriting task was found to be highly congruent.
In order to assess the impact of visual stimuli on neural activity, we conducted handwriting tasks both with and without video guidance within the same session. Several example neurons under both conditions are showcased in Fig. S3, where it is evident that the neural activity in the absence of video guidance maintained clear distinguishability among different digits. Most importantly, it retained a significant resemblance to its counterpart with video guidance. However, due to the issue of alignment, the initiation and termination points of each handwriting strokes or cohesions could not be accurately identified, thereby hindering more detailed subsequent analysis.
To scrutinize the population activity patterns, we utilized principal component analysis (PCA) to diminish the dimensionality of neural activity data, followed by visualization using t-distributed stochastic neighbor embedding (tSNE) as presented in Fig. 2E. The neural activities corresponding to different digits were well-separated; digits with similar writing styles, like 6 and 0, 4 and 9, etc., were found in close proximity to one another, suggesting analogous population dynamics. A simple classifier (support vector machine, SVM) with bin size of 200 ms achieved an average accuracy of 96.7%±2.21%, as depicted in Fig. 2F. Furthermore, we engaged an artificial neural network in an attempt to model and fit the trajectory of digit writing based on the population neural activities. The outcomes demonstrated human-recognizable reconstructions on a single-trial basis, as portrayed in Fig. 2G. These findings suggest that the neural representation of imagined handwriting is distinct and likely supports the decoding of more intricate handwriting patterns, such as Chinese characters.
Trajectory fitting of handwriting Chinese characters
To test if the neural activity during imaginary handwriting could be used to fit more complex trajectories, we asked the subject to write 180 Chinese characters in 6 sessions. These characters, illustrated in Fig. S4A, are commonly used in daily life. Meanwhile, these characters are complex, with an average of 7.06±2.78 strokes per character. First of all, the neural activity patterns for each character were highly distinct; SVM classifier based on SUA and MUA achieved nearly perfect discrimination (98.2%±2.29% and 97.2%±2.30%) among 30-character in each session (Fig. 3B).
Next, we tried to fit the neural activities into the velocity of the handwriting, and reconstruct the trajectory by performing an integration along the path, as depicted in Figure 3C. We trained both linear Kalman filter and nonlinear long short-term memory (LSTM) network using leave-one-character-out cross-validation for trajectory fitting. The decoding correlation coefficient (CC) and mean square error (MSE) with various types of low- and high-frequency signals were presented in Fig. 3D and S4C, respectively. Across all scenarios, the LSTM demonstrated superior fitting outcomes compared to the Kalman filter. Notably, ESA yielded significantly better results than all other signal types, with an average CC of 0.753±0.18.
We further investigated the optimization of parameters for ESA extraction, as shown in Figures S4E-4H, and discovered that the outcomes were not particularly sensitive to parameter variations within a specific range. Finally, a bidirectional LSTM (bi-LSTM) yielded even more improved decoding results (Fig. S4I), but the decoding was not causal and thus unsuitable for online use. Additionally, the computational load was much higher than that of a standard LSTM.
To provide a qualitative illustration of how the reconstructed trajectories varied with different CC values, we showcased five example reconstructions in Figure 3E, with CC values ranging from 0.1 to 0.9. Generally, a reconstruction with a CC exceeding 0.5 would result in a human recognizable shape. Further quantitative results are detailed below.
Stroke and cohesion decoding during handwriting
The act of handwriting characters, whether they are Latin or Non-Latin words, is composed of strokes and cohesions, which possess distinct movement features and are likely encoded differently at the neural level. Upon close examination of the trajectory fitting outcomes for more rudimentary characters, it became evident that the decoding accuracy for individual strokes consistently surpassed that of cohesions. As illustrated in Fig. 4A, the four representative characters highlighted that incorrect cohesion decoding— primarily concerning the orientation of each cohesion—resulted in the misplacement of well-decoded strokes. This misplacement led to dissimilar profiles and, consequently, trajectories that were unrecognizable.
We then conducted a detailed examination of the neuronal tuning for strokes and cohesions in isolation. Fig. 4B showcased the tuning curve of one example neuron. For strokes, the curve peaked at a preferred writing direction of 135°, yet remained flat for cohesions across all directions. When combined, the tuning curve was biased toward strokes due to their predominance. For comparative purposes, we also assessed the neuronal tuning of strokes that were randomly divided in half. As depicted in Fig. 4C, the same example neuron from Fig. 4B exhibited consistent tuning properties between the two stroke groups. Additional example tuning curves contrasting strokes versus cohesions are displayed in Fig. S5A, with comparisons to tuning curves of stroke halves in Fig. S5B. We quantified the differences by calculating the delta PD and CC between the two tuning curves (Fig. 4D). The results demonstrated a significantly higher delta PD (39.1 vs. 91.4, p = 6.93e-8) and lower CC (0.03 vs. 0.65, p = 8.87e-14) for stroke vs. cohesion compared to stroke vs. stroke, indicating markedly different tuning property between strokes and cohesions.
Subsequently, we developed two decoding models, one trained exclusively with stroke data and the other with cohesion data, to evaluate whether this dual-model approach would outperform the single model trained with a mix of strokes and cohesions. The strokes and cohesions of the same four example characters were decoded using their respective models (Fig. 4E), showing improved fitting quality for both strokes and cohesions over the single-model depicted in Fig. 4A. It is important to note that the more precise orientation decoding for cohesions facilitated the correct placement of strokes, which is essential for character recognition. Quantitative results from a dataset of 30 characters (Fig. 4F) revealed significantly improved decoding similarity for both cohesions (0.79 vs. 0.87, p = 4e-07) and strokes (0.76 vs. 0.87, p = 2.3e-22).
Encouraged by these findings, we applied the dual-model scheme to the 180-character dataset and demonstrated that the dual-model achieved a significantly lower MSE (99.0±56.9 vs. 67.4±41.2) and higher CC (0.753±0.18 vs. 0.841±0.11) for overall trajectory fitting than the single model, as depicted in Fig. 4G, which was around 11.7% improvement. However, for practical application of this dual-model approach, it was necessary to first distinguish whether a particular part was a stroke or a cohesion. We then employed a LSTM classifier to classify strokes and cohesions bin-by-bin using ESA, SUA and local motion potential (LMP) signals. The ESA achieved highest classification accuracy of 83.72%±5.83%, indicating a promising discriminatory capability (Fig. 4H).
We then constructed a decoding model by cascading the stroke/cohesion classifier and dual-model fitting decoder. That is, for each bin, the classifier first determined whether the current bin corresponded to a stroke or a cohesion, followed by velocity fitting using the appropriate model accordingly. However, this cascading model did not outperform the single model in terms of either CC or MSE, as shown in Figure 4I. Although we demonstrated that the encoding structures for strokes and cohesions were distinct during handwriting, the decoding method by combining two stages of classification and fitting did not enhance the trajectory reconstruction. Further study could be done to explore more sophisticated algorithms, but single LSTM model is strong enough for current study. Consequently, we continued to utilize a single LSTM model with ESA signal for trajectory fitting in subsequent analyses.
Translate decoded trajectories into text
To objectively assess whether the decoded trajectories could be recognized as legible text, we initially utilized a generic handwriting recognition software to discern the continuous trajectory for each character. In this scenario, we used ESA, SUA and LMP for decoding and compared both speed and position decoding schemes. ESA velocity decoding yielded the highest recognition rate; however, only around a quarter (27.6%) of the trajectories could be recognized as correct Chinese characters (Fig. 5A). This was not surprise because the trajectory for each character was essentially a single continuous stroke, which significantly deviates from conventional stroke-by-stroke handwriting patterns (see Discussion section).
To recognize the trajectories correctly, we devised an innovative method aimed at finding out standard character that has the most similar speed profile with the decoded trajectories. The underlying concept was that each character would generate a unique and distinctive speed profile identifier along the writing process. To that end, we first built a library that encompassed the speed profiles for writing the standard 180-character. Then each decoded trajectory was z-score normalized and matched with the most similar standard character using dynamic time warping (DTW) algorithm. Once again, ESA with velocity decoding achieved the highest recognition rate, and this time approximately 87.2% of the trajectories could be correctly recognized (Fig. 5B). Given that the speed profiles were highly unique for each character, only a slight decrease in recognition rate to 79.8% was observed when the library expanded to 1000 characters (Fig.5C). Comparing to CC based method, DTW permits temporal sequences to exhibit certain degrees of delay or stretching along the time axis, thereby enabling a more precise capture of the inter-sequence similarity. This suggested that the recognition method was sufficiently robust for recognizing a large number of characters.
Ultimately, we examined the consistency of decoded trajectories for the same character across different days. The same set of 30 characters was repeatedly written four times over the course of eight days. Despite variations in neural activity, the decoded trajectories maintained a high degree of similarity; even after intervals of up to eight-day, all cross-day correlation coefficients exceeded a high value of 0.84 (Fig. 5D). This indicated that the imaged handwriting trajectory could possess certain stability, which could further improve the recognition rate (e.g., using the trajectory that decoded in previous days as the template). Collectively, these findings suggest that the trajectories of complex characters can be decoded and recognized as text, offering a universal brain-to-text communication solution applicable to any written language.
Discussion
In this study, we recorded intracortical neural activity from a human patient during video-guided imagined handwriting. Our findings revealed that neurons exhibit tuning properties during the handwriting process akin to the classical motor directional tuning theory (15). Additionally, we discovered that writing strokes and cohesions are encoded with distinct rules. Leveraging these insights, we engineered decoders capable of accurately reconstructing the trajectories of imagined handwriting for complex Chinese characters. Moreover, we developed a novel matching algorithm that translates these trajectories into legible text. This approach contrasts with previous classification methods (10) which introduced a pioneering brain-to-text methodology that were suitable for letter-based languages. Specifically, our method involves reconstructing handwriting trajectories and subsequently recognizing these trajectories as text, a technique that holds promise for application across universal languages. This innovative strategy advances the field of BCIs and paves the way for individuals with limited mobility to communicate through written language.
Movement trajectory fitting is a well-established technique within the realm of BCIs. Prior research has predominantly concentrated on decoding straight movements in arm-reach distances with both monkeys and humans, demonstrating control capabilities for computer cursors (3, 16) or prosthetics (2, 17). Preliminary trials have also explored the decoding of simple curved drawings in monkeys (18, 19). However, the ability to reconstruct the intricate handwriting trajectory, which occurs within a significantly smaller range but encompasses complex spatial and temporal dynamics, remained unexplored. Willett et al. provided an illustration of trial-averaged activity to reconstruct the trajectory of handwriting of single letters (10). Our study first confirms that incorporating temporal variability induced by handwriting significantly enhances classification accuracy (10), as evidenced by the perfect discrimination of up to 30 characters. More notably, due to the precise alignment between neural activity and handwriting kinematics, we have been able to reconstruct complex writing trajectories as human recognizable characters on a single-trial basis. To the best of our knowledge, this research marks the inaugural attempt to reconstruct complex handwriting movements for brain-to-text communication. This novel strategy extends the application of handwriting BCIs to encompass any written language, be it Latin-based or non-Latin, as it enables the decoding of any written trajectory as it is, thereby broadening the horizons for individuals seeking enhanced communication capabilities.
Handwriting serves as a pivotal motor task for investigating motor control (20) and assessing motor diseases (21). However, previous studies have predominantly focused on the analysis of written trajectories, often overlooking the distinct characteristics of strokes and cohesions. In reality, the execution of strokes and cohesions in handwriting exhibits fundamental differences, both from kinematic perspectives—such as cohesions involving an additional movement dimension perpendicular to the paper plane—and kinetic aspects—like the significantly reduced force applied to the pen during cohesions compared to strokes. While the neural substrates and mechanisms of handwriting have been primarily examined through lesion studies and neuroimaging (13, 14), our research delves into the single-neuron level investigation of both stroke and cohesion handwriting. We discovered markedly different tuning properties between the two at the individual neuron level, a finding underscored by the superior performance of a dual-model approach over a single mixed model when the labels for strokes and cohesions were identifiable. Nevertheless, our attempt to integrate a classification model with trajectory fitting did not surpass the performance of the single model. This was primarily attributed to the inappropriate assign of strokes and cohesions, despite achieving a bin-by-bin classification accuracy exceeding 85%. Further exploration into the population neural dynamics (22) may yield more effective discrimination between strokes and cohesions, thereby facilitating more precise trajectory fitting and enhancing our understanding of the intricate processes underlying handwriting.
Handwriting recognition and Optical Character Recognition (OCR) have reached a high level of sophistication and are widely utilized in contemporary applications (23, 24). However, these standard recognition techniques are not well-suited for the handwriting trajectories reconstructed from neural signals in our study, primarily for two reasons. Firstly, generic handwriting recognition systems are trained on normal handwriting patterns, which are notably distinct from the continuous one-touch-writing trajectories we decoded here. Secondly, the single-model decoding approach used in our study, was prone to inaccuracies particularly for cohesions, which led to the misplacement of otherwise correctly decoded strokes. Therefore, a recognition program tailored to account for these specific characteristics would likely achieve a higher recognition rate. In addition, one interesting finding was that the reconstructed trajectories of the same character exhibited a high degree of similarity across different days, indicating a consistent, person-specific signature would exist. To account for that, a personalized recognition program could be more effective in accurately decoding imagined handwriting. The consistency of these trajectories over time underscores the potential for developing individualized algorithms that can reliably interpret the unique handwriting patterns derived from neural activity.
Our study, while illuminating, has several limitations that warrant acknowledgment. Firstly, although visual guidance was instrumental in synchronizing neural activities with handwriting kinematics, it also risked contaminating or even amplifying the handwriting-related signals, potentially leading to false positive detections. Nonetheless, this approach remains a valuable starting point for constructing an initial decoder, which can be further refined during online testing as reliance on visual cues diminishes. Another limitation is that our study still considered handwriting as a 2D plane movement, rather than employing a 3D or multi-dimensional model. Future research should integrate these additional dimensions to more fully account for the variations observed in neural data, particularly for the nuanced differences between strokes and cohesions.
The application of our findings in the near future seems highly plausible, especially considering that fully implantable electronics are now accessible in both academic (25) and industrial (26) spheres. This advancement will expedite the translation of our research into practical human applications, broadening the potential impact of our work in the field of BCIs and motor control studies.
Materials and Methods
Participant and surgery
The participant enrolled in this study was a right-handed individual, who had experienced a C4-level spinal cord injury and resulted in total sensory and motor loss below the shoulders. The microelectrode implantation surgery was conducted about 3 years after the injury in his 70s and data collection for this study was at around 2.5 years after the surgery. All clinical and experimental procedures received approval from the Medical Ethics Committee of the Second Affiliated Hospital of Zhejiang University and were registered in the Chinese Clinical Trial Registry (chictr.org.cn; registration number: ChiCTR2100050705).
Two 96-channel Utah microelectrode arrays (Blackrock Microsystem, USA) were implanted into the left precentral gyrus, specifically targeting the hand ‘knob’ area of motor cortex (Fig. 1A inset). The location of implantation was identified using functional magnetic resonant imaging (fMRI) prior to surgery when the participant imaging reaching and grasping movement.
Video-guided handwriting paradigm
To guide the motor imaginary process for the patient, a handwriting video was played on the computer monitor. The video consisted of stroke-by-stroke writing animation of a specific character, leading by a hand with chalk (Fig. 1A). The patient was asked to attempt to write the same character with chalk on a blackboard following the guidance. We also asked the patient to write on a paper with pen, basically the classification results were similar. We kept using chalk on blackboard paradigm based on the patient’s preference. A typical trial started by showing the character (in dark green) on the screen (500 ms) followed by an auditory prompt of the character’s pronunciation (1000 ms). After a short delay (300 ms), a sound cue was issued and the writing animation started. The writing consisted of both strokes and cohesions, i.e., air connection between strokes. The written strokes were highlighted as light green and the cohesions were simplified as a direct line between the end of current stroke and the start of next stroke. The duration of writing depended on the length of the character, ranging from 4 to 8 seconds, which is a little bit longer than normal writing speed to adapt to the patient. The speed for each character and cohesion was constant, i.e., the duration of each stroke or cohesion is proportional to their lengths.
The handwriting videos were artificially synthetic. Firstly, the sequences of two-dimensional coordinate for writing each character were extracted from standard font of that character using ‘GetData Graph Digitizer’ software. Secondly, each segment, defined as a straight line or an approximation of a straight line before sharp inflections, was labeled as stroke or cohesion and converted the coordinates into velocity sequences. The duration for each segment was proportional to the ratio of the segment length to the total length, and the velocity profiles in x- and y-direction were defined as: Where T represents total duration of that segment, and a is the scaling factor to fit the duration of the segment. Lastly, the handwriting animations were created frame-by-frame according to the velocity profile above using MATLAB. The video had a black background, and there was a static dark trace of the entire character before the actual writing starts. The strokes are represented by light green lines with thick width over the static dark characters (Fig. 1B). The position and velocity data used for decoding were 5-point smoothed version of the actual traces (sampled at 20 Hz), which resembled a bell-shaped profile (Fig. 1C).
Data collection sessions
Neural data were recorded when the subject attempted to write various characters during 1-2 hours sessions on scheduled days. During the experimental sessions, the patient was seated in a wheelchair with cables were connected from the patient’s head connectors to the NeuroPort data acquisition system (Blackrock Microsystem, USA), which recorded both neural signals and task timings (through serial port) simultaneously. The character dataset used in this study included:
8 directional paths from the center and 2 circular paths both clockwise and counterclockwise, which resembled a center-out task commonly used to examine directional tuning of neurons. Each direction was repeated 10 times in pseudorandom order (Fig. 2);
10-digit number from 0-9, which was repeated 10 times in pseudorandom order in each session. In some sessions, both center-out and digit writing were conducted to examine the tuning property for the same neuron (Fig. 2). In this case, only 5 repeats for each digit/direction were performed;
30 simple Chinese characters (usually 3-stroke) that were used to investigate the difference of tuning property between stroke and cohesion (Fig. 4). For each character, 2 blocks and 3 repeats/block were conducted per session.;
270 complex Chinese characters. 180 of them (average 7-stroke) were recorded with raw data and various signal features could be extracted and used for decoding analysis (Fig. 3). For each character, 2 blocks and 3 repeats/block were conducted per session.;
The same 30 Chinese characters were repeated in another separate 5 sessions to examine the stability of the decoded trajectories (Fig. 5). For each character, 2 blocks and 3 repeats/block were conducted per session.
Neural signal preprocessing
Neural signals from each channel were amplified, filtered (0.3-7500 Hz) and digitized at a sample rate of 30 kHz using NeuroPort. Various signal features were then extracted, including:
Single-unit activity (SUA), which was extracted online after further filtering (250-5000 Hz) with a threshold of -6.25 times root mean square (rms). Single units were isolated offline using Offline Sorter (Plexon, USA).
Multi-unit activity (MUA), which was extracted offline from the further filtered data (250-5000 Hz) using different threshold at -4.5 and -6.25 rms. No further spike sorting was applied.
Local filed potential (LFP), which was obtained by low-passing (below 500 Hz) of raw signal and down sampled to 2000 Hz. To reduce sporadic outliers, extremes exceeding ±3 times the standard deviation from mean were clipped, followed by a third-order Butterworth lowpass filter. Then the mean powers for each frequency band (1-4, 3-10, 12-23, 27-38, 50-300 Hz) were calculated as signal features.
Local motor potential (LMP), which was the moving averaged of LFP in non-overlapping 50 ms windows (27).
Entire spiking activity (ESA), which was obtained by applying a first-order Butterworth high-pass filter (300 Hz) on raw signal, rectifying by taking the absolute value, first-order Butterworth low-pass filtering (12 Hz), and finally down sampling to 1 kHz (28).
Spiking-band power (SBP), which was obtained by applying a second-order Butterworth bandpass filter (300-1000 Hz) to the raw signal, rectifying by taking its absolute value, and finally down sampled to 2 kHz (29).
Continuous multiunit activity (cMUA), which was obtained by applying third-order Butterworth bandpass filtering (300-6000 Hz) to raw signal, squared, low-pass filtering using a third-order Butterworth filter (100 Hz), clipping negative values, square rooted, and finally down sampled to 1 kHz (30). We have found cMUA had high correlation coefficient (above 0.87, Fig. S4B) and similar decoding results with ESA and was not used for further analysis.
To identify the actual timing of imaginary handwriting after animation start, we performed principal component analysis (PCA) and found a significant change of neural activity in PC1 and PC2 occurred at around 300 ms after the cue (Fig. S4C). Subsequent decoding analysis confirmed that a delay of 300 ms achieved the best results. Therefore, we aligned the writing kinematics with the 300-ms-shifted neural in all following analysis. The bin size to average the neural activities were also tested in a classification decoding task, ranging from 50 to 400 ms, and confirming that a bin size of around 200 ms yielded the best results (Fig. 2F). Thus, all the neural signal features above were binned with overlapping 200 ms window and shifted 300 ms to align with handwriting kinematics.
Directional tuning and visualization
During center-out task, the averaged firing rate in each direction was calculated and depicted as a radar plot for each neuron (Fig. 2B). The preferred direction (PD) was determined as the direction with highest firing rate. During digits writing task, each spike during writing was plotted back onto the trajectory (with little position jet depending on the width of the stroke) to illustrate where, during the writing, a spike fired (Fig. 2D). During simple Chinese character writing, the tunning curve (i.e., firing rate vs. writing direction) was plotted separately for stroke and cohesion (Fig. 4B). As a comparison, the strokes were randomly assigned into two equal groups and the tuning curves for each group were constructed separately (Fig. 4C).
We used t-distributed stochastic neighbor embedding (t-SNE) to reduce the dimensionality of the trials of neural activity for visualization (Fig. 2E). The neural activity for writing digits was compiled into a matrix with dimension T × UB, where T is the number of trials, U is the number of units and B is the number of bins in each trial. We applied t-SNE to these matrices using tsne function in MATLAB with default parameters.
Classification and fitting models and metrics
To classify the digits (10 in each session with 10 repeats each) or characters (30 in each session with 3 repeats each) identities, we employed support vector machine (SVM) classifier in libsvm library with polynomial kernel. The classifier was cross-validated with leaving-one-trial-out scheme.
To fit the trajectory of the imagined handwriting movement from the neural signal features, we utilized both Kalman Filter (KF) and long short-term memory (LSTM) as decoder (28). The KF uses linear system state equation and the input-output data observed to estimate the system’s state optimally. The KF employs a recursive approach for state prediction and state updates as follows: where is the predicted state value, is the optimal estimate of the state, A is the state transition matrix, B is the control input matrix, H is the state observation matrix, Q and R represent the covariances, which respectively characterize the deviations of the state values and observation values.
The LSTM is a type of recurrent neural network (RNN) designed specifically to solve the issue of long-term dependencies in traditional RNNs. The core of LSTM is the cell state, which serves to stably preserve long-term memory in the model. LSTM utilizes gate mechanisms to control the removal or addition of information to the cell state. The forget gate determines which information should be discarded from the cell state, the input gate determines which new information should be added to the cell state, and the output gate determines the features of the cell state to be outputted. The description is as follows: Where x represents the input, h represents the output, f represents the forget gate, i represents the input gate, o represents the output gate, c represents the cell memory. The symbols σ and ⊙ represent the sigmoid activation function and element-wise multiplication operator. The number of units in the LSTM was 512 and the network was trained with batch size of 1, dropout rate of 0 and learning rate of 0.001.
We also tried bidirectional LSTM (Bi-LSTM) for trajectory fitting (10), which, contrasting with LSTM, considers both historical and future information to determine the output. The structure of Bi-LSTM consists of two LSTM units, one processing the input sequence from the past to the future, and the other processing it from the future to the past. Through this approach, Bi-LSTM can achieve a more comprehensive understanding of the sequence. However, this approach was not causal and thus could not be used for online applications.
The fitting models were cross-validated using a leaving-one-character-out method, in which, all the repeats for the same character to be tested was excluded for training the model. Both velocity and position of the handwriting were used to decode the trajectory of characters. For velocity model, an additional step that integrating velocity along the path was calculated to reconstruct the position, i.e., trajectory. Finally, we used Mean Squared Error (MSE) and Pearson’s Correlation Coefficient (CC) as evaluation metrics for decoding performance and paired Wilcoxon signed-rank tests to assess statistical differences in decoding performance between different features and decoding methods.
Dual-model and similarity metrics
The trajectory fitting model above were trained with mixed strokes and cohesions (single-model). We also trained stroke- and cohesion-model with exclusively the stroke and cohesion data, respectively, and tested in simple Chinese characters (dual-model, Fig. 4). To quantified the quality of reconstruction for the single- and dual-model, two similarity metrics were defined for stroke and cohesion, respectively. The cohesions were always straight lines and the similarity was defined as weighted sum of angular and length similarity: where w is the weight and set to 0.6 in this study to emphasize the importance of angle of cohesion which is important for character reconstruction. ΔLi and Δθi is the length and angle difference for i-th cohesion out of the total number of cohesions N. Similarity for strokes was defined as the weighted sum of pair-wise distance and the correlation between the trajectories: where the weight w was set to 0.6 to emphasis the pair-wise distance Δdj between decoded trajectory and prompted trajectory.
To classify the strokes and cohesions, another LSTM was trained with similar structure and parameters above. A cascading model with LSTM classifier and dual-model for fitting was constructed to decode the velocity bin-by-bin and reconstruct the trajectories by integration.
Recognition of handwriting trajectories
Decoded handwriting trajectories were recognized as text in two ways. Firstly, the trajectory for each character was fed into an online generic handwriting recognition software through their APIs (teshuzi.com). The first Chinese character output by the algorithm, which has the highest similarity score, was selected as the recognition outcome. Secondly, we recognized the decoded trajectories by matching them against a database of velocity profiles from standard characters. To accomplish this, we extracted trajectories for up to 1000 commonly used characters (using methods above) and converted them into their corresponding velocity profiles. The 180 characters tested in this study were part of this library, but the velocity profiles in the library were not identical with the velocity prompted to the subject (due to different sampling). Dynamic Time Warping (DTW) and correlation coefficient (CC) were employed to quantify the similarity between the decoded velocity and velocity profiles in the library. DTW permits temporal stretch and delay, thereby enabling a more precise capture of the inter-sequence similarity. However, the computation load was high for DTW and the fastDTW algorithm was employed to compute the DTW distances. The character with the highest similarity score, as determined by DTW or CC, was selected as the final recognition result.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Author contributions
Conceptualization: YH, YW; Methodology: YH, GX, KX, Jzhu, YW; Investigation: YH, GX, XY, ZW, XX, JZhu, YW; Visualization: YH, GX, XY, ZW, XX; Supervision: YH, JZhang, YW; Writing—original draft: YH, GX; Writing—review & editing: YH, KX, JZhang, YW.
Competing interests
Authors declare that they have no competing interests.
Data and materials availability
All data in the main text or the supplementary materials are available upon request.
Acknowledgments
This work was supported by STI 2030—Major Projects (2021ZD0200404), National Natural Science Foundation of China (62336007), Pioneer R&D Program of Zhejiang (2024C03001), the Starry Night Science Fund of Zhejiang University Shanghai Institute for Advanced Study (SN-ZJU-SIAS-002), and the Fundamental Research Funds for the Central Universities (2023ZFJH01-01, 2024ZFJH01-01). The authors thank Mr. Xiang Li for software development, Prof. Schwartz for implantation surgery.