Abstract
Background Neurological and functional recovery after traumatic spinal cord injury (SCI) is highly heterogeneous, challenging outcome predictions in rehabilitation and clinical trials. We propose k-nearest neighbour (k-NN) matching as a data-driven, interpretable solution.
Methods This study used acute-phase International Standards for Neurological Classification of SCI exams to forecast 6-month recovery motor function as primary evaluation endpoint. Secondary endpoints included severity grade improvement, independent walking, and self-care ability. Different similarity metrics were explored for NN matching within 1267 patients from the European Multicenter Study about Spinal Cord Injury before validation in 411 patients from the Sygen trial.
Results We obtained a population-wide root-mean-squared error (RMSE) in motor score sequence of 0.76(0.14, 2.77) and competitive functional score predictions (AUCwalker=0.92, AUCself-carer=0.83). The validation cohort showed comparable results (RMSE = 0.75(0.13, 2.57), AUCwalker=0.92). Prediction performance in AIS grade B and C patients (∼30%) showed the largest deviations from true recovery scores, in line with large SCI heterogeneity.
Conclusions Our approach provides detailed predictions of neurological and functional recovery based on a highly interpretable unsupervised machine learning concept. The k-NN matching strategy further enables the integration of historical control data into the evaluation of clinical trials and provides a data-driven digital twin for recovery trajectory exploration.
1. Background
Traumatic spinal cord injury (SCI) greatly impacts the quality of life of affected individuals due to an impairment of sensorimotor function. (1,2) Patients and their families have critical questions regarding their prognosis of recovery and probable long-term disability to plan for their health-related, financial, and social future. Personalised outcome predictions could provide a more specific perspective and help guide physicians in their rehabilitation planning. However, accurate prediction of sensorimotor recovery from traumatic SCI for an individual is challenging as injury and recovery patterns of traumatic SCI are greatly heterogeneous (3,4). The International Standards for Neurological Classification of SCI (ISNCSCI) exam assessing sensorimotor functions in a standardised way represents the global clinical gold standard to determine the neurological level and injury severity. This exam assesses light touch, pinprick and motor function as integer scores from 0 to 2 (sensation) or 5, respectively. The American Spinal Injury Association (ASIA) impairment scale (AIS) further gives a notion of the overall injury severity. Sensory and motor scores, the impairment scale grade, together with the relevant patient age, sex, and neurological level of injury (NLI) mainly provide the current basis for recovery prognosis.(5–7) Although some personalised biomarkers of recovery extracted from magnetic resonance imaging or injury-derived proteins have also been reported (8–12), recovery estimates are mainly determined on the basis of changes in values of ordinal scales such as the ISNCSCI on a group level. The current state-of-the-art in the field is a logistic regression model by van Middendrop et al. (13) that provided a prediction of an individual’s walking ability. Despite this success, current prognostic models are greatly limited in terms of flexibility to predict multiple functional abilities and neurological recovery which may be valuable to assess the full clinical picture of a patient. To the best of our knowledge, there currently exists no publicly available tool providing a more holistic recovery predictions covering several aspects of the recovery process. At the same time, understanding the prediction process and the confounding variables leading to a specific outcome prediction are key and often not easily accessible for the individual. Similarly, a notion of prediction uncertainty given input data variability should be provided in light of medical assessment data such as the ISNCSCI (14).
Despite numerous clinical trials over the last decades, the range of effective treatments to ameliorate traumatic damage to the spinal cord is limited. The heterogeneity of outcomes and the scarcity of reliable outcome predictors have likely contributed to the failure of SCI clinical trials, where, due to its low incidence (8.0 to 246.0 cases per million inhabitants per year), recruitment of participants is highly challenging (15). A potential mitigation strategy may be to use historic control cohorts as an extension to placebo-controlled trial designs by maximising the number of patients receiving the investigational therapy. This approach has been adopted by the Nogo Inhibition in Spinal Cord Injury (NISCI) trial (clinicaltrials.gov #NCT03935321), a current phase-II study that followed a phase-I evaluation by Kucher et al.(16). NISCI randomises patients at a ratio of 1:2 into placebo and treatment arms while accounting for the smaller randomised control group through historical controls. Integral to the success of such advanced trial designs is a reliable means of identifying suitable historic controls. Currently, a comprehensive comparison of different ways to match historical controls is also missing.
In this study, we define “historic neighbours” as the closest match(es) to a patient within a reference database in the acute phase of SCI (see Figure 1A) as a purely data-driven, unsupervised process. In contrast to other data-driven approaches such as supervised machine or deep learning (17–20), k-nearest neighbour matching (21) provides an inherently interpretable solution that is flexible to predict multiple endpoints simultaneously. The transparency of the prediction pipeline is essential for high-stakes clinical decision-making to optimise the resource investment in clinical trials. As a so-called lazy learner algorithm, k-NN, does not involve a specific training phase but uses the reference data upon inference to identify “similarity” between the sample of interest and the reference pool. The relevant similarity metric needs to be carefully chosen to reflect the clinical hallmarks of the task at hand.
The objective of this study was to benchmark different ways to match patients with SCI to historic cohorts (i.e. the nearest neighbours) and quantify the degree to which these models provide detailed, personalised recovery predictions. The central hypothesis of this study is that kNN regression would be superior to patient stratification by clinical and demographic features alone. The mechanistic rationale for this is that several features of the retained motor and sensory function influence recovery implying that the full motor function pattern needs to be accounted for upon neighbour matching. Particularly, we score both neurological recovery (at the myotome level) and functional ability to provide a full picture of clinical SCI recovery. Where possible we benchmark our approach against the current state-of-the-art, i.e. a logistic regression for walking ability prediction and a regularised linear (Ridge) regression for the prediction of segmental motos scores. Importantly, by making the analysis code publicly available it is possible to freely use the new paradigms for clinical trial design and personalised predictions.
2. Methods
Figure 1 summarizes the workflow of this study.
2.1. Included Data
We based the analysis on the European Multicenter Study about Spinal Cord Injury (clinicaltrials.gov #NCT01571531), comprising over 5000 patients as of April 2021. The EMSCI study is performed in accordance with the Declaration of Helsinki and approved by all contributing institutional review boards. We validate our findings against an independent cohort from the Sygen trial(22).
Patient inclusion criteria
Included traumatic SCI patients were required to have complete ISNCSCI (23,24) assessments and AIS grades in the very acute (within two weeks of injury) and the recovery phase (∼26 weeks, 150-186 days). We excluded patients with an NLI in the sacral segments and those displaying notable deterioration in motor function (> 2 points per score). Given no treatment effect was demonstrated by the Sygen trial we pooled control and intervention groups (22,25). Despite significant changes in patient management (26,27), we included all historic patient data given no proven improvement over time(28). This led to 1267 patients from the EMSCI database (Table 1, Figure S1) being identified as a reference pool and assessment of the kNN by leave-one-out cross-validation. Sygen patients were matched to the EMSCI reference pool.
Included variables for neighbour matching (2-week assessments)
All ISNCSCI scores, including motor (MS) and sensory scores (SS, i.e. light touch (LTS) and pinprick (PPS)) and the presence or absence of deep anal pressure (DAP) sensation and the ability for any voluntary anal contraction (VAC) were used. For optional patient matching was performed based on the NLI, patient demographics (age, sex), the AIS grade, and lower extremity motos score (LEMS).
Evaluation endpoints (26-week assessments)
Figure 1C gives an overview of the neurological and functional evaluation endpoints used. The primary evaluation endpoint was neurological recovery of motor function quantified as root mean squared errors below the NLI between the true and predicted MSs for an individual. Secondary endpoints included differences in LEMS (ΔLEMS), AIS grade conversion, and the ability to walk independently(13) or to care for oneself. For the latter, we binarize the Spinal Cord Independence Measure (SCIM) subscores (#1-4) assessing self-care into self-carers (SCIM1-4 ≥ 15, i.e reaching at least 75%) and dependent patients (SCIM1-4 <15). During the EMSCI trial, different versions of the SCIM (II(29) and III(30)) were used which were pooled and available for 1189 of the 1267 patients. For Sygen patients, SCIM was not assessed. Instead Benzel et. al.’s modified Japanese Orthopaedic Association Scale (31) with equivalent cutoffs to predict walking ability was used. Self-care potential was not assessed.
2.2. Similarity metrics and optimal hyperparameter selection
All of the described concepts were implemented in python (version 3.8). Figure 1B outlines the three-step procedure followed for neighbour matching and prediction calculation: Reference pool stratification (step 1), MS matching (step 2), neighbour agglomeration (step 3). Step 1 comprises an optional stratification by variables not representing the motor function pattern (NLI, AIS grade, sex, age +/- 10-year bracket) to account for biases. I.e. all retained reference patients matched the patient of interest for these variables. In step 2 the sensorimotor function of the patient of interest was matched based on one of four similarity metrics summarised in Table 2 matching either LEMS (+/- 5 points, Type 1), or the mean MS below the NLI (+/- 0.5 points, Type 2), or achieved an RMSE between the acute-phase MS below the NLI with that of the patient of interest below 0.5 (Type 3), or were identified as kNN (with k in [1,3,10,20]) based on either all MS, or both MS and SS (LTS, PPS). The recovery-phase outcome variables of all identified neighbours are agglomerated by either mean or median in the final step to arrive at the prediction.
2.3. Statistical analysis and visualisation
Motor score assessment uncertainty and bootstrapping
Despite best efforts (32,33) the ISNCSCI scores are subject to uncertainty stemming from the examiner’s level of experience (systematic) and personal fluctuation (random), as well as the patients’ compliance with the examination and natural variation of physical fitness (random variation). We base our estimation of MS uncertainty distributions on a study by Bye et al.(14) to calculate the probability density function for each MS level (Figure S2). We calculate n=100 bootstraps for each of the target patient’s MS sequences and repeat the matching. In all visualisations we show median values with 95% confidence intervals.
Prediction performance reporting
We represent the segmental MS distribution as a graph, whose y-axis shows the MS, and the x-axis the segments of the key muscle from C5 to T1 and L2 to S1 in the order from rostral to caudal. For each patient, we quantify graph agreement based on the root mean squared error below the NLI (RMSEblNLI). Models are ranked based on the normalised (to the minimum over all models) sum of the median and 97.5th RMSEblNLI percentiles taken over all patients. The obtained rank is used to identify the best-performing model for neurological recovery prediction.
Functional score prediction performance is quantified as area under the receiver operator characteristic (ROC-AUC) based on the probability of scores of all matched neighbours for a patient.
Subgroup assignment and dimensionality reduction
Patient subgroups were assigned using k-means clustering (sklearn.cluster.KMeans) over all acute-phase ISNCSCI scores scaled to [0,1]. We chose a number of eight clusters as a compromise between the minimum number of patients per cluster (99) and subgroup detail covered. We visualise the identified patient subgroups in a 2D-dimensional embedding of the high-dimensional feature space using Uniform Manifold Approximation and Projection (u-map)(34) implemented in the umap package.
Alternative prediction models
We compare our prediction of walking ability to an unregularized logistic regression (sklearn.metrics.LogisticRegression) based on five input variables: age dichrometised at 65 years, and the larger of the left/right side motor and light touch scores of the gastrocsoleus (S1) and quadriceps femoris (L3) muscles. We performed leave-one-out cross-validation on the EMSCI data, whereas a model trained on all EMSCI data was used for predictions in the Sygen subcohort. Similarly, we provide predictions of MSs at 6 months from a Ridge regression model (sklearn.metrics.Ridge) using all acute-phase MS and SS as inputs. Here hyperparameters were tuned via grid search on the validation subset of a 5-fold nested cross-validation, and results are reported for the pooled test sets over all folds.
3. Results
Figure S1 shows the CONSORT-style flowchart illustrating the selection of 1,267 EMSCI (primary cohort; reference pool) and 411 Sygen (validation cohort) patients. Table 1 shows relevant summary statistics.
3.1. Contribution of reference pool stratification and neighbour agglomeration
Figure 2A shows boxplots for the number of identified neighbours given a fixed distance for the similarity metrics Types 1-3 (Type 4 matching yields constant numbers of neighbours) with different options of reference pool stratification. Multiple neighbours, rather than a single match, are frequently identified. Particularly for LEMS and MeanMS similarity (Types 1, 2), a large number of neighbours is identified in the absence of stratification for AIS grade, NLI, and/or sex and age.
Figure 2B shows violin plots of the achieved RMSEblNLI at 6 months between the true segmental MS sequences and the Type 1 and 2 NN predictions (results for other matching approaches and evaluation by ΔLEMS are given in Figure S3 and yielded minimal variation). Controlling for selected biases prior to NN matching, e.g. by AIS and/or NLI, improved performance (i.e.lower RMSEblNLI), whereas the inclusion of age and sex information did not (i.e.comparable shape of the violins to no stratification). Overall, mean agglomeration resulted in narrower distributions, but higher median RMSEblNLI values compared to median agglomeration.
3.2. Comparison of similarity metrics
Population-level performance
The best combination of reference pool stratification in addition to each of the MS similarity metrics are summarised in Table 3 (top). For the patient population as a whole, matching Type 4A (20-NN) was best. This model also performed well for AIS grade conversion and functional score predictions (ROC-AUC best for walking ability prediction, 8th for AIS grade conversion, best model without age/sex information for self-care ability prediction). Generally, AIS grade conversion was less accurately predicted and self-care ability prediction benefitted from the inclusion of age and sex information (Table S1). Walking ability prediction within the EMSCI cohort (ROC-AUCkNN = 0.92) was comparable to that achieved by a logistic regression model (ROC-AUCLR(13,35)= 0.94, Figure 2I). Ridge regression achieved an RMSEblNL=0.98 (0.22, 2.57) in the same cohort ranking 17th amongst the assessed 176 approaches.
Performance within patient subgroups
For visualisation, all patient data are projected into a two-dimensional embedding of the full variable spaces using u-map (34) (Figure 2C). By colouring patients in this embedding by selected clinical scores (AIS grade, NLI, LEMS, DAP) subgroups become apparent. Figure 2D shows the resulting recovery-phase RMSEblNLI between the true and predicted MSs indicating performance variation as a function of AIS grade and NLI.
We quantify patient subgroup performance in Table 3 (middle). Subgroups are visualised in Figure 2E and separate the cohort by the NLI, AIS grade, and retained motor and sensory function as indicated in the accompanying heatmap. Whereas clusters 1 to 4 comprised more heterogeneous patients, clusters 5-8 included similar AIS A patients as indicated by the degree of noise in the relevant heatmaps and AIS grade composition in Table 3. The heatmaps also give an indication of the approximate NLI within the patient subgroups. Within each cluster, some patients did not achieve an acceptable (i.e. RMSEblNLI < 1.0) prediction as indicated by wide 95% confidence intervals of the reported metrics. We observe that clusters comprising a larger fraction of patients with complete (AIS A) paraplegia (clusters 7, 8) or a large proportion of AIS D patients (cluster 3) performed best. Predictions were less accurate for patients with a lumbar injury (cluster 2) or tetraplegic AIS B, and C patients (cluster 4). Representative examples of patients and the 20-NN predictions achieving subgroup median performance scores from each cluster are shown in Figure 2F. All predictions are provided with 95% confidence bands for MSs, and a probability in addition to a binary label for functional endpoints. This illustration gives a notion of the clinical relevance of the achieved median RMSEblNLI given uncertainty in the ISNCSCI score assessment. Independent of the chosen similarity metric, and patient subpopulation, we observe substantial variation in prediction performance by patients, reflected by large ranges of the evaluation metrics.
3.3. Correlation analysis between acute and recovery phase neighbour agreement
One possible explanation for poor prediction is suboptimal NN matching in the acute phase motivating a correlation analysis of the acute and recovery phase RMSEblNL between the agglomerated neighbour cohorts and patient of interest. Figure 2G shows two representative examples of this analysis, the matching performing best for the EMSCI cohort (Figure 2G, left), and the model achieving the highest correlation coefficient (no added stratification; Type 1; median) in Figure 2G, right. We observe that Type 1 matching may not always achieve good agreement between the neighbours’ and the acute phase segmental MSs (RMSEblNLI > 2), particularly for patients with an acute phase LEMS of 0 likely due to not accounting for impairment of the upper extremities. These patients could also not be predicted well, even if matched perfectly in the acute phase (see Figure 2G, right). In contrast, good matches were identified for most patients in the acute phase for Type 4A kNN but this did not always translate to an accurate prediction, i.e. low RMSEblNLI at 6 months (Figure 2G, left). In general, independent of the similarity and evaluation metric, all correlations were highly significant (all p-values < 0.0001), while correlation coefficients varied markedly as summarised in Figure 2H. The highest correlation coefficients were observed for similarity assessed by LEMS and MeanMS (Type 1 and 2). A stronger correlation implies that if a good/poor match is identified, a reasonable/poor recovery prediction is more likely. We observe that stronger correlation coefficients are driven by poorer matching in the acute phase, leading also to poorer prediction. Only 62% of patients displayed a very good match (RMSEblNLI < 0.3) to their identified nearest neighbour cohort in the acute phase for any tested model. More severely injured patients were generally matched better: AIS A: 490/546 (90%), B: 154/178 (87%), C: 80/239 (33%), D: 61/304 (20%). The fraction of patients achieving an acceptable prediction (RMSEblNLI ≦ 0.5) also varied greatly depending on the used matching method. They ranged between 99 (subset by age, sex, and NLI with 20-NN(MS, SS), mean agglomeration) to 512 (no subset, 20-NN(MS), median aggregation) of the 1,267 EMSCI patients. Specifically, matching approaches relying on stratification by age and sex resulted in fewer acceptable matches. For models based on MS (i.e., not accounting for sensory scores), neighbour agglomeration by median led to stronger correlations than mean averaging. Here, we also observed that correlation coefficients increased (associated p-values decreased) with an increasing number of neighbours considered for Type 4 (kNN) matching.
3.4. Validation cohort
We tested the proposed pipeline on an independent cohort derived from the Sygen trial (Table 3, bottom). Here, the same model identified for the EMSCI cohort (AIS; Type 4 (k=20); mean) performed best in terms of neurological recovery and achieved a comparable ROC-AUC for walking ability prediction as in the primary cohort and a logistic regression model (Figure 2I, ROC-AUCLR = ROC-AUCkNN Sygen = 0.92). Results in terms of RMSEblNLI and ΔLEMS were also comparable for patient subgroups and the population as a whole (Table 3, middle and bottom) with the worst performance observed in clusters 2 and 4, whereas patients in clusters 7 or 8 had the best prediction performance.
4. Discussion
kNN has previously been applied in the context of SCI predictions but outside the realm of recovery prediction from ISNCSCI scores (36–38). Given its ease of calculation and direct interpretability, kNN is particularly promising in the medical sector for highly heterogeneous, yet small patient cohorts characterised by few clinical variables, such as SCI. The key findings of this study include i) the identification of an optimal historic cohort matching strategy, ii) the observation that multiple rather than single reference patient provide superior predictions, and iii) competitive prediction performance on two large cohorts for neurological and functional endpoints with prediction confidence.
We investigated several similarity metrics aiming for robust (independent validation), highly-detailed (prediction of full motor function pattern), and diverse (functional and neurological) recovery predictions. This goes beyond previous work (39,40) in all of these aspects. Historic cohorts identified by kNN are versatile and address several aspects of the recovery process. Based on ISNCSCI scores kNN outperformed linear regression for the prediction of motor function patterns and achieved comparable performance for functional endpoints as supervised logistic regression.
Expectedly, subgroup analysis showed that patients presenting with either severe (AIS A) or mild injuries (AIS D) achieved better predictive performance. For AIS A patients, this may be due to a larger number of patients (EMSCI: 43%, Sygen: 65%), better matching performance in the acute injury phase (i.e. significant correlation of early and recovery-phase RMSEblNLI), and an overall smaller recovery achieved (flooring effect). In contrast, ceiling effects may motivate AIS D performance. Similar to previous studies, we recorded the strongest variation for AIS B and C patients, which represent a small fraction of the included patients (∼30%) but display large variations in motor and sensory function.
20-NN assigns a historic cohort rather than a single, closest neighbour yielding more generalizable predictions than 1NN. This aligns with the notion that no two patients recover the same but rather that similar patterns of recovery exist. For neurological recovery prediction it helped to stratify the reference pool by AIS grade but no other demographic or injury characteristic in addition to the MS quantification. In agreement with previous work (28), age or sex infomration did not improve neurological outcome predictions. MSs are intrinsically reported relative to the expected maximum possible implying a potential assessment bias with respect to age and sex. However, we observed that age and sex information benefitted the prediction of self-care ability.
Several studies have previously addressed the challenge of SCI recovery prediction of various endpoints such as walking scores, AIS grade conversion, or total motor score recovery.(17,35,41,42) Despite promising results within small cohorts, including detailed additional data such as cerebrospinal fluid biomarkers or imaging data, (11,43) a more generalizable approach based only on clinically well-established scores allowing for large cohort evaluations was missing. Our analysis fills this gap with a data-driven, personalised, and multi-faceted recovery prediction for the individual. For walking ability, we achieve comparable performance as previously reported based on a logistic regression model (13,35), however, our approach provides a variety of additional endpoint predictions and each of these with confidence estimates. As such, we improved beyond the state-of-the-art in terms of prediction detail and interpretability of the underlying model. The evaluation of functional endpoints was key in addition to quantifying neurological recovery here. ΔLEMS or RMSEblNLI are not directly related to clinical outcomes making it difficult to interpret the clinical relevance of the prediction performance in terms of these scores. Still, our study presents quantitative results for these metrics that serve as a baseline for future prediction models. The clinical impact of this study is twofold. Firstly, the presented concept provides patients and physicians with a personalised perspective regarding neurological and functional recovery, including quantification of prediction confidence stemming from the distribution in the reference cohort. The prediction process is inherently interpretable by reviewing the selected neighbours. Moreover, we analysed patient subgrouping through clustering beyond the current gold standard AIS grades alone which allows for more fine-grained and noise-robust group assignments also for other clinical evaluations. Secondly, we provide a means to identify historic controls for individual trial patients to increase and improve the pool of controls in clinical trials. This is of particular importance given the large heterogeneity and low incidence of SCI. Our publicly available code enables direct application of the concept. An in silico evaluation of the statistical power of this way to identify historic control cohorts will be addressed in our follow-up study.
Some limitations remain. kNN performance varied among patient subgroups, particularly those reflecting the highest degree of heterogeneity in terms of injury and recovery potential. Also, despite very good performance for walking ability and self-caring, AIS grade conversion was poorly predicted for AIS A patients. This is likely due to low incidence of improving AIS A patients and implies that the underlying mechanisms are not reflected in the ISNCSCI exam. Instead, factors not included, e.g. CSF, imaging, or other modifying events may drive this outcome. The data included lacked potentially important clinical information, e.g. the timing and success of decompression surgery, patient comorbidities, or other treatment and response effects (e.g. blood markers, medication). Such information was not available for a sufficiently large patient population in a database. Our approach, based on the ISNCSCI exam, was a compromise between the number of patients and the assessment detail covered. We suggest repeating the presented evaluation once more detailed data become available.
We further did not account for longitudinal variation during matching. Adverse, disease-modifying events, e.g. severe pneumonia, may greatly influence recovery (44). Since we restricted this analysis to a single time point for neighbour matching, we ignored longitudinal changes in recovery trajectory due to modifying events and excluded patients that greatly deteriorated. However, a lot of the observed variation in the AIS B and C categories may be driven by modifying events. The proposed analysis hence represents predictions based on an expected normal/optimal recovery trajectory. Given a more detailed data pool, it would be possible to consider matching at multiple longitudinal time points. We purposefully did not include longitudinal information of later time points as the aim of a historic cohort by our interpretation should be i) a perspective of recovery in the very acute injury phase, and ii) historic control cohort building for trials in which patients are randomised in the acute injury phase.
Patient numbers and the comparably early time of evaluation (6 months) may be a further limitation. The EMSCI database is one of the largest, longitudinal SCI databases to date assessing patients at 26 and 52 weeks at the latest. Owing to large proportions of missing data, it was not possible to include all enrolled patients and motivated the use of the 6-month time point given larger data availability. Our kNN is strongly based on motor and sensory scores which are difficult to impute leading to complete-case analysis, i.e. discarding any patients with missing data (EMSCI 75 %, Sygen 46%, see Figure S1). We would like to stress that SCI recovery at 6 months is likely not final and further improvements are possible. However, we presented a proof-of-principle evaluation for a single time point (six months) that is translatable to different evaluation endpoints.
Another limitation of the concept in itself is the lack of any underlying mechanistic modeling or description of potential relations among variables and predictions as is the case for other machine learning and mathematical models. Thus, this approach cannot infer beyond the provided pool of reference patients whereas other algorithms may increase inference strength given a sufficiently large number of training examples. This especially limits performance for patient subgroups where variation in motor and sensory scores is large, but sample numbers are limited (here AIS B and C). It would be essential to increase the pool of historic reference patients to boost the performance of kNN for the purpose of clinical application but also to allow for data-driven extrapolation and pattern mining to improve predictions. Despite investigating several similarity metrics, it is clear that this is not a fully exhaustive evaluation, and further optimization of the matching criteria, e.g. of the LEMS or meanMS matching thresholds may lead to different results for these matching types.
5. Conclusions
We presented a systematic analysis of the concept of nearest neighbour matching for recovery prediction of traumatic SCI based on acute injury phase ISNCSCI assessment scores. We further propose quantitative metrics to compare prediction performance and apply these to identify the optimal matching criteria for specific patient subgroups. By making our software pipeline publicly available and validating our findings in an independent test cohort, we provide a path of clinical translation of the proposed concept in terms of cohort design and evaluation of clinical trials in the realm of SCI research in addition to providing personalised recovery predictions.
Data Availability
All data produced in the present study are available upon reasonable request to the authors.
Declarations
Ethics approval and consent to participate
The EMSCI study is performed in accordance with the Declaration of Helsinki and approved by all contributing institutional review boards. EMSCI follows the ethical guidelines of the participating countries and patients gave their written informed consent before being included in the database. The Sygen clinical trial also received ethical approval but was conducted before clinical trials were required to be registered (i.e., no clinicaltrial.gov identifier available).
Consent for publication
Not applicable.
Availability of data and materials
Anonymized data used in this study will be made available upon request to the corresponding author and in compliance with the General Data Protection Regulation (EU GDPR). We publish all code required to reproduce the presented results in our GitHub repository: https://github.com/BorgwardtLab/SCIRecoveryPredictionPublic.git
Competing interest
The authors declare no competing interests.
Funding
This study was supported by the Swiss National Science Foundation (Ambizione Grant, #PZ00P3_186101), Wings for Life Research Foundation (#2017_044, #2020_118), and the International Foundation for Research in Paraplegia (IRP, Curt). SB was supported by the Botnar Research Centre for Child Health Postdoctoral Excellence Programme (#PEP-2021-1008). The funders did not specify the study design, data collection, analysis, or the decision to publish and preparation of the manuscript.
Authors’ contribution
SB implemented the presented models and performed the evaluation, visualisation, and interpretation of the results. LB and LL supported the design of the matching algorithms and patient cohorts. The EMSCI Study group collected data used in this analysis. AC led data collection and provided clinical insight to the prediction task. DM, RA, NW, RR, and JG gave critical feedback to improve the conceptual design and supported the clinical interpretation of the proposed findings. CJ designed the study and provided data access and continuous feedback. All authors contributed to the writing of the manuscript and reviewed the written document.
Footnotes
↵# Shared senior authorship
List of abbreviations
- AIS
- American Spinal Injury Association impairment scale
- ASIA
- The American Spinal Injury Association
- DAP
- Deep anal pressure
- EMSCI
- European Multicenter Study about Spinal Cord Injury
- ISNCSCI
- International Standards for Neurological Classification of SCI
- kNN
- k-nearest neighbour
- LEMS
- Lower extremity motor score
- LR
- Logistic regression
- LTS
- Light touch score
- meanMS
- Mean motor score
- MS
- Motor score
- nan
- Not a number - missing data for DAP
- NLI
- Neurological level of injury
- PPS
- Pinprick score
- RMSEblNLI
- Root-mean-squared error below the NLI
- ROC
- Receiver operator characteristic
- SCI
- Spinal cord injury
- SCIM
- Spinal cord independence measure.