Abstract
Changes to spatiotemporal gait metrics in gait-altering conditions are characteristic of the pathology. This data can be interpreted by machine learning (ML) models which have recently emerged as an adjunct to clinical medicine. However, the literature is undecided regarding its utility in diagnosing pathological gait and is heterogeneous in its approach to applying ML techniques. This study aims to address these gaps in knowledge. This was a prospective observational study involving 32 patients with Parkinson’s disease and 88 ‘normative’ subjects. Spatiotemporal gait metrics were gathered from all subjects using the MetaMotionC inertial measurement unit and data obtained were used to train and evaluate the performance of 10 machine learning models. Principal component analysis and Genetic Algorithm were amongst the feature selection techniques used. Classification models included Logistic Regression, Support Vector Machine, Naïve – Bayes, Random Forest, and Artificial Neural Networks. ML algorithms can accurately distinguish pathological gait in Parkinson’s disease from that of normative controls. Two models which used the Random Forest classifier with Principal Component analysis and Genetic Algorithm feature selection techniques separately, were 100% accurate in its predictions and had an F1 score of 1. A third model using principal component analysis and Artificial neural networks was equally as successful (100% accuracy, F1 = 1). We conclude that ML algorithms can accurately distinguish pathological gait from normative controls in Parkinson’s Disease. Random Forest classifiers, with Genetic Algorithm feature selection are the preferred ML techniques for this purpose as they produce the highest performing model.
Author summary The way humans walk, are emblematic of their overall health status. These walking patterns, otherwise, can be captured as gait metrics from small and portable wearable sensors. Data gathered from these sensors can be interpreted by machine learning algorithms which can then be used to accurately distinguish healthy and non-healthy patients based on their gait or walking pattern. The applications of this technology are many and varied. Firstly, it can be used to simply aid in diagnosis as explored in this paper. In future, researchers may use their understanding of normal and pathological gait, and their differences to quantify how severely one’s gait is affected in a disease state. This data can be used to track, and quantify, improvements or further deteriorations post treatment, whether these be medication-based or interventions like surgery. Retrospective analyses on data such as this can be used to judge the value of an intervention in reducing a patient’s disability, and advise health related expenditure.
1. Background
1.1. Introduction to Gait analysis
Gait refers to the way a person or animal walks or runs and is a simple yet informative measure of overall health. A meta-analysis by Studenski et al(1) showed that with each increment of 0.1 m/s in walking speed there was a 12% increase in 10-year survival rate in older adults (HR 0.88, 95% CI, 0.87-0.90; P<0.001)(1). Walking speed as a health metric is not restricted to the context of ageing but can also be predictive of neurological, cardiovascular, orthopaedic, and psychiatric conditions(2–6).
Gait, however, is remarkably complex and is not restricted to the metric of walking speed alone. Gait analysis can be subdivided into qualitative and quantitative methods. Qualitative observational methods utilised by clinicians day-to-day are convenient, yet highly subjective and correlate poorly with validated computerised sensors (mean r=0.55)(7). Kinetic data investigates forces involved in locomotion such as ground reaction force. These measures present limited clinical utiliy(8) and are more suited to the realm of high-performance sports where the focus of gait analysis is not to identify disease states but rather to maximise the efficiency of locomotion(9).
In contrast kinematic analyses have shown clinically significant differences in pathological and healthy gait patterns. Table 1 summarises findings from several studies where spatiotemporal parameters in a range of conditions are compared to healthy age-matched controls. Table 1 is merely a snapshot of the unique gait ‘signatures’ of various pathologies which illuminates the diagnostic potential of spatiotemporal gait metrics. For example, appreciable differences can be noted between Parkinson’s disease(10–17) and Lumbar disc herniation(18) in terms of cadence (-6% vs -66%) and double support time (+24% vs +53%) whilst those with Lumbar spinal stenosis(19–23) present with a more modest decrease in cadence (10-14%). Furthermore, statistical models created by Verghese et al. and Lord et al. using spatiotemporal data alone, were able to explain up to 90% of gait variance between healthy and pathological gait using only five factors: pace, rhythm, variability, asymmetry, and postural control(24, 25).
A normal gait cycle for each leg involves a stance and a swing phase. Stance (also known as support) phase describes the entire period during which a foot is on the ground, and swing describes the time this same foot is in the air as the limb advances in space. When one limb is in stance, the contralateral limb is in swing, except for an overlapping period where both feet are on the ground, known as the double support time, as seen in Figure 1.
The single support time is the period during which only one limb is on the ground. Several other spatiotemporal gait metrics exist and are described in Table 2.
1.2. Measuring Gait
1.2.1. Laboratory Techniques
When it comes to gait assessments, optoelectronic stereophotogrammetry is a highly precise laboratory technique and is the gold standard for clinical spatiotemporal gait analysis(27). Infrared cameras capture three-dimensional trajectories of reflective markers placed on points of interest on the subject’s body. However, these require expensive equipment, skilled technicians and are ultimately not feasible for the fast-paced everyday clinical environment(19). Furthermore, these methods are susceptible to the psychological Hawthorne and “white-coat” effects as individuals are more likely to be conscious of their gait when closely observed by a clinician. Hence laboratory techniques fail to capture ‘free-living gait’ which refers to the way people walk in everyday life(19). One study by Brodie et al. highlights this well, finding that lab-based technologies tend to overestimate parameters such as cadence (8.91%, p< 0.001) whilst underestimating the variability in gait (81.55%, p<0.001)(28). These drawbacks may limit the validity of the study and decrease the generalisability of the findings.
1.2.2. Inertial Measurement Units
In contrast, inertial measurement units (IMU’s) are wearable single-point devices with an accelerometer, magnetometer, and a gyroscope. Measurements made with IMU’s have shown to be largely consistent with that of the laboratory analysis techniques (r >0.83). These are very promising as they can capture free-living gait in community and home environments as they are small, inexpensive, and unobtrusive to the activities of daily living(29–31).
After measuring gait, scientists are concerned with distinguishing healthy and pathological gait patterns. This has proven to be challenging and the literature shows that mathematical(32, 33) and statistical techniques(34, 35) are popular due to their simplicity. However, purely mathematical transforms provide limited insight as they rely solely on univariate signals and data processed from wavelets, whilst statistical techniques assume normal distributions which tend to oversimplify the complex non-linear relationships in gait data(36, 37). In contrast, recent applications of machine learning (ML), a special subset of artificial intelligence (AI), have shown their ability to model non-linear multidimensional data whilst being versatile in incorporating new data to improve accuracy of predictions(38, 39).
1.3. Machine Learning in Gait Analysis
The workflow in classifying healthy and pathological gait has four key stages.
1.3.1. Feature Selection
Feature selection techniques aim to optimise the model’s performance by selecting only the features with maximal separation between classes to ensure the model is both time and cost-efficient(40, 41). Methodologies fall under three categories: filter, wrapper, and embedded methods.
Filter methods are the least computationally intensive as they evaluate the dataset without evaluating the performance of the model(40). Wrapper methods are the most computationally intensive as they select features tailored to the performance of the ML model(40). Embedded methods consider both the dataset and the performance of the model with the advantage of being much less computationally intensive than wrapper methods(40).
The most common feature selection methods used in gait analysis are Principal Component Analysis (PCA) a filter method, Genetic Algorithm (GA) a wrapper method, and Hill-climbing (HC) an embedded method(42–44). PCA aims to find the minimum number of features or variables required to explain the majority of variance in the data(45). The GA is a different technique which uses the Darwinian theory of natural selection to determine the ‘fittest’ features. i.e., those that are most discriminative and contribute meaningfully to the performance of the model. Successive iterations of the genetic algorithm are termed ‘generations’ and see the ‘natural selection’ of fitter features and allow for the ‘breeding’ of fit features to form newer and fitter composite features(46) to enhance performance. In contrast, HC is a heuristic search for a solution which maximises the separation between classes but as it is an embedded method, HC may miss the global optimal maximum and instead settle on local maxima. Hence its heuristic nature may provide a sufficient solution in a reasonable amount of time, but this may not be the optimal solution to the classification problem(47).
PCA is the simplest technique computationally, and produces the most reliable results(48) (model accuracy >95%) (Table 3). Theoretically speaking, HC is expected to be quite promising as an embedded method and has been highly successful (>96% accuracy) in heart monitors(49). However, it still provides relatively low classification accuracy (75.5-83.3%)(50) with spatiotemporal gait data, showing that its use has not yet been optimised to gait analysis. Further research is recommended to realise its potential in gait analysis.
1.3.2. Classification
Support vector machine (SVM), Naïve-Bayes (NB), and Artificial Neural networks (ANN) were the most common ML models used for classification purposes in the literature.
SVM utilises supervised learning methods to compute a hyperplane with greatest separability between the analysed classes(50) whilst NB utilises the Bayes theorem and assumes that all features are independent to create a probabilistic model(51). Finally, ANN’s feature a feed-forward networks where multiple nodes ‘synapse’ upon each other in a layered system, and rely on a ‘transfer-function’ for forward propagation and classification.(52)
SVM has shown the greatest success with model accuracies as high as 100%(48) (Table 3). It is also the most used ML model(48, 53–55). NB has been featured sparingly in the literature, and more papers featuring this model are required before its utility can be determined.
1.3.3. Cross Validation
Cross-validation (CV) is used to evaluate the generalisability and external validity of a model by training the algorithm on a training set and evaluating its performance on a validation set(44, 50, 56). The most common CV techniques are the k-fold and leave-one-out (LOO) methods as seen in Table 3. K-fold techniques randomly partition data into k subsets where k is an integer. K-1 subsets are used as training subsets, whilst the remaining subset is used to validate the model(50). This is done k times where a different subset is chosen as the validation set on each iteration of the process. LOO methodology uses the same concept except that it is not random as each subset belongs to an individual subject. Consequently, LOO trains the model more rigorously compared to k-fold and introduces levels of complexity which may overfit the model and reduce its external validity. Hence, LOO should be reserved for smaller datasets(48, 53). However, the literature does not indicate an appropriate size for a dataset using LOO and this is likely since it is not only the number of subjects that determines the ‘size’ of the dataset, but also the amount of information associated with each subject. Hence, the size of spatiotemporal gait datasets must be evaluated with both CV techniques before a recommendation can be made.
1.3.4. Evaluation of model performance
A confusion matrix (Figure 2a) is used to represent the results of the classification model. Metrics such as accuracy, recall, precision, specificity and F1 score can be calculated from the matrix(57).
Furthermore, the generalisability of the model can be evaluated by the Mean Squared Error (MSE) which is a reflection of the degree of underfitting or overfitting(58). All metrics are summarised in Figure 2b. Whilst accuracy is used as a metric in almost all papers (see Table 3), the literature is quite heterogeneous in its use of other metrics. Further research in this field is required in order recommend a more consistent and holistic approach to evaluating model performance as it pertains to clinical use cases.
1.4. Research Questions
This study will use multiple ML models to distinguish normative subjects from those with Parkinson’s disease using spatiotemporal gait data gathered by wearable IMU’s.
1.4.1. Primary Research Question
Can ML algorithms accurately distinguish patients with Parkinson disease from normative controls?
1.4.2. Secondary Research Question
Which combination of feature selection and classification techniques are most suited to an AI model tasked with gait analysis?
1.4.3. Study Rationale
Spatiotemporal gait data are discriminative of pathologies and IMUs are valid and convenient methods of gathering spatiotemporal data. ML has emerged as a promising adjunct to clinical medicine but has not been optimised for clinical gait analysis. The study aims to determine whether a ML model can accurately distinguish patients suffering from Parkinson disease from normative controls, and the combination of feature selection and classification techniques which are best suited to this purpose.
1.4.4. Study Significance
Such a model would allow for significantly earlier diagnosis of gait-altering pathologies such as Parkinson’s disease, compared to current means which depend on clinicians’ observational analysis. This will facilitate early intervention, improve long-term outcomes and patient quality of life.
2. Results
2.1 Study Population
After cleaning our data prior to applying ML techniques, we excluded data pertaining to 68 normative subjects due to missing demographic values, 4 normative subjects due to an IMUGaitPy bug, 8 normative subjects with excessive noise evidenced by their clearly incorrect spatiotemporal parameters. After exclusion of these records, the study population consisted of 32 subjects with Parkinson’s disease and 88 normative subjects.
2.2 Demographic characteristics
A summary of the demographic characteristics can be found in Table 4b. There were no statistically significant differences in height, weight, BMI, and sex. However significant differences were noted in age, daily step count, smoking, diabetes, cholesterol, and 12-month falls status as well as problems with balance.
2.3 Model Performance
Confusion matrices for the classification for each of the models outlined in Figure 4 are available in Appendix 4. The performance of the model according to metrics outlined in Figure 2b are available in Table 5a.
Models 4, 5 and 9 were the most accurate (100%), sensitive (100%), and had the highest F1 score (1.000).
See Table 5b for rankings of models according to the aforementioned metrics.
3. Discussion
Spatiotemporal gait patterns detailed in Table 2 have proven to be sufficiently discriminative of gait altering pathologies such as lumbar spinal stenosis, multiple sclerosis, and Parkinson’s disease. Mathematical and statistical techniques have shown their ability to distinguish between healthy and pathological gait(32–35) but are limited in their inability to model the complex non-linear relationships that are inherent to human gait metrics(38, 39).
Recently ML has emerged as a promising new technique which can model both linear and non-linear relationships and is versatile in its ability to incorporate new information to improve the performance of the model. However, this field is still largely in its infancy, especially where it pertains to medicine. The current literature is largely heterogeneous and undecided on the best approach to applying ML techniques to spatiotemporal gait features.
The present study applies a wide range of ML techniques to spatiotemporal gait metrics gathered (using MetaMotionC) from participants with Parkinson’s disease and normative subjects. The feature selection techniques will be applied separately to each classification technique as illustrated in Figure 4 to determine the combination of techniques which produces the highest performing model. The aim of the study is to determine the utility of ML in diagnosing pathological gait and finding the combination of ML techniques which produces the highest performing model.
3.1 Justification of study design
3.1.1 Data collection protocols
The present study was inspired by research done by Fonseka et al(59) and Natarajan et al(60) who profiled a variety pathological gait signatures of lumbar spinal stenosis, chronic mechanical lower back pain as well as rheumatological hip and knee conditions. These authors found >92% of agreement between measurements taken from the MetaMotionC and a reference standard (single-camera videography) with an intraclass coefficient >0.86 (p<0.001) and was hence deemed valid. Despite other gait analysis studies placing IMUs at the lower back(22, 61–64), wrist(65), ankle(63, 66) or thigh(64), the sternal angle was chosen as the flat surface of the sternum provides a simple and highly repeatable sensor attachment even for unskilled users(59). Accordingly, several studies(29–31) validate chest-based sensor placements for spatiotemporal metrics by demonstrating high correlation (r > 0.83) with optoelectronic stereophotogrammetry which is the current gold standard in gait analysis(27).
3.1.2 Machine learning techniques
The present study utilises PCA and GA feature selection techniques, but omitted HC in its investigation as it is computationally intensive(67) yet performs inconsistently with model accuracies ranging from 75%(54) to 100%(53). Hair et al’s(45) recommendation that features chosen by PCA should explain at least 60% of variance in a dataset is widely cited in the literature. Hence the present study uses this notion to choose the top 15 variables which explains 86% of variance in the dataset. In GA, although the ideal population size is specific to the application(46), the literature recommends a larger population size (up to n=300(68)) to allow GA to converge on a robust solution. Since our total population was n=120, we applied GA to our entire dataset with n=50 generations, to obtain a solution with 11 hybrid or ‘mutated’ features.
The LR and ANN classifier were fitted using the Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm (lbfgs) which is derived from Broyden-Fletcher-Goldfarb-Shanno algorithm (bfgs). Both are mathematical techniques applied to non-linear optimisation problems(69), with lbfgs having the added advantage of reduced runtime and memory usage. The lbfgs has been validated by biomechanics papers which found a limited increase in model performance for a considerably larger investment of computational power(70, 71) with the original bfgs. The SVM was fitted using a linear kernel which is known for its shorter runtime and is preferred for datasets with many features(72). For RF models, the literature recommends using 64-128 trees as a tradeoff between high ROC AUC values and processing time(73). Hence, we utilised 100 decision trees which were merged to increase the accuracy of predictions. Whilst the number of hidden layers possible for an ANN are unlimited, a higher number of layers incurs greater computational costs. The author decided on a moderate number of hidden layers (n=6) to increase the accuracy of predictions with a reasonable computational cost. This is based off previous applications in biochemistry and genetics(74–76) which used a similar number of features to the present study (n=10-16).
3.2 Evaluating Model performance
3.2.1 Metrics used to evaluate model performance
The literature is heterogeneous in the metrics used to report the performance of ML models. Different combinations of the metrics in Figure 2b have been used(48, 50, 53–55). For example, Eskofier at al(48) and Pogorelc at al(55) report only accuracy, whilst Begg et al(54) and Khandoker et al(53) report accuracy, recall (sensitivity) and precision (positive predictive value). Reporting accuracy alone is problematic with an imbalanced dataset when the condition has a low prevalence(77) and can lead to misleading conclusions. Hence recall is useful as it quantifies the true-positive rate whilst precision reflects the false positive rate (FPR = 1-precision). Ideally, a good test has a high sensitivity, so as not to miss subjects suffering from a condition, but also has a high precision (low FPR) so as not to incur additional costs to the healthcare system by necessitating clinic visits for healthy individuals(78).
However, models which have high recall, do not necessarily have high precision. As seen in Table 5b, models can be ranked differently by different metrics. For example, Model 3 in this present study is more precise than model 8 (0.9615 vs 0.7879) but has a lower recall (0.7813 vs 0.9310). Here the Fβ score is useful (Figure 2b) as it is a composite metric of both recall and precision. The β-parameter controls the tradeoff of importance between recall and precision. β < 1focuses on precision, β > 1 focuses on recall and β = 1 assigns equal importance to both. The use of the F1 score (β = 1) has not yet been used in the literature concerning gait analysis but has proven to be insightful in studies related to COVID-19(79) as well as the wider statistical literature(80–83). The F1 score is suitable to the present study where the maximisation of true positives and minimisation of false positives are of equal importance.
Furthermore, the Mean-squared Error (MSE) is obtained (Figure 2b) after performing cross-validation techniques such as k-fold and LOO. Whilst ML studies (see Table 3) all perform cross validation; none report the error in any form, whether it be MSE, Mean absolute error (MAE) or others. The MSE is a representation of the degree of bias in a model(84). A highly complex and overfitted model tends to be less biased towards its training data, have a lower MSE, but in turn these models show greater variance with external data, are less generalisable and have poor external validity. The opposite is true for models with higher error. If they retain a high classification accuracy and F1 score, a higher error value is desirable as it means that the model is more biased towards its training data, less overfitted, less likely to show variance with external data, generalisable and clinically useful(85).
Hence, the author recommends the combined use of accuracy and F1 score to judge the performance of ML models considering its MSE after cross-validation which is an indication of the bias-variance tradeoff(85).
3.2.2 Performance of models
The present study finds that there is a high classification accuracy amongst all models (>89%). Models 4,5 and 9 (Table 5a) are the highest performers with 100% accuracy, and F1 score of 1, which is the highest possible score. Out of these Model 9 performs best as it has the highest MSE (0.125) after cross-validation and is likely to have higher bias in favour of lower variance, thus greater generalisability and greater external validity. Following this, are models 1,2 and 10 which are ranked highest to lowest in terms of accuracy and F1 score. The remaining models cannot be ranked as they perform inconsistently based on accuracy and F1 score.
The success of Models 4 and 9 which use RF is consistent with Arora et al(86) who used the tri-axial accelerometry data from smartphones to distinguish participants with Parkinson’s disease from normative controls. The models had an average sensitivity of 98.5% and specificity of 97.5%. The likely reason for the slightly lower performance of their model is due to the lack of features used in their analysis as well as the lack of a feature selection technique. The MetaMotionC has not only an accelerometer but a magnetometer and gyroscope and inevitably, the present study works with more features. In summary, our findings are consistent with the literature and suggest that the RF classifier is promising in gait analysis. The author recommends the use of a feature selection technique, namely GA (Model 9) in combination with RF to increase the performance of the model.
In comparison, Model 5 featuring ANN, greatly outperforms a recent study by Iosa et al(88) who used a very similarly capable IMU and hence had access to a very similar feature set. It is understood that this team did not apply any feature selection techniques and that their participants only walked for 10m in data collection. A 2016 study by Del Din et al.(87) found that longer ambulatory bouts were more discriminative of pathological gait (Figure 5). The present study utilises a minimum walking distance of 50m and is a likely reason for our increased classification performance. Hence, we find that ANN is a good classifier in spatiotemporal gait analysis but should be used with a feature selection technique and longer ambulatory bouts for more valid predictions.
Both models 3 and 8 performed poorly (accuracies 92.5% and 89.1%) in the present study and performs similarly poorly in other models using sensor-based data(89, 90). However, a study by Pogorelc et al(55) which used video-analysis as opposed to sensor-based techniques in gait analysis achieved a 97.2% classification accuracy. No feature selection techniques were used, indicating that classification accuracies could be further increased. Early interpretations may suggest that NB classifiers are more suited to visual data compared to sensor-based data, but further research is necessary to make a firm conclusion.
3.3 Significance of findings
The findings confirm that ML algorithms can accurately distinguish pathological from healthy gait. This field is still largely in its infancy and extant literature is heterogeneous in its approach to the use of ML techniques. Through this study, we have contributed to the existing knowledge by showing that feature selection improves performance and should hence be used routinely hereon. Furthermore, we have shown that RF classifiers in conjunction with GA outperform other spatiotemporal gait analysis models which combine other techniques. In addition, through our analysis we add to the extant literature by recommending the routine use of accuracy and F1 score to evaluate model performance.
3.4 Strengths and limitations
The main strength of this study is in the wide scope of techniques investigated. By combining two feature selection techniques iteratively with five different classifiers, we were able to form 10 different models to make a comprehensive recommendation on the combination of methods best suited to distinguishing pathological from healthy gait using spatiotemporal gait data.
The main limitation is in the statistically significant age difference between the Parkinson’s and normative groups. This makes age a confounding variable which may obscure the ‘true’ impact of the pathology(91) and limit the internal validity of the study. This arose largely due to difficulty obtaining older subjects who satisfied the inclusion criteria for the normative group.
In addition, the lack of an external validation dataset precludes determination of the generalisability and external validity of the model. Cross-validation techniques and MSE values calculated are only a prediction of the likely generalisability. CV techniques are common in the literature because models are often ‘bootstrapped’ for data. Increasing the size of the dataset, would allow researchers to have a separate validation dataset that is not used at all in training the model(92).
4. Future directions and Conclusion
Firstly, the number of older participants should be increased in a follow-up study. Participants should be stratified by age to reduce the confounding influence. Secondly, we aim to introduce a second pathological group such as patients with lumbar spinal stenosis to evaluate the performance of the model in a three-way classification problem similar to that conducted by Mannini et al who achieved 90.5% accuracy in classifying elderly subjects from Post-stroke and Huntington’s disease patients using Support Vector Machines (SVM)(93).
In addition, the utility of models described in this paper must be determined by examining the degree of disease progression by assessing the severity of gait deterioration. Similarly, the team aim to investigate whether ML models can quantify patients’ response to therapy. For example, whether a model distinguish a Parkinson’s patient before and after they take medication.
In conclusion, this study found that ML algorithms in combination with feature selection techniques could accurately distinguish pathological from healthy gait. In relation to Parkinson’s disease the findings suggest that a RF classifier paired with the GA feature selection is the best performing model with 100% accuracy and F1 score.
These findings are invaluable considering that such a tool can allow early diagnosis of conditions such as Parkinson’s disease, facilitate early intervention and increase patient outcomes and quality of life.
Future research should have larger datasets stratified with age and construct a model that is not only able to distinguish Parkinson’s patients from healthy ones but also from patients suffering from other gait-altering pathologies (e.g., post-stroke, lumbar spinal stenosis).
5. Materials and Methods
5.1 Objectives
The present study is an observational case-control study of participants with Parkinson’s disease who were compared to healthy controls. Spatiotemporal gait metrics summarised in Table 2 were collected from both groups using an IMU and several ML models were used to classify the study population based on whether they suffer from Parkinson’s disease.
5.2 Ethics
Approval was obtained from the South-Eastern Sydney Local Health District, New South Wales, Australia (HREC 17/184). All participants provided written informed consent.
5.3 Study Population
A total of 168 normative subjects and 32 participants with Parkinson’s Disease were recruited for the study. Details regarding the locations from which participants were recruited as well as age ranges can be found in Table 4a.
Inclusion criteria for normative subjects included being older than 18 years of age and inclusion criteria for the group with Parkinson’s disease included being older than 18 years of age and a clinical diagnosis of Parkinson’s disease.
Exclusion criteria for both groups included a BMI greater than 25, inability to walk at least 50m independently, women who are pregnant and any concurrent gait altering pathologies including but not restricted to stroke, lumbar spinal stenosis, multiple sclerosis, rheumatological conditions of hip, knee and spine and cauda equina syndrome.
5.4 Data collection
Participants provided informed written consent after which they were interviewed to obtain demographic data summarised in Table 4b. The wearable IMU used was the MetaMotionC developed by Mbientlab Inc. and contains a 16bit triaxial accelerometer (100Hz), gyroscope (100Hz), and 0.3μT magnetometer (25Hz). Participants were fitted with this sensor at the sternal angle (Figure 3) and following a short pause to orient the device, instructed to walk 50m, unobserved, along a flat concrete pathway, at their natural walking pace. Data was downloaded via Bluetooth™ to an AndroidTM smartphone running the IMUGait Recorder application which was developed for this study. IMUGaitPY, a modified version of the open-source GaitPY Python(94) package by Czech and Patel was used to extract spatiotemporal gait metrics (Table 2) from the raw data. Appendix 1 elaborates on this process. Setup instructions for IMUGaitPY as well as details regarding configuration files and mathematical derivations can be found in Appendix B.
5.5 Data Analysis
5.5.1 Demographic variables
Demographic data were assessed for normality using the Shapiro-Wilk test and visual inspection of histograms. Continuous variables such as age, height, weight, BMI were compared between groups using the independent sample t-test for normal data and the Mann-Whitney U test for non-normal data. Categorical variables such as sex, smoking, diabetes, hypertension, cholesterol, and 12-month falls status were compared using the Chi-square test of independence. The level of statistical significance was set to p=0.05 and analysis was performed using IBM SPSS Statistics Version 26.0 (IBM, New York, United States).
5.5.2 Machine learning models
Pre-processing
The dataset was cleaned by removing duplicate records and records with missing values. Structural errors such as spelling mistakes were corrected as they have the potential to return error codes. Following this, the data was standardised to remove outliers.
Recoding variables and outcomes
In preparation for binary classification, normative (healthy) subjects were assigned a value of 0 whilst those with Parkinson’s disease were assigned a value of 1. Similarly, categorical demographic variables such as smoking, diabetes, hypertension, cholesterol, and 12-month falls status which were previously answered as yes or no, were recoded to be 1 and 0 respectively.
Feature selection
Principal Component Analysis (PCA) was used to reduce the features (variables). The sum of the first 15 features (from a total of 75) explained over 86% of variance and was deemed sufficient to represent the data.
Separately, the Genetic Algorithm (GA) reduced the dataset to the 11 most descriptive features.
Classification
Classification models used include Logistic regression, Support Vector Model (SVM), Naïve Bayes classifier (NB), Random Forest (RF) and Artificial Neural Network (ANN). Each of these models were applied separately to each of the reduced feature sets determined by PCA and GA to create 10 separate ML models in total. The process thus far is summarised in Figure 4.
The LR model was fitted using the Limited-memory Broyden-Fletcher-Goldfarb-Shanno (lbfgs) optimisation algorithm The SVM was fitted using a linear kernel whilst the NB classifier was applied using the Gaussian Naïve Bayes method. The RF model utilized 100 decision trees which were merged to increase the accuracy of predictions. Finally, the connected multi-layer artificial neural network (ANN) multiplayer perceptron was also fitted with the lbfgs optimisation algorithm and included six hidden layers (n=6).
Cross-Validation
All models were validated independently using both the k-fold (k value set to 5) and leave-one-out (LOO) techniques.
Evaluating performance
All metrics outlined in Figure 2b were used to evaluate the performance of the models.
The models above were coded using Jupyter Notebook, an open-source software (Project Jupyter, 2014). See Appendix 3 for the full code.
Data Availability
Able to be accessed in a Google Drive link. Deidentified Normative Database: https://docs.google.com/spreadsheets/d/1L2ua-LERcYig1LzS2DwjU-g0PVE1SKqfS69j8WcKIZ8/edit?usp=sharing Deidentified Parkinson's Database: https://docs.google.com/spreadsheets/d/1Sc6JL0UmtiEIJCmD1R2jbsuGXDIy24SxbmkdQKDJg-4/edit?usp=sharing
https://docs.google.com/spreadsheets/d/1L2ua-LERcYig1LzS2DwjU-g0PVE1SKqfS69j8WcKIZ8/edit?usp=sharing
https://docs.google.com/spreadsheets/d/1Sc6JL0UmtiEIJCmD1R2jbsuGXDIy24SxbmkdQKDJg-4/edit?usp=sharing
Conflict of interest
The authors declare no conflict of interest.
Acknowledgments/ Funding information
V.Fernando was responsible for data collection, curation and analysis as well as writing the manuscript.=. RJM and MMM were crucial in the conceptualisation of the study. RJM, MMM, RDF and PN all provided editorial input into the manuscript through their reviews. MP and NS aided also in data collection. SMJ provided all patients with Parkinson’s disease for assessment through her clinics. This study was not funded.
REFERENCES
- 1.↵
- 2.↵
- 3.
- 4.
- 5.
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.↵
- 18.↵
- 19.↵
- 20.
- 21.
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.
- 82.
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵