Abstract
Objectives The objective of this paper is to systematically review and evaluate the responsiveness of different functional tests via the minimal detectable change (MDC) across different older adult population cohorts.
Design Systematic review of ISI Web of Knowledge and PubMed databases were searched up to September 26th 2020.
Setting Community dwellings, hospital and residential homes
Participants Studies were included if participants were adults over the age of 60. This study reports data from studies that utilise healthy community dwelling older adults, as well as older adults who are hospitalised, live in residential home or have musculoskeletal conditions.
Interventions No interventions feature in this study
Primary and secondary outcome measures MDC reported for gait speed, grip strength, balance, timed up and go, and repeated chair stand separated per older adult sub-group were deemed the primary outcome measure. A secondary outcome measure were the results of a regression analysis, performed to determine the effect of the functional test, cohort, study design and MDC calculation methodology on MDC magnitude.
Results Thirty-nine studies met the inclusion criteria. Not all assessments were evaluated in the literature for all population cohorts. The MDC was affected by the functional test used, the cohort and MDC calculation methodology.
Conclusion The MDC can be assessment and population specific, and thus this should be taken into account when using the MDC. It appears acceptable that different assessors are involved in the re-assessment of the same person.
Trial registration The systematic review protocol was published in PROSPERO (CRD42019147527).
Strengths and Limitations of this Study
Strength: A range of assessments were included to determine if MDC could be used to prioritize specific assessments in interventions.
Strength: A wide range of search criteria and methods resulting in 6448 studies being assessed that enabled the inclusion of 39 original research papers to derive 138 MDC values.
Strength: Analysis of MDC95 for functional tests commonly used by practitioners to assess effective change in older adults
Strength: Analyses of the impact of method design features such as different assessors on the MDC
Limitation: Limited to the settings and tests selected
Introduction
The aging process can result in various physiological and biomechanical changes 1-3, which can lead to, or result from the onset of frailty 4, or diseases such as osteoarthritis 5, 6 or sarcopenia 1, 3. In particular, co-morbidity such as musculoskeletal conditions and acute and chronic living conditions such as hospitalisation and residential home habituation respectively can exacerbate the change that occur due to aging7. The changes underlying physiological, biomechanical and cognitive processes can negatively impact functional ability of the individual. Recently, the World Health Organisation defined functional ability as the combination of the intrinsic capacity of the individual, the environment a person lives in and how people interact with their environment8. This includes walking, balance and strength as indicators of intrinsic capacity8. Subsequently, a decrease in the ability to perform these activities will affect their quality of life and well-being4. To monitor an individual’s function ability, functional performance assessments are used to establish change in functional ability due to aging, that predict increasing mortality risk, falls likeliness, and cognitive decline 2, 9-11.
Changes in functional performance can reflect variances in cognitive processes 2, neuromuscular control 10, endurance 12, muscle strength and ‘physical power’ 10, 13, and poorer static and dynamic balance 14. These changes can lead to fear of falling 15, perception of social isolation/participation 16, an inability to perform daily activities and immobility 17 and a generally reduced quality of life 16. Therefore, it appears important to monitor changes in functional ability over time, both across the lifespan due to aging and during acute events. Similarly, it is important to monitor those changes occurring between the short and long term, such as those due to hospitalisation, musculoskeletal conditions or residential home living. This allows monitoring of the deterioration in patient condition and the responsiveness to interventions by clinicians and therapists.
Various functional performance assessments exist that are safe, cost-effective, require limited time and equipment, and are easy to interpret 18; these tests include the five-times or 30-second Sit to Stand (5xSTS and 30sSTS respectively), Hand Grip Strength (HGS), Berg Balance Scale (BBS), Six Minute Walk (6MWT) and Two Minute Walk (2MWT), normal (NGS) and fast gait speed (FGS), Timed Up and Go (TUG), and Single Legged Stance (SLS). These tests are commonly used to measure functional capacity in both clinical and community-based settings for a range of populations 19. In addition, their scoring is primarily based on continuous data, meaning that minor changes might be identified. This is in contrast to assessments involving cut offs to score performance, such as the Short Physical Performance Battery (SPPB). Despite being popular, the SPPB has the inherent limitation that for example an increase in chair stand test duration of 2.5 seconds might not lead to a change in performance if the baseline was 14.0 seconds. Thus, particular functional assessments employing continuous data for scoring can provide valuable insight into the monitoring of patient centred outcome measures while being objectively quantified.
The ability to observe change or difference in functional assessments is influenced by the size of error within observed values 20. This error can be indicated by the absolute reliability of the data, demonstrated via the calculation of the Minimal Detectable Change (MDC) 21. This statistic provides a quantifiable value that defines the responsiveness of the functional assessment; a lower MDC suggests a better ability to detect a small improvement or deterioration in functional ability 21.
Published systematic reviews exist, which detail the responsiveness of gait speed tests in older adults 22-24. These reviews do not however consider potential differences between normal gait speed and fast gait speed. Similarly, Downs et al. (2013) 25 reviewed the literature for the BBS test, reporting a curvilinear relationship between the average BBS score and the MDC statistic. The MDC of 2MWT 26, HGS 27, and 6MWT 28 have been shown to vary across patient groups suggesting that the responsiveness is population/health condition specific. It is possible that the inter-session reliability of assessments is impacted by the health condition of the patient, such as daily variations in health, pain and subjective feelings 29, whereas this is less likely to affect the intra-session reliability. Thus, musculoskeletal conditions as well as the impact of hospitalisation and residential care living could increase task performance variability compared with healthy individuals. This would increase the MDC, and mean that condition specific MDC values should be used over MDC values calculated on other clinical populations or from the general population. This is a factor not considered in the interpretation of the MDC within the aforementioned gait speed and BBS reviews. Many of these reviews also analyse the data for all age groups without exploring the MDC specifically for older adults. This population’s physiological changes may add further noise to that present due to a specific health condition and increase the MDC reported. These reviews have also excluded studies with small sample size (n < 10) 25, which along with methodological considerations such as the Standard Error of the Measurement (SEM) calculation method used in the MDC calculation, will impact the data reliability statistics 30. In addition, the length of time between trials will also likely impact data noise and the MDC calculated and these factors need to be understood when exploring a suitable MDC to be used when evaluating change.
The purpose of this review was to systematically search the literature to provide an updated review of MDC statistics for the 5xSTS, 30sSTS, BBS, 6MWT, 2MWT, TUG, SLS, HGS and NGS and FGS for healthy older adult populations as well as populations who can be defined as having musculoskeletal conditions, hospitalised or in residential home. This will enable the generation of recommendation for clinicians and therapists to monitor patients over time, and for example determine the individual effectiveness of rehabilitation programmes. Secondary aims were to determine the impact of the study design, SEM calculation method, time between trials or assessor scoring and population size on the MDC measurement. In addition, since the selected functional tests measure similar physiological characteristics, the impact of the test chosen on MDC95 was also explored.
Methods
Data Sources and Searches
The systematic review was designed collectively by two researchers. ISI Web of Knowledge and PubMed databases were searched using the terms identified in Table 1 for all available dates up to September 26th 2020. Subsequently, manual searches from relevant systematic reviews included in the database search and references lists of included manuscripts were performed. This systematic review protocol was registered with PROSPERO (Registration Number CRD42019147527).
Study Selection
Eligible studies were those which reported data for study participants with a mean age greater than 60 years. Eligible studies also included those which reported the SEM or the MDC or analogous terms (Smallest Real Difference (SRD) or Smallest Detectable Change (SDC)). These were identified for the following functional assessment measures: 5xSTS and 30sSTS, BBS, HGS, 6MWT, 2MWT, TUG, SLS, FGS and NGS. Intervention studies, defined as a study which implemented a strategy to change outcome measures over a period of 3 or more weeks, were included only if the MDC and/or SEM data was established using test and retest data collected pre-intervention or post intervention. Studies were also included if it showed an inter-rater (at least two assessors rating a patient’s single performance at the same time on 1 occasion), intra-patient/inter-rater (at least two assessors scoring the patient’s performance on different occasions), and patient test-retest (a patient’s performance was assessed on different occasions by the same assessor) design. Those studies which did not detail the study design or which the design could not be classified as one of the above were excluded, as were those which were not original research studies (i.e. used the data published elsewhere) or were not written in English. From the extracted data, the present study focussed on studies comprising of healthy community dwelling, musculoskeletal conditions, hospitalisation or residential care home participants.
For the papers retrieved, the two authors of this paper independently screened the titles and abstracts for the inclusion criteria. Any research that was not clear whether the criteria was met, underwent a review of the full text and was accepted or rejected accordingly. The reference lists of the accepted studies also underwent the same review process and any additional research publication which met the search criteria and which were not found from the initial database search were then included. Finally, a check of similar systematic reviews also helped to ensure relevant sources were not missed.
Data Extraction and Quality Assessment
Information on the patients’ health status (healthy community dwelling, musculoskeletal conditions, hospitalisation or residential care home), the number of patients sampled and the statistical approach used to calculate the SEM and/or MDC value were extracted into a central data sheet. Additionally, the mean or median average for each cohort on the first trials (baseline) were extracted (or group mean if these values are not provided), along with time between trials or assessors scoring. When required, data were converted to ensure the same measurement units in MDC value across studies (i.e. cm/s to m/s). Likewise, to ensure comparable statistics, when MDC were reported using the z-score associated with a 90% Confidence Interval, these were converted to 95% using the SEM reported in their study or by using Equation 1. All data entered into the central sheet were checked for accuracy by both authors.
Data Synthesis and Analysis
The MDC95 values were averaged across all studies for each assessment and their range was determined. Extracted MDC95 values were also transformed into a percentage of the baseline assessment score if this value was not provided by the study, to enable comparison of the magnitude of the MDC95 across assessments (MDC95%). In line with Chapter 10 of the Cochrane Handbook for Systematic Reviews for Interventions (http://www.training.cochrane.org/handbook), meta-regressions can be used to investigate differences in an outcome variable changes for continuous and categorical explanatory variables. Therefore, to explore the variation in the MDC95 data, an Enter Method multilinear regression was performed using the MDC95% as the dependent variable.
Functional assessment, SEM calculation method (use of a one-way random, two-way random or two-way fixed ICC or square root of mean square error (√MSE)), study population, number of study participants, time between trials or assessor scoring (1 day or fewer, within three days, within seven days, seven days specifically, within two weeks) and study design (intra-rater/intra-patient, inter-rater/intra-patient or inter-rater) data were then used as the independent variables. For time between trials or assessor scoring and the SEM calculation method variables, any category with less than five studies were combined with those which provide unclear data and grouped as ‘other’; these were included in the analysis since these studies include information relating to the other independent variables. However, these ‘other’ groupings were excluded from later interpretation. Since the functional assessment, SEM calculation method, study population, time between trials or assessor scoring and study design data were categorical, the different categories were entered into SPSS as separate columns. Dummy variable (i.e. 1 = yes or 0 = no) were then used to indicate the presence of the independent variable category (level) in the calculation of the MDC95 statistic. To obtain beta coefficients for each level, multiple regression analyses were performed, with each level within the independent variable used as a reference variable in one of the analyses (e.g. not placed into the regression).
Beta coefficients are reported for all variables, along with the r and r2 value and F-value from the ANOVA table. Significant contribution of the independent variables to the variation in dependent variable was indicated at the 95% confidence level when p < 0.05. Comparisons between groups were expressed as absolute difference in MDC95%, thus in percentage points difference in MDC95%.
Results
A total of 39 studies reporting the MDC95 or provided SEM data to calculate MDC95 were included in this review (Figure 1). Table 2 shows the frequency of MDC95 for each assessment per population. Out of the 39 studies reporting the MDC95, a healthy, community dwelling older adults was available for all assessments except for the 30sSTS, while the MDC95 was less frequently available for specific populations for other assessments. The average and range of MDC95 values for each assessment are shown in Table 3 and Figure 2. A full description of the included studies is provided in Appendix A.
There were 138 MDC values included in the regression analysis. The following number of values were therefore included in the regression analysis (note, some studies reported more than one MDC95 value): Functional test (FGS (n = 10), NGS (n = 26), TUG (n = 28), HGS (n = 25), 5xSTS (n = 14), 30sSTS (n = 7), SLS (n = 7), BBS (n = 2), 6MWT (n = 7), 2MWT (n = 12); Design (intra-rater/intra-patient (n = 97), inter-rater/intra-patient (n = 39); inter-rater (2)); SEM calculation method (one-way random effects (n = 2), two-way random effects (n = 27), two-way mixed effects (n = 7), mean square error (√MSE) (n = 51, other/unknown (n = 51)); Time between trials or assessor scoring (1 day or fewer (n = 44), within 3 days (n = 20), within 7 days (n = 20), 7 days (n = 21), within 2 weeks (n = 21), unknown/other (n = 12)); Population (residential (n = 25), musculoskeletal (n = 63), hospitalised (n = 9), healthy community dwellers (n = 41)).
The results of the regression analysis indicated a significant correlation (R = 0.74, p < 0.01) with the r2 indicating that model explained 55.1% of the variance in the dependent variable and was a significant predictor of MDC95%, (F(24,113) = 5.777, p < 0.001). Variation in the MDC95% could be explained by the functional test and sample population used, and the time between trials or assessor scoring (p ≤ 0.05). Conversely, the SEM calculation method and the study design (patient test-retest or intra-patient/inter-rater) was not a significant predictor of MDC95% magnitudes although inter-rater studies were smaller than intra-patient/inter-rater (21.085MDC95% points). Similarly, the regression analyses demonstrated that there was no significant effect of the number of patients sampled on MDC95% (beta = -0.013 years, t = -1.133, p = 0.260). A full description of beta-coefficients are provided in Tables 4-8 (Appendix A).
The analysis demonstrated that the population used to calculate MDC95% impacts the size of the MDC95%. More specifically, those who were healthy community dwellers had smaller MDC95% values compared to residential patients (10.728MDC95% points). The regression also showed differences between the functional test being used, whereby all functional tests possessed significantly smaller MDC95% than the SLS (30sSTS (39.440MDC95% points), HGS (37.941MDC95% points), 5xSTS (32.386MDC95% points), BBS (36.156MDC95% points), NGS (33.221MDC95% points), FGS (39.460MDC95% points), 6MWT (38.207MDC95% points), 2MWT (44.240MDC95% points) and TUG (30.842MDC95% points)). In addition, the 2MWT MDC95% was smaller than that of the TUG (13.398MDC95% points) and the HGS MDC95% was smaller than the NGS test (4.720MDC95% points). Further still, when the data were collected within a day of each other, the MDC95% was smaller than when collected within a week of each other (12.139MDC95% points); those collected within seven days were also larger than those collected specifically in 7 days (13.098MDC95% points)
Discussion
The present systematic review has identified that there is a large range of MDC95 values reported across studies. It is the first study to confirm factors that influence the MDC95 include the cohort population and the type of functional test being performed. In addition, aspects of the study procedures, such as the time between trials and the study design also contribute to variation in the MDC95. These results provide additional support for specific cohort population requirements when choosing the MDC95, which allows clinicians and therapists a more considered choice of test to use with their clients.
The present systematic review is the first to determine that the MDC95 differs between functional tests. For balance assessments, the SLS possessed a greater MDC95% magnitude than BBS and all other assessments. Consequently, this highlights that the SLS is not recommended for use to monitor individuals, although this disagrees with Choi et al. 31. A potential reason could be that Choi et al. 31 based their interpretation on data utilising higher ICCs and was specific to osteoarthritis patients.
Similarly, for gait speed assessments, the TUG and 2MWT were different, whereby the 2MWT possesses a smaller MDC95%; the values for TUG, 6MWT, FGS and NGS which can also be used to measure walking speed, were however similar. It is important to clarify that tests such as the TUG, 2MWT and 6MWT are indicative of endurance and mobility that would impact walking speed, whilst FWS and NWS are more direct measure of typical walking speed. These findings may suggest that the 2MWT introduces less error than TUG. Alternatively, it may suggest less sensitivity to calculation errors when the 2MWT is used. The MDC95% of FGS was also not significantly different to that of NGS, a comparison not previously explored 22-24. These results therefore highlight the importance of considering the test being used to assess patients. It suggests the 2MWT may offer the greater ability to detect smaller changes than the other measures, although it is acknowledged that this will be quite time-consuming and requires substantial space requirements. In contrast to balance and gait speed assessment, lower leg strength as indicated via the tests TUG, 30xSTS and 5xSTS provided similar MDC95% indicating equal ability to detect differences between population groups.
The present systematic review also confirms that there is variation in the MDC95 across patient populations, as suggested previously for 6MWT 28, 2MWT 26 and HGS 27. The multivariate regression showed that fluctuations in daily performance appear greater in individuals with underlying health conditions compared with healthy individuals. These relative differences will likely be due to the different levels of noise within the data, a result of the differing health conditions and its effect on the neuromuscular functioning of the body 32. However, despite the increased number of studies reporting the MDC compared to previous systematic reviews, Table 2 shows that not all populations selected in this study have been evaluated to date for multiple functional tests. Therefore, clinicians and therapists need to consider whether further reliability study is required for these population-functional test combinations. Similarly, under-powered comparisons may prevent other differences between populations from being demonstrated in this study.
The highest MDC95% values were found to be within 7 days between assessments, with generally around 10 MDC95% points higher than assessments done on the same day. It remains unknown why MDC95% is increased between 1-7 days, but speculatively might be due to regular activities varying from day to day, but repeated weekly thus leading to similar values when at a 7-day separation period was used, which was also different to data collected within a week of each other. MDC95% around 1 week offers some support to the view of Choi et al. (2014) 31, who recommended that 7 days between trials as this was long enough for a learning effect to be mitigated but short enough for clinical status not to change. Yet, since there was no difference in MDC95% between those collected on the same day with those with those with at least 7 days separation, it would seem advantageous to collect multiple trials on the same day.
The regression analysis further revealed that there was no effect of repeated measure study design, thus having the same assessor to measure the test and retest appears to have the same impact on the MDC95 as the use of different assessors. This in agreement with some individual studies (e.g. 33, 34), but potential limitations of this study could be because of publication bias and large heterogeneity of study methodologies included in the present review. The lack of effect of assessor was likely related to the simple nature of the tests, sufficient assessor training and the use of the same or similar equipment. Consequently, the MDC95 in the studies identified mostly reflects error due to performer. Combined, it appears that future intervention studies should consider having multiple assessments on the same day or between 7 days apart, but this does not necessarily have to be the same assessor.
The results of the regression analyses also indicate that the ICC model had little impact on the MDC95 and did not find evidence to suggest that the two-way fixed model data should not be used beyond the study in which it has been conducted 30. Compared with the two-way fixed model, the two-way random model uses the systematic and random errors as separate sources of error in the ICC calculation. Consequently, given the presence of systematic error which is of similar magnitudes, the two-way fixed model will underestimate the MDC95 calculated 30. However, given the similar magnitudes it may suggest that the level of systematic error is low and thus not impacting the values obtained. Further still, the √MSE calculation did not result in a lower MDC95 compared to the two-way random model or two-way fixed model. In comparison to the two-way random model, this was unexpected, whereas in comparison to the two-way fixed model this was as anticipated given that systematic error was also ignored by this method of SEM calculation. The one-way model produced similar MDC95% to those calculated by two-way models. These observations are unexpected since the one-way model should offer a more conservative reliability statistic and thus a larger MDC95 when errors are similar 30. A potential explanation could be that low statistical power and varying magnitudes of random and systematic errors across studies, due to different method considerations, may have had a larger impact on the difference in MDC95% between these models, especially since the number of data sets included for the one-way model was relatively low (n = 16).
From a practical implementation perspective, clinicians and therapists should consider the population and MDC calculation techniques before using the currently available MDC values and that differently designed analyses might not provide appropriate MDC values. Moreover, there was a large range of MDC values across studies and the regression model only explained 54.4%, suggesting that many other factors could limit the use of pre-existing MDC values. Further research to better understand variability of assessments is thus warranted. In addition, the typical MDC95 exceeding 20% suggests that monitoring an individual’s change requires on average at least 20% change if done with a single assessment. This for example equals around 7 or 15 years of ‘typical’ decline for hand grip strength for an 80 or 60-year-old person respectively 35, indicating that the MDC is considerably large. Therefore, future research could therefore also explore alternative techniques to the MDC to monitor the impact of the aging process on functional ability. This could improve the ability to identify functional decline earlier, and intervene earlier in a person-centred approach, and thus potentially better long-term outcomes for the patient. A potential alternative approach would be to determine an MDC equivalent for each individual, based on that individual’s variation in repeated assessments. This would however require multiple measurements and would warrant ubiquitous monitoring, or self-assessment techniques to be developed that allow repeat measurements to detect automatically.
Limitations
Limitations of this paper includes that no risk of bias assessment was performed, due to lack of existing tools 36. However, recently, a Delphi study made various recommendations 36, which are similar to the methodology employed in the present systematic review, confirming the quality of the data exclusion criteria. Nevertheless, in some studies, the method used were not clear meaning that for some independent variables they needed classification as ‘other’. The number of studies available in these populations may also suggest that the results for some independent variables led to underpowered comparisons and thus potential type 2 error in the conclusions. The study also reviewed data for specific cohorts of older adults, and given the effect of these on the results obtained, future research should summarise the impact of other health conditions on the MDC95 in older adult population; it would also be worthwhile to explore the impact of health conditions within younger population. Furthermore, the MDC only reflects the value needed for a difference to be beyond chance and not necessarily the magnitude of change needed for clinical symptoms to change 21. A recent expert panel on frailty and sarcopenia highlighted the importance of Minimal Clinical Important Change statistic to determine clinically meaningful change for physical performance measures 37, thus a systematic review of anchor-based values that indicate this would therefore be a useful further study. Finally, the present study did not focus on the minimal change required for groups, which can be quantified using the SEM and is effectively the MDC95 divided by 2.77 (see equation 1). Combined, a comparison of both the MDC and MCIC with effect size of intervention studies could increase the understanding of the magnitude and impact of an intervention.
Conclusion
In conclusion, the MDC95 can be assessment specific and when using a previously established MDC one should consider the impact of the method used, particularly when one-way models or √MSE values are used in the calculation. Thus, clinicians and therapists should be careful about relying on a single MDC study for their interpretation of individual patient changes, regardless whether this is for ongoing monitoring or rehabilitation interventions. Gait speed measured via 2MWT appears to be less sensitive to, or result in less variation between trials than other measures of gait speed and the single leg stance is not recommended for use. It appears acceptable that different assessors are involved in the re-assessment of the same person. The minimal detectable change is dependent on the population of interest. However, for some assessments, population specific indicators of responsiveness are not currently available, and therefore the present systematic review provides a guide to appropriate values to employ as the minimal detectable change.
Data Availability
All data is retrievable from the manuscript and supplementary files, including the search results via EndNote files.
https://osf.io/jc7b8/?view_only=a53734d68e20463c9b007efe60453711
Funding Statement
This work was supported by European Innovation and Technology Health, no grant number.
Competing interests
There are NO associations with commercial entities that provided support for the work reported in the submitted manuscript.
There are NO associations with commercial entities that could be viewed as having an interest in the general area of the submitted manuscript.
There are NO similar financial associations involving their spouse or their children under 18 years of age.
There were NO non-financial associations that may be relevant to the submitted manuscript.
Author Contribution
Both authors were central to the conception, development of protocol, summary, analysis and discussion of data and the preparation of this manuscript.
A data sharing statement
Endnote data file of all retrieved papers is shared via https://osf.io/jc7b8/?view_only=a53734d68e20463c9b007efe60453711
Acknowledgements
There are no acknowledgements for this study
Appendix A: beta coefficients
Footnotes
Email: Daniel.low{at}brunel.ac.uk, Phone: +44(0)1895 268931