ABSTRACT
The placebo and nocebo effects highlight the importance of expectations in modulating pain perception, but in every day life we don’t need an external source of information to form expectations about pain. The brain can learn to predict pain in a more fundamental way, simply by experiencing fluctuating, non-random streams of noxious inputs, and extracting their temporal regularities. This process is called statistical learning. Here we address two key open questions: (1) does statistical learning modulate pain perception? and (2) is it different in people with chronic musculoskeletal pain? In a first experiment, we asked 27 participants to both rate and predict pain intensity levels in sequences of fluctuating heat pain. Using a computational approach, we show that probabilistic expectations and confidence were used to weight pain perception and prediction. Given that statistical learning involves supramodal processes, we developed an online, stock market game to assess the ability to explicitly predict volatile and stochastic time series, probing the most fundamental components of statistical learning. The game was played by 56 chronic back pain and 55 healthy participants. We show that back pain participants learn the statistics of the sequence more slowly than controls. In conclusion, this study shows that statistical learning shapes pain experience and can be disrupted in common chronic pain conditions, opening a new path of research into the brain mechanisms of pain regulation.
1 INTRODUCTION
Clinical pain typically varies over time; in most pain states, the brain receives a stream of volatile and noisy noxious signals, which are also auto-correlated in time. The temporal structure of these signals is important, because the human brain has evolved the exceptional ability to extract regularities from streams of auto-correlated sensory signals, a process called statistical learning [1–7]. In the context of pain, statistical learning can allow the brain to predict future pain, which is crucial for orienting behaviour and maximising well-being [8, 9]. Statistical learning might also be fundamental to the ability of the nervous system to endogenously regulate pain. Indeed, statistical learning generates predictions about forthcoming pain. We already know that pain expectations can modulate pain levels by gating the reciprocal transmission of neural signals between the brain and spinal cord, as shown by previous work on placebo and nocebo effects [10–14].
Recent work using temporal sequences of noxious inputs has shown that the pain system supports the statistical learning of the basic rate of getting pain by engaging both somatosensory and supramodal cortical regions [8]. Specifically, both sensorimotor cortical regions and the ventral striatum encode probabilistic predictions about pain intensity, which are updated as a function of learning by engaging parietal and prefrontal regions. According to a Bayesian inference framework, both the predictive inference and its confidence should, in theory, modulate the neural response to noxious inputs and affect perception, as a function of learning. In support of this conjecture, there is evidence that the confidence of probabilistic pain predictions modulates the cortical response to pain [9]. The relationship is inverse: the lower the confidence, the higher is the early cortical response to noxious inputs (and viceversa), as measured by EEG. This is expected based on Bayesian inference theory: when confidence is low, the brain relies less on his prior beliefs and more on sensory evidence to respond to the input. Bayesian inference theory also predicts that prior expectations and their confidence scale perception [15]. Thus, we hypothesise that the predictions generated by learning the statistics of noxious inputs in dynamically evolving sequences of stimuli modulate the perception of forthcoming inputs.
We address this question in the first experiment of this study. Specifically, we asked healthy participants to both rate the perceived and predicted intensity of pain, and their confidence (Fig. 1a-b), in thermal sequences with varying levels of temporal regularity (Fig. 1c). We contrasted four models of statistical learning, which varied according to the inference strategy used (i.e., optimal Bayesian inference or an heuristic) and the role of expectations on perception. All models used confidence ratings to weight the inference. We anticipate that probabilistic learning weighted by confidence and expectations modulates pain perception. This provides behavioural evidence for a link between learning and endogenous pain regulation.
One reason why this is important is that it might help understand individual differences in the ability to endogenously regulate pain. This is particularly relevant for chronic pain, given that endogenous pain regulation can be dysfunctional in several chronic pain conditions [16–21]. Although there is ample evidence for changes in the functional anatomy and connectivity of endogenous pain modulatory systems in chronic pain, their computational mechanisms are poorly understood. We speculate that there might be a close relationship between statistical learning and the ability to effectively regulate pain endogenously. Disrupted statistical learning could result in dysfunctional endogenous pain regulation, and vice versa, in a vicious loop that leads to pain deregulation (i.e. deregulated pain is more difficult to learn and predict, which makes it harder to control).
Given that (1) statistical learning modulates pain perception (Experiment 1) and (2) the core mechanisms of statistical learning are shared by a range of cognitive processes across sensory domains and modalities [22], in a second experiment we investigated whether statistical learning of dynamically-changing and noisy sequences is altered in people with chronic back pain vs. the general population. We selected back pain as condition of interest simply because it is one of the most prevalent causes of disability worldwide [23], affecting approx. 10% of adults in the U.S. [24]. The experiment was designed iteratively with people with lived experience of chronic back pain, using a systems engineering approach to study design. The result was an open-source, phone-based online game that assessed supramodal aspects of statistical learning, which could be easily scaled-up for future digital healthcare applications. Experiment 2 shows that statistical learning of fluctuating values is indeed disrupted in chronic back pain, opening a promising new path of research on the relationship between statistical learning and endogenous pain regulation.
2 RESULTS
2.1 Modelling strategy
In Experiment 1, participants were required to rate their perceived and predicted pain, whereas in Experiment 2 they were required to predict the fluctuating value of shares in noisy and volatile timeseries. In both Experiments 1 and 2, we tested two main families of learning strategies: an optimal Bayesian inference strategy (whereby uncertainty weights the learning rate) and an heuristic (a non-probabilistic delta rule, whereby the learning rate is fixed). These two approaches should be viewed as complimentary [25, 26]. As a baseline, we included a random-response model (please see Methods for a formal treatment of the computational models).
According to a Bayesian strategy, on each trial, participants update their beliefs about the feature of interest (thermal stimuli or stock market values) based on probabilistic inference, maintaining a full posterior distribution over its values [26, 27]. Operating within a Bayesian paradigm, participants are assumed to track and, following new information, update both the mean of the sequence of interest and the uncertainty around it [28]. In most cases, such inference makes an assumption about environmental dynamics. For example, a common assumption is that the underlying mean (a hidden/latent state) evolves linearly according to a Gaussian random walk, with the rate of this evolution defined by the the variance of this Gaussian walk (volatility). The observed value is then drawn from another Gaussian with that mean, which has some observation noise (stochasticity). In this case, the observer can infer the latent states through the process of Bayesian filtering [27], using the Kalman Filter (KF) algorithm [29].
Sequence learning can also be captured by an heuristic to the Bayesian learning, i.e. a simple reinforcement learning (RL) rule. Here, participants maintain and update a point-estimate of the expected value of the sequence in an adaptive manner, within a non-stationary environment. RL explicitly involves correcting the tracked mean of the sequence proportionally to a trial-by-trial prediction error (PE) - a difference between the expected and actual value of the sequence [30]. Importantly, RL agents do not assume any specific dynamics of the environment and hence are considered model-free.
Both models perform some form of error correction about the underlying sequence. The rate at which this occurs is captured by the learning rate α ∈ [0, 1] element. The higher the learning rate, the faster participants update their beliefs about the sequence after each observation. In case of the RL model, the learning rate α is a free parameter that is constant across the trials. On the other hand, the learning rate in the KF model αt (also known as the kalman gain) is calculated on every trial. It depends on participants’ trial-wise belief uncertainty as well as their overall estimation of the inherent noise in the environment (stochasticity, s). In turn, the belief uncertainty is updated after each observation and depends on participants sense of volatility (v) and stochasticity (s) in the environment.
Crucially, we also used participants’ trial-by-trial confidence ratings to measure to what extent confidence plays a role in learning. This is captured by the confidence scaling factor C, which defines the extent to which confidence affects response (un-)certainty. Intuitively, the higher the confidence scaling factor C, the less important role confidence plays in participant’s response. With relatively low values of C, when the confidence is low, participants responses are more noisy, i.e. less certain. We demonstrate this in Fig. 2 by plotting hypothetical responses (A-F) and the effect on the noise scaling (G-L) as a function of C and confidence ratings.
2.2 Experiment 1: pain perception and prediction
In the first experiment we set out to establish whether pain perception is modulated by statistical learning, i.e. whether participants rely on extracted temporal regularities in rating their pain. In particular, following the work by [26], the primary aim of the experiment was to determine whether the expectations participants hold about the sequence inform their perceptual beliefs about the intensity of the stimuli. To that end, we recruited 27 healthy participants to complete a psycho-physical experiment where we delivered four different, 80-trial-long sequences (conditions) of evolving thermal stimuli. On each trial, a 2s thermal stimulus was applied, following which participants where asked to either rate their perception of the intensity (Fig. 1a) or to predict the intensity of the next stimulus in the sequence (Fig. 1b). Participants also reported their response confidence.
As a secondary aim, we evaluated how participants perform when we manipulate levels of volatility and stochasticity [31]. The volatility can be conceived of as how quickly (or slowly) the sequence evolves over time. In Kalman-Filter models, this is referred to as the process noise. There are two main families of volatility: within-context volatility (which we explicitly manipulated in Experiment 2) and between-context volatility, i.e. how likely a context, such as a reward rate, is to switch from trial to trial [32]. In Experiment 1, we varied the level of volatility and stochasticity across blocks (i.e. conditions), whilst we fixed their overall level within each block; the level of volatility was defined by the number of trials until the mean intensity level changes. The changes were often subtle and participants were not informed when they happened. Given that volatility was fixed within condition, we treated it as a single-context scenario from the point of view of modelling [32], and we did not interpret its effect on the learning rate [31]. The stochasticity is the additional noise that is added on each trial to the underlying mean, often referred to as the observation noise. In Experiment 1, we set two levels (low/high) of each type of uncertainty, achieving a 2×2 factorial design, with the order of conditions randomised across participants. A set of four example sequences of thermal intensities delivered to one of the participant’s can be found in Fig. 1c, alongside their ratings of perception and predictions. Additionally, example confidence ratings for each type of response are plotted in Fig. 1d. We refer the reader to the Methods section for a detailed description of the generative process of the sequences.
2.2.1 Model-naive performance
Prior to modelling, we first checked whether participant’s performance in the task was affected by the sequence condition. As a measure of performance, we calculated the root mean squared error (RMSE) of participants responses (ratings and predictions) compared to the normative noxious input for each condition as in Fig. 3 (see also Methods). The lower the RMSE, the more accurate participants’ responses are. Performance in different conditions was analysed with a repeated measures ANOVA, whose results are reported in full in Supplementary Table 1. Although volatility did not affect rating accuracy (F(1,26) = 0.96, p = 0.336, , we found a 2-way interaction between the level of stochasticity of the sequence (low, high) and the type of rating provided (perceived intensity vs. prediction) (F(1,26) = 29.842, p < 0.001, . We followed up this interaction in post-hoc comparisons, as reported in Supplementary Table 2. The performance score differences between all the pairs of stochasticity and response type interactions were significant, apart from the perception ratings in the stochastic environment as compared with perception and prediction performance in the low stochastic setting. Intuitively, the RMSE score analysis revealed an overall trend of participants performing worse on the prediction task, in particular when the level of stochasticity is high.
2.2.2 Modelling results
To evaluate the effect of expectation on perceived intensity (on top of statistical learning modulating perception), we expanded the standard RL and KF models by adding a perceptual weighting element, γ ∈ [0, 1] (similarly to [26]). Essentially, γ governs how much each participant relies on the normative input on each trial, and how much their expectation of the input influences their reported perception - i.e. they take a weighted average of the two. The higher the γ, the bigger the impact of the expectation on perception. Again, in the case of the Reinforcement Learning model (eRL - expectation weighted RL), γ is a free parameter that is constant across trials, while in the Kalman Filter model (eKF - expectation weighted KF), γt is calculated on every trial and depends on: (1) the participants’ trial-wise belief uncertainty, (2) their overall estimation of the inherent noise in the environment (stochasticity, s) and (3) the participant’s subjective uncertainty about the level of intensity, ε. Thus, in total we tested 5 models in Experiment 1: RL and KF (perception not-weighted by expectations), eRL and eKF (perception weighted by expectations), and a baseline random model. We then proceeded to fit these 5 computational models to participants’ responses. For parameter estimation, we used hierarchical Bayesian methods, where we obtained group- and individual-level estimates for each model parameter (see Methods).
Sequence conditions
We fit each model for each condition sequence. Example trial-by-trial model prediction plots from one participant can be found in Supplementary Fig. 5. To establish which of the models fitted the data best, we ran model comparison analysis based on the difference in expected log point-wise predictive density (ELPD) between models. The models are ranked according to the ELPD (with the largest providing the best fit). The ratio between the ELPD difference and the standard error around it provides a significance test proxy through the sigma effect. In each condition, the expectation weighted models provided significantly better fit than models without this element (Fig. 4a-d), suggesting that regardless of the levels of volatility and stochasticity, participants still weigh perception of the stimuli with their expectation. In particular, we found that the expectation-weighted KF model offered a better fit than the eRL, although in conditions of high stochasticity this difference was short of significance against the eRL model. This suggests that in learning about temporal regularities in the sequences of thermal stimuli, participants’ expectations play a significant role in the perception of the stimulus. Moreover, this process was best captured by a model that updates the observer’s belief about the mean and the uncertainty of the sequence in a Bayesian manner.
We also found that as the confidence in the response decreases, the response uncertainty is scaled linearly with a negative slope ranging between 0.112-0.276 across conditions (Fig. 5), confirming the intuition that less confidence leads to bigger uncertainty.
As an additional check, for each participant, condition and response type (perception and prediction), we plotted participants’ ratings against model predicted ratings and calculated a grand mean correlation in Supplementary Fig. 6.
Next, we checked whether the parameters of the the winning eKF model differed across different sequence conditions. There were no differences for the group-level parameters; i.e., we did not detect significant differences between conditions in a hypothetical healthy participant group as generalised from our population of participants.
However, we found some differences at the individual-level of parameters (i.e. within our specific population of recruited participants), which we detected by performing repeated-measures ANOVAs (see Supplementary Fig. 9 for visualisation). The stochasticity parameter s was affected by the interaction between the levels of stochasticity and volatility , and was higher in highly stochastic and volatile conditions as compared to conditions where either volatility (t = 7.735, pbon f < 0.001), stochasticity (t = 9.396, pbon f < 0.001) or both were low (t = 8.826, pbon f < 0.001). This suggests that, while participants’ performance was generally worse in highly stochastic environments, participants seem to have attributed this to only one source - stochasticity (s), regardless of the source of higher uncertainty in the sequence (stochasticity or volatility).
The response noise ξ was modulated by the level of volatility , where it was smaller in highly volatile conditions. Moreover, we detected a significant interaction between volatility and stochasticity on the confidence scaling factor , where the values C were overall lower when either volatility (t = 11.570, pbon f < 0.001), stochasticity (t = − 6.165, pbon f < 0.001) or both (t = − 4.575, pbon f < 0.001) were high as compared to the conditions where both levels of noise were low. This indicates there may have been some trade-off between ξ and C, as lower values of C introduce additional uncertainty when participant’s confidence is low.
Lastly, we found the initial uncertainty belief w0 was affected by the interaction between volatility and stochasticity without a consistent pattern, with significant differences between each pair reported in Supplementary Table 17. All the other effects were not significant, and can be found in the Supplementary Tables 10-24.
In summary, we formalised the process behind pain perception and prediction in noxious time-series within the framework of sequential learning, where the best description of participants’ statistical learning was captured through Bayesian filtering, in particular using a confidence-weighted Kalman Filter model. Most importantly, we discovered that, in addition to weighing their responses with confidence, participants used their expectations about stimulus intensity levels to form a judgement as to what they perceived. This mechanism was present across various levels of uncertainty that defined the sequences (volatility and stochasticity).
2.3 Experiment 2: statistical learning and chronic back pain
Given that statistical learning is involved in endogenously modulating pain perception, we wondered whether the fundamental components of statistical learning are affected in people with chronic pain. Indeed, one of the core mechanisms proposed to explain chronic pain is a dysfunction of the descending pain modulatory system, and this is based on ample neurophysiological and pharmacological evidence [16–21]. However, the computational mechanisms of dysfunctional pain regulation in chronic pain are poorly understood. We hypothesise that there might be a close relationship between statistical learning and the ability to effectively regulate pain endogenously. Disrupted statistical learning could result in dysfunctional endogenous pain regulation, and vice versa, in a vicious loop that leads to pain deregulation (i.e. deregulated pain is more difficult to learn and predict, which makes it harder to control). Importantly, learning the statistics of time-varying sensory inputs is an ubiquitous neural function involved in many tasks across sensory modalities, from the visual to auditory systems, including the pain system. Statistical learning is governed by general computational principles, shared across modalities, and partially shared neurobiological basis [22]. Whereas Experiment 1 focused specifically on statistical learning for pain inputs, Experiment 2 targets fundamental, supramodal aspects of statistical learning which could characterise an individual statistical learning strategy.
In collaboration with a group of people with lived experience of chronic pain, we designed an online game where participants made explicit predictions about stochastic and volatile time-series. In the game, participants played the role of a stockbroker predicting how a company’s share price fluctuated over a series of days (where each trial represents a day). Participants were informed that the value of their shares was $100 before the first trial. At the start of each trial participants were asked to predict the value of their shares on that day and rate their confidence in their prediction. Then the actual value of their shares was subsequently revealed. A schematic of a single trial is depicted in Fig. 1e. This was repeated for 200 trials in total. Fig. 1g shows one example of a participant’s share price predictions and confidence ratings over the course of the game. The generative model of the sequence is detailed in the Methods. The main difference with Experiment 1 was that the level of volatility and stochasticity of the sequence slowly evolved within each sequence.
We compared the performance of people with chronic back pain (N = 56) to age-matched healthy controls (N = 55). We chose back pain as condition of interest, because it is the most common chronic pain disorder [23]; however, there is no reason to think that the findings would not generalise to other chronic pain conditions. For the chronic pain participant group, we recruited individuals who had experienced pain in their back for a duration of over 6 months [33]. Chronic pain is associated with emotional comorbidity, particularly anxiety and depression, and these have been linked with impaired cognitive functions, including statistical learning [34]. Therefore, to reduce confounding factors in the group comparison, we selected controls with low psychological questionnaire scores for anxiety and depression as well as pain. Participant questionnaire scores are displayed in the Supplementary Table 25.
Before starting the task, participants rated the intensity of pain they were experiencing in their back and their current level of fatigue out of 10. Unsurprisingly, back pain participants gave significantly higher ratings than controls for their levels of pain (Bayesian Independent Samples T-Test, BF10 = 3.136 × 1014) and fatigue (BF10 = 3.608 × 108).
Before the computational analysis, we compared how well the back pain participants performed in the game relative to the controls. Participants’ predictions and confidence ratings are displayed in the Supplementary Fig. 10-13. As a model-naive measure of performance, we calculated the RMSE of participant predictions compared to the sequence outcomes. Using this measure, we found prediction performance was comparable across groups (Bayesian Independent Samples T-Test, BF10 = 0.250).
2.3.1 Computational modelling results
We fit the RL, KF and random models to the participant prediction and confidence rating data, fitting the back pain and control participant groups separately. We based the model comparison analysis on the difference in log point-wise predictive density (ELPD) between models, as in Experiment 1, to determine which model best fit the participant data. Models were ranked according to the ELPD, where greater ELPD indicates a better fit. For each model, the ELPD difference relative the best fitting model and its standard error are shown in Table 1. Additionally shown is the sigma effect, the ratio between the ELPD and the standard error, which is a proxy for significance.
For both the pain and control groups (Fig. 6), the model comparison shows the KF and RL models fit the data significantly better than the random model. The ELPD difference between the KF and RL was not significant, indicating they provided similar closeness of fit. Additionally we calculated the correlation between participant predictions and model predictions as a measure of model accuracy. For each group, we found that the RL and KF mean Pearson correlation coefficients (r) were identical to 2 significant figures. The correlation between model prediction and participant prediction was greater for the control group (KF: r = 0.64, RL: r = 0.64) than the back pain group (KF: r = 0.56, RL: r = 0.56), see Supplementary Fig. 14. This indicates that the KF and RL models provided a similar closeness of fit to the data and similar prediction accuracy.
2.3.2 Parameter Analysis
Next, we compared the RL and KF parameter estimates between the back pain and control groups. The models were hierarchically parameterised, such that the individual-level parameters were scaled by group-level parameters. The models were simultaneously fitted to the group and to the individuals to obtain the joint probability distribution for all parameters. Therefore, we obtained estimates for group-level means as well as individual estimates, regularised by group level statistics.
To determine whether there were any differences at the group parameter level, we compared the 95% highest density intervals (HDIs) of the posterior distributions across groups. If the HDI of the difference between the back pain and control group posteriors did not contain 0, we concluded the group-level posteriors were significantly different. In the RL model, the back pain group-level mean learning rate was significantly lower than the controls (95% HDI difference (Back pain – Controls) = [-0.281, -0.010]); Fig. 7a.
For the individual-level parameters, we performed Bayesian independent samples T-tests to compare back pain and control groups. For the RL model, the learning rate α parameters were again significantly lower for the back pain vs. control groups (BF10 = 10.487); Fig. 7b. This indicates back pain participants learned the statistics of the sequences more slowly than controls, confirming findings at the group parameter level. Additionally, the confidence scaling factor C was significantly lower for the back pain vs. control group (BF10 = 806, 177.442); Fig. 7d. The parameter C modulates how the confidence rating scales the magnitude of the prediction noise, such that smaller C values increase the magnitude of the confidence scaling term, leading to noisier responses. Therefore, the lower C estimates of the back pain group indicate they gave noisier responses in the game, more strongly affected by confidence, compared to controls.
For the KF model, stochasticity s estimates were significantly higher for the back pain vs. control group (BF10 = 27, 332.251; Fig. 8b). This indicates back pain participants estimated the stochasticity of the sequences to be greater than controls. The initial uncertainty w0 parameters were significantly higher for back pain vs. control group (BF10 = 2.438 × 1011; Fig. 8d), indicating their initial estimation of noise of the sequences was higher. The KF confidence scaling factor C was significantly lower for the back pain vs. control group (BF10 = 1.170 ×106; Fig. 8f). This reiterates the finding from the RL model parameter comparison - that the chronic pain vs. control groups provided noisier responses, more strongly influenced by confidence.
Additionally, the mean Kalman gain over the whole game was calculated for each participant. The values were highly correlated with the RL learning rate α (Bayesian Pearson Correlation, Pearson’s r = 0.975, BF10 = 1.182 × 1069). As with values were significantly lower in back pain participants than controls (BF10 = 10.977; Fig. 8g). Therefore, the KF results also indicate back pain participants displayed slower learning of the statistics of the game sequences.
In summary, for both the control and back pain groups, the RL and KF models fit the participant data significantly better than the random baseline, although there was no significant difference in the closeness of fit between the RL and KF models. Comparison of model parameter estimates between groups revealed significant group differences. RL learning rate α parameters and KF average Kalman gain values were significantly lower for the back pain participants vs. controls, indicating back pain participants learned the statistics of the sequence more slowly than controls.
At the level of individual parameters, confidence scaling factor C estimates for both the RL and KF models were lower for back pain participants than controls, suggesting that back pain participants provided more uncertain predictions which were more strongly affected by confidence. KF initial uncertainty w0 and stochasticity s estimates were higher for the back pain group, indicating back pain participants had greater initial uncertainty and perceived greater stochasticity in the sequence.
3 DISCUSSION
Statistical learning allows the brain to extract regularities from streams of sensory inputs and is central to perception and cognitive function. Despite its fundamental role, it has often been overlooked in the field of pain research. Yet, chronic pain appears to fluctuate over time, in ways that are non-random. For instance, [35–37] reported that chronic back pain ratings vary periodically, over several seconds-minutes and in absence of movements. This temporal aspect of pain is important because periodic temporal structures are easy to learn for the brain [1, 8]. If the temporal evolution of pain is learned, it can be used by the brain to regulate its responses to forthcoming pain, effectively shaping how much pain it experiences. Indeed, Experiment 1 shows that healthy participants extract temporal regularities from sequences of noxious stimuli and use this probabilistic knowledge to form confidence-weighted judgements and predictions about the level of pain intensity they experience in the sequence. We formalised our results within a Bayesian inference framework, where the belief about the level of pain intensity is updated on each trial according to the the amount of uncertainty participants ascribe to the stimuli and the environment. Importantly, their perception and prediction of pain were influenced by the expected level of intensity that participants held about the sequence before responding. This phenomenon remained unchanged when we varied different levels of inherent uncertainty in the sequences of stimuli (stochasticity and volatility).
Using a similar theoretical approach, Experiment 2 shows that statistical learning for noisy and volatile sequences of fluctuating values is slower in adults with chronic back pain than age-matched controls. This is important because sub-optimal learning to predict information about the value of events in the context of pain could affect the ability of the nervous system to endogenous regulate its responses to forthcoming negative states, such as pain. This finding opens a new path of research, to determine whether maladaptive statistical learning increase both the risk and severity of chronic pain conditions.
3.1 Statistical inference and learning in pain sequences
The first main contribution of our work is towards the understanding of the phenomenon of statistical learning in the context of pain. Statistical learning is an important function that the brain employs across the lifespan, with relevance to perception, cognition and learning [6]. The large majority of past research on statistical learning focused on visual and auditory perception [1, 3, 4, 38], with the nociceptive system receiving relatively little attention [39]. Recently, we showed that the human brain can learn to predict a sequence of two pain levels (low and high) in a manner consistent with optimal Bayesian inference, by engaging sensorimotor regions, parietal, premotor regions and dorsal striatum [8]. We also found that the confidence of these probabilistic inferences modulates the cortical response to pain, as expected by hierarchical Bayesian inference theory [9]. Here we tested sequences with a much larger range of stimulus intensities to elucidate the effect of statistical learning and expectations on pain perception. As predicted by hierarchical Bayesian inference theory, we find that the pain intensity judgements are scaled by both probabilistic expectations and confidence.
Hence, our work highlights the inferential nature of the nociceptive system [26, 39–42], where in addition to the sheer input received by the nociceptors, there is a wealth of a priori knowledge and beliefs the agent holds about themselves and the environment that need to be integrated to form a judgement about pain [43–46]. This has an immediate significance for the real world, where weights need to be assigned to prior beliefs and/or stimuli to successfully protect the organism from further damage, but only to an extent to which it is beneficial.
Secondly, our results regarding the effect of expectation on pain perception relate to a much larger literature on this topic. The prime example would be placebo analgesia (i.e. the expectation of pain relief decreasing pain perception) and nocebo hyperalgesia (i.e. the expectation of high level of pain increasing its perception; [10, 42, 47, 48]. Recent work attempted to capture such expectancy effects within the Bayesian inference framework. For example, [28] showed that in addition to expectation influencing perceived pain in general, higher level of uncertainty around that expectation attenuated its effect on perception. Similarly, [49] demonstrated that when the discrepancy between the expectation and outcome (prediction error) is unusually large, the role of expectation is significantly reduced and so the placebo and nocebo effects are not that strong. An unusually large prediction error could be thought of as contributing to increased uncertainty about the stimuli, which mirrors the results from [28] Bayesian framework. Nevertheless, the types of stimuli used in the above studies (i.e. noxious stimuli cued by non-noxious stimuli) differed from the more ecologically valid sequences of pain that are reported by chronic pain patients [35], as we indicated above. Furthermore, [26] used a conditioning paradigm and also found that expectations influence both perception and learning, in a self-reinforcing loop. Our work has followed a similar modelling strategy to [26], but it goes beyond simple conditioning schedules or sequences of two-level discrete painful stimuli, showing expectancy effects even when the intensities are allowed to vary across a wider range of values and according to more complex statistical temporal structures. Additionally, given the reported role of confidence in perception of pain [9, 50], we draw a more complete picture by including participants confidence ratings in our modelling analysis.
3.2 Statistical inference in the context of chronic back pain
Statistical learning is known to be mediated by both modality-specific and supramodal mechanisms [22]. Although the former can only be probed using paradigms that involve the presentation of noxious stimuli, the most fundamental, supramodal components of statistical learning can be investigated using more abstract sequence learning tasks, such as that used in Experiment 2. We designed volatile and noisy sequences of share values, which would yield gains and losses on each trial. This is an abstraction from the dynamic time-series of pain states that patients experience, which fluctuate between states of relief (i.e. gains) and flares (i.e. losses, from the point of view of their value).
The results from Experiment 2 indicate that statistical learning in people with chronic pain is significantly altered. By fitting computational models to the online prediction game data of participants and comparing RL and KF parameter estimates of the back pain participants vs controls, we found the following key differences. Firstly, back pain participants learned the statistics of the time-series more slowly than controls - the difference was significant at both individual and group-parameters levels in both the RL and KF models. Secondly, the estimation of the initial uncertainty and stochasticity of the sequence in the KF model appeared to be greater in chronic back pain participants than controls, although only at the level of individual parameters. It is likely that the greater noise in the estimates of the back pain group underlies their lower learning rates. The greater the random noise (stochasticity) of a sequence, the less informative each observation is about the true value relative to the long term mean. For optimal learning, as the stochasticity of a sequence increases, the slower expectations are updated, hence the lower the learning rate. Finally, the predictions of back pain participants were more strongly affected by confidence, resulting in noisier responses.
Future studies would need to determine whether these general features of statistical learning extend to the perception and anticipation of noxious time-series. Our findings predict slower learning of the statistics of fluctuating pain signals, greater perceived stochasticity, and greater uncertainty in the prediction of future pain. Experiment 1 shows that statistical learning is implicated in the endogenous regulation of pain. Therefore, it could, in principle, influence how a pain state evolves. Once a pain state is initiated, how an individual learns and anticipates the fluctuating pain signals may contribute to determine how well it can be regulated by the nervous system, thus affecting the severity and recurrence of pain flares. This, in turn, would affect whether aversive associations with the instigating stimulus are extinguished or reinforced [41]. In chronic pain, dysfunctional learning may promote the amplification and maintenance of pain signals, contributing to the reinforcement of aversive associations with incident stimuli, as well as the persistence of pain [51–53].
The present finding of slower statistical learning in chronic back pain is a first step in the direction of both quantifying and understanding learning in the context of chronic pain. The idea that maladaptive learning is causally implicated in chronic pain is not new, being rooted in cognitive accounts of pain [53–55] and neuroimaging evidence of alterations in brain networks involved in value-based learning [36, 56–61]. However, very few studies have actually quantified learning in the context of chronic pain and no study, to the best of our knowledge, has directly investigated whether learning is causally implicated in the development of chronic pain. Our paper comes with open tools, which can be adapted in future studies on statistical learning in chronic pain. Online tasks can be easily used at scale. The key advantage of taking an hypothesis-driven, computational-neuroscience approach to quantify learning is that it allows to go beyond symptoms-mapping, identifying the quantifiable computational principles that mediate the link between symptoms and neural function.
3.3 Conclusions
Statistical expectations and confidence scale the judgement of pain in sequences of noxious inputs, as predicted by hierarchical Bayesian inference theory. More generally, the statistical learning of noisy and volatile sequences of fluctuating values is slower in adults with chronic back pain, possibly because they perceive the environment as being more stochastic than what it truly is. This work makes clear predictions to test in future pre-clinical studies, namely that impaired statistical learning is associated with maladaptive endogenous pain regulation. Therefore, this study opens a new avenue of research on the role of learning in chronic pain.
4 METHODS
4.1 Participants
In Experiment 1, 33 (18 female) healthy adult participants were recruited for the experiment. The mean age of participants was 22.4 ± 2.7 years old (range: 18-35). Participants had no chronic condition and no infectious illnesses, as well as no skin conditions (e.g. eczema) at the site of stimulus delivery. Moreover, we only recruited participants that had not taken any anti-anxiety, anti-depressive medication, nor any illicit substances, alcohol and pain medication (including NSAIDs such as ibuprofen and paracetamol) in the 24 hours prior to the experiment.
In Experiment 2, 724 participants were recruited online using Prolific [62]. All participants gave written informed consent in accordance with procedures approved by the Department of Engineering, University of Cambridge ethics committee, before beginning the screening process. Participants completed a preliminary screening survey consisting of a general health questionnaire, the Keele STarT Back Screening Tool, and psychological questionnaires (the State-Trait Anxiety Inventory (STAI), the Pain Anxiety Symptoms Scale (PASS-20), the Depression Patient Health Questionnaire-9 (PHQ-9), and the Pain Catastrophizing Scale (PCS)). 63 chronic pain participants and 70 controls met the eligibility criteria for the respective participant groups and were invited to take part in the study. 55 healthy control participants (34 female; mean age 34.5 years old; age range 20-75 years) and 56 participants with chronic back pain (38 female; mean age 32.7 years old; age range 22-63 years) completed the estimation task and their data was used in the analysis. Exclusion criteria included neurological and psychiatric illness and failure to pass the attention check. Selected chronic back pain participants reported experiencing persistent pain in their back for a duration of over 6 months and were classified as high-risk for chronic back pain by the STarT tool (STarT score > 4) [63]. Selected healthy controls reported no persisting pain in the general health questionnaire and were classified at low risk for back pain by the STarT tool (scoring 3 or below [63]). Control participants were also selected to have no clinical symptoms of anxiety (STAI score < 41) [64] or depression (PHQ-9 score < 10) [65]), to simplify the comparison between the control and chronic pain groups and reduce confounding factors.
All participants gave informed written consent to take part in the study, which was approved by the local ethics committee.
4.2 Protocol of Experiment 1
The experimental room’s temperature was maintained between 20°C to 23°C. Upon entry, an infrared thermometer was used to ensure participants temperature was above 36°C at the forehead and forearm of the non-dominant hand, to account for the known effects of temperature on pain perception [66]. A series of slideshows were presented, which explained the premise of the experiment and demonstrated what the participant would be asked to carry out. Throughout this presentation, questions were asked to ensure participants understood the task. Participants were given multiple opportunities to ask questions throughout the presentation.
We used the Medoc Advanced Thermosensory Stimulator 2 (TSA2) [67] to deliver thermal stimuli using the CHEPS thermode. The CHEPS thermode allowed for rapid cooling (40°C / sec) and heating (70°C / sec) so transitions between the baseline and stimuli temperatures were minimal. The TSA2 was controlled externally, via Matlab (Mathworks).
We then established the pain threshold, using the method of limits [68], in order to centre the range of temperature intensities used in the experiment. Each participant was provided with stimuli of increasing temperature, starting from 40°C going up in 0.5°C increments, using an inter-stimulus interval (ISI) of 2.5 sec and and a 2 sec duration. The participant was asked to indicate when the stimuli went from warm to painful - this temperature was noted and the stimuli ended. The procedure was repeated three times, and the average was used as an estimate of the pain threshold.
During the experiment, four sequences of thermal stimuli were delivered. Due to the phenomenon of offset analgesia (OA), where decreases in tonic pain result in a proportionally larger decrease in perceived pain [69], we chose phasic stimuli, with a duration of 2 sec and an inter-stimulus-interval (ISI) 2.5 seconds. In order to account for individual differences, the temperatures which the levels refer to are based upon the participants threshold. The median intensity level was defined as threshold, giving a max temperature of 3°C above threshold, which was found to be acceptable by participants. Before the start of the experiment each participant was provided with the highest temperature stimuli that could be presented, given their measured threshold, to ensure they where comfortable with this. Two participants found the stimulus too painful - the temperature range was lowered by 1°C and this was found to be acceptable.
After every trial of each sequence, the participant was asked for either their perception of the previous stimulus, or their prediction for the next stimulus through a 2D VAS (Fig. 1b), presented using PsychToolBox-3 [70]. The y-axis encodes the intensity of the stimulus either perceived or predicted, ranging from 0 (no heat detected/predicted) to 100 (worst pain imaginable perceived/predicted); on this scale, 50 represents pain threshold. This was done as a given sequence was centred around the threshold. The x-axis encodes confidence in either perception or prediction, ranging from 0 - completely uncertain (‘unsure’) - to 1 - complete confidence in the rating (‘sure’). Differing background colours were chosen to ensure participants were aware of what was being asked, and throughout the experiment participants were reminded to take care in answering the right question. The mouse movement was limited to be inside of the coloured box, which defined the area of participants’ input. At the beginning of each input screen, the mouse location was uniformly randomised within the input box.
The sequence of response types was randomised so as to retain 40 prediction and 40 perception ratings for each of the four sequence conditions. For an 80-trial long sequence, this gave 80 participant responses. Each sequence condition was separated by a 5 minute break, during which the thermode’s probe was slightly moved around the area of skin on the forearm to reduce sensitisation (i.e. a gradual increase in perceived intensity with repetitive noxious stimuli) [71]. In the middle of each sequence, there was a 3 minute break. During the ISI, the temperature returned to a base-line of 38°C. One participant was unable to complete the sequence as their threshold was too low, and data from four participants was lost due to Medoc software issues (the remote control failed and the data of 2 out 4 sessions were not saved). We excluded one participant’s whose ratings/predictions were inversely proportional to the noxious input. Thus, we analysed data from 27 participants.
4.2.1 Generative process of the painful sequences
We manipulated two sources of uncertainty in the sequence: the stochasticity (s) of the observation and the volatility (v) of the underlying sequence. Sequences were defined by two levels (high or low) of stochasticity and volatility, resulting in four different sequences conditions - creating a 2×2 factorial design. Each sequence was defined as a series of chunks, where the intensity for trial t, it was sampled from 𝒩(I, σ2), where σ2 indicates the level of stochasticity (σ2 = 1.75 for high level of stochasticity, σ2 = 0.25 for low level of stochasticity). The mean of the chunk, I, was drawn from 𝒰(3.5, 10.5). To ensure a noticeable difference in chunk intensity to the participant, concurrent chunk means were constrained to be at least 2 intensity levels different. Volatility was implemented by defining the length, or number of trials, of a chunk (l) drawn from 𝒰(L − a, L + a), where L is the mean o the chunk length (L = 15 for high volatility level, L = 25 for low volatility level). A jitter, a, was added around the mean to ensure the transition from one chunk to the next was not consistent or predictable. For both high and low volatility conditions we set a = 3. Sampled values were then discretised, where any intensities outside the valid intensity range [1, 13] were discarded and re-sampled resulting in an 80-trial long sequence for each condition. The mean of each sequence was centred around intensity level 7, i.e. the participants threshold. So defined, six sets of four sequences were sampled. Each participant received one set, with a randomized sequence order. See an example sequence (after subject-specific linear transformation) and one participant’s responses (including confidence ratings) in Fig. 1c-d.
4.3 Protocol of Experiment 2
After the preliminary screening, selected participants were invited through the Prolific platform to play an online game. Before being directed to the task, participants were asked to rate the intensity of pain they were experiencing in their back and their current level of fatigue out of 10. The instructions preceding the game explained that participants were to play the role of a stockbroker predicting how a company’s share price fluctuate over time. Participants were informed the initial value of their shares was $100 and this value would change over a number of days – where each trial represents a day. At the start of each trial, the participant was asked to predict the value of their shares on that day and rate their confidence in their prediction. After submitting their response, the actual outcome was revealed, alongside an accuracy score. The participant was then asked to make a prediction for the next day. The entire task was 200 trials in length, and participants had fixed 10 sec breaks every 25 trials.
For each participant, the sequence of time-varying share prices used in the task was randomly drawn from a set of 20 sequences. The sequences were designed to display variable stochasticity and volatility within a single sequence to test how participants’ learning strategy adapted to changing conditions. The sequences were generated using the following generative model.
4.3.1 Generative Process of Online Task Sequences
For each trial t, the volatility and stochasticity evolve according to a random walk.
ℒ(υ, λ) represents a Laplace distribution with location, υ, and scale, λ The fixed parameters determine the corresponding variances of the step size. The terms with decay rates αv,αs and process means µv, µs add stationarity.
The underlying mean xt evolves similarly, with decay rate αx, process mean µx, and step size with variance equal to the constrained volatility . The sequence outcome (share price) was generated from the underlying mean with added Gaussian noise with variance equal to the constrained stochasticity .
The free parameters , λv, λs, αv, and αs were adjusted to produce sequences that demonstrated noticeable variation in levels of stochasticity and volatility (, and αv,αs = 0.075). Additionally, they were checked for significant variation in autocorrelation – reflecting sufficient change in volatility.
4.4 Data pre-processing
4.4.1 Experiment 1
Since the intensity values of the noxious input were discretised between 1 and 13, while the participant’s responses (perception and prediction) were given on a 0-100 scale, we applied a linear transformation of the input to map its values onto a common 0-100 range. For each participant, for a set of inputs at perception trials from the concatenated sequence (separate sequence conditions in the order as presented), we fit a linear least-squares regression using Python’s scipy.stats.linregress function. On the rare occasions, when the transformed input was negative, we refit the line using Python’s non-linear least squares function scipy.optimize.curve_fit, constraining the intercept above 0 [72]. See the transformations in Supplementary Fig. 1. We then extracted each participant’s optimised slope and intercept and applied the transformation both to the concatenated and condition-specific sequence of inputs. So transformed, the sequences where then used in all the analyses.
To capture participant’s model-naive performance in the task, both for the concatenated and condition-specific sequence, we calculated Root Mean Square Error (RMSE) of each participant’s perception (Eq. 1) and prediction (Eq. 2) responses as compared to the input. The lower the RMSE, the higher the response accuracy. where TP is the number of perception trials, is participant’s perception response to the stimulus yt at trial t, TE is the number of prediction trials and is participant’s prediction of the next stimulus intensity yt+1 at at trial t + 1.
4.4.2 Experiment 2
For the online task, participant predictions were visually inspected to ensure the game was played properly. Five participants were excluded at this stage. One participant was excluded because the predictions were negatively correlated with the sequence outcomes, and four participants were excluded because they repeatedly entered the same response for large chunks of the game (≥ 30 trials) which indicated they did not properly engage with the game. Participants’ predictions were processed to remove typing errors (0.009% of trials). Mistakes in typing that resulted in predictions far outside the range of the sequence outcomes (predictions > 250) were replaced with the average of the previous trial and following trial. Additionally, confidence ratings were re-scaled from a 0-5 scale to a 0-1 scale.
4.5 Models
4.5.1 Reinforcement Learning
RL
In reinforcement learning models, learning is driven by discrepancies between the estimate of the expected value and observed values. Before any learning begins, at trial t = 1, participants have an initial expectation, E1 = E0, which is a free parameter that we estimate.
In experiment 1, on each trial, participants receive a thermal input Nt. We then calculate the prediction error δt, defined as the difference between the expectation Et and the input Nt (Eq. 3).
Participant is then assumed to update their expectation of the stimulus on the next trial as in Eq. 4 where α is the learning rate (free parameter), which governs how fast participants assimilate new information to update their belief.
On trials when participants rate their perceived intensity, we assume no effects on their perception other than confidence rating ct and response noise, so participants perception response is drawn from a Gaussian distribution, with the mean Pt = Nt and a confidence-scaled response noise ξ (free parameter), as in Eq. 5 where C is the confidence scaling factor (free parameter), which defines the extent to which confidence affects response uncertainty. Please, see Fig. 2 for an intuition behind confidence scaling.
On trials when participants are asked to predict the intensity of the next thermal stimulus, we use the updated expectation Et+1 to model participants prediction response Êt+1. This is similarly affected by confidence rating and response noise and is defined as in Eq. 6.
Analogously, in Experiment 2, whereby participants guess the value of the stock market, their response Êt is given based on their expectation Et and confidence rating, and is drawn from a Gaussian distribution with the mean Et and a confidence-scaled response noise, as in Eq. 7.
After the true value of the stock market Ot is revealed on each trial, we calculate the prediction error δt, defined as the difference between the expectation Et and the subsequently observed outcome Ot (Eq. 8).
The prediction error is then used to update the expectation of the stock market value for the next trial, as in Eq. 4.
To recap, the RL model has 4 free parameters: the learning rate α, response noise ξ, the initial expectation E0, and the confidence scaling factor C.
eRL
Additionally, in Experiment 1, where we investigate the effects of expectation on the perception of pain [26], we included an element that allows us express the perception as a weighted sum of the input and expectation (Eq. 9) where γ ∈ [0, 1] (free parameter) captures how much participants rely on the normative thermal input vs. their expectation. When γ = 0, the expectation plays no role and the model simplifies to that of the standard RL above. All the other equations are the same, and in total eRL has 5 free parameters.
4.5.2 Kalman Filter
KF
To capture sequential learning in a Bayesian manner, we used the Kalman filter model [26, 27, 29]. KF assumes a generative model of the environment where the latent state on trial t, xt (the mean of the sequences in our experiments), evolves according to a Gaussian random walk with a fixed drift rate, v (volatility), as in Eq. 10.
The observation on trial t, Nt (for experiment 1) or Ot (for experiment 2), is then drawn from a Gaussian (Eq. 11) with a fixed variance, which represents the observation uncertainty s (stochasticity).
As such the KF assumes stable dynamics since the generative process has fixed volatility and stochasticity.
For ease of explanation, we refer to the thermal input and observed stock market value at each trial as Ot, we also use the O1:t notation, which refers to a sequence of observations up to and including trial t. The model allows to obtain posterior beliefs about the latent state xt given the observations. This is done by tracking an internal estimate of the mean mt and the uncertainty, wt, of the latent state xt.
First, following standard KF results, on each trial, the participant is assumed to hold a prior belief (indicated with (−) superscript) about the latent state, xt (Eq. 12).
On the first trial, before any observations, we set (free parameters). In light of the new observation, Ot on trial t, the tracked mean and uncertainty of the latent state are reweighed based on the new evidence Ot and its associated observation uncertainty s as in Eq. 13.
We can then define the learning rate αt (Eq. 14), to get the update rule for the new posterior beliefs (indicated with (+) superscript) about the mean (Eq. 15) and uncertainty (Eq. 16) of xt.
Following this new belief, and the assumption about the environmental dynamics (volatility), the participant forms a new prior belief about the latent state xt+1 for the next trial t + 1 as in Eq. 17. where
We can simplify the notation to make it comparable to the RL models. We let, , and Following a new observation at trial t, we calculate the prediction error (Eq. 20) and learning rate (Eq. 21). we then update the belief about the mean (Eq. 22) and uncertainty (Eq. 23) of the latent state for the next trial.
Now, mapping this onto the experiments, the mean of the latent state is participants expectation Et = mt, and so for Experiment 1 we have participant perception rating modelled as in Eq. 24. and the prediction rating for the next trial as in Eq. 25.
Analogously, in Experiment 2, participants’ guess is defined as in Eq. 26.
In total the model has 6 free parameters: s (environmental stochasticity), v (environmental volatility), ξ (response noise), E0 (initial belief about the mean), w0 (initial belief about the uncertainty) and C (confidence scaling factor).
eKF - expectation weighted Kalman Filter
Lastly, for Experiment 1, we can introduce the effect of expectation on the pain perception, by assuming that participants treat the thermal input as an imperfect indicator of the true level of pain [26]. In this case, the input, Nt, is modelled as in Eq. 27 which forms an expression for the likelihood of the observation and adds an additional level to the inference, slightly modifying the Kalman filter assumptions such that:
However, we can apply the standard KF results and Bayes’ rule to arrive at simple update rules for the participants’ belief about the mean and uncertainty of the latent state xt. From this, we get a prior on the πt defined in Eq. 29 which, following a new input Nt, gives us the posterior belief about πt as in Eq. 30.
Now, if we define γt as in Eq. 31 we have that the posterior belief about the mean level of pain πt is calculated as: which is a weighted sum of the input Nt and participant expectation about the latent state xt, governed by the perceptual weight γt, analogously to the eRL model. Finally, the posterior belief about xt is obtained in Eq.
Now, setting the learning rate as in Eq. 34 we get:
Next, following the same notation simplification as before, we get the update rules for the prior belief about the mean (Eq. 37) and uncertainty (Eq. 38) of the latent state xt+1 for the next trial. as well as the expression for subjective perception, Pt, at trial t (Eq. 39).
The perception and prediction responses are modelled analogously as the KF model. In total, the model has 7 free parameters: ε (subjective noise), s (environmental stochasticity), v (environmental volatility), ξ (response noise), E0 (initial belief about the mean), w0 (initial belief about the uncertainty) and C (confidence scaling factor).
4.5.3 Random model
As a baseline, we also included a model that performs a random guess. For both experiment the perceptual/prediction rating as well as the guess was modelled as in Eq. 40.
The model has 3 free parameters: R, ξ, and C, where R is a constant value that participants respond with.
4.6 Model fitting
Model parameters were estimated using hierarchical Bayesian methods, performed with RStan package (v. 2.21.0) [73] in R (v. 4.0.2) based on Markov Chain Monte Carlo techniques (No-U-Turn Hamiltonian Monte Carlo). For the individual level-parameters we used non-centred parametrisation [74]. For the group-level parameters we used 𝒩(0, 1) priors for the mean, and the gamma-mixture representation of the Student-t(3, 0, 1) for the scale [75]. Parameters in the (0, 1) range were constrained using Phi_approx - a logistic approximation to the cumulative Normal distribution [76].
In Experiment 1, for each condition and each of the four chains, we ran 6000 samples (after discarding 6000 warm-up ones). For each condition, we examined R-hat values for each individual- (including the 𝒩(0, 1) error term from the non-centred parametrisation) and group-level parameters from each model to verify whether the Markov chains have converged. At the group-level and individual-level, all R-hat values had a value < 1.1, indicating convergence. In the random response model, 0.01% −0.16% iterations saturated the maximum tree depth of 11.
In Experiment 2, we fit the pain and control group data sets separately. For each of the four chains, we ran 3000 samples (after discarding 3000 warm-up ones). For each group, we examined R-hat values for each individual- and group-level parameters from each model to verify whether the Markov chains have converged. At the group-level and individual-level, all R-hat values had a value < 1.1, indicating convergence. In the random response model, 3.38% − 7.33% iterations saturated the maximum tree depth of 11.
4.6.1 Model comparison
For model comparison, we used R package loo, which provides efficient approximate leave-one-out cross-validation. The package allows to estimate the difference in models’ expected predictive accuracy through the difference in expected log point-wise predictive density (ELPD) [77]. By looking at the ratio between the ELPD difference and the standard error (SE) of the difference, we get the sigma effect - a heuristic for significance of such model differences. The closeness of fit can be also captured with LOO information criterion (LOOIC), where the lower LOOIC values indicate better fit.
4.6.2 Parameter comparison
For the comparison of group-level parameters between conditions (Experiment 1) or groups (Experiment 2), we extracted 95% High Density Intervals (HDI) of the permuted and merged (across chains) posterior samples of each group-level parameter [78]. To assess significant differences between groups/conditions, we calculated a difference between such defined intervals. In the Bayesian scenario, a significant difference is indicated by the interval not containing the value 0 [79, 80].
For experiment 2, Bayesian independent samples T-tests were performed, using JASP [81], on the two sets of individual-level parameters to determine whether there were significant differences between back pain and control groups. The Bayes Factor BF10 is a measure of the evidence for the alternative hypothesis relative to the null hypothesis, such that a greater BF10 indicates stronger support for the alternative hypothesis – that a significant difference between groups does exist. A Bayes factor BF10 ≥10 was interpreted as strong evidence for the alternative hypothesis [82].
4.6.3 Parameter and model recovery
To asses the reliability of our modelling analysis [83], for each model we performed parameter recovery analysis, where we simulated participants’ responses using newly drawn individual-level parameters from the group-level distributions.
For Experiment 1, we repurposed existing sequences of noxious inputs in the [1, 13] range (pre-transformation). When then applied a linear transformation to the input sequences using sampled slope and intercept coefficients from a Gaussian distribution of these coefficients that we estimated based on our dataset using R’s fitdistrplus package. Furthermore, we simulated the confidence ratings based on lag-1 auto-correlation across a moving window of the transformed input sequence.
For Experiment 2, we used the same sequences of share values that were used in the task. We simulated confidence ratings based on the lag-1 auto-correlation across a moving window of the share value sequence.
We then fit the same model to the simulated data and calculated Pearson correlation coefficients r between the generated and estimated individual-level parameters. The higher the coefficient r, the more reliable the estimates are, which can be categorised as: poor (if r <0.5); fair (if 0.5< r <0.75); good (0.75< r <0.9); excellent (if r >0.9) [84].
We also performed model recovery analysis [83], where we first simulated responses using each model and then fit each model-specific dataset with each model. We then counted the number of times a model fit the simulated data best (according to the LOOIC rule), effectively creating an M × M confusion matrix, where M is the number of models. In the case where we have a diagonal matrix of ones, the models are perfectly recoverable and hence as reliable as possible.
Data Availability
All code and data will be available open source, released upon acceptance of the paper in a peer-reviewed journal.
5 DATA AND CODE AVAILABILITY
All code and data will be available open source, released upon acceptance of the paper.
7 AUTHOR CONTRIBUTIONS
JO, MW, NG, GT, BS and FM designed the study. MW and NG collected the data. JO, MW, MJ and FM analysed the data. JO, MW, BS and FM wrote the article.
6 ACKOWLEDGEMENTS
The study was funded by a MRC Career Development Award to FM (MR/T010614/1) and a UKRI Advanced Pain Discovery Platform grant to both F.M. and B.S. (MR/W027593/1). B.S. was also funded by Wellcome (214251/Z/18/Z), Versus Arthritis (21537), and IITP (MSIT 2019-0-01371). This work has been performed using resources provided by the Cambridge Tier-2 system operated by the University of Cambridge Research Computing Service (www.hpc.cam.ac.uk) funded by EPSRC Tier-2 capital grant (EP/T022159/1). HPC access was additionally funded by an EPSRC research infrastructure grant to F.M.. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.
REFERENCES
- 1.↵
- 2.
- 3.↵
- 4.↵
- 5.
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.
- 12.
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.
- 18.
- 19.
- 20.
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.
- 41.↵
- 42.↵
- 43.↵
- 44.
- 45.
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.
- 53.↵
- 54.
- 55.↵
- 56.↵
- 57.
- 58.
- 59.
- 60.
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵